docs/dev/AudioInputDebug.md
Mumble does quite a bit of signal processing on the raw microphone input, so if something breaks it may not be immediately apparent where it breaks.
For this reason, the --dump-input-streams option was added, to help tap into various parts of the DSP chain, and find where the issue is. Consider
it a bit like the digital equivalent of probing with an oscilloscope the signal path of an analog audio gear.
As the option was introduced to debug the echo canceller, the default tap points are at the input and output of that algorithm, but if you are going
to debug some C++ code, you should not have problems moving a write() to an ofstream here and there should you need to, right?
--dump-input-streamsYou'll need to run Mumble from the command line, and the directory from where you run it will be where the dumped files will be written.
$ ./mumble --dump-input-streams
Then log into a server as usual, and start using Mumble. It's usually good enough to just run it for 10/20 seconds and then quit. Unless your bug happens only after some time or occurs at random, there's no need to accumulate gigabytes of dumped audio. It's also best to make reproducible tests, like playing the same video or speaking the same phrase, so as to compare results.
After closing Mumble, there should be 3 new files in the directory you launched it from:
raw_microphone_dumpspeaker_dumpprocessed_microphone_dumpPlease note that if you run Mumble again, those files will be overwritten. Also, those files are overwritten whenever the AudioInput class is
reinstantiated, such as when going though the audio wizard. If you find it difficult to get the data you want, such as because closing the audio
wizard clears your files, terminate Mumble with Ctrl-C at any moment and the files won't be erased.
These files contain the raw PCM streams that have been sampled. No header, no file format; nothing. Just data. This makes the dumping code as simple as possible, and you also don't have to change the header every time you tap a point with a different sample rate or encoding, as there's no header.
To open the raw files, you can use Audacity. Select File > Import > Raw Data.
Since there's no metadata, Audacity will ask you what's in those files:
Signed 16 bit PCM in the default tap point (i.e. you haven't modified write()). Mumble's signal path is partly 16 bit and partly
float, so remember to select 32 bit float if you move the tap points to some float part of the Mumble audio path.Little-endian if you're on an x86 CPU, which you most likely are.1 for the microphone signal path, but may be more for the speaker readback if you use multichannel echo cancellation.48000 for the default tap point, as Mumble's audio chain resamples everything to 48KHz regardless of what your audio card is
configured to. Change accordingly when tapping before the resampler.In Audacity you can open multiple tracks and mute them individually, so it's usually a good idea to open all three tracks to compare.
The audio dumps have an additional property that is fundamental for debugging the echo canceller: the're synchronous. If you open them all in Audacity, you'll be able not only to see what gets passed to the echo canceller, but the relative time between the signals.
This is fundamental for an echo canceller, which can break simply because the microphone data arrives before the speaker one (how can the echo canceller predict an echo from the future?), or if the speaker data is so ahead that exceeds its limited filter length.
--print-echocancel-queue optionNow that I've mentioned the requirement for the echo canceller to have well aligned inputs, maybe it's best to introduce the
--print-echocancel-queue option. When running Mumble with this option, the current state of the queue in the Resynchronizer class is used to align
the microphone and speaker readback streams is printed on the command line. Moreover, if packets are dropped (which is necessary to keep the signals
aligned if the OS/pulseaudio/audio card is playing tricks to us), those will be printed as well.
Documentation on the Resynchronizer class is put as a comment in the AudioInput.h file, but it doesn't hurt to repeat it here, also because the
statemachine design doesn't fit in a C++ comment as it's an image.
According to https://www.speex.org/docs/manual/speex-manual/node7.html "It is important that, at any time, any echo that is present in the input has already been sent to the echo canceller as echo_frame." Thus, we artificially introduce a small lag in the microphone by means of a queue, so as to be sure the speaker data always precedes the microphone.
There are conflicting requirements for the queue:
The current implementation uses a 5 elements queue, with a control statemachine that introduces packet drops to control the fill level to at least 2 (plus or minus one) and less than 4 elements. With a 10ms chunk, this queue should introduce a ~20ms lag to the voice.
Here m means a microphone chunk was received, s a speaker chunk was received, and the number in the state is the queue fill level. The design tries to keep the limit cycle of the queue add/remove pattern between 1 and 4 elements, preventing the queue to operate in a limit cycle between 0 and 1 elements (queue too empty, the speaker data may risk arriving after the microphone) and in a limit cycle between 4 and 5 elements (too full, we're wasting some precious filter length to cancel real echo just because some delay accumulated).
To avoid regressions being introduced in the echo cancellation feature, it is beneficial to have a controlled test that can be easily reproduced to test whether the echo canceller works.
You will need:
Here's the step by step guide:
--dump-input-streams optionspeaker_dump when testing multichannel
echo cancellationExample of an echo canceller bug: the speaker data lags compared to the microphone one. As a result, only the note is cancelled, but voice is not.
Exampe of the a working echo canceller.