doc/help/Drivers-Audio.md
The Audio driver lets Serial Studio treat any OS-level audio input device as an analog data source: microphones, line-in, audio interfaces, USB DACs, virtual loopback devices, anything the OS can record from.
It can also feed an analog signal into Serial Studio when no dedicated driver fits. A vibration sensor through a microphone preamp, or a current shunt through a line input, both work the same as a real microphone.
Digital audio is the discrete-time, discrete-amplitude representation of an analog sound waveform. Three numbers describe it:
The encoding scheme is almost always PCM (Pulse-Code Modulation): each sample is the amplitude at that instant, encoded as an integer or float. No compression, no transformations, no per-sample headers.
flowchart TB
A[Continuous analog signal] --> B[ADC sampling at 48 kHz]
B --> C["Sequence of integer samples
(16-bit, signed)"]
C --> D["PCM stream:
...0x1234, 0xABCD, 0x0FF0..."]
The Nyquist-Shannon sampling theorem states that, to faithfully reconstruct a signal containing frequencies up to f, the sample rate must be at least 2f. A 44.1 kHz sample rate therefore captures frequencies up to about 22.05 kHz, which exceeds the upper limit of human hearing (about 20 kHz). That is the reason CD audio settled on 44.1 kHz.
Higher sample rates (96 and 192 kHz) are common in studio work, mostly to provide headroom during processing rather than to capture sound above 22 kHz. For Serial Studio's purposes:
Sampling below twice the highest signal frequency causes aliasing: high frequencies fold back into the audible band as ghost signals at the wrong pitch. Most audio hardware filters out high frequencies before sampling to prevent this. Custom hardware feeding Serial Studio should bandwidth-limit its input the same way.
Each PCM sample's bit depth determines the smallest amplitude difference that can be represented. The signal-to-noise ratio (SNR) of a perfectly-quantised sine wave is roughly 6 dB per bit:
| Bit depth | Theoretical SNR | Use case |
|---|---|---|
| 8-bit | ~48 dB | Voice memos, low-quality streaming |
| 16-bit | ~96 dB | CD-quality, most consumer audio |
| 24-bit | ~144 dB | Studio recording, master tapes |
| 32-bit float | effectively unlimited | Mixing, processing |
Anything below the bit-depth's noise floor is lost. 16-bit is usually adequate for Serial Studio applications; 24-bit or 32-bit float adds headroom for signals that vary across many orders of magnitude.
A stereo signal is two PCM streams interleaved sample by sample: L, R, L, R, L, R, .... A 4-channel interface gives 1, 2, 3, 4, 1, 2, 3, 4, .... The OS exposes each channel as a separate stream of samples that share the same sample rate and bit depth.
In Serial Studio each input channel can drive its own dataset. A 4-input audio interface with sensors on each input therefore yields four independent telemetry streams.
The audio driver is built on miniaudio, a single-header cross-platform audio library. miniaudio talks directly to:
This avoids the overhead of QtMultimedia and gives the driver direct access to low-latency callback-based capture.
The audio driver is the most thread-heavy of all Serial Studio drivers:
Qt::PreciseTimer at highest priority, drains the captured-buffer queue, and forwards data downstream.now - (N-1) / sample_rate so the first sample carries the correct acquisition time, not the moment the OS got around to firing the callback.This timestamp accuracy is what keeps audio data lined up in CSV exports and session reports even when the audio backend buffer is large. See Threading and Timing Guarantees for the full timestamp-ownership rules.
The driver converts each captured buffer to CSV text: one line per sample period, channels separated by commas (L,R for stereo). Under line-delimited framing (the Quick Plot default), each line becomes one frame carrying its own sample-clock timestamp, so a 48000 Hz capture produces 48000 frames per second. The frame parser sees the decoded sample values as text, which can be:
The FFT and Waterfall widgets share the same per-dataset settings (fftSamples, fftSamplingRate, fftMin, fftMax), so a single audio channel can drive both views simultaneously.
In Quick Plot mode the dashboard configures itself: each channel becomes a dataset in an Audio Input group with FFT enabled, fftSamplingRate set to the device sample rate, fftSamples sized to the power of two covering roughly 50 ms of signal (256 to 8192), and plot ranges taken from the sample format's limits. In a project, set those per-dataset values yourself.
| Setting | Controls |
|---|---|
| Input Device | Which OS audio device to capture from. |
| Sample Rate | Capture rate in Hz; only rates the device reports are offered. First-run default is 44100 Hz (22050 Hz on Windows) when the device supports it. |
| Sample Format | Unsigned 8-bit, Signed 16-bit, Signed 24-bit, Signed 32-bit, or Float 32-bit, filtered to what the device supports. |
| Channels | Mono, Stereo, or a multichannel layout (3.0 up to 7.1), depending on the device. |
| Output Device | Optional playback device, with its own Sample Format and Channels selectors. |
Selections persist across sessions and are saved with the project by stable identifiers (device name, rate in Hz, format name, channel count), so they survive index changes when devices are plugged or unplugged. None of them can change while the device is open; disconnect first.
When an Output Device is configured, the driver opens in duplex mode and data written to it plays back as audio. Each outgoing frame is a comma-separated list with one value per playback channel: integer sample values for the integer formats, -1.0 to 1.0 for Float 32-bit.
The same settings are scriptable through the io.audio.* commands of the JSON-RPC API: setInputDevice and setOutputDevice (deviceIndex), setSampleRate (rateIndex), setInputSampleFormat and setOutputSampleFormat (formatIndex), and setInputChannelConfig and setOutputChannelConfig (channelIndex), plus the read-only listInputDevices, listOutputDevices, listSampleRates, listInputFormats, listOutputFormats, and getConfig. Every setter takes a zero-based index into the option list, in the same order as the Setup Panel. When the in-app AI issues these commands, they sit behind the Allow device control toggle.
For step-by-step setup, see the Protocol Setup Guides, Audio Input section.
arecord -l (ALSA) or PulseAudio's pavucontrol to confirm the device exists and is not muted.fftSamplingRate on the dataset to match the audio sample rate. If the sample rate is 48 kHz and fftSamplingRate is left at its default of 100, the frequency axis is scaled by 480x. Quick Plot mode sets it automatically; projects do not.fftSamples or disable the waterfall in the per-dataset settings.io.audio.* command set for scripted control.