src/fx/audio/README.md
Real-time audio processing modules for FastLED, optimized for embedded platforms (ESP32).
Current Location: fx/audio/
This directory contains high-level audio effects and detectors:
audio_processor.h/cpp - High-level facade for easy orchestration of all detectorsdetectors/ - All audio detector implementations (beat, vocal, percussion, chord, key, mood, etc.)advanced/ - Advanced signal processing modules (sound-to-midi, etc.)For low-level infrastructure, see:
fl/audio/ - AudioContext (shared FFT caching), AudioDetector (base class), core primitivesfl/audio/README.md - Infrastructure documentationThis directory contains audio processing modules for creating music-reactive LED effects:
audio_processor.h)detectors/beat.h)sound_to_midi.h)detectors/)All modules are designed for real-time operation on resource-constrained platforms with minimal latency and CPU overhead.
The easiest way to use audio detection is through the AudioProcessor facade:
#include "fx/audio/audio_processor.h"
fl::AudioProcessor audio;
void setup() {
// Register callbacks for events you care about
audio.onBeat([]() {
// Flash on beat
fill_solid(leds, NUM_LEDS, CRGB::White);
});
audio.onVocal([](bool active) {
// Change color when vocals detected
if (active) {
fill_solid(leds, NUM_LEDS, CRGB::Purple);
}
});
audio.onKick([]() {
// React to kick drum
leds[0] = CRGB::Red;
});
}
void loop() {
while (fl::AudioSample sample = audioInput.next()) {
audio.update(sample); // Updates all registered detectors
}
FastLED.show();
}
See fx/audio/audio_processor.h for the complete API with 40+ event callbacks.
The FastLED Audio System provides 17 fully-implemented detectors organized by tier:
| Detector | File | Purpose |
|---|---|---|
| BeatDetector | detectors/beat.h | Rhythmic pulse detection with tempo tracking |
| FrequencyBands | detectors/frequency_bands.h | Bass/mid/treble frequency abstraction |
| EnergyAnalyzer | detectors/energy_analyzer.h | Overall loudness and RMS tracking |
| TempoAnalyzer | detectors/tempo_analyzer.h | BPM tracking with confidence scoring |
| Detector | File | Purpose |
|---|---|---|
| TransientDetector | detectors/transient.h | Attack detection and transient analysis |
| NoteDetector | detectors/note.h | Musical note detection (MIDI-compatible) |
| DownbeatDetector | detectors/downbeat.h | Measure-level timing and meter detection |
| DynamicsAnalyzer | detectors/dynamics_analyzer.h | Loudness trends (crescendo/diminuendo) |
| PitchDetector | detectors/pitch.h | Pitch tracking with confidence |
| SilenceDetector | detectors/silence.h | Auto-standby and silence detection |
| Detector | File | Purpose |
|---|---|---|
| VocalDetector | detectors/vocal.h | Human voice detection using spectral analysis |
| PercussionDetector | detectors/percussion.h | Drum-specific detection (kick/snare/hihat) |
| ChordDetector | detectors/chord.h | Real-time chord recognition |
| KeyDetector | detectors/key.h | Musical key detection (major/minor/modes) |
| MoodAnalyzer | detectors/mood_analyzer.h | Emotional content analysis (circumplex model) |
| BuildupDetector | detectors/buildup.h | EDM buildup tension tracking |
| DropDetector | detectors/drop.h | EDM drop impact detection |
Total: 17 detectors + AudioProcessor facade = 18 high-level components
All detectors use the AudioContext pattern for optimal FFT sharing:
#include "fx/audio/audio_processor.h"
fl::AudioProcessor audio;
void setup() {
audio.onVocal([](bool active) {
if (active) {
// Vocals detected - change color to purple
fill_solid(leds, NUM_LEDS, CRGB::Purple);
} else {
// No vocals - return to normal
fill_solid(leds, NUM_LEDS, CRGB::Blue);
}
});
audio.onVocalStart([]() {
// Vocal segment started
Serial.println("Vocals started!");
});
audio.onVocalEnd([]() {
// Vocal segment ended
Serial.println("Vocals ended!");
});
}
void loop() {
while (fl::AudioSample sample = audioInput.next()) {
audio.update(sample);
}
FastLED.show();
}
#include "fx/audio/audio_processor.h"
fl::AudioProcessor audio;
void setup() {
audio.onKick([]() {
// Kick drum hit - flash red in bass section
fill_solid(leds, NUM_LEDS/3, CRGB::Red);
});
audio.onSnare([]() {
// Snare hit - flash yellow in mid section
fill_solid(leds + NUM_LEDS/3, NUM_LEDS/3, CRGB::Yellow);
});
audio.onHiHat([]() {
// Hi-hat hit - sparkle in treble section
leds[random16(NUM_LEDS)] = CRGB::White;
});
audio.onPercussion([](const char* type) {
Serial.print("Percussion: ");
Serial.println(type); // "kick", "snare", or "hihat"
});
}
#include "fx/audio/audio_processor.h"
fl::AudioProcessor audio;
void setup() {
audio.onChord([](const char* chord_name) {
Serial.print("Chord detected: ");
Serial.println(chord_name); // e.g., "Cmaj", "Gmin", "D7"
// Map chord to color
CRGB color = getColorForChord(chord_name);
fill_solid(leds, NUM_LEDS, color);
});
}
CRGB getColorForChord(const char* chord) {
// Map chords to colors based on circle of fifths
if (strstr(chord, "C")) return CRGB::Red;
if (strstr(chord, "G")) return CRGB::Orange;
if (strstr(chord, "D")) return CRGB::Yellow;
if (strstr(chord, "A")) return CRGB::Green;
if (strstr(chord, "E")) return CRGB::Cyan;
if (strstr(chord, "B")) return CRGB::Blue;
if (strstr(chord, "F")) return CRGB::Purple;
return CRGB::White;
}
#include "fx/audio/audio_processor.h"
fl::AudioProcessor audio;
void setup() {
audio.onMood([](float valence, float arousal) {
// valence: -1 (negative) to +1 (positive)
// arousal: -1 (calm) to +1 (energetic)
// Map mood to color palette
if (valence > 0 && arousal > 0) {
// Happy/Excited - bright warm colors
fill_solid(leds, NUM_LEDS, CRGB::Orange);
} else if (valence > 0 && arousal < 0) {
// Content/Relaxed - soft cool colors
fill_solid(leds, NUM_LEDS, CRGB::CornflowerBlue);
} else if (valence < 0 && arousal > 0) {
// Angry/Tense - intense reds
fill_solid(leds, NUM_LEDS, CRGB::DarkRed);
} else {
// Sad/Depressed - cool dark colors
fill_solid(leds, NUM_LEDS, CRGB::DarkBlue);
}
});
}
#include "fx/audio/audio_processor.h"
fl::AudioProcessor audio;
uint8_t buildupIntensity = 0;
void setup() {
audio.onBuildup([](float tension) {
// tension: 0.0 (no buildup) to 1.0 (peak tension)
buildupIntensity = tension * 255;
// Increase brightness and color saturation during buildup
FastLED.setBrightness(buildupIntensity);
fill_solid(leds, NUM_LEDS, CHSV(0, 255, buildupIntensity));
});
audio.onDrop([]() {
// Drop detected - massive flash!
fill_solid(leds, NUM_LEDS, CRGB::White);
FastLED.setBrightness(255);
FastLED.show();
delay(50);
});
}
#include "fx/audio/audio_processor.h"
fl::AudioProcessor audio;
void setup() {
// Combine multiple detectors for rich interactions
audio.onBeat([]() {
// Flash on beat
fill_solid(leds, NUM_LEDS, CRGB::White);
});
audio.onVocal([](bool active) {
if (active) {
// Vocals present - use warm colors
FastLED.setTemperature(Tungsten100W);
} else {
// No vocals - use cool colors
FastLED.setTemperature(ClearBlueSky);
}
});
audio.onKick([]() {
// Kick drum - pulse bass LEDs
for (int i = 0; i < NUM_LEDS/4; i++) {
leds[i] = CRGB::Red;
}
});
audio.onChord([](const char* chord) {
// Chord change - shift hue
static uint8_t hue = 0;
hue += 32;
fill_solid(leds, NUM_LEDS, CHSV(hue, 255, 200));
});
audio.onMood([](float valence, float arousal) {
// Mood influences overall palette
uint8_t sat = (arousal + 1) * 127; // More energetic = more saturated
uint8_t bri = (valence + 1) * 127; // More positive = brighter
fadeToBlackBy(leds, NUM_LEDS, 255 - bri);
});
}
Files: detectors/beat.h, detectors/beat.cpp
Real-time beat detection using the AudioContext pattern for shared FFT computation.
The Beat Detector provides rhythmic pulse detection and tempo tracking using spectral flux analysis of the shared FFT from AudioContext. It detects both individual onsets and maintains a tempo estimate with confidence scoring.
The easiest way to use beat detection is through the AudioProcessor facade:
#include "fx/audio/audio_processor.h"
fl::AudioProcessor audio;
void setup() {
// Register callbacks for beat events
audio.onBeat([]() {
// Flash on beat
fill_solid(leds, NUM_LEDS, CRGB::White);
});
audio.onTempoChange([](float bpm, float confidence) {
Serial.print("Tempo: ");
Serial.print(bpm);
Serial.print(" BPM");
});
audio.onOnset([](float strength) {
Serial.print("Onset strength: ");
Serial.println(strength);
});
}
void loop() {
while (fl::AudioSample sample = audioInput.next()) {
audio.update(sample); // Updates beat detector automatically
}
FastLED.show();
}
See examples/BeatDetection/BeatDetection.ino for a complete example with LED visualization and BPM-based color coding.
Files: advanced/sound_to_midi.h, advanced/sound_to_midi.cpp
Sound to MIDI converts audio input into MIDI Note On/Off events. It supports both monophonic (single note) and polyphonic (multiple simultaneous notes) pitch detection, with automatic velocity calculation and anti-jitter filtering.
Uses a YIN/MPM-like autocorrelation algorithm for pitch detection:
hop_size < frame_size): Incoming samples accumulate in ring bufferhop_size samples, extract last frame_size samples with Hann windowingnote_hold_frames consecutive framessilence_frames_off consecutive framesUses FFT-based spectral peak detection:
hop_size < frame_size): Incoming samples accumulate in ring bufferhop_size samples, extract last frame_size samples with windowing#include "fx/audio/advanced/sound_to_midi.h"
// Create configuration with sliding window and auto-tuning
SoundToMIDI cfg;
cfg.sample_rate_hz = 16000;
cfg.frame_size = 512;
cfg.hop_size = 256; // 50% overlap for improved onset detection
cfg.confidence_threshold = 0.82f;
// Optional: Enable K-of-M filtering to reduce spurious events
cfg.enable_k_of_m = true;
cfg.k_of_m_onset = 2; // Require 2 detections in last 3 frames
cfg.k_of_m_window = 3;
// Optional: Enable auto-tuning for adaptive thresholds
cfg.auto_tune_enable = true; // Automatically adapts to noise floor
// Create monophonic engine
SoundToMIDIMono engine(cfg);
// Set callbacks
engine.onNoteOn = [](uint8_t note, uint8_t velocity) {
Serial.print("Note ON: ");
Serial.print(note);
Serial.print(" vel=");
Serial.println(velocity);
};
engine.onNoteOff = [](uint8_t note) {
Serial.print("Note OFF: ");
Serial.println(note);
};
// In your audio processing loop (can feed any chunk size):
float audioBuffer[512];
// ... fill audioBuffer with audio samples ...
engine.processFrame(audioBuffer, 512);
#include "fx/audio/sound_to_midi.h"
// Create configuration with sliding window and advanced features
SoundToMIDI cfg;
cfg.sample_rate_hz = 44100;
cfg.frame_size = 2048;
cfg.hop_size = 512; // 75% overlap for dense chord analysis
cfg.fmin_hz = 80.0f; // Lower limit for guitar/piano
cfg.fmax_hz = 2000.0f;
// Polyphonic-specific settings
cfg.window_type = WindowType::Hann;
cfg.harmonic_filter_enable = true;
cfg.parabolic_interp = true;
cfg.pcp_enable = true; // Pitch class profile for stability
// Optional: Enable K-of-M filtering (per-note persistence)
cfg.enable_k_of_m = true;
cfg.k_of_m_onset = 3; // Require 3 detections in last 4 frames
cfg.k_of_m_window = 4;
// Optional: Enable auto-tuning for dynamic environments
cfg.auto_tune_enable = true;
cfg.auto_tune_peak_margin_db = 6.0f; // Adaptive peak threshold
// Create polyphonic engine
SoundToMIDIPoly engine(cfg);
// Optional: Monitor auto-tuning adjustments
engine.setAutoTuneCallback([](const char* param, float old_val, float new_val) {
Serial.print("Auto-tune: ");
Serial.print(param);
Serial.print(" ");
Serial.print(old_val);
Serial.print(" → ");
Serial.println(new_val);
});
// Set callbacks (same as monophonic)
engine.onNoteOn = [](uint8_t note, uint8_t velocity) {
// Multiple notes can trigger simultaneously
Serial.print("Note ON: ");
Serial.println(note);
};
engine.onNoteOff = [](uint8_t note) {
Serial.print("Note OFF: ");
Serial.println(note);
};
// Process audio (can feed any chunk size)
engine.processFrame(audioBuffer, 2048);
| Parameter | Default | Description |
|---|---|---|
sample_rate_hz | 16000 | Input sample rate (16000-48000 Hz) |
frame_size | 512 | Analysis window size (512 for 16kHz, 1024+ for 44kHz) |
hop_size | 512 | Step size between frames (set < frame_size for overlap, e.g., 256 = 50% overlap) |
| Parameter | Default | Description |
|---|---|---|
fmin_hz | 40.0 | Minimum detectable frequency (E1 ≈ 41.2 Hz) |
fmax_hz | 1600.0 | Maximum detectable frequency (G6 ≈ 1568 Hz) |
| Parameter | Default | Description |
|---|---|---|
confidence_threshold | 0.80 | Minimum confidence [0-1] to accept pitch |
note_hold_frames | 3 | Consecutive frames required before Note On |
silence_frames_off | 3 | Consecutive silent frames before Note Off |
rms_gate | 0.010 | RMS threshold below which signal is silent |
| Parameter | Default | Description |
|---|---|---|
vel_gain | 5.0 | Gain multiplier for RMS → velocity |
vel_floor | 10 | Minimum MIDI velocity (1-127) |
velocity_from_peak_mag | true | Use peak magnitude for velocity (poly only) |
| Parameter | Default | Description |
|---|---|---|
note_change_semitone_threshold | 1 | Semitones required to trigger note change |
note_change_hold_frames | 3 | Frames new note must persist before switching |
median_filter_size | 1 | Median filter window (1=off, 3-5 for noisy input, auto-adjusted if auto-tune enabled) |
| Parameter | Default | Description |
|---|---|---|
enable_k_of_m | false | Enable K-of-M onset persistence filtering |
k_of_m_onset | 2 | Require K detections in last M frames (monophonic) or per-note (polyphonic) |
k_of_m_window | 3 | Window size M for K-of-M detection |
| Parameter | Default | Description |
|---|---|---|
window_type | Hann | Window function (None, Hann, Hamming, Blackman) |
spectral_tilt_db_per_decade | 0.0 | Spectral tilt compensation (+3 boosts highs) |
smoothing_mode | Box3 | Magnitude smoothing (None, Box3, Tri5, AdjAvg) |
peak_threshold_db | -40.0 | Magnitude threshold for peak detection (dB) |
parabolic_interp | true | Sub-bin frequency refinement |
harmonic_filter_enable | true | Suppress overtones |
harmonic_tolerance_cents | 35.0 | Cents tolerance for harmonic detection |
octave_mask | 0xFF | Bitmask for enabled octaves (bit 0-7) |
pcp_enable | false | Enable pitch class profile stabilizer |
Auto-tuning automatically adapts detection thresholds based on input characteristics:
| Parameter | Default | Description |
|---|---|---|
auto_tune_enable | false | Enable auto-tuning (master switch) |
auto_tune_rms_margin | 1.8 | RMS gate = noise_floor × margin (1.5-2.0) |
auto_tune_peak_margin_db | 8.0 | Peak threshold = noise_floor_db + margin (6-10 dB) |
auto_tune_update_rate_hz | 5.0 | Update frequency (5-10 Hz recommended) |
auto_tune_param_smoothing | 0.95 | Smoothing factor for updates (0.9-0.99, higher = smoother) |
auto_tune_threshold_step | 0.02 | Step size for threshold adjustments |
auto_tune_calibration_time_sec | 1.0 | Initial calibration period (seconds) |
auto_tune_rms_gate_min/max | 0.005/0.100 | Limits for RMS gate adaptation |
auto_tune_confidence_min/max | 0.60/0.95 | Limits for confidence threshold (mono) |
auto_tune_peak_db_min/max | -60.0/-20.0 | Limits for peak threshold (poly) |
auto_tune_notes_per_sec_min/max | 1.0/10.0 | Target event rate range (mono) |
auto_tune_peaks_per_frame_min/max | 1.0/5.0 | Target peak rate range (poly) |
What Auto-Tuning Does:
rms_gate (mono) and peak_threshold_db (poly) based on noise floor + marginmedian_filter_size (mono) for stabilitynote_hold_framesEnable auto-tuning for environments with varying noise levels, dynamic range, or when optimal settings are unknown.
SoundToMIDI cfg;
cfg.frame_size = 1024;
cfg.hop_size = 256; // 75% overlap (4× analysis density)
SoundToMIDIMono engine(cfg);
// Feed audio in arbitrary chunk sizes - engine handles buffering
float chunk[128];
while (audio_available) {
read_audio(chunk, 128);
engine.processFrame(chunk, 128); // Analysis triggered every hop_size samples
}
Overlap Guidelines:
cfg.enable_k_of_m = true;
cfg.k_of_m_onset = 2; // Require 2 detections
cfg.k_of_m_window = 3; // In last 3 frames
// Monophonic: Reduces spurious onset/offset events
// Polyphonic: Per-note tracking - each MIDI note requires K detections
Recommended Settings:
engine.setAutoTuneCallback([](const char* param_name, float old_val, float new_val) {
Serial.print("Auto-tune: ");
Serial.print(param_name);
Serial.print(" changed from ");
Serial.print(old_val);
Serial.print(" to ");
Serial.println(new_val);
});
const AutoTuneState& state = engine.getAutoTuneState();
Serial.print("Noise floor: ");
Serial.println(state.noise_rms_est);
Serial.print("Confidence EMA: ");
Serial.println(state.confidence_ema);
Serial.print("Event rate: ");
Serial.println(state.event_rate_ema);
Serial.print("Pitch variance: ");
Serial.println(state.pitch_variance_ema);
// Check if still in calibration phase
if (state.in_calibration) {
Serial.println("Calibrating...");
}
SoundToMIDIPoly poly(cfg);
// Adjust sensitivity
poly.setPeakThresholdDb(-35.0f); // More sensitive
// Filter octave range
poly.setOctaveMask(0x3C); // Only octaves 2-5 (bits 2,3,4,5)
// Boost high frequencies
poly.setSpectralTilt(3.0f); // +3 dB/decade
// Change smoothing
poly.setSmoothingMode(SmoothingMode::Tri5); // Triangular 5-point
The conversion from frequency to MIDI note number uses the formula:
MIDI note = 69 + 12 × log₂(frequency / 440 Hz)
Where 69 is MIDI note A4 (440 Hz). For example:
The sliding window provides overlapped analysis for both mono and poly engines:
Ring Buffer Management:
frame_size + hop_size floatsaccumulated >= hop_size, triggers analysisFrame Extraction:
frame_size samples from ring bufferw[n] = 0.5 × (1 - cos(2π×n/(N-1)))Overlap Benefits:
The YIN algorithm finds the fundamental frequency by:
Confidence is derived from 1 - d'(τ), where lower difference means higher confidence.
Reduces spurious events by requiring persistence across frames:
Monophonic Mode:
Polyphonic Mode:
Latency Impact:
(M - K + 1) × hop_size / sample_ratePolyphonic mode analyzes the frequency spectrum:
Noise Floor Estimation:
Adaptive Thresholds:
rms_gate = max(min_rms, noise_rms_est × margin)
peak_threshold_db = noise_mag_db_est + margin_db
Jitter Monitoring (Monophonic):
Event Rate Control:
Hold-time Optimization:
note_hold_frames = 0.75 × avg_duration / hop_periodsilence_frames_off = 0.5 × avg_gap / hop_periodCalibration Phase:
Monophonic Mode:
Polyphonic Mode:
Sliding Window Buffers (per engine):
frame_size + hop_size floats (e.g., 1024+256 = 5KB @ 75% overlap)frame_size floats (e.g., 4KB for 1024 samples)frame_size floats (e.g., 4KB)Auto-tuning State:
Total Memory (Polyphonic, Full Features):
Algorithmic Latency:
frame_size / sample_rate (e.g., 1024/16000 = 64ms)hop_size / sample_rate (e.g., 256/16000 = 16ms with 75% overlap)(M-K+1) × hop_size / sample_rate (e.g., 2 hops = 32ms for K=2, M=3)Total Typical Latency:
Note: Overlap reduces effective latency for onset detection by analyzing more frequently.
ESP32 / ESP32-S3 (512KB SRAM):
ESP32-C3 (400KB SRAM):
Memory-constrained platforms (<100KB SRAM):
Both modules require platforms with sufficient memory:
#if SKETCH_HAS_LOTS_OF_MEMORY
// Beat detector and sound-to-MIDI available
#else
// Not available on memory-constrained platforms
#endif
Tested platforms:
Status: Production-ready Date Completed: 2025-01-16 Total Components: 20 (3 infrastructure + 17 detectors)
Tier 1 (4 detectors): BeatDetector, FrequencyBands, EnergyAnalyzer, TempoAnalyzer Tier 2 (6 detectors): TransientDetector, NoteDetector, DownbeatDetector, DynamicsAnalyzer, PitchDetector, SilenceDetector Tier 3 (7 detectors): VocalDetector, PercussionDetector, ChordDetector, KeyDetector, MoodAnalyzer, BuildupDetector, DropDetector
FastLED provides features not found in competing libraries:
hop_size < frame_size)enable_k_of_m, k_of_m_onset, k_of_m_windowgetAutoTuneState(), setAutoTuneCallback()examples/BeatDetection/BeatDetection.ino - Real-time beat detection with LED visualization
MIT License (same as FastLED)
See implementation documentation in project root:
AUTO_TUNE_IMPLEMENTATION.md - Auto-tuning extension design and implementationTASK.md - Sliding window STFT design and multi-frame logicTASK2.md - Core integration plan for sliding window (completed)These documents provide detailed algorithms, design decisions, and testing strategies for the advanced features.
See project root CLAUDE.md for coding standards and guidelines.