Back to Fastled

FastLED Audio Effects

src/fx/audio/README.md

3.10.336.0 KB
Original Source

FastLED Audio Effects

Real-time audio processing modules for FastLED, optimized for embedded platforms (ESP32).

Directory Organization

Current Location: fx/audio/

This directory contains high-level audio effects and detectors:

  • audio_processor.h/cpp - High-level facade for easy orchestration of all detectors
  • detectors/ - All audio detector implementations (beat, vocal, percussion, chord, key, mood, etc.)
  • advanced/ - Advanced signal processing modules (sound-to-midi, etc.)

For low-level infrastructure, see:

  • fl/audio/ - AudioContext (shared FFT caching), AudioDetector (base class), core primitives
  • fl/audio/README.md - Infrastructure documentation

Overview

This directory contains audio processing modules for creating music-reactive LED effects:

Main Modules

  1. AudioProcessor - High-level facade for easy detector orchestration (audio_processor.h)
  2. Beat Detector - Real-time beat detection and tempo tracking for EDM (detectors/beat.h)
  3. Sound to MIDI - Monophonic/polyphonic pitch detection with MIDI events (sound_to_midi.h)
  4. 17+ Audio Detectors - Vocal, percussion, chord, key, mood, buildup, drop, and more (detectors/)

All modules are designed for real-time operation on resource-constrained platforms with minimal latency and CPU overhead.


Quick Start with AudioProcessor

The easiest way to use audio detection is through the AudioProcessor facade:

cpp
#include "fx/audio/audio_processor.h"

fl::AudioProcessor audio;

void setup() {
    // Register callbacks for events you care about
    audio.onBeat([]() {
        // Flash on beat
        fill_solid(leds, NUM_LEDS, CRGB::White);
    });

    audio.onVocal([](bool active) {
        // Change color when vocals detected
        if (active) {
            fill_solid(leds, NUM_LEDS, CRGB::Purple);
        }
    });

    audio.onKick([]() {
        // React to kick drum
        leds[0] = CRGB::Red;
    });
}

void loop() {
    while (fl::AudioSample sample = audioInput.next()) {
        audio.update(sample);  // Updates all registered detectors
    }
    FastLED.show();
}

See fx/audio/audio_processor.h for the complete API with 40+ event callbacks.


Complete Detector Catalog

The FastLED Audio System provides 17 fully-implemented detectors organized by tier:

Tier 1: Essential Detectors (4 core components)

DetectorFilePurpose
BeatDetectordetectors/beat.hRhythmic pulse detection with tempo tracking
FrequencyBandsdetectors/frequency_bands.hBass/mid/treble frequency abstraction
EnergyAnalyzerdetectors/energy_analyzer.hOverall loudness and RMS tracking
TempoAnalyzerdetectors/tempo_analyzer.hBPM tracking with confidence scoring

Tier 2: Enhancement Detectors (6 important features)

DetectorFilePurpose
TransientDetectordetectors/transient.hAttack detection and transient analysis
NoteDetectordetectors/note.hMusical note detection (MIDI-compatible)
DownbeatDetectordetectors/downbeat.hMeasure-level timing and meter detection
DynamicsAnalyzerdetectors/dynamics_analyzer.hLoudness trends (crescendo/diminuendo)
PitchDetectordetectors/pitch.hPitch tracking with confidence
SilenceDetectordetectors/silence.hAuto-standby and silence detection

Tier 3: Differentiator Detectors (7 unique features)

DetectorFilePurpose
VocalDetectordetectors/vocal.hHuman voice detection using spectral analysis
PercussionDetectordetectors/percussion.hDrum-specific detection (kick/snare/hihat)
ChordDetectordetectors/chord.hReal-time chord recognition
KeyDetectordetectors/key.hMusical key detection (major/minor/modes)
MoodAnalyzerdetectors/mood_analyzer.hEmotional content analysis (circumplex model)
BuildupDetectordetectors/buildup.hEDM buildup tension tracking
DropDetectordetectors/drop.hEDM drop impact detection

Total: 17 detectors + AudioProcessor facade = 18 high-level components

All detectors use the AudioContext pattern for optimal FFT sharing:

  • FFT computed once per frame, shared by all detectors
  • Typical performance: 3-8ms per frame with 5-7 active detectors
  • Memory footprint: ~1.5KB for 5-7 active detectors

Individual Detector Modules

Quick Start Examples

VocalDetector - Human Voice Detection

cpp
#include "fx/audio/audio_processor.h"

fl::AudioProcessor audio;

void setup() {
    audio.onVocal([](bool active) {
        if (active) {
            // Vocals detected - change color to purple
            fill_solid(leds, NUM_LEDS, CRGB::Purple);
        } else {
            // No vocals - return to normal
            fill_solid(leds, NUM_LEDS, CRGB::Blue);
        }
    });

    audio.onVocalStart([]() {
        // Vocal segment started
        Serial.println("Vocals started!");
    });

    audio.onVocalEnd([]() {
        // Vocal segment ended
        Serial.println("Vocals ended!");
    });
}

void loop() {
    while (fl::AudioSample sample = audioInput.next()) {
        audio.update(sample);
    }
    FastLED.show();
}

PercussionDetector - Drum Detection

cpp
#include "fx/audio/audio_processor.h"

fl::AudioProcessor audio;

void setup() {
    audio.onKick([]() {
        // Kick drum hit - flash red in bass section
        fill_solid(leds, NUM_LEDS/3, CRGB::Red);
    });

    audio.onSnare([]() {
        // Snare hit - flash yellow in mid section
        fill_solid(leds + NUM_LEDS/3, NUM_LEDS/3, CRGB::Yellow);
    });

    audio.onHiHat([]() {
        // Hi-hat hit - sparkle in treble section
        leds[random16(NUM_LEDS)] = CRGB::White;
    });

    audio.onPercussion([](const char* type) {
        Serial.print("Percussion: ");
        Serial.println(type);  // "kick", "snare", or "hihat"
    });
}

ChordDetector - Chord Recognition

cpp
#include "fx/audio/audio_processor.h"

fl::AudioProcessor audio;

void setup() {
    audio.onChord([](const char* chord_name) {
        Serial.print("Chord detected: ");
        Serial.println(chord_name);  // e.g., "Cmaj", "Gmin", "D7"

        // Map chord to color
        CRGB color = getColorForChord(chord_name);
        fill_solid(leds, NUM_LEDS, color);
    });
}

CRGB getColorForChord(const char* chord) {
    // Map chords to colors based on circle of fifths
    if (strstr(chord, "C")) return CRGB::Red;
    if (strstr(chord, "G")) return CRGB::Orange;
    if (strstr(chord, "D")) return CRGB::Yellow;
    if (strstr(chord, "A")) return CRGB::Green;
    if (strstr(chord, "E")) return CRGB::Cyan;
    if (strstr(chord, "B")) return CRGB::Blue;
    if (strstr(chord, "F")) return CRGB::Purple;
    return CRGB::White;
}

MoodAnalyzer - Emotional Content

cpp
#include "fx/audio/audio_processor.h"

fl::AudioProcessor audio;

void setup() {
    audio.onMood([](float valence, float arousal) {
        // valence: -1 (negative) to +1 (positive)
        // arousal: -1 (calm) to +1 (energetic)

        // Map mood to color palette
        if (valence > 0 && arousal > 0) {
            // Happy/Excited - bright warm colors
            fill_solid(leds, NUM_LEDS, CRGB::Orange);
        } else if (valence > 0 && arousal < 0) {
            // Content/Relaxed - soft cool colors
            fill_solid(leds, NUM_LEDS, CRGB::CornflowerBlue);
        } else if (valence < 0 && arousal > 0) {
            // Angry/Tense - intense reds
            fill_solid(leds, NUM_LEDS, CRGB::DarkRed);
        } else {
            // Sad/Depressed - cool dark colors
            fill_solid(leds, NUM_LEDS, CRGB::DarkBlue);
        }
    });
}

BuildupDetector & DropDetector - EDM Structure

cpp
#include "fx/audio/audio_processor.h"

fl::AudioProcessor audio;
uint8_t buildupIntensity = 0;

void setup() {
    audio.onBuildup([](float tension) {
        // tension: 0.0 (no buildup) to 1.0 (peak tension)
        buildupIntensity = tension * 255;

        // Increase brightness and color saturation during buildup
        FastLED.setBrightness(buildupIntensity);
        fill_solid(leds, NUM_LEDS, CHSV(0, 255, buildupIntensity));
    });

    audio.onDrop([]() {
        // Drop detected - massive flash!
        fill_solid(leds, NUM_LEDS, CRGB::White);
        FastLED.setBrightness(255);
        FastLED.show();
        delay(50);
    });
}

Multi-Detector Combined Example

cpp
#include "fx/audio/audio_processor.h"

fl::AudioProcessor audio;

void setup() {
    // Combine multiple detectors for rich interactions
    audio.onBeat([]() {
        // Flash on beat
        fill_solid(leds, NUM_LEDS, CRGB::White);
    });

    audio.onVocal([](bool active) {
        if (active) {
            // Vocals present - use warm colors
            FastLED.setTemperature(Tungsten100W);
        } else {
            // No vocals - use cool colors
            FastLED.setTemperature(ClearBlueSky);
        }
    });

    audio.onKick([]() {
        // Kick drum - pulse bass LEDs
        for (int i = 0; i < NUM_LEDS/4; i++) {
            leds[i] = CRGB::Red;
        }
    });

    audio.onChord([](const char* chord) {
        // Chord change - shift hue
        static uint8_t hue = 0;
        hue += 32;
        fill_solid(leds, NUM_LEDS, CHSV(hue, 255, 200));
    });

    audio.onMood([](float valence, float arousal) {
        // Mood influences overall palette
        uint8_t sat = (arousal + 1) * 127;  // More energetic = more saturated
        uint8_t bri = (valence + 1) * 127;   // More positive = brighter
        fadeToBlackBy(leds, NUM_LEDS, 255 - bri);
    });
}

Beat Detector

Files: detectors/beat.h, detectors/beat.cpp

Real-time beat detection using the AudioContext pattern for shared FFT computation.

What is it?

The Beat Detector provides rhythmic pulse detection and tempo tracking using spectral flux analysis of the shared FFT from AudioContext. It detects both individual onsets and maintains a tempo estimate with confidence scoring.

Key Features

  • Spectral flux onset detection with adaptive thresholds
  • Tempo tracking with BPM estimation
  • Beat phase tracking for predicted beat times
  • Adaptive threshold using flux history
  • Temporal filtering to prevent false triggers
  • Shared FFT - Optimal performance through AudioContext

Quick Start

The easiest way to use beat detection is through the AudioProcessor facade:

cpp
#include "fx/audio/audio_processor.h"

fl::AudioProcessor audio;

void setup() {
    // Register callbacks for beat events
    audio.onBeat([]() {
        // Flash on beat
        fill_solid(leds, NUM_LEDS, CRGB::White);
    });

    audio.onTempoChange([](float bpm, float confidence) {
        Serial.print("Tempo: ");
        Serial.print(bpm);
        Serial.print(" BPM");
    });

    audio.onOnset([](float strength) {
        Serial.print("Onset strength: ");
        Serial.println(strength);
    });
}

void loop() {
    while (fl::AudioSample sample = audioInput.next()) {
        audio.update(sample);  // Updates beat detector automatically
    }
    FastLED.show();
}

Example

See examples/BeatDetection/BeatDetection.ino for a complete example with LED visualization and BPM-based color coding.


Sound to MIDI

Files: advanced/sound_to_midi.h, advanced/sound_to_midi.cpp

What is it?

Sound to MIDI converts audio input into MIDI Note On/Off events. It supports both monophonic (single note) and polyphonic (multiple simultaneous notes) pitch detection, with automatic velocity calculation and anti-jitter filtering.

Key Features

  • Monophonic pitch detection using YIN/MPM algorithm
  • Polyphonic pitch detection using FFT-based peak detection
  • RMS-based velocity calculation with configurable gain
  • Anti-jitter filtering (semitone threshold, hold frames, median filter)
  • Onset/offset hysteresis to prevent rapid note toggling
  • Sliding window STFT - integrated ring buffer with configurable overlap (hop_size < frame_size)
  • K-of-M multi-frame onset detection - reduces spurious events by requiring persistence across frames
  • Peak continuity tracking - per-note frequency/magnitude history for improved stability (polyphonic)
  • Auto-tuning - automatically adapts thresholds based on input characteristics (noise floor, event rate, jitter)
  • Callback-based event system for Note On/Off events

How It Works

Monophonic Mode (Single Notes)

Uses a YIN/MPM-like autocorrelation algorithm for pitch detection:

  1. Sample Buffering (if hop_size < frame_size): Incoming samples accumulate in ring buffer
  2. Frame Extraction: Every hop_size samples, extract last frame_size samples with Hann windowing
  3. Pitch Detection: Analyzes windowed frame to find fundamental frequency via autocorrelation
  4. Confidence Check: Only accepts pitch if confidence exceeds threshold (default 0.80, auto-tuned if enabled)
  5. Note Conversion: Converts frequency to MIDI note number
  6. K-of-M Filtering (optional): Require pitch to appear in K of last M frames before onset
  7. Stability Filtering:
    • Semitone threshold - ignore small pitch changes
    • Hold frames - require note to persist before switching
    • Median filter - smooth out noisy detections (auto-adjusted based on jitter)
  8. Onset/Offset Detection:
    • Onset: Note held for note_hold_frames consecutive frames
    • Offset: Silence detected for silence_frames_off consecutive frames
  9. Velocity Calculation: RMS amplitude × gain → MIDI velocity (1-127)
  10. Auto-tuning (optional): Adapts RMS gate, confidence threshold, median filter size based on noise floor and jitter

Polyphonic Mode (Multiple Notes)

Uses FFT-based spectral peak detection:

  1. Sample Buffering (if hop_size < frame_size): Incoming samples accumulate in ring buffer
  2. Frame Extraction: Every hop_size samples, extract last frame_size samples with windowing
  3. FFT Analysis: Computes magnitude spectrum with optional windowing (Hann/Hamming/Blackman)
  4. Spectral Conditioning:
    • Spectral tilt compensation
    • Magnitude smoothing (box, triangular, or adjacent average)
  5. Peak Detection: Finds spectral peaks above threshold (auto-tuned if enabled)
  6. Parabolic Interpolation: Refines peak frequencies to sub-bin accuracy
  7. Harmonic Filtering: Suppresses overtones that are harmonics of fundamentals
  8. Octave Masking: Filter peaks by octave range (bitmask)
  9. K-of-M Filtering (optional): Per-note persistence tracking - require K detections in M frames
  10. Peak Continuity: Track per-note frequency/magnitude history for stable sustained notes
  11. PCP Stabilization (optional): Pitch class profile helps stabilize note detection
  12. Note Tracking: Independent onset/offset for each note (128 MIDI notes)
  13. Auto-tuning (optional): Adapts peak threshold, harmonic filter settings based on noise floor and event rate

Quick Start - Monophonic

cpp
#include "fx/audio/advanced/sound_to_midi.h"

// Create configuration with sliding window and auto-tuning
SoundToMIDI cfg;
cfg.sample_rate_hz = 16000;
cfg.frame_size = 512;
cfg.hop_size = 256;              // 50% overlap for improved onset detection
cfg.confidence_threshold = 0.82f;

// Optional: Enable K-of-M filtering to reduce spurious events
cfg.enable_k_of_m = true;
cfg.k_of_m_onset = 2;            // Require 2 detections in last 3 frames
cfg.k_of_m_window = 3;

// Optional: Enable auto-tuning for adaptive thresholds
cfg.auto_tune_enable = true;     // Automatically adapts to noise floor

// Create monophonic engine
SoundToMIDIMono engine(cfg);

// Set callbacks
engine.onNoteOn = [](uint8_t note, uint8_t velocity) {
    Serial.print("Note ON: ");
    Serial.print(note);
    Serial.print(" vel=");
    Serial.println(velocity);
};

engine.onNoteOff = [](uint8_t note) {
    Serial.print("Note OFF: ");
    Serial.println(note);
};

// In your audio processing loop (can feed any chunk size):
float audioBuffer[512];
// ... fill audioBuffer with audio samples ...
engine.processFrame(audioBuffer, 512);

Quick Start - Polyphonic

cpp
#include "fx/audio/sound_to_midi.h"

// Create configuration with sliding window and advanced features
SoundToMIDI cfg;
cfg.sample_rate_hz = 44100;
cfg.frame_size = 2048;
cfg.hop_size = 512;              // 75% overlap for dense chord analysis
cfg.fmin_hz = 80.0f;             // Lower limit for guitar/piano
cfg.fmax_hz = 2000.0f;

// Polyphonic-specific settings
cfg.window_type = WindowType::Hann;
cfg.harmonic_filter_enable = true;
cfg.parabolic_interp = true;
cfg.pcp_enable = true;           // Pitch class profile for stability

// Optional: Enable K-of-M filtering (per-note persistence)
cfg.enable_k_of_m = true;
cfg.k_of_m_onset = 3;            // Require 3 detections in last 4 frames
cfg.k_of_m_window = 4;

// Optional: Enable auto-tuning for dynamic environments
cfg.auto_tune_enable = true;
cfg.auto_tune_peak_margin_db = 6.0f;  // Adaptive peak threshold

// Create polyphonic engine
SoundToMIDIPoly engine(cfg);

// Optional: Monitor auto-tuning adjustments
engine.setAutoTuneCallback([](const char* param, float old_val, float new_val) {
    Serial.print("Auto-tune: ");
    Serial.print(param);
    Serial.print(" ");
    Serial.print(old_val);
    Serial.print(" → ");
    Serial.println(new_val);
});

// Set callbacks (same as monophonic)
engine.onNoteOn = [](uint8_t note, uint8_t velocity) {
    // Multiple notes can trigger simultaneously
    Serial.print("Note ON: ");
    Serial.println(note);
};

engine.onNoteOff = [](uint8_t note) {
    Serial.print("Note OFF: ");
    Serial.println(note);
};

// Process audio (can feed any chunk size)
engine.processFrame(audioBuffer, 2048);

Configuration Reference

Audio Parameters

ParameterDefaultDescription
sample_rate_hz16000Input sample rate (16000-48000 Hz)
frame_size512Analysis window size (512 for 16kHz, 1024+ for 44kHz)
hop_size512Step size between frames (set < frame_size for overlap, e.g., 256 = 50% overlap)

Pitch Detection Range

ParameterDefaultDescription
fmin_hz40.0Minimum detectable frequency (E1 ≈ 41.2 Hz)
fmax_hz1600.0Maximum detectable frequency (G6 ≈ 1568 Hz)

Detection Thresholds

ParameterDefaultDescription
confidence_threshold0.80Minimum confidence [0-1] to accept pitch
note_hold_frames3Consecutive frames required before Note On
silence_frames_off3Consecutive silent frames before Note Off
rms_gate0.010RMS threshold below which signal is silent

Velocity Calculation

ParameterDefaultDescription
vel_gain5.0Gain multiplier for RMS → velocity
vel_floor10Minimum MIDI velocity (1-127)
velocity_from_peak_magtrueUse peak magnitude for velocity (poly only)

Anti-Jitter (Monophonic)

ParameterDefaultDescription
note_change_semitone_threshold1Semitones required to trigger note change
note_change_hold_frames3Frames new note must persist before switching
median_filter_size1Median filter window (1=off, 3-5 for noisy input, auto-adjusted if auto-tune enabled)

K-of-M Multi-frame Filtering

ParameterDefaultDescription
enable_k_of_mfalseEnable K-of-M onset persistence filtering
k_of_m_onset2Require K detections in last M frames (monophonic) or per-note (polyphonic)
k_of_m_window3Window size M for K-of-M detection

Polyphonic Spectral Processing

ParameterDefaultDescription
window_typeHannWindow function (None, Hann, Hamming, Blackman)
spectral_tilt_db_per_decade0.0Spectral tilt compensation (+3 boosts highs)
smoothing_modeBox3Magnitude smoothing (None, Box3, Tri5, AdjAvg)
peak_threshold_db-40.0Magnitude threshold for peak detection (dB)
parabolic_interptrueSub-bin frequency refinement
harmonic_filter_enabletrueSuppress overtones
harmonic_tolerance_cents35.0Cents tolerance for harmonic detection
octave_mask0xFFBitmask for enabled octaves (bit 0-7)
pcp_enablefalseEnable pitch class profile stabilizer

Auto-Tuning

Auto-tuning automatically adapts detection thresholds based on input characteristics:

ParameterDefaultDescription
auto_tune_enablefalseEnable auto-tuning (master switch)
auto_tune_rms_margin1.8RMS gate = noise_floor × margin (1.5-2.0)
auto_tune_peak_margin_db8.0Peak threshold = noise_floor_db + margin (6-10 dB)
auto_tune_update_rate_hz5.0Update frequency (5-10 Hz recommended)
auto_tune_param_smoothing0.95Smoothing factor for updates (0.9-0.99, higher = smoother)
auto_tune_threshold_step0.02Step size for threshold adjustments
auto_tune_calibration_time_sec1.0Initial calibration period (seconds)
auto_tune_rms_gate_min/max0.005/0.100Limits for RMS gate adaptation
auto_tune_confidence_min/max0.60/0.95Limits for confidence threshold (mono)
auto_tune_peak_db_min/max-60.0/-20.0Limits for peak threshold (poly)
auto_tune_notes_per_sec_min/max1.0/10.0Target event rate range (mono)
auto_tune_peaks_per_frame_min/max1.0/5.0Target peak rate range (poly)

What Auto-Tuning Does:

  • Noise Floor Tracking: Estimates ambient noise during silence (EMA with slow time constant)
  • Adaptive Gates: Adjusts rms_gate (mono) and peak_threshold_db (poly) based on noise floor + margin
  • Jitter Monitoring: Tracks pitch variance and adjusts median_filter_size (mono) for stability
  • Event Rate Control: Adjusts thresholds to maintain target note/peak rates (prevents floods or missed notes)
  • Hold-time Optimization: Learns typical note durations and gaps to set optimal note_hold_frames
  • Calibration: 1-second startup period to establish initial noise floor

Enable auto-tuning for environments with varying noise levels, dynamic range, or when optimal settings are unknown.

Advanced Usage

Sliding Window with Custom Overlap

cpp
SoundToMIDI cfg;
cfg.frame_size = 1024;
cfg.hop_size = 256;     // 75% overlap (4× analysis density)

SoundToMIDIMono engine(cfg);

// Feed audio in arbitrary chunk sizes - engine handles buffering
float chunk[128];
while (audio_available) {
    read_audio(chunk, 128);
    engine.processFrame(chunk, 128);  // Analysis triggered every hop_size samples
}

Overlap Guidelines:

  • 50% overlap (hop = frame/2): Good balance, 2× CPU cost
  • 75% overlap (hop = frame/4): Better onset detection, 4× CPU cost
  • No overlap (hop = frame): Legacy mode, minimal CPU

K-of-M Filtering for Noise Rejection

cpp
cfg.enable_k_of_m = true;
cfg.k_of_m_onset = 2;     // Require 2 detections
cfg.k_of_m_window = 3;    // In last 3 frames

// Monophonic: Reduces spurious onset/offset events
// Polyphonic: Per-note tracking - each MIDI note requires K detections

Recommended Settings:

  • Clean input: K=2, M=3 (minimal filtering)
  • Moderate noise: K=2, M=4 or K=3, M=4
  • High noise: K=3, M=5 (aggressive filtering, slight latency increase)

Auto-Tuning Monitoring

cpp
engine.setAutoTuneCallback([](const char* param_name, float old_val, float new_val) {
    Serial.print("Auto-tune: ");
    Serial.print(param_name);
    Serial.print(" changed from ");
    Serial.print(old_val);
    Serial.print(" to ");
    Serial.println(new_val);
});

const AutoTuneState& state = engine.getAutoTuneState();
Serial.print("Noise floor: ");
Serial.println(state.noise_rms_est);
Serial.print("Confidence EMA: ");
Serial.println(state.confidence_ema);
Serial.print("Event rate: ");
Serial.println(state.event_rate_ema);
Serial.print("Pitch variance: ");
Serial.println(state.pitch_variance_ema);

// Check if still in calibration phase
if (state.in_calibration) {
    Serial.println("Calibrating...");
}

Polyphonic Runtime Adjustments

cpp
SoundToMIDIPoly poly(cfg);

// Adjust sensitivity
poly.setPeakThresholdDb(-35.0f);  // More sensitive

// Filter octave range
poly.setOctaveMask(0x3C);  // Only octaves 2-5 (bits 2,3,4,5)

// Boost high frequencies
poly.setSpectralTilt(3.0f);  // +3 dB/decade

// Change smoothing
poly.setSmoothingMode(SmoothingMode::Tri5);  // Triangular 5-point

Frequency to MIDI Note Conversion

The conversion from frequency to MIDI note number uses the formula:

MIDI note = 69 + 12 × log₂(frequency / 440 Hz)

Where 69 is MIDI note A4 (440 Hz). For example:

  • C4 (Middle C) = 60 (261.63 Hz)
  • A4 = 69 (440 Hz)
  • C5 = 72 (523.25 Hz)

Algorithm Details

Sliding Window STFT (Integrated into Core)

The sliding window provides overlapped analysis for both mono and poly engines:

Ring Buffer Management:

  • Size: frame_size + hop_size floats
  • Accumulates incoming samples regardless of chunk size
  • When accumulated >= hop_size, triggers analysis

Frame Extraction:

  • Extract last frame_size samples from ring buffer
  • Apply Hann window: w[n] = 0.5 × (1 - cos(2π×n/(N-1)))
  • Pass windowed frame to pitch detection

Overlap Benefits:

  • Eliminates edge effects at frame boundaries
  • Improves onset detection recall (catches events missed between frames)
  • Reduces spectral leakage through windowing
  • Enables multi-frame K-of-M filtering

YIN/MPM Pitch Detection (Monophonic)

The YIN algorithm finds the fundamental frequency by:

  1. Autocorrelation: Compute difference function d(τ) for various lags τ
  2. Cumulative Mean Normalization: Normalize d(τ) to get d'(τ)
  3. Threshold Crossing: Find first τ where d'(τ) < threshold
  4. Parabolic Interpolation: Refine τ to sub-sample accuracy
  5. Frequency Calculation: f₀ = sample_rate / τ

Confidence is derived from 1 - d'(τ), where lower difference means higher confidence.

K-of-M Multi-frame Onset Detection

Reduces spurious events by requiring persistence across frames:

Monophonic Mode:

  • Circular buffer tracks last M onset/voiced states (bool array)
  • On each frame, check if ≥K of last M frames are voiced
  • Emit NoteOn only when K-of-M threshold met
  • Same logic for NoteOff (require K silent frames in M)

Polyphonic Mode:

  • Per-note tracking: 128 independent K-of-M counters
  • Each MIDI note tracks on_count and off_count
  • Note emitted when specific note appears in K of M frames
  • Prevents one-frame blips without global suppression

Latency Impact:

  • Worst case: (M - K + 1) × hop_size / sample_rate
  • Example: K=2, M=3, hop=256, fs=16000 → 32ms additional latency
  • Benefit: Dramatically reduced false trigger rate

FFT Peak Detection (Polyphonic)

Polyphonic mode analyzes the frequency spectrum:

  1. Windowing: Apply window function (Hann/Hamming/Blackman) to reduce spectral leakage
  2. FFT: Compute magnitude spectrum
  3. Conditioning: Apply tilt compensation and smoothing
  4. Peak Finding: Detect local maxima above threshold
  5. Interpolation: Parabolic interpolation for sub-bin accuracy
  6. Harmonic Filtering: Remove peaks that are harmonics of lower peaks
  7. Note Tracking: Manage Note On/Off for each detected note independently

Auto-Tuning Adaptive Algorithms

Noise Floor Estimation:

  • RMS amplitude EMA (α=0.05, slow tracking)
  • Spectral magnitude median/EMA in dB
  • Updated only during silence (mono) or low peak count (poly)
  • Time constant: ~0.5 seconds

Adaptive Thresholds:

rms_gate = max(min_rms, noise_rms_est × margin)
peak_threshold_db = noise_mag_db_est + margin_db

Jitter Monitoring (Monophonic):

  • Tracks pitch variance as squared semitone difference
  • High variance (>1.0): increase median_filter_size to 3-5
  • Low variance (<0.25): reduce to 1 for minimal latency

Event Rate Control:

  • Monophonic: Track note-on events per second
  • Polyphonic: Track peaks per frame
  • Adjust thresholds to maintain target ranges
  • Prevents both floods and starvation

Hold-time Optimization:

  • EMA of note durations and inter-note gaps
  • note_hold_frames = 0.75 × avg_duration / hop_period
  • silence_frames_off = 0.5 × avg_gap / hop_period
  • Bounded to reasonable ranges (1-10 frames)

Calibration Phase:

  • 1-second startup period (configurable)
  • Establishes initial noise floor
  • No events emitted during calibration
  • Smooth transition to normal operation

Use Cases

Monophonic Applications

  • Singing/vocal pitch tracking
  • Single-instrument recording (flute, trumpet, etc.)
  • Melody extraction
  • Pitch-to-CV conversion for synthesizers

Polyphonic Applications

  • Guitar/piano chord detection
  • Multi-voice harmony analysis
  • Polyphonic MIDI recording
  • Music transcription

Performance Characteristics

CPU Usage

Monophonic Mode:

  • Base processing: ~1-2ms per frame @ 512 samples (ESP32 @ 240MHz)
  • With 50% overlap: ~2-4ms (2× analysis rate)
  • With 75% overlap: ~4-8ms (4× analysis rate)
  • Auto-tuning overhead: <1% additional CPU
  • K-of-M filtering: negligible overhead

Polyphonic Mode:

  • Base processing: ~5-10ms per frame @ 2048 samples (ESP32-S3 @ 240MHz)
  • With 50% overlap: ~10-20ms (2× analysis rate)
  • With 75% overlap: ~20-40ms (4× analysis rate)
  • Auto-tuning overhead: <1% additional CPU
  • K-of-M per-note tracking: ~0.5ms for 128 notes

Memory Footprint

Sliding Window Buffers (per engine):

  • Ring buffer: frame_size + hop_size floats (e.g., 1024+256 = 5KB @ 75% overlap)
  • Window coefficients: frame_size floats (e.g., 4KB for 1024 samples)
  • Analysis frame: frame_size floats (e.g., 4KB)
  • Total: ~13KB for frame_size=1024, 75% overlap

Auto-tuning State:

  • AutoTuneState struct: ~200 bytes
  • K-of-M history: ~16 bytes (mono) or ~512 bytes (poly, 128 notes × 4 bytes)
  • Peak continuity (poly): ~3KB (128 notes × 24 bytes)

Total Memory (Polyphonic, Full Features):

  • ~20KB for frame_size=1024, 75% overlap, all features enabled
  • Suitable for ESP32/ESP32-S3 (512KB SRAM)

Latency

Algorithmic Latency:

  • Input to first analysis: frame_size / sample_rate (e.g., 1024/16000 = 64ms)
  • Analysis to next analysis: hop_size / sample_rate (e.g., 256/16000 = 16ms with 75% overlap)
  • K-of-M adds: ≤ (M-K+1) × hop_size / sample_rate (e.g., 2 hops = 32ms for K=2, M=3)

Total Typical Latency:

  • Monophonic (no overlap): 64ms
  • Monophonic (75% overlap, K-of-M): 64ms + 32ms = 96ms
  • Polyphonic (75% overlap, K-of-M): 128ms + 48ms = 176ms

Note: Overlap reduces effective latency for onset detection by analyzing more frequently.

Recommendations by Platform

ESP32 / ESP32-S3 (512KB SRAM):

  • ✅ Monophonic: 50-75% overlap, auto-tune, K-of-M
  • ✅ Polyphonic: 50% overlap recommended, 75% possible if frame rate allows
  • ✅ All features supported

ESP32-C3 (400KB SRAM):

  • ✅ Monophonic: All features supported
  • ⚠️ Polyphonic: Use frame_size=1024, 50% overlap max

Memory-constrained platforms (<100KB SRAM):

  • ❌ Not recommended - use simpler beat detector or external processing

Platform Requirements

Both modules require platforms with sufficient memory:

cpp
#if SKETCH_HAS_LOTS_OF_MEMORY
// Beat detector and sound-to-MIDI available
#else
// Not available on memory-constrained platforms
#endif

Tested platforms:

  • ✅ ESP32 (520KB SRAM)
  • ✅ ESP32-S3 (512KB SRAM)
  • ❌ Arduino Uno (2KB SRAM - insufficient)
  • ❌ Arduino Nano (2KB SRAM - insufficient)

Implementation Status

FastLED Audio System v2.0 - ✅ COMPLETE

Status: Production-ready Date Completed: 2025-01-16 Total Components: 20 (3 infrastructure + 17 detectors)

✅ Core Infrastructure (100% Complete)

  • AudioContext - Shared computation state with lazy FFT
  • AudioDetector - Base class interface
  • AudioProcessor - High-level facade with lazy instantiation

✅ All Detectors Implemented (100% Complete)

Tier 1 (4 detectors): BeatDetector, FrequencyBands, EnergyAnalyzer, TempoAnalyzer Tier 2 (6 detectors): TransientDetector, NoteDetector, DownbeatDetector, DynamicsAnalyzer, PitchDetector, SilenceDetector Tier 3 (7 detectors): VocalDetector, PercussionDetector, ChordDetector, KeyDetector, MoodAnalyzer, BuildupDetector, DropDetector

✅ Quality Metrics

  • All 104 C++ unit tests passing
  • Zero test failures
  • Python/C++ linting passed
  • Memory footprint: ~1.5KB for 5-7 active detectors
  • Performance: ~3-8ms per frame (typical)

✅ Unique Differentiators

FastLED provides features not found in competing libraries:

  1. VocalDetector - Human voice detection using spectral analysis
  2. ChordDetector - Real-time chord recognition
  3. KeyDetector - Musical key detection (major/minor/modes)
  4. MoodAnalyzer - Emotional content analysis (circumplex model)
  5. BuildupDetector - EDM buildup tension tracking
  6. DropDetector - EDM drop impact detection
  7. Shared FFT - Optimal performance through smart caching

Sound to MIDI Features

✅ Core Detection (100% Complete)

  • Monophonic YIN/MPM pitch detection
  • Polyphonic FFT spectral peak detection
  • RMS-based velocity calculation
  • Anti-jitter filtering (semitone threshold, hold frames, median filter)
  • Onset/offset hysteresis
  • Callback-based event system

✅ Sliding Window STFT (100% Complete)

  • Internal ring buffer with configurable overlap (hop_size < frame_size)
  • Automatic Hann windowing when overlap enabled
  • Streaming API - feed arbitrary chunk sizes
  • Zero API changes - backward compatible
  • Integrated into core engines (not external wrapper)

✅ K-of-M Multi-frame Onset Detection (100% Complete)

  • Circular buffer tracking last M frames
  • Monophonic: onset/offset persistence filtering
  • Polyphonic: per-note tracking (independent K-of-M for each MIDI note)
  • Configurable via enable_k_of_m, k_of_m_onset, k_of_m_window
  • Disabled by default for backward compatibility

✅ Peak Continuity Tracking (Infrastructure Complete, 80%)

  • Per-note frequency and magnitude history
  • Frames-absent tracking for gap detection
  • Foundation for advanced matching algorithms
  • Future: Cross-frame peak matching with continuity_cents threshold

✅ Auto-Tuning (100% Complete)

  • Noise floor estimation (RMS and spectral magnitude)
  • Adaptive thresholds (RMS gate, confidence, peak_threshold_db)
  • Jitter monitoring with median filter adaptation
  • Event rate control (notes/sec for mono, peaks/frame for poly)
  • Hold-time optimization based on learned note durations
  • Octave statistics and harmonic filter adaptation (poly)
  • Calibration phase on startup
  • API: getAutoTuneState(), setAutoTuneCallback()
  • <1% CPU overhead, <1KB memory footprint

Testing Status

  • ✅ All existing tests pass (backward compatibility verified)
  • ✅ 9 new sliding window tests (4 mono, 5 poly)
  • ✅ K-of-M filtering tests
  • ✅ Auto-tuning tests
  • ✅ Linting passes (C++, Python, JavaScript)
  • ✅ Compilation on ESP32 platforms

Examples

  • examples/BeatDetection/BeatDetection.ino - Real-time beat detection with LED visualization
    • Demonstrates the AudioProcessor facade for beat detection
    • LED strip flashes on beat with BPM-based color coding
    • Shows tempo tracking and onset detection callbacks

License

MIT License (same as FastLED)


References

Sound to MIDI - Core Algorithms

  1. De Cheveigné & Kawahara (2002) - YIN, a fundamental frequency estimator for speech and music
  2. McLeod & Wyvill (2005) - A smarter way to find pitch (MPM)
  3. Smith (2011) - Spectral Audio Signal Processing (parabolic interpolation)

Sound to MIDI - Advanced Features

See implementation documentation in project root:

  • AUTO_TUNE_IMPLEMENTATION.md - Auto-tuning extension design and implementation
  • TASK.md - Sliding window STFT design and multi-frame logic
  • TASK2.md - Core integration plan for sliding window (completed)

These documents provide detailed algorithms, design decisions, and testing strategies for the advanced features.


Contributing

See project root CLAUDE.md for coding standards and guidelines.