src/platforms/README_SPI_ADVANCED.md
FastLED's Advanced SPI system enables automatic parallel LED strip control by intelligently routing multiple SPI-based LED strips through hardware acceleration. The system provides:
Instead of manually configuring parallel SPI, FastLED automatically detects when multiple LED strips share the same clock pin and promotes them to multi-lane hardware SPI:
// This code automatically uses Quad-SPI (4 parallel lanes):
FastLED.addLeds<APA102, 23, 18>(leds1, 100); // Same clock pin (18)
FastLED.addLeds<APA102, 19, 18>(leds2, 100); // Same clock pin (18)
FastLED.addLeds<APA102, 22, 18>(leds3, 100); // Same clock pin (18)
FastLED.addLeds<APA102, 21, 18>(leds4, 100); // Same clock pin (18)
// ↑ FastLED detects 4 strips on clock pin 18 → enables Quad-SPI
Result: All 4 strips transmit simultaneously in ~0.16ms instead of sequentially in ~8ms (50× faster).
The Advanced SPI system uses a 4-layer architecture that separates routing, aggregation, interleaving, and hardware:
┌─────────────────────────────────────────────────────────────────────┐
│ FastLED User API │
│ FastLED.addLeds<APA102, PIN, CLK>() × N strips │
└──────────────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────────────▼──────────────────────────────────────┐
│ SPIDeviceProxy │
│ • Template-based per-controller proxy │
│ • Mirrors ESP32SPIOutput interface │
│ • Buffers writes for multi-lane SPI │
│ • Routes to appropriate backend │
└──────────────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────────────▼──────────────────────────────────────┐
│ SPI Bus Manager (Singleton) │
│ • Groups devices by shared clock pin │
│ • Detects conflicts and promotes to S1/S2/S4 │
│ • Manages bus lifecycle (reference counting) │
│ • Routes transmissions to correct lane │
└──────────────────────────────┬──────────────────────────────────────┘
│
┌──────────────┼──────────────┐
│ │ │
┌───────────────▼──────┐ ┌────▼────────┐ ┌─▼──────────────────────┐
│ Single-SPI (S1) │ │ Dual (S2) │ │ Quad-SPI (S4) │
│ ESP32SPIOutput │ │ SPIDual + │ │ SPIQuad + │
│ (direct HW) │ │ Transposer │ │ Transposer │
└──────────────────────┘ └─────────────┘ └────────────────────────┘
│ │ │
└───────────────────────┴────────────────────┘
│
┌──────────────────────────────▼──────────────────────────────────────┐
│ Hardware SPI Peripheral │
│ ESP32 SPI2/SPI3 with DMA (HSPI/VSPI buses) │
└─────────────────────────────────────────────────────────────────────┘
addLeds<>() creates LED controllersSPIDeviceProxy<> wraps each controller, routes SPI callsSPIBusManager (singleton) groups by clock pin, manages aggregationWhen you call FastLED.addLeds<APA102, DATA_PIN, CLOCK_PIN>():
1. LED Controller created (e.g., APA102Controller)
↓
2. Controller creates SPIDeviceProxy<DATA_PIN, CLOCK_PIN, SPEED>
↓
3. SPIDeviceProxy::init() called
↓
4. Proxy registers with Bus Manager:
mBusManager = &getSPIBusManager(); // Get singleton
mHandle = mBusManager->registerDevice(CLOCK_PIN, DATA_PIN, this);
↓
5. Bus Manager groups device by CLOCK_PIN
↓
6. Bus Manager assigns:
- bus_id (which hardware SPI bus)
- lane_id (which lane on that bus: 0-3)
↓
7. Returns SPIBusHandle { bus_id, lane_id, is_valid }
Example: 4 strips on clock pin 18
Device 1: registerDevice(CLK=18, DATA=23) → Handle { bus_id=0, lane_id=0 }
Device 2: registerDevice(CLK=18, DATA=19) → Handle { bus_id=0, lane_id=1 }
Device 3: registerDevice(CLK=18, DATA=22) → Handle { bus_id=0, lane_id=2 }
Device 4: registerDevice(CLK=18, DATA=21) → Handle { bus_id=0, lane_id=3 }
Bus Manager creates:
Bus 0: Type=QUAD_SPI, Clock=18, Lanes=[23, 19, 22, 21]
On first FastLED.show(), bus manager initializes hardware:
1. SPIBusManager::initialize() called
↓
2. For each bus with devices:
↓
3. Determine bus type based on device count:
- 1 device → SINGLE_SPI (no aggregation)
- 2 devices → DUAL_SPI (if supported)
- 3-4 devices → QUAD_SPI (if supported)
↓
4. Allocate backend hardware:
- SINGLE_SPI: Each device creates own ESP32SPIOutput
- DUAL_SPI: Create SPIDual controller, configure 2 lanes
- QUAD_SPI: Create SPIQuad controller, configure 4 lanes
↓
5. Store hardware controller in bus info:
mBuses[bus_id].quad_controller = quad; // or dual_controller
↓
6. Mark bus as initialized
When you call FastLED.show():
For each LED controller:
1. Controller calls showPixels() on its chipset
↓
2. Chipset writes to SPIDeviceProxy:
proxy.select(); // Begin transaction
proxy.writeByte(0xFF); // Write start frame
proxy.writeByte(pixel_data); // Write LED data
proxy.writeWord(end_frame); // Write end frame
proxy.release(); // End transaction
proxy.finalizeTransmission(); // Flush buffered data
↓
3. SPIDeviceProxy routes based on backend:
If SINGLE_SPI (mSingleSPI != nullptr):
→ Direct passthrough to ESP32SPIOutput
→ Immediate hardware transmission
If DUAL/QUAD_SPI (mSingleSPI == nullptr):
→ Buffer writes in mWriteBuffer (fl::vector<uint8_t>)
→ On finalizeTransmission():
↓
4. SPIDeviceProxy calls Bus Manager:
mBusManager->transmit(mHandle, mWriteBuffer.data(), mWriteBuffer.size());
↓
5. Bus Manager stores per-lane data:
mBuses[bus_id].lanes[lane_id].buffer = data;
mBuses[bus_id].lanes[lane_id].size = length;
↓
6. Bus Manager calls finalizeTransmission(mHandle):
↓
7. Bus Manager checks if all lanes ready:
- Increments ready_count
- If ready_count == total_lanes → transmit!
↓
8. Bus Manager calls transposer:
SPITransposer::transpose4(lanes, max_size, output_buffer)
→ Bit-interleaves all lane data into single buffer
↓
9. Bus Manager calls hardware:
quad->transmit(output_buffer);
quad->waitComplete();
↓
10. Hardware SPI peripheral transmits via DMA (0% CPU)
→ All lanes transmit simultaneously
When a device is destroyed (e.g., program exit):
1. SPIDeviceProxy::~SPIDeviceProxy() called
↓
2. Proxy unregisters from Bus Manager:
mBusManager->unregisterDevice(mHandle);
↓
3. Bus Manager removes device from bus:
- Decrements reference count for that bus
↓
4. If last device on bus (refcount == 0):
↓
5. Bus Manager releases hardware:
releaseBusHardware(bus_id);
↓
6. Delete hardware controller:
delete mBuses[bus_id].quad_controller; // or dual_controller
mBuses[bus_id].quad_controller = nullptr;
↓
7. Mark bus as uninitialized
Reference Counting Example:
4 devices on Bus 0 (Quad-SPI):
Device 1 destroyed → refcount = 3 (keep hardware)
Device 2 destroyed → refcount = 2 (keep hardware)
Device 3 destroyed → refcount = 1 (keep hardware)
Device 4 destroyed → refcount = 0 → releaseBusHardware() called
→ SPIQuad controller deleted
Location: src/platforms/shared/spi_manager.h
The Bus Manager is a singleton that acts as the central routing and lifecycle controller for all SPI devices.
class SPIBusManager {
public:
// Register a device (called by SPIDeviceProxy::init())
SPIBusHandle registerDevice(uint8_t clock_pin, uint8_t data_pin, void* controller);
// Unregister a device (called by SPIDeviceProxy destructor)
void unregisterDevice(SPIBusHandle handle);
// Initialize all buses (called on first FastLED.show())
bool initialize();
// Transmit data for a specific device/lane
void transmit(SPIBusHandle handle, const uint8_t* data, size_t length);
// Finalize transmission (triggers actual hardware transmission when all lanes ready)
void finalizeTransmission(SPIBusHandle handle);
// Query bus information
const SPIBusInfo* getBusInfo(uint8_t bus_id) const;
bool isDeviceEnabled(SPIBusHandle handle) const;
// Cleanup (called by destructor)
void reset();
private:
struct SPIBusInfo {
SPIBusType bus_type; // SINGLE_SPI, DUAL_SPI, QUAD_SPI
uint8_t clock_pin;
uint8_t data_pins[4]; // Up to 4 data lanes
uint8_t num_lanes;
uint8_t reference_count; // Number of active devices
bool initialized;
// Backend hardware (only one is active)
SPIQuad* quad_controller; // For QUAD_SPI
SPIDual* dual_controller; // For DUAL_SPI (future)
// Per-lane buffering
struct LaneBuffer {
const uint8_t* buffer;
size_t size;
} lanes[4];
uint8_t ready_count; // How many lanes have data ready
};
fl::vector<SPIBusInfo> mBuses; // All registered buses
bool mInitialized;
};
// Global singleton accessor
SPIBusManager& getSPIBusManager();
All production code uses the global singleton:
// In SPIDeviceProxy::init()
mBusManager = &getSPIBusManager(); // Get global instance
This ensures all devices across different LED controllers share the same bus manager, enabling:
Implementation:
// Singleton accessor (thread-safe in C++11)
inline SPIBusManager& getSPIBusManager() {
static SPIBusManager instance; // Created once, lives forever
return instance;
}
Note: Unit tests create local instances for isolation, avoiding global state pollution during testing.
enum class SPIBusType {
SOFT_SPI, // Software bit-bang (fallback)
SINGLE_SPI, // 1 device, hardware SPI (no aggregation)
DUAL_SPI, // 2 devices, hardware dual-lane SPI
QUAD_SPI // 3-4 devices, hardware quad-lane SPI
};
Promotion Logic:
Device count on shared clock pin → Bus type
1 device → SINGLE_SPI (ESP32SPIOutput)
2 devices → DUAL_SPI (SPIDual, if supported)
3 devices → QUAD_SPI (SPIQuad, uses 3 of 4 lanes)
4 devices → QUAD_SPI (SPIQuad, all 4 lanes)
5+ devices → ERROR (disable conflicting devices, warn user)
Standard SPI (1 data line):
Clock: ──┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┐ ┌──
└─┘ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘
MOSI: ──1───0───1───1───0───0───1───0── (8 clocks = 1 byte)
Dual-SPI (2 data lines):
Clock: ──┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌──
└─┘ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘
D0: ──1───0───1───1───0───0───1───0── Strip 1 data
D1: ──0───1───1───0───0───1───1───0── Strip 2 data
(8 clocks = 2 bytes, one per strip)
Quad-SPI (4 data lines):
Clock: ──┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌──
└─┘ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘
D0: ──1───1───0───0───1───1───0───0── Strip 1 data
D1: ──0───0───1───1───0───0───1───1── Strip 2 data
D2: ──1───1───1───1───0───0───0───0── Strip 3 data
D3: ──0───1───1───0───0───1───1───0── Strip 4 data
(8 clocks = 4 bytes, one per strip)
All LED strips share a single clock line but have separate data lines:
Quad-SPI Wiring:
ESP32
CLK (GPIO 18) ──┬────────> Strip 1 Clock
├────────> Strip 2 Clock
├────────> Strip 3 Clock
└────────> Strip 4 Clock
D0 (GPIO 23) ──────────> Strip 1 Data (MOSI/D0)
D1 (GPIO 19) ──────────> Strip 2 Data (MISO/D1)
D2 (GPIO 22) ──────────> Strip 3 Data (WP/D2)
D3 (GPIO 21) ──────────> Strip 4 Data (HD/D3)
Dual-SPI Wiring:
ESP32
CLK (GPIO 18) ──┬────────> Strip 1 Clock
└────────> Strip 2 Clock
D0 (GPIO 23) ──────────> Strip 1 Data (MOSI/D0)
D1 (GPIO 19) ──────────> Strip 2 Data (MISO/D1)
Location: src/platforms/shared/spi_transposer.h (Unified)
Each LED strip has its own independent data stream:
Lane 0: [0xAB, 0xCD, 0xEF, ...] → Strip 1
Lane 1: [0x12, 0x34, 0x56, ...] → Strip 2
Lane 2: [0x78, 0x9A, 0xBC, ...] → Strip 3
Lane 3: [0xDE, 0xF0, 0x11, ...] → Strip 4
But Quad-SPI hardware sends 4 bits per clock cycle (1 bit per data line):
Clock cycle 1: D3=bit7_lane3, D2=bit7_lane2, D1=bit7_lane1, D0=bit7_lane0
Clock cycle 2: D3=bit6_lane3, D2=bit6_lane2, D1=bit6_lane1, D0=bit6_lane0
...
Clock cycle 8: D3=bit0_lane3, D2=bit0_lane2, D1=bit0_lane1, D0=bit0_lane0
The transposer interleaves bits from all lanes into output bytes.
Input (4 lanes, 1 byte each):
Lane 0: 0xAB = 10101011
Lane 1: 0x12 = 00010010
Lane 2: 0xEF = 11101111
Lane 3: 0x78 = 01111000
Output (4 interleaved bytes):
Each output byte contains 2 bits from each lane (MSB first):
Byte format: [D3b7 D2b7 D1b7 D0b7 D3b6 D2b6 D1b6 D0b6]
Out[0] = 0b01110010 (bits 7:6 from each lane)
= L3[7:6]=01, L2[7:6]=11, L1[7:6]=00, L0[7:6]=10
Out[1] = 0b11110110 (bits 5:4 from each lane)
= L3[5:4]=11, L2[5:4]=11, L1[5:4]=01, L0[5:4]=10
Out[2] = 0b10010011 (bits 3:2 from each lane)
= L3[3:2]=10, L2[3:2]=01, L1[3:2]=00, L0[3:2]=11
Out[3] = 0b00111011 (bits 1:0 from each lane)
= L3[1:0]=00, L2[1:0]=11, L1[1:0]=10, L0[1:0]=11
Input (2 lanes, 1 byte each):
Lane 0: 0xAB = 10101011 (hi=0xA, lo=0xB)
Lane 1: 0x12 = 00010010 (hi=0x1, lo=0x2)
Output (2 interleaved bytes):
Each output byte contains 1 nibble from each lane:
Out[0] = 0x1A (hi nibbles: Lane1=0x1, Lane0=0xA)
Out[1] = 0x2B (lo nibbles: Lane1=0x2, Lane0=0xB)
Quad-SPI (Direct 2-bit extraction):
void interleave_byte_optimized(uint8_t* dest,
uint8_t a, uint8_t b,
uint8_t c, uint8_t d) {
// Extract 2-bit chunks directly (faster than bit-by-bit)
dest[0] = ((d & 0xC0) >> 2) | ((c & 0xC0) >> 4) |
((b & 0xC0) >> 6) | ((a & 0xC0) >> 8);
dest[1] = ((d & 0x30) << 2) | ((c & 0x30)) |
((b & 0x30) >> 2) | ((a & 0x30) >> 4);
dest[2] = ((d & 0x0C) << 4) | ((c & 0x0C) << 2) |
((b & 0x0C)) | ((a & 0x0C) >> 2);
dest[3] = ((d & 0x03) << 6) | ((c & 0x03) << 4) |
((b & 0x03) << 2) | ((a & 0x03));
}
Dual-SPI (Nibble extraction):
void interleave_byte_optimized(uint8_t* dest, uint8_t a, uint8_t b) {
// Each output byte = 4 bits from each lane
dest[0] = ((a >> 4) & 0x0F) | (((b >> 4) & 0x0F) << 4); // Hi nibbles
dest[1] = (a & 0x0F) | ((b & 0x0F) << 4); // Lo nibbles
}
| Operation | Quad-SPI (4 lanes) | Dual-SPI (2 lanes) |
|---|---|---|
| Interleave 100 LEDs | ~50-100µs | ~25-50µs |
| Interleave 500 LEDs | ~250µs | ~125µs |
Note: Interleaving is CPU-bound but runs once per frame. Transmission is DMA-driven (0% CPU).
LED strips often have different lengths:
Strip 1: 60 LEDs → 240 bytes
Strip 2: 100 LEDs → 400 bytes
Strip 3: 80 LEDs → 320 bytes
Strip 4: 120 LEDs → 480 bytes (LONGEST)
If we transmit without padding:
Time →
Strip 1: ████████████░░░░░░░░░░░░░░ (finishes early, latches early)
Strip 2: ████████████████████████░░ (finishes late)
Strip 3: ██████████████████░░░░░░░░ (finishes early)
Strip 4: ████████████████████████░░ (finishes last)
Problem: Strips latch at different times → visual tearing
Shorter strips are padded at the beginning with invisible black LED frames:
Strip 1: ░░░░░░░░░░░░████████████ (60 real + padding to 120)
Strip 2: ░░░░░░░░████████████████ (100 real + padding to 120)
Strip 3: ░░░░████████████████████ (80 real + padding to 120)
Strip 4: ████████████████████████ (120 real, no padding)
All strips finish simultaneously → synchronized latch ✓
Each chipset has its own black LED format:
APA102/SK9822 (4 bytes per LED):
{0xE0, 0x00, 0x00, 0x00} // Brightness=0, RGB=0
LPD8806 (3 bytes per LED, 7-bit GRB + MSB=1):
{0x80, 0x80, 0x80} // G=0, R=0, B=0 (MSB=1 required)
WS2801 (3 bytes per LED, RGB):
{0x00, 0x00, 0x00} // R=0, G=0, B=0
P9813 (4 bytes per LED, flag + BGR):
{0xFF, 0x00, 0x00, 0x00} // Flag byte + B=0, G=0, R=0
The transposer repeats the padding frame to fill shorter lanes:
static uint8_t getLaneByte(const LaneData& lane,
size_t byte_idx,
size_t max_size) {
size_t lane_size = lane.payload.size();
if (byte_idx >= max_size) {
return 0x00; // Out of bounds
}
// Calculate padding needed
size_t padding_bytes = max_size - lane_size;
if (byte_idx < padding_bytes) {
// In padding region - repeat padding frame
size_t frame_size = lane.padding_frame.size();
return lane.padding_frame[byte_idx % frame_size];
} else {
// In data region
return lane.payload[byte_idx - padding_bytes];
}
}
Understanding ESP32 SPI peripheral allocation is critical for LED control implementation:
| Platform | SPI0 | SPI1 | SPI2 | SPI3 | Available for LEDs | FastLED Uses |
|---|---|---|---|---|---|---|
| ESP32 (classic) | Flash cache | Flash | ✅ General | ✅ General | SPI2 + SPI3 (2 hosts) | ✅ SPI Engine |
| ESP32-S2 | Flash cache | Flash | ✅ General | ✅ General | SPI2 + SPI3 (2 hosts) | ✅ SPI Engine |
| ESP32-S3 | Flash cache | Flash | ✅ General | ✅ General | SPI2 + SPI3 (2 hosts) | ✅ SPI Engine |
| ESP32-C3 | Flash cache | Flash | ✅ General | ❌ N/A | SPI2 only (1 host) | ⚠️ SPI Engine (limited) |
| ESP32-C6 | Flash cache | Flash | ✅ General | ❌ N/A | SPI2 only (1 host) | ❌ RMT5 (SPI not used) |
| ESP32-H2 | Flash cache | Flash | ✅ General | ❌ N/A | SPI2 only (1 host) | ❌ RMT5 (SPI not used) |
| ESP32-P4 | Flash cache | Flash | ✅ General | ✅ General | SPI2 + SPI3 (2 hosts) | ✅ SPI Engine + Octal |
Key Points:
spi_bus_initialize() documentation explicitly states "SPI0/1 is not supported" for general use| Platform | Dual-SPI | Quad-SPI | Buses | Notes |
|---|---|---|---|---|
| ESP32 | ✅ | ✅ | 2 (HSPI, VSPI) | Full support |
| ESP32-S2 | ✅ | ✅ | 2 | Full support |
| ESP32-S3 | ✅ | ✅ | 2 | Full support |
| ESP32-C3 | ✅ | ❌ | 1 (SPI2) | Dual-SPI only (2 lanes max) |
| ESP32-C2 | ✅ | ❌ | 1 (SPI2) | Dual-SPI only (2 lanes max) |
| ESP32-C6 | ⚠️ | ❌ | 1 (SPI2) | SPI2 available but not used - RMT5 preferred (better performance, preserves SPI2 for users) |
| ESP32-H2 | ⚠️ | ❌ | 1 (SPI2) | SPI2 available but not used - RMT5 preferred |
| ESP32-P4 | ✅ | ⚠️ | 2 | Supports Octal-SPI (8 lanes, future) |
| Teensy 4.0/4.1 | ✅ | ⚠️ | 3 (SPI, SPI1, SPI2) | LPSPI supports dual/quad via WIDTH register; Quad mode requires data2/data3 pins (PCS2/PCS3) not exposed on standard boards. See LP_SPI.md for implementation details |
| Testing | ✅ | ✅ | N/A | Mock drivers for unit tests |
SPIQuad interface (or SPIDual) for your platformSPIQuad::createInstances() with strong definitionquad_spi_platform.hExample:
// src/platforms/rp2040/spi_quad_rp2040.cpp
#ifdef ARDUINO_ARCH_RP2040
class SPIQuadRP2040 : public SPIQuad {
// Implement interface using RP2040 PIO
};
fl::vector<SPIQuad*> SPIQuad::createInstances() {
static SPIQuadRP2040 controller0(0, "PIO0");
return {&controller0};
}
#endif
Hardware: ESP32 @ 240 MHz, 40 MHz SPI clock
Sequential Software SPI (baseline):
Strip 1: 2.0 ms
Strip 2: 2.0 ms Total: 8.0 ms
Strip 3: 2.0 ms (4 strips × 2.0 ms each)
Strip 4: 2.0 ms
CPU usage: 100% during transmission
Quad-SPI Hardware DMA:
Transpose: 0.08 ms (CPU, one-time)
Transmit: 0.08 ms (DMA, zero CPU)
Total: 0.16 ms
Speedup: 8.0 / 0.16 = 50× faster
Effective: ~27× with frame overhead
CPU usage: 0% during transmission
Sequential Software SPI (baseline):
Strip 1: 2.0 ms
Strip 2: 2.0 ms Total: 4.0 ms
CPU usage: 100% during transmission
Dual-SPI Hardware DMA:
Transpose: 0.04 ms (CPU, one-time)
Transmit: 0.16 ms (DMA, zero CPU)
Total: 0.20 ms
Speedup: 4.0 / 0.20 = 20× faster
CPU usage: 0% during transmission
| Operation | Quad-SPI (4×100 LEDs) | Dual-SPI (2×100 LEDs) |
|---|---|---|
| Bit-interleaving | 50-100µs (CPU) | 25-50µs (CPU) |
| DMA transmission | ~80µs (0% CPU) | ~160µs (0% CPU) |
| Total | ~160µs | ~200µs |
| Speedup | 50× | 20× |
| LEDs per Strip | Transpose | Transmit @ 40MHz | Total |
|---|---|---|---|
| 50 | 25µs | 40µs | 65µs |
| 100 | 50µs | 80µs | 130µs |
| 200 | 100µs | 160µs | 260µs |
| 500 | 250µs | 400µs | 650µs |
| LEDs per Strip | Transpose | Transmit @ 40MHz | Total |
|---|---|---|---|
| 50 | 12µs | 80µs | 92µs |
| 100 | 25µs | 160µs | 185µs |
| 200 | 50µs | 320µs | 370µs |
| 500 | 125µs | 800µs | 925µs |
Note: Transmission is DMA-driven (0% CPU), so CPU can do other work during transmission.
| Clock Speed | Transmission Time (4×100 LEDs) | Notes |
|---|---|---|
| 10 MHz | 320µs | Safe for long wires |
| 20 MHz | 160µs | Default, good balance |
| 40 MHz | 80µs | Fast, short wires only |
| 80 MHz | 40µs | Very fast, signal integrity issues |
Just add multiple strips with the same clock pin - FastLED automatically detects and enables multi-lane SPI:
#include <FastLED.h>
#define CLOCK_PIN 18
#define NUM_LEDS 100
CRGB leds_strip1[NUM_LEDS];
CRGB leds_strip2[NUM_LEDS];
CRGB leds_strip3[NUM_LEDS];
CRGB leds_strip4[NUM_LEDS];
void setup() {
// Add 4 strips sharing clock pin 18
// FastLED auto-detects and enables Quad-SPI (4 parallel lanes)
FastLED.addLeds<APA102, 23, CLOCK_PIN>(leds_strip1, NUM_LEDS);
FastLED.addLeds<APA102, 19, CLOCK_PIN>(leds_strip2, NUM_LEDS);
FastLED.addLeds<APA102, 22, CLOCK_PIN>(leds_strip3, NUM_LEDS);
FastLED.addLeds<APA102, 21, CLOCK_PIN>(leds_strip4, NUM_LEDS);
}
void loop() {
// Set colors independently
fill_rainbow(leds_strip1, NUM_LEDS, 0, 7);
fill_rainbow(leds_strip2, NUM_LEDS, 64, 7);
fill_rainbow(leds_strip3, NUM_LEDS, 128, 7);
fill_rainbow(leds_strip4, NUM_LEDS, 192, 7);
// All 4 strips transmit in parallel (hardware DMA)
FastLED.show();
}
#include <FastLED.h>
#define CLOCK_PIN 18
#define NUM_LEDS 100
CRGB leds_strip1[NUM_LEDS];
CRGB leds_strip2[NUM_LEDS];
void setup() {
// Add 2 strips sharing clock pin 18
// FastLED auto-detects and enables Dual-SPI (2 parallel lanes)
FastLED.addLeds<APA102, 23, CLOCK_PIN>(leds_strip1, NUM_LEDS);
FastLED.addLeds<APA102, 19, CLOCK_PIN>(leds_strip2, NUM_LEDS);
}
void loop() {
// Set colors independently
fill_rainbow(leds_strip1, NUM_LEDS, 0, 7);
fill_rainbow(leds_strip2, NUM_LEDS, 128, 7);
// Both strips transmit in parallel (hardware DMA)
FastLED.show();
}
For low-level control:
#include "platforms/shared/spi_hw_4.h"
#include "platforms/shared/spi_transposer.h"
void setup() {
// Get available Quad-SPI controllers
const auto& controllers = fl::SpiHw4::getAll();
if (controllers.empty()) {
Serial.println("No Quad-SPI hardware available");
return;
}
fl::SpiHw4* quad = controllers[0];
// Configure hardware
fl::SpiHw4::Config config;
config.bus_num = 2; // HSPI
config.clock_speed_hz = 40000000; // 40 MHz
config.clock_pin = 18;
config.data0_pin = 23;
config.data1_pin = 19;
config.data2_pin = 22;
config.data3_pin = 21;
if (!quad->begin(config)) {
Serial.println("Failed to initialize Quad-SPI");
return;
}
// Prepare lane data
fl::vector<uint8_t> lane0_data = {0xFF, 0x00, 0x00}; // Red LED
fl::vector<uint8_t> lane1_data = {0xFF, 0xFF, 0x00}; // Green LED
fl::optional<fl::SPITransposer::LaneData> lanes[4];
lanes[0] = fl::SPITransposer::LaneData{
fl::span<const uint8_t>(lane0_data.data(), lane0_data.size()),
fl::span<const uint8_t>() // No padding
};
lanes[1] = fl::SPITransposer::LaneData{
fl::span<const uint8_t>(lane1_data.data(), lane1_data.size()),
fl::span<const uint8_t>()
};
// Transpose and transmit
size_t max_size = fl::fl_max(lane0_data.size(), lane1_data.size());
fl::vector<uint8_t> output(max_size * 4);
const char* error = nullptr;
if (fl::SPITransposer::transpose4(lanes[0], lanes[1], lanes[2], lanes[3],
fl::span<uint8_t>(output), &error)) {
quad->transmit(fl::span<const uint8_t>(output));
quad->waitComplete();
} else {
Serial.printf("Transpose failed: %s\n", error);
}
}
Check if multi-lane SPI is available:
#include "platforms/quad_spi_platform.h"
#if FASTLED_HAS_QUAD_SPI
// Compile-time check (API availability)
Serial.println("Quad-SPI API is available");
#endif
// Runtime check (actual hardware)
const auto& controllers = fl::SpiHw4::getAll();
if (!controllers.empty()) {
Serial.printf("Found %d Quad-SPI controllers:\n", controllers.size());
for (const auto& ctrl : controllers) {
Serial.printf(" - %s (bus %d)\n", ctrl->getName(), ctrl->getBusId());
}
}
Caller Responsibility:
max_size * 4 bytes for Quad, * 2 for Dual)waitComplete())transpose() functionPlatform Responsibility:
end() or destructorSPITransposer:
transpose2(), transpose4(), etc. simultaneouslySpiHw2/SpiHw4 Implementations:
transmit() callswaitComplete() must be called before next transmissionSPIBusManager:
Transpose Errors:
const char* error = nullptr;
if (!transpose(lanes, max_size, output, &error)) {
// error points to static string (no need to free)
Serial.printf("Error: %s\n", error);
}
Common errors:
"Output buffer too small" - Need max_size * 4 bytes (Quad) or * 2 (Dual)"Invalid max_size (zero)" - Must specify non-zero size"All lanes are empty" - At least one lane needs dataHardware Errors:
if (!quad->begin(config)) {
// Check config parameters
// - Invalid bus_num (must be 2 or 3)
// - Invalid pins
// - Bus already in use
}
if (!quad->transmit(buffer)) {
// DMA queue full or transmission error
quad->waitComplete(); // Clear pending
quad->transmit(buffer); // Retry
}
ESP32:
Max Transfer Size:
Config::max_transfer_szPlatforms override factory via weak linkage:
Default (no hardware):
// src/platforms/shared/spi_hw_4.cpp
FL_LINK_WEAK
fl::vector<SpiHw4*> SpiHw4::createInstances() {
return {}; // Empty vector
}
ESP32 override (strong definition):
// src/platforms/esp/32/spi_hw_4_esp32.cpp
fl::vector<SpiHw4*> SpiHw4::createInstances() {
static SpiHw4ESP32 controller2(2, "HSPI");
static SpiHw4ESP32 controller3(3, "VSPI");
return {&controller2, &controller3};
}
Linker picks strong definition when ESP32 code is linked.
Quad-SPI Mock: src/platforms/stub/spi_4_stub.h
Dual-SPI Mock: src/platforms/stub/spi_2_stub.h
Test-only implementation that captures transmissions:
#ifdef FASTLED_TESTING
auto controllers = fl::SpiHw4::getAll();
fl::SpiHw4Stub* stub = fl::toStub(controllers[0]);
// Perform transmission
stub->transmit(data);
// Inspect captured data
const auto& transmitted = stub->getLastTransmission();
REQUIRE(transmitted.size() == expected_size);
// De-interleave to verify per-lane data
auto lanes = stub->extractLanes(4, bytes_per_lane);
REQUIRE(lanes[0][0] == 0xAB);
REQUIRE(lanes[1][0] == 0x12);
#endif
Unit Tests:
tests/test_quad_spi.cpp - Quad-SPI transposer and hardware interfacetests/test_spi_bus_manager.cpp - Bus manager lifecycle and routingtests/test_dual_spi.cpp - Dual-SPI (future)Test Categories:
Bit-interleaving correctness:
Bus Manager:
Hardware interface:
Integration:
# All tests
uv run test.py
# Specific test
uv run test.py quad_spi
uv run test.py spi_bus_manager
# With QEMU (ESP32 hardware emulation)
uv run test.py --qemu esp32s3
Enable verbose logging:
#define FASTLED_DEBUG 1
#include <FastLED.h>
// Prints diagnostic info about multi-lane SPI initialization
Inspect transmitted data:
// In testing environment
fl::SpiHw4Stub* stub = fl::toStub(quad);
const auto& data = stub->getLastTransmission();
for (size_t i = 0; i < data.size(); i += 4) {
printf("Interleaved[%zu]: %02X %02X %02X %02X\n",
i/4, data[i], data[i+1], data[i+2], data[i+3]);
}
Verify de-interleaving:
auto extracted = stub->extractLanes(4, bytes_per_lane);
for (uint8_t lane = 0; lane < 4; ++lane) {
printf("Lane %d: ", lane);
for (auto byte : extracted[lane]) {
printf("%02X ", byte);
}
printf("\n");
}
FastLED's Advanced SPI system provides intelligent, automatic parallel LED control through:
Key Advantages:
For Users: Just add strips with shared clock pins - FastLED handles everything automatically!
For Developers: Extensible architecture supports new platforms, chipsets, and higher lane counts (Octal-SPI).
┌─────────────────────────────────────────────────────────────────────┐
│ User Code │
│ FastLED.addLeds<APA102, PIN, CLK>() × N │
└──────────────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────────────▼──────────────────────────────────────┐
│ LED Controller (APA102Controller) │
│ • Manages LED buffer (CRGB array) │
│ • Calls chipset showPixels() │
└──────────────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────────────▼──────────────────────────────────────┐
│ SPIDeviceProxy<DATA, CLOCK, SPEED> │
│ • Template-based per-controller proxy │
│ • init() → registers with Bus Manager │
│ • writeByte/writeWord() → buffers or passthrough │
│ • finalizeTransmission() → triggers bus manager │
└──────────────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────────────▼──────────────────────────────────────┐
│ SPIBusManager (Singleton) │
│ • registerDevice(clock, data) → assigns bus_id + lane_id │
│ • Groups devices by shared clock pin │
│ • Determines bus type: SINGLE / DUAL / QUAD │
│ • transmit(handle, data) → stores per-lane buffers │
│ • finalizeTransmission() → triggers interleave + hardware │
│ • Reference counting → cleanup on last unregister │
└──────────────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────┼──────────────────────┐
│ │ │
┌───────▼────────┐ ┌──────────▼─────────┐ ┌────────▼───────────────┐
│ SINGLE_SPI │ │ DUAL_SPI │ │ QUAD_SPI │
│ (1 device) │ │ (2 devices) │ │ (3-4 devices) │
├────────────────┤ ├────────────────────┤ ├────────────────────────┤
│ ESP32SPIOutput │ │ SPITransposer │ │ SPITransposer │
│ (direct HW) │ │ (transpose2) │ │ (transpose4) │
│ │ │ ↓ │ │ ↓ │
│ │ │ SpiHw2 driver │ │ SpiHw4 driver │
└────────┬───────┘ └──────────┬─────────┘ └────────┬───────────────┘
│ │ │
└─────────────────────┴──────────────────────┘
│
┌──────────────────────────────▼──────────────────────────────────────┐
│ ESP32 SPI Peripheral (SPI2/SPI3) │
│ • Hardware DMA-driven transmission │
│ • 0% CPU usage during transmission │
│ • Supports Single/Dual/Quad/Octal modes │
└─────────────────────────────────────────────────────────────────────┘
src/platforms/shared/spi_manager.hsrc/platforms/esp/32/spi_device_proxy.hsrc/platforms/shared/spi_hw_4.hsrc/platforms/shared/spi_hw_2.hsrc/platforms/shared/spi_transposer.hsrc/platforms/esp/32/spi_hw_4_esp32.cppsrc/platforms/esp/32/spi_hw_2_esp32.cppsrc/platforms/esp/32/fastspi_esp32.hsrc/platforms/esp/32/spi_device_proxy.hsrc/platforms/arm/teensy/teensy4_common/spi_hw_4_mxrt1062.cppsrc/platforms/arm/teensy/teensy4_common/spi_hw_2_mxrt1062.cppsrc/platforms/arm/mxrt1062/fastspi_arm_mxrt1062.hsrc/platforms/arm/mxrt1062/spi_device_proxy.hLP_SPI.md (quad-mode pin configuration details)src/platforms/quad_spi_platform.htests/test_spi_bus_manager.cpptests/test_quad_spi.cpptests/test_dual_spi.cppsrc/platforms/stub/spi_4_stub.hsrc/platforms/stub/spi_2_stub.hexamples/SpecialDrivers/ESP/QuadSPI/Basic/QuadSPI_Basic.inosrc/platforms/README_SPI_ADVANCED.md (unified guide)