Back to Fastled

Automatic Parallel Output for RP2040/RP2350

src/platforms/arm/rp/rpcommon/PARALLEL.md

3.10.319.7 KB
Original Source

Automatic Parallel Output for RP2040/RP2350

Overview

The automatic parallel output driver enables seamless parallel LED control on RP2040/RP2350 platforms using the standard FastLED API. Unlike the manual ParallelClocklessController, this driver:

  • ✅ Works with standard FastLED.addLeds() calls
  • ✅ Automatically detects consecutive GPIO pins
  • ✅ Groups them for efficient parallel output (2, 4, or 8 pins)
  • ✅ Falls back to sequential output for non-consecutive pins
  • ✅ Uses the same PIO/DMA resources as manual setup

Quick Start

Enable the Driver

Define FASTLED_RP2040_CLOCKLESS_PIO_AUTO before including FastLED.h:

cpp
#define FASTLED_RP2040_CLOCKLESS_PIO_AUTO 1
#include <FastLED.h>

#define NUM_LEDS 100

// Standard FastLED arrays
CRGB leds1[NUM_LEDS];
CRGB leds2[NUM_LEDS];
CRGB leds3[NUM_LEDS];
CRGB leds4[NUM_LEDS];

void setup() {
    // Just use standard addLeds() - automatic parallel grouping!
    FastLED.addLeds<WS2812, 2, GRB>(leds1, NUM_LEDS);  // GPIO 2
    FastLED.addLeds<WS2812, 3, GRB>(leds2, NUM_LEDS);  // GPIO 3 (grouped with 2)
    FastLED.addLeds<WS2812, 4, GRB>(leds3, NUM_LEDS);  // GPIO 4 (grouped with 2-3)
    FastLED.addLeds<WS2812, 5, GRB>(leds4, NUM_LEDS);  // GPIO 5 (grouped with 2-4)
}

void loop() {
    // Update LEDs
    fill_rainbow(leds1, NUM_LEDS, millis() / 10);
    // Standard FastLED.show() - all 4 strips output in parallel!
    FastLED.show();
}

How It Works

Automatic Pin Grouping

When FastLED.show() is called, the driver:

  1. Collects all registered pins from addLeds() calls
  2. Sorts pins numerically (e.g., [5, 2, 4, 3] → [2, 3, 4, 5])
  3. Detects consecutive runs:
    • [2, 3, 4, 5] → Single 4-pin group
    • [2, 3, 5, 6] → Two 2-pin groups
    • [2, 5, 10] → Three single-pin groups (sequential fallback)
  4. Creates parallel groups for consecutive pins (2, 4, or 8 pins)
  5. Allocates PIO/DMA resources per group
  6. Outputs all groups (parallel or sequential)

Parallel Group Sizes

Consecutive PinsGroup SizePIO Output Mode
2 pins2-lane parallelBit-transposed
3 pins2-lane + 1 sequentialMixed
4 pins4-lane parallelBit-transposed
5-7 pins4-lane + fallbackMixed
8+ pins8-lane parallelBit-transposed

Non-consecutive pins fall back to sequential (non-parallel) output.

Example: Mixed Groups

cpp
// Pins: 2, 3, 4, 5, 10, 11, 15
FastLED.addLeds<WS2812, 2>(...);  // ┐
FastLED.addLeds<WS2812, 3>(...);  // ├─ Group 1: 4-lane parallel (GPIO 2-5)
FastLED.addLeds<WS2812, 4>(...);  // │
FastLED.addLeds<WS2812, 5>(...);  // ┘
FastLED.addLeds<WS2812, 10>(...); // ┬─ Group 2: 2-lane parallel (GPIO 10-11)
FastLED.addLeds<WS2812, 11>(...); // ┘
FastLED.addLeds<WS2812, 15>(...); // ── Group 3: Sequential (GPIO 15 alone)

Resources Used:

  • Group 1 (4-pin): 1 PIO state machine + 1 DMA channel
  • Group 2 (2-pin): 1 PIO state machine + 1 DMA channel
  • Group 3 (1-pin): Uses existing non-parallel driver (TODO: implement fallback)

Hardware Requirements

Pin Consecutiveness (Critical!)

RP2040/RP2350 PIO hardware requires consecutive GPIO pins for parallel output.

This is a hardware limitation of the PIO out pins, N instruction.

Valid Configurations:

GPIO 2-3   (2 pins)
GPIO 2-5   (4 pins)
GPIO 10-17 (8 pins)
GPIO 0-7   (8 pins)

Invalid Configurations:

GPIO 2, 4, 6, 8  (non-consecutive - will use sequential fallback)
GPIO 1, 3, 5     (non-consecutive - will use sequential fallback)

GPIO Pin Availability

  • RP2040: 30 GPIO pins (GPIO 0-29)
  • RP2350: 48 GPIO pins (GPIO 0-47, some reserved)

Avoid using:

  • GPIO 19-20 (USB UART on some boards)
  • GPIO 23-25 (SPI flash on Pico)
  • Pins used for I2C, SPI, UART if needed

Recommended consecutive ranges:

  • GPIO 2-9 (8 consecutive pins)
  • GPIO 10-17 (8 consecutive pins)
  • GPIO 20-27 (8 consecutive pins, avoid if using USB UART)

Performance

Bit Transposition Overhead

The driver transposes LED data from standard RGB format to bit-parallel format:

Group SizeTranspose Time (100 LEDs)CPU Overhead
2 strips~20 µs @ 133 MHz<1%
4 strips~35 µs @ 133 MHz~2%
8 strips~60 µs @ 133 MHz~3%

Transpose algorithms:

  • 8-strip: Hacker's Delight (optimized)
  • 4-strip: Nibble extraction
  • 2-strip: Bit extraction

Frame Rate

WS2812B timing limits:

  • 100 LEDs × 4 strips = ~12 ms per frame
  • Maximum ~83 FPS (timing limited, not CPU limited)

Practical frame rates:

  • 50 FPS: 20 ms delay (recommended)
  • 60 FPS: 16.7 ms delay
  • 83 FPS: No delay (WS2812 timing limit)

Memory Usage

Buffer allocation (RGB mode):

  • RectangularDrawBuffer: (max_leds × 3 bytes) × num_strips
    • Example: 100 LEDs × 4 strips = 1200 bytes
  • Transpose buffer: max_leds × 24 bytes per group
    • Example: 100 LEDs = 2400 bytes per group
  • Total for 4 strips × 100 LEDs: ~3600 bytes (1200 + 2400)

Buffer allocation (RGBW mode):

  • RectangularDrawBuffer: (max_leds × 4 bytes) × num_strips
    • Example: 100 LEDs × 4 strips = 1600 bytes
  • Transpose buffer: max_leds × 32 bytes per group
    • Example: 100 LEDs = 3200 bytes per group
  • Total for 4 strips × 100 LEDs: ~4800 bytes (1600 + 3200)

Memory location:

  • RP2040: Main SRAM (264 KB)
  • RP2350: Main SRAM (520 KB)
  • No PSRAM support (unlike ESP32-S3)

Resource Usage

Per parallel group:

  • 1 PIO state machine (from 8 available: 2 PIOs × 4 SMs)
  • 1 DMA channel (from 12 available)
  • ~32 PIO instructions for timing program

Example: 4-pin + 2-pin + 1-pin groups:

  • 2 PIO state machines (4-pin + 2-pin groups)
  • 2 DMA channels
  • 1 sequential fallback (1-pin group, uses existing driver)

Comparison: Manual vs Automatic

Manual Parallel Setup (Old Way)

cpp
#include <platforms/arm/rp/rp2040/clockless_arm_rp2040.h>

CRGB leds[4][100];

fl::ParallelClocklessController<
    2, 4,               // Base pin, 4 lanes
    400, 850, 50000,    // WS2812B timing
    GRB
> controller;

void setup() {
    for (int i = 0; i < 4; i++) {
        controller.addStrip(i, leds[i], 100);
    }
    controller.init();
}

void loop() {
    controller.showLeds(0xFF);  // NOT FastLED.show()!
}

Automatic Parallel Setup (New Way)

cpp
#define FASTLED_RP2040_CLOCKLESS_PIO_AUTO 1
#include <FastLED.h>

CRGB leds1[100], leds2[100], leds3[100], leds4[100];

void setup() {
    FastLED.addLeds<WS2812, 2, GRB>(leds1, 100);
    FastLED.addLeds<WS2812, 3, GRB>(leds2, 100);
    FastLED.addLeds<WS2812, 4, GRB>(leds3, 100);
    FastLED.addLeds<WS2812, 5, GRB>(leds4, 100);
}

void loop() {
    FastLED.show();  // Standard API!
}

Benefits:

  • ✅ Standard FastLED API (no custom classes)
  • ✅ Works with all FastLED features (brightness, color correction, etc.)
  • ✅ Automatic grouping detection
  • ✅ Graceful fallback for non-consecutive pins
  • ✅ Same performance as manual setup

Limitations

Hardware Constraints

  1. Consecutive pins required for parallel output

    • Hardware limitation of PIO out pins, N instruction
    • Non-consecutive pins fall back to sequential output
  2. Maximum 12 DMA channels (shared with other peripherals)

    • Each parallel group uses 1 DMA channel
    • Sequential groups may share resources
  3. Maximum 8 PIO state machines (2 PIOs × 4 SMs)

    • Each parallel group uses 1 SM
    • Shared with other PIO-based features (SPI, I2C, etc.)

Software Constraints

  1. Variable strip lengths supported but padded to maximum

    • If strips have different LED counts, all are padded to the longest
    • Example: [50, 100, 75] LEDs → all padded to 100 LEDs internally
  2. RGBW mode fully supported

    • RGBW uses 4 bytes per LED vs 3 for RGB
    • Buffer sizes increase accordingly (32 bytes vs 24 bytes per LED for transpose)
    • Dedicated RGBW transpose functions for optimal performance
    • Mixed RGB/RGBW strips in same parallel group supported (see RGBW section below)
  3. Sequential fallback TODO for single pins

    • Single non-consecutive pins currently use placeholder
    • Will integrate with existing clockless driver in future update

Troubleshooting

"Failed to claim PIO state machine"

Cause: All 8 PIO state machines are in use by other peripherals.

Solutions:

  • Reduce number of parallel groups (use fewer consecutive pins)
  • Disable other PIO-based features (e.g., parallel SPI)
  • Use sequential output for some strips

"Failed to claim DMA channel"

Cause: All 12 DMA channels are in use.

Solutions:

  • Reduce number of parallel groups
  • Disable other DMA-based features
  • Use sequential output for some strips

Non-consecutive pins not outputting

Cause: Sequential fallback not fully implemented yet.

Solutions:

  • Use consecutive pins for now
  • Wait for future update with sequential fallback integration

Strips flicker or show wrong colors

Possible causes:

  1. Power supply issues (most common)

    • Ensure adequate current capacity (60 mA per LED max)
    • Use external 5V power supply for >50 LEDs
    • Add bypass capacitors (100-1000 µF)
  2. GPIO pin conflict

    • Check that pins aren't used by other peripherals
    • Avoid GPIO 19-20 (USB UART) if using Serial
  3. Timing issues

    • WS2812B requires precise timing (handled by PIO)
    • Ensure no interrupt-heavy code in loop()

Advanced Usage

Dynamic Strip Addition

Strips can be added at runtime:

cpp
void setup() {
    // Initial strips
    FastLED.addLeds<WS2812, 2, GRB>(leds1, 100);
    FastLED.addLeds<WS2812, 3, GRB>(leds2, 100);
}

void addMoreStrips() {
    // Add more strips later (auto-regroups on next show())
    FastLED.addLeds<WS2812, 4, GRB>(leds3, 100);
    FastLED.addLeds<WS2812, 5, GRB>(leds4, 100);
    // Next FastLED.show() will detect 4-pin group
}

Note: Pin grouping is re-evaluated when the strip configuration changes.

RGBW Support

The RP2040/RP2350 automatic parallel driver fully supports RGBW (4-channel) LED strips like SK6812.

Basic RGBW Usage

cpp
#define FASTLED_RP2040_CLOCKLESS_PIO_AUTO 1
#include <FastLED.h>

#define NUM_LEDS 100
CRGB leds[NUM_LEDS];

void setup() {
    // Add RGBW strip (SK6812 or similar)
    FastLED.addLeds<WS2812, 2, GRB>(leds, NUM_LEDS).setRgbw(RgbwDefault());
}

void loop() {
    // Set colors normally - white channel calculated automatically
    fill_solid(leds, NUM_LEDS, CRGB::White);
    FastLED.show();
}

Parallel RGBW Strips

Multiple RGBW strips work with automatic parallel grouping:

cpp
CRGB leds1[100], leds2[100], leds3[100], leds4[100];

void setup() {
    // All 4 strips will output in parallel (GPIO 2-5)
    FastLED.addLeds<WS2812, 2, GRB>(leds1, 100).setRgbw(RgbwDefault());
    FastLED.addLeds<WS2812, 3, GRB>(leds2, 100).setRgbw(RgbwDefault());
    FastLED.addLeds<WS2812, 4, GRB>(leds3, 100).setRgbw(RgbwDefault());
    FastLED.addLeds<WS2812, 5, GRB>(leds4, 100).setRgbw(RgbwDefault());
}

Mixed RGB and RGBW Strips

You can mix RGB and RGBW strips in the same parallel group:

cpp
CRGB rgb_leds[100];
CRGB rgbw_leds[100];

void setup() {
    // GPIO 2: RGB strip (WS2812)
    FastLED.addLeds<WS2812, 2, GRB>(rgb_leds, 100);

    // GPIO 3: RGBW strip (SK6812)
    FastLED.addLeds<WS2812, 3, GRB>(rgbw_leds, 100).setRgbw(RgbwDefault());

    // These will be grouped for parallel output
}

Important: When mixing RGB and RGBW strips in a parallel group:

  • The entire group operates in RGBW mode (4 bytes per LED)
  • RGB strips have their W channel set to 0 (no visible effect)
  • This simplifies hardware configuration (single PIO program per group)
  • Slight memory overhead for RGB strips, but negligible in practice

RGBW Performance

Strip TypeBytes per LEDTranspose Buffer (100 LEDs)
RGB (3 channels)32,400 bytes (24 bytes/LED)
RGBW (4 channels)43,200 bytes (32 bytes/LED)

Frame time impact:

  • RGBW: ~33% more data to transfer (4 bytes vs 3)
  • PIO timing: ~1.25 µs per byte (WS2812 protocol)
  • 100 RGBW LEDs: ~500 µs vs ~375 µs for RGB
  • Still well within real-time performance requirements

RGBW Transpose Functions

The transpose functions support both RGB and RGBW via a bytes_per_led parameter:

  • transpose_8strips(input, output, num_leds, bytes_per_led): 8 parallel strips
  • transpose_4strips(input, output, num_leds, bytes_per_led): 4 parallel strips
  • transpose_2strips(input, output, num_leds, bytes_per_led): 2 parallel strips

The bytes_per_led parameter defaults to 3 (RGB) but can be set to 4 for RGBW.

The correct function is automatically selected based on:

  1. Number of consecutive pins in the group (2, 4, or 8)
  2. Whether any strip in the group has RGBW enabled (sets bytes_per_led = 4)

No performance regression: RGB-only groups use bytes_per_led = 3 (default).

Debug Output

Enable FastLED debug output to see grouping decisions:

cpp
#define FASTLED_DEBUG_LEVEL 1
#define FASTLED_RP2040_CLOCKLESS_PIO_AUTO 1
#include <FastLED.h>

Example output:

Detecting pin groups from 4 pins
Created 4-pin parallel group at GPIO 2
Allocated resources for 4-pin parallel group at GPIO 2 (PIO0, SM0, DMA0)
Transposed 4-pin group at GPIO 2 (100 LEDs, 2400 bytes)
Parallel output for 4 pins starting at GPIO 2 (2400 bytes)

Technical Details

Data Flow

  1. User calls FastLED.show()
  2. beginShowLeds() phase:
    • Each controller queues its pin to singleton group
    • Group collects all pins
  3. showPixels() phase:
    • Each controller writes pixel data to RectangularDrawBuffer
    • Buffer allocates rectangular memory (padded to max LED count)
  4. endShowLeds() phase:
    • First controller triggers output
    • Detect/rebuild pin groups if configuration changed
    • Transpose data for parallel groups
    • Start DMA transfers to PIO state machines
  5. Guard ensures single output per frame

Bit Transposition Format

RGB Mode (3 channels)

Input (Standard RGB):

Strip 0: [R0][G0][B0][R1][G1][B1]...
Strip 1: [R0][G0][B0][R1][G1][B1]...
Strip 2: [R0][G0][B0][R1][G1][B1]...
Strip 3: [R0][G0][B0][R1][G1][B1]...

Output (Bit-Transposed for 4-pin PIO):

Byte 0:  [0][0][0][0][S3_R0_b7][S2_R0_b7][S1_R0_b7][S0_R0_b7]  // MSB of R0
Byte 1:  [0][0][0][0][S3_R0_b6][S2_R0_b6][S1_R0_b6][S0_R0_b6]
...
Byte 7:  [0][0][0][0][S3_R0_b0][S2_R0_b0][S1_R0_b0][S0_R0_b0]  // LSB of R0
Byte 8:  [0][0][0][0][S3_G0_b7][S2_G0_b7][S1_G0_b7][S0_G0_b7]  // MSB of G0
...
Byte 23: [0][0][0][0][S3_B0_b0][S2_B0_b0][S1_B0_b0][S0_B0_b0]  // LSB of B0

Total: 24 bytes per LED (8 bytes per channel × 3 channels)

RGBW Mode (4 channels)

Input (Standard RGBW):

Strip 0: [R0][G0][B0][W0][R1][G1][B1][W1]...
Strip 1: [R0][G0][B0][W0][R1][G1][B1][W1]...
Strip 2: [R0][G0][B0][W0][R1][G1][B1][W1]...
Strip 3: [R0][G0][B0][W0][R1][G1][B1][W1]...

Output (Bit-Transposed for 4-pin PIO):

Byte 0:  [0][0][0][0][S3_R0_b7][S2_R0_b7][S1_R0_b7][S0_R0_b7]  // MSB of R0
...
Byte 7:  [0][0][0][0][S3_R0_b0][S2_R0_b0][S1_R0_b0][S0_R0_b0]  // LSB of R0
Byte 8:  [0][0][0][0][S3_G0_b7][S2_G0_b7][S1_G0_b7][S0_G0_b7]  // MSB of G0
...
Byte 15: [0][0][0][0][S3_G0_b0][S2_G0_b0][S1_G0_b0][S0_G0_b0]  // LSB of G0
Byte 16: [0][0][0][0][S3_B0_b7][S2_B0_b7][S1_B0_b7][S0_B0_b7]  // MSB of B0
...
Byte 23: [0][0][0][0][S3_B0_b0][S2_B0_b0][S1_B0_b0][S0_B0_b0]  // LSB of B0
Byte 24: [0][0][0][0][S3_W0_b7][S2_W0_b7][S1_W0_b7][S0_W0_b7]  // MSB of W0
...
Byte 31: [0][0][0][0][S3_W0_b0][S2_W0_b0][S1_W0_b0][S0_W0_b0]  // LSB of W0

Total: 32 bytes per LED (8 bytes per channel × 4 channels)

PIO out pins, 4 instruction: Outputs lower 4 bits to GPIO 2-5 simultaneously.

Architecture Diagram

┌──────────────────────────────────────────────────────────────────┐
│ User Code: FastLED.addLeds<WS2812, PIN>()                       │
└──────────────────┬───────────────────────────────────────────────┘
                   │
                   ▼
┌──────────────────────────────────────────────────────────────────┐
│ ClocklessController_RP2040_PIO_WS2812<PIN>                      │
│ - One instance per addLeds() call                               │
│ - Stores: mPin (GPIO number)                                    │
└──────────────────┬───────────────────────────────────────────────┘
                   │
                   ▼
┌──────────────────────────────────────────────────────────────────┐
│ RP2040ParallelGroup (Singleton)                                 │
│ - RectangularDrawBuffer: Collects all LED data                  │
│ - Pin grouping detection: Sorts & detects consecutive runs      │
│ - PinGroup array: Stores groups with allocated PIO/DMA          │
└──────────────────┬───────────────────────────────────────────────┘
                   │
        ┌──────────┴──────────┬──────────┬──────────┐
        ▼                     ▼          ▼          ▼
┌─────────────────┐  ┌─────────────┐  ...  ┌─────────────┐
│ PinGroup 1      │  │ PinGroup 2  │       │ PinGroup N  │
│ - base_pin: 2   │  │ - base_pin  │       │ - base_pin  │
│ - num_pins: 4   │  │ - num_pins  │       │ - num_pins  │
│ - PIO0, SM0     │  │ - PIO/SM    │       │ - PIO/SM    │
│ - DMA0          │  │ - DMA       │       │ - DMA       │
│ - Transpose buf │  │ - Buffer    │       │ - Buffer    │
└────────┬────────┘  └──────┬──────┘       └──────┬──────┘
         │                  │                     │
         ▼                  ▼                     ▼
┌─────────────────────────────────────────────────────────┐
│ Hardware: PIO State Machines + DMA                      │
│ - PIO: Precise WS2812 timing via custom program         │
│ - DMA: Non-blocking data transfer                       │
│ - GPIO: Parallel output to consecutive pins             │
└─────────────────────────────────────────────────────────┘

Examples

See examples/SpecialDrivers/RP/Parallel_IO.ino for a complete working example.

License

This driver is part of FastLED and is licensed under the MIT License.

Contributing

Found a bug? Have a suggestion? Please open an issue or pull request on GitHub: https://github.com/FastLED/FastLED

Version History

  • v1.0.0 (2025-01): Initial release
    • Automatic consecutive pin detection
    • 2/4/8-lane parallel output
    • Integration with standard FastLED API
    • RectangularDrawBuffer for multi-strip management