docs/benchmarking.md
This document describes how to benchmark Redpanda Connect connectors — the standard approach, the tools involved, and how to record and report results.
Each connector that needs benchmarking gets a self-contained bench/ directory inside its implementation package (e.g. internal/impl/<component>/bench/). The benchmark suite should be fully reproducible from a single task invocation and should measure throughput of the connector under realistic conditions.
The general approach:
benchmark processor to measure throughputPlace benchmarking files in internal/impl/<component>/bench/:
internal/impl/<component>/bench/
├── README.md # How to run, prerequisites, expected output
├── Taskfile.yaml # Task runner for orchestration
├── benchmark_config.yaml # Redpanda Connect pipeline config
├── docker-compose.yml # (optional) Multi-service setups
├── create.sql # (optional) Schema creation scripts
├── users.sql # (optional) Data generation scripts
└── main.go # (optional) Programmatic data seeding
Use Docker to run the service locally. Define tasks in Taskfile.yaml for starting, stopping, and managing the container. Use the same image that production would use — avoid "lite" or "local" variants unless that's the only option (e.g. DynamoDB Local), and document the limitation.
version: '3'
tasks:
service:up:
cmd: |
docker run -d \
--name <service-name> \
-p <host-port>:<container-port> \
-e <ENV_VARS> \
<image>
service:down:
cmd: docker rm -fv <service-name>
service:logs:
cmd: docker logs -f <service-name>
For benchmarks involving multiple services (e.g. source and destination clusters), use a docker-compose.yml instead.
Reproducibility controls — For consistent results across runs, pin resources in your docker-compose:
cpuset) — Prevents OS scheduling noise. Assign dedicated cores to each container so they don't compete.mem_limit) — Prevents the OOM killer and keeps conditions consistent.GOMAXPROCS and GOMEMLIMIT on Connect containers to control goroutine scheduling and GC pressure.See the migrator benchmark for an example that pins source, destination, loader, and migrator to separate CPU sets:
migrator:
environment:
GOMAXPROCS: "3"
GOMEMLIMIT: "3GiB"
cpuset: "5,6,7"
mem_limit: 3500M
Dataset design — Use multiple tables with different schemas (e.g. users, products, orders) rather than one giant table. This is more realistic and matters for CDC connectors where per-table parallelism is a factor. Use realistic row sizes (1-2KB is typical).
There are three approaches depending on the connector:
SQL scripts — For database connectors, write SQL scripts that generate bulk data. Use stored procedures with loops for large datasets:
-- Example: generate 500,000 rows
DECLARE @i INT = 0;
WHILE @i < 500000
BEGIN
INSERT INTO users (name, email, created_at)
VALUES (CONCAT('user-', @i), CONCAT('user', @i, '@example.com'), GETDATE());
SET @i = @i + 1;
END
Add Taskfile entries for each data generation script:
data:users:
cmd: task sqlcmd EXTRA_ARGS="-i users.sql"
Go seeder program — For services with native Go SDKs (e.g. DynamoDB), write a main.go that seeds data using concurrent workers. Use BatchWriteItem or equivalent bulk APIs for speed. See the DynamoDB benchmark for a reference implementation using 16 concurrent workers to insert 450k items.
Bloblang generate input — For benchmarks that just need raw message throughput (e.g. migrator benchmarks), use a Redpanda Connect config with generate input:
input:
generate:
interval: "" # As fast as possible
count: 30_000_000
batch_size: 1_000
mapping: |
root = "<your payload here>"
Create benchmark_config.yaml — a Redpanda Connect config that reads from the connector under test and sinks to drop: {} (discard output). The key element is the benchmark processor which logs rolling throughput statistics:
http:
debug_endpoints: true # Required for profiling
input:
<your_connector>:
# connector-specific config
batching:
count: 1000 # Tune batch size for throughput
output:
processors:
- benchmark:
interval: 1s # How often to log stats
count_bytes: true # Report MB/sec in addition to msg/sec
drop: {} # Discard output — we only care about read throughput
logger:
level: INFO
metrics:
prometheus:
add_process_metrics: true
add_go_metrics: true
Key configuration points:
http.debug_endpoints: true — Exposes pprof endpoints at localhost:4195 for CPU/memory/blocking profilingbenchmark processor — Logs msg/sec and bytes/sec at the configured intervaldrop: {} — Eliminates output overhead so you measure only input throughputBatch size tuning — The batching.count parameter has a significant impact on throughput and varies widely across connectors. Existing benchmarks range from 1,000 (SQL Server, DynamoDB) to 140,000 (Oracle CDC). Experiment with this value — too small means excessive per-batch overhead, too large means memory pressure and latency spikes. Document what you tested and what worked best.
Docker image architecture — On Apple Silicon (ARM), make sure you're using the correct image architecture. The migrator benchmark explicitly uses redpandadata/connect:edge-arm64. Running an x86 image under Rosetta/QEMU emulation will tank throughput numbers and produce misleading results.
Wire everything together in the Taskfile so task (or task run) executes the full sequence:
run:
cmds:
- task: service:up
- task: create
- task: seed
- go run ../../../../../cmd/redpanda-connect/main.go run ./benchmark_config.yaml
Or for manual step-by-step execution:
# Start the service
task service:up
# Create schema and seed data
task create
task seed
# Run the benchmark
go run ../../../../cmd/redpanda-connect/main.go run ./benchmark_config.yaml
You should see rolling throughput logs:
INFO rolling stats: 101000 msg/sec, 135 MB/sec @service=redpanda-connect ...
INFO rolling stats: 104000 msg/sec, 139 MB/sec @service=redpanda-connect ...
Most CDC connectors maintain a checkpoint/cursor. Add a task to clear it between runs:
drop-checkpoint:
cmd: <command to drop checkpoint table/cache>
Every bench/ directory must have a README.md that includes:
brew install sqlcmd, Docker, etc.)For CDC connectors, benchmark both modes separately — they have very different performance characteristics:
SELECT. Oracle CDC snapshot hit ~140K msg/sec vs ~50K for streaming.Report both numbers. Snapshot throughput establishes a ceiling; streaming throughput is what customers will actually experience.
Some source systems have retention windows for change data:
rman_setup.rman in the Oracle benchmark).Document any retention-related constraints in the benchmark README so others don't waste time debugging "0 msg/sec" output.
For deeper investigation, use the profiling tools in resources/docker/profiling/:
# Start Prometheus + Grafana monitoring stack
cd resources/docker/profiling
task up
# Grafana: http://localhost:3000
# Prometheus: http://localhost:9090
# Capture profiles (requires debug_endpoints: true in your config)
task profile:cpu # 30s CPU profile
task profile:mem # Memory heap profile
task profile:block # Goroutine blocking profile
# View profiles in browser
task pprof:cpu
task pprof:mem
task pprof:block
For long-running profiling sessions, consider a streaming data generator that produces continuous load (see the migrator's loader-streaming.yaml which generates ~100MB/s indefinitely).
Record benchmark results in the PR description. Include:
Good benchmark PRs don't just report numbers — they investigate where the bottleneck is. Techniques used in past benchmarks:
sql_raw input or a simpler input to rule out the connector's own code as the bottleneck vs the source system (done in the SQL Server CDC benchmark).sql.DBStats to see how many connections are actually in use. The SQL Server benchmark revealed only 1 of 100 connections was active, proving the bottleneck was single-threaded reads, not Connect.For post-hoc analysis, you can write benchmark output to a file instead of (or in addition to) dropping it:
output:
processors:
- benchmark:
interval: 1s
count_bytes: true
file:
path: "./results.json"
codec: lines
In addition to the PR description, add or update a results file in docs/benchmark-results/. Each connector gets its own file (e.g. mssqlserver-cdc.md). Append new runs as dated sections so we can track performance over time.
When adding a new result, include:
Example from the SQL Server CDC benchmark PR:
Runtime: ~4m 30s Dataset: 1.4kb × 21,198,489 rows = 24.1GB
INFO rolling stats: 101000 msg/sec, 135 MB/sec INFO rolling stats: 104000 msg/sec, 139 MB/sec INFO rolling stats: 103000 msg/sec, 138 MB/sec
For a non-technical overview suitable for sales, marketing, and other non-engineering audiences, see the Performance Summary.
| Component | Bench Suite | Results | Throughput | Notes |
|---|---|---|---|---|
| Redpanda Migrator | internal/impl/redpanda/migrator/bench/ | results | 1 GB/s+, 1M msg/sec | Cluster-to-cluster, 30GB transfer |
| SQL Server CDC | internal/impl/mssqlserver/bench/ | results | ~135 MB/sec, 100K msg/sec | Single connection bottleneck |
| Oracle CDC | internal/impl/oracledb/bench/ | results | ~50K msg/sec (streaming) | LogMiner single-threaded limitation |
| DynamoDB CDC | internal/impl/aws/dynamodb/bench/ | results | ~200 MB/sec, 100K msg/sec | DynamoDB Local, 3 tables x 150K items |
Benchmark results go stale. Follow these practices to keep them current:
When adding a new benchmark suite — Create a corresponding results file in docs/benchmark-results/, update the table in this document, and update docs/benchmark-results/SUMMARY.md.
When modifying a connector's performance path — Re-run the benchmark and append a new dated section to the results file. This includes changes to batching, buffering, connection handling, serialization, or any code that sits in the hot path.
When re-running an existing benchmark — Always append (don't replace) so we can track performance over time. Include the date, PR link, and what changed since the last run.
During code review — The /review skill includes a benchmarking check. It will flag PRs that add or modify bench/ directories without updating results files, and PRs that include throughput numbers in the description without recording them in docs/benchmark-results/. It will also note when performance-critical connector changes may warrant a benchmark re-run.
For unit-level benchmarks of internal components (serialization, conversion, etc.), use standard Go testing.B benchmarks in *_test.go files. Use b.ReportMetric() to report domain-specific metrics (e.g. spans/sec) and b.ReportAllocs() for allocation tracking:
func BenchmarkConvert(b *testing.B) {
// setup...
b.ReportAllocs()
for b.Loop() {
// operation under test
}
b.ReportMetric(float64(itemCount)/b.Elapsed().Seconds(), "items/sec")
}
These are complementary to the integration-level benchmarks described above and are useful for isolating performance of specific code paths.