docs/benchmark-results/iceberg.md
Benchmarks for the iceberg output using a local REST catalog backed by MinIO (S3-compatible).
See internal/impl/iceberg/bench/ for the benchmark configs and run instructions.
Synthetic events generated at maximum speed (generate input with count: 0, interval: ""), written to a single Iceberg table. Varying GOMAXPROCS and batching.count.
Environment: Intel Core i7-10850H @ 2.70GHz, 32 GB RAM, WSL2 (Linux 6.6.87.2), x86_64, MinIO + REST catalog running in Docker (localhost)
Dataset: Synthetic events, ~142 B per message (id, user_id, event_type, value, info, ts) — measured at the pipeline processor. Actual bytes written to MinIO will differ due to Parquet columnar compression.
Count: 1,000,000 messages per run
| GOMAXPROCS | batch=1000 | batch=5000 | batch=10000 |
|---|---|---|---|
| 1 | 757 | 3,105 | 5,442 |
| 2 | 1,186 | 4,408 | 6,763 |
| 4 | 1,147 | 4,758 | 8,483 |
| 8 | 1,056 | 4,107 | 8,231 |
| (unbounded) |
| GOMAXPROCS | batch=1000 | batch=5000 | batch=10000 |
|---|---|---|---|
| 1 | 106 | 435 | 774 |
| 2 | 166 | 618 | 961 |
| 4 | 161 | 667 | 1206 |
| 8 | 148 | 576 | 1170 |
| (unbounded) |
Observations:
Fixed at GOMAXPROCS=4, varying batching.count and max_in_flight to measure the impact of concurrent catalog commits.
Environment: Intel Core i7-10850H @ 2.70GHz, 32 GB RAM, WSL2 (Linux 6.6.87.2), x86_64, MinIO + REST catalog running in Docker (localhost)
Dataset: Synthetic events, ~142 B per message
Count: 1,000,000 messages per run
| max_in_flight | batch=5000 | batch=10000 |
|---|---|---|
| 4 | 4,758 | 8,483 |
| 8 | 7,105 | 13,839 |
| 16 | 12,973 | 23,316 |
| 32 | 20,462 | 34,835 |
| 64 | 34,993 | 33,703 |
| 128 | 33,911 | 33,742 |
| max_in_flight | batch=5000 | batch=10000 |
|---|---|---|
| 4 | 0.67 | 1.21 |
| 8 | 1.00 | 2.00 |
| 16 | 1.80 | 3.30 |
| 32 | 2.90 | 5.00 |
| 64 | 5.00 | 4.80 |
| 128 | 4.80 | 4.80 |
Observations:
max_in_flight is the most impactful knob: at batch=10000, throughput scales from 8,483 (MIF=4) → 13,839 (MIF=8) → 23,316 (MIF=16) → 34,835 (MIF=32) msg/sec — a 4x gain by increasing concurrent commits.max_in_flight) and it scales linearly until MinIO saturates.Environment: Intel Core i7-10850H @ 2.70GHz, 32 GB RAM, WSL2 (Linux 6.6.87.2), x86_64 Dataset: 10,000,000 synthetic events, MinIO + Iceberg REST catalog in Docker
Both connectors use a 10s commit window and 16 Kafka partitions. The transformation computes 5 derived fields per message (event_id, value_usd, value_tier, ts_ms, is_high_value).
| Connector | Throughput |
|---|---|
| Kafka Connect (Tabular) | 84,745 msg/s |
| Redpanda Connect | 61,349 msg/s |
| Connector | Kafka CPUs | Throughput |
|---|---|---|
| Kafka Connect (Tabular) | unbounded | 37,037 msg/s |
| Redpanda Connect | unbounded | 47,272 msg/s |
| Redpanda Connect | 1 | 45,248 msg/s |
| Redpanda Connect | 2 | 48,829 msg/s |
bench-events-transformed), then Kafka Connect reads from that topic and sinks to Iceberg. The two-stage I/O cuts throughput by more than half.