Back to Connect

MySQL CDC Benchmark Results

docs/benchmark-results/mysql-cdc.md

4.90.28.4 KB
Original Source

MySQL CDC Benchmark Results

Environment: Intel Core i7-10850H @ 2.70GHz, 32 GB RAM, WSL2 (Linux 6.6.87.2), x86_64


Redpanda Connect — Pure MySQL Read Throughput (no Kafka)

Redpanda Connect mysql_cdc input reading a full snapshot of cart (10,000,000 rows × ~600 B) and dropping all output immediately. No Kafka, no sink — this measures the raw MySQL read ceiling. Varying GOMAXPROCS and batching.count.

See internal/impl/mysql/bench/ for configs and run instructions.

bash
task bench:load:cart COUNT=10000000
task bench:run CORES=1 BATCH=1000
task bench:run CORES=2 BATCH=1000
# ...

msg/sec

GOMAXPROCSbatch=1000batch=5000batch=10000
199,977103,433104,630
2163,592173,022173,045
4187,419187,439187,462
8191,439187,464187,464

MB/sec

GOMAXPROCSbatch=1000batch=5000batch=10000
1606263
298104104
4113113113
8115113113

Observations:

  • Core scaling is strong from 1→2 cores (~1.67×) then rapidly plateaus: 2→4 is ~1.09×, 4→8 is ~1.02×.
  • Throughput saturates at ~187K msg/sec beyond 4 cores — additional cores provide no benefit on this machine.
  • Batch size has negligible effect at all core counts. At 1 core the range is only 99K→105K; at 4+ cores all batch sizes converge to the same value.
  • Peak throughput: 191,439 msg/sec, 115 MB/sec at CORES=8 BATCH=1000.

Kafka Connect JDBC Sink Comparison

10,000,000 rows written from Kafka to MySQL via Confluent JDBC Sink connector. Schema/payload JSON envelope, 16 partitions.

See internal/impl/mysql/bench/mysql-write/jdbc-sink/ for configs and run instructions.

bash
task bench:load COUNT=10000000
task bench:run TASKS=16

msg/sec

batch.size = 3000

tasks.maxmsg/sec
418,518
831,250
1642,553

batch.size = 10000

tasks.maxmsg/sec
1643,859

Observations:

  • Peak throughput at batch=3000: 42,553 msg/sec at 16 tasks. Increasing to batch=10000 yields marginal improvement (43,859 msg/sec) — batch size is not the bottleneck.
  • Task scaling diminishes quickly: 4→8 tasks ~1.7×, 8→16 tasks ~1.4×.
  • MySQL CDC is roughly 4.5× faster than Kafka Connect JDBC Sink at peak (191K vs 43K msg/sec), with a fraction of the infrastructure.

Debezium Kafka Source Connector Comparison

Debezium MySQL source connector reading 10,000,000-row cart snapshot into a Kafka topic. Varying max.batch.size, max.queue.size, and max.poll.records.

See internal/impl/mysql/bench/mysql-read/debezium/ for configs and run instructions.

msg/sec

fetch.sizebatch.sizequeue.sizeelapsedmsg/sec
1,0001,0004,000841s11,890
5,0005,00020,000747s13,386
10,00010,00040,000781s12,804

Observations:

  • Peak throughput: 13,386 msg/sec at fetch=5000/batch=5000/queue=20000.
  • Increasing to fetch=10000 gives no further gain (12,804 msg/sec) — throughput plateaus around 13K msg/sec regardless of fetch/batch size.

Redpanda Connect — MySQL → Kafka

Redpanda Connect mysql_cdc input reading 10,000,000-row cart snapshot into a Kafka topic (kafka_franz output). Varying GOMAXPROCS and batching.count.

See internal/impl/mysql/bench/mysql-read/rpcn/ for configs and run instructions.

bash
task bench:build
task bench:load COUNT=10000000
task bench:all OUT=results.txt

msg/sec

GOMAXPROCSbatch=1,000batch=5,000batch=10,000
115,08527,84946,137
239,25338,76041,322
429,41245,45545,455
829,41245,45545,872
unbounded28,59241,90850,440

Observations:

  • Peak throughput: 50,440 msg/sec (unbounded cores, batch=10000) — roughly 4× faster than Debezium Kafka Source (13,386 msg/sec).
  • Core scaling is strong from 1→2 cores (~2.5× at batch=1000) but plateaus at 2→4→8 cores — Kafka write throughput is the bottleneck beyond 2 cores.
  • Batch size matters most at 1 core (15K→46K, ~3×); at 2+ cores the gain narrows significantly as the bottleneck shifts to Kafka.
  • Beyond batch=5000 there is no meaningful gain at any core count.

Redpanda Connect — Kafka → MySQL

Redpanda Connect consuming from a Kafka topic (kafka_franz input) and writing to MySQL (sql_insert output). Same Kafka broker as above (cpus: 3), 16 partitions. Varying GOMAXPROCS and batching.count.

See internal/impl/mysql/bench/mysql-write/rpcn/ for configs and run instructions.

bash
task bench:load COUNT=10000000
task bench:run CORES=1 BATCH=10000
task bench:run CORES=4 BATCH=10000
# ...

msg/sec (writes to MySQL)

GOMAXPROCSbatch=10000
464,102
860,975

Observations:

  • Throughput plateaus at 4→8 cores (~60-64K msg/sec) — MySQL write throughput is the bottleneck, not CPU.
  • Redpanda Connect is ~1.5× faster than Kafka Connect JDBC Sink at peak (64K vs 43K msg/sec) using fewer resources (4 cores vs 16 tasks).
  • Compared to MySQL CDC (191K msg/sec), the Kafka→MySQL write path is ~3× slower — the extra hop through Kafka and MySQL insert overhead both contribute.

Kafka Connect CDC (Debezium Source — Change Events)

Debezium MySQL source connector streaming CDC change events (inserts) for 10,000,000 rows into a Kafka topic. Varying max.batch.size and max.queue.size.

See internal/impl/mysql/bench/mysql-read/debezium/ for configs and run instructions.

msg/sec

batch.sizequeue.sizeelapsedmsg/sec
1,0004,000~549s18,227
5,00020,000392s25,510
10,00040,000427s23,419

Observations:

  • Peak throughput: 25,510 msg/sec at batch=5000/queue=20000.
  • Increasing to batch=10000 gives no further gain (23,419 msg/sec) — throughput plateaus around 25K msg/sec regardless of batch size.
  • Batch size has diminishing returns beyond 5000; the bottleneck is Debezium's internal processing, not fetch/queue sizing.

Redpanda Connect CDC — MySQL → Kafka

Redpanda Connect mysql_cdc input streaming CDC change events (inserts) for 10,000,000 rows into a Kafka topic (kafka_franz output). Varying GOMAXPROCS and batching.count.

See internal/impl/mysql/bench/mysql-read/rpcn/ for configs and run instructions.

bash
task bench:build
task bench:load:cdc
task bench:all:cdc COUNT=10000000 OUT=cdc_results.txt

msg/sec

GOMAXPROCSbatch=1,000batch=5,000batch=10,000
117,36119,92019,920
218,93915,87315,974
415,87315,87315,823
816,07715,77316,287

Observations:

  • Peak throughput: 19,920 msg/sec at 1 core, batch=5000/10000.
  • Adding more cores provides no benefit — throughput is flat across 1→8 cores (~15–19K msg/sec). The bottleneck is the single-threaded CDC reader, not CPU or Kafka write parallelism.
  • Batch size has minimal impact beyond 5000; the gain from 1000→5000 at 1 core (~15%) disappears entirely at 2+ cores.
  • Debezium CDC is ~1.3× faster (~25K vs ~20K msg/sec) — the only benchmark where Redpanda Connect does not win. All other workloads (snapshot read, Kafka→MySQL write) favor Redpanda Connect.