Back to Connect

AWS S3 Benchmark Results

docs/benchmark-results/aws-s3.md

4.93.08.2 KB
Original Source

AWS S3 Benchmark Results

Environment: Intel Core i7-10850H @ 2.70GHz, 32 GB RAM, WSL2 (Linux 6.6.87.2), x86_64

See ../../internal/impl/aws/s3/bench/ for configs and run instructions.
Read benchmarks are under bench/read/, write benchmarks under bench/write/.


📊 PERFORMANCE REPORT

Executive Summary

This benchmark evaluates S3 read and write throughput across three approaches:

  • Sequential S3 access (aws_s3 bucket walk)
  • Kafka Connect S3 Sink/Source
  • Redpanda Connect S3 pipelines (single and multi-instance)

Key Findings

  • S3 access is latency-bound unless parallelized.
    Sequential reads and writes are dominated by per-request overhead.

  • Kafka Connect achieves the highest throughput (~250k msg/s)
    due to parallel task execution and large-batch S3 writes.

  • Redpanda Connect is throughput-capped (~60k–73k msg/s)
    due to shared output constraints limiting S3 write concurrency.

  • Batching is the dominant factor for write performance.
    Larger batches significantly reduce S3 request overhead.

  • LocalStack introduces artificial ceilings.
    Results reflect LocalStack’s single-node S3 implementation rather than real AWS scalability.


Performance Summary

WorkloadBest ThroughputLimiting Factor
Bucket walk (1KB)~563 msg/sRequest latency
Bucket walk (1MB)~190 msg/s (~195 MB/s)Transfer bandwidth
Kafka Connect S3 Source~73k msg/sS3 read throughput
Kafka Connect S3 Sink~250k msg/sS3 write throughput
Redpanda Connect (single)~61k msg/sShared S3 writer
Redpanda Connect (multi)~73k msg/sS3 backend saturation

Key Conclusions

  • Parallelism is the primary driver of performance.
    Systems that issue multiple concurrent S3 requests achieve significantly higher throughput.

  • Batch size is critical for write-heavy workloads.
    Larger batches reduce request overhead and improve efficiency.

  • Architectural differences dominate tuning effects.
    Kafka Connect scales via independent tasks; Redpanda Connect is constrained by shared output.

  • Measured ceilings are environment-dependent.
    LocalStack limits concurrency; real AWS S3 would likely increase absolute throughput and widen scaling differences.


READ BENCHMARKS

Bucket Walk — Small Objects (1 KB)

200,000 objects × 1 KB. Default aws_s3 input in bucket walk mode (no SQS), LocalStack.

msg/sec

GOMAXPROCSsize=1024
1563
2556
4548
8544

kB/sec

GOMAXPROCSsize=1024
1577
2569
4561
8557

Bucket Walk — Large Objects (1 MB)

20,000 objects × 1 MB. Same setup.

msg/sec

GOMAXPROCSsize=1048576
1190
2186
4179
8180

MB/sec

GOMAXPROCSsize=1048576
1199
2195
4188
8188

Kafka Connect S3 Source — Read Throughput

Results

TASKSFLUSHELAPSED(s)MSG/S
150004961224
1100005060000
1500006645454
250004173170
2100004271428
2500005158823
450004271428
4100004271428
4500005060000
850004271428
8100004173170
8500005752631

Read Observations

  • Throughput is latency-bound for bucket walk.
    Sequential GetObject calls make HTTP round-trip time the dominant factor.

  • CPU parallelism has no impact.
    Increasing GOMAXPROCS does not improve performance, confirming serialized I/O.

  • Object size determines efficiency.
    Small objects (~1 KB) are dominated by request overhead; large objects (~1 MB) achieve high throughput due to efficient data transfer.

  • Small-object workloads are inefficient.
    A 1000× size increase yields ~340× better throughput (MB/sec), showing request overhead dominates.

  • Kafka Connect source follows the same S3 limits.
    Single-task throughput (~60k msg/s) matches Redpanda write ceilings, indicating S3 request cost dominates.

  • Parallelism improves read throughput up to saturation (~73k msg/s).
    Beyond that, S3 becomes the bottleneck.

  • LocalStack underestimates real latency impact.
    Real S3 deployments will show lower msg/sec due to network RTT.


WRITE BENCHMARKS

Kafka Connect S3 Sink — Write Throughput

Best Configurations

TASKSFLUSHPOLLFETCH MINMSG/S
165000010001MB250000
25000050004MB230769
45000010001MB230769
85000050001MB230769

Redpanda Connect S3 Sink — Single Process

Best Configurations

THREADSFLUSHFETCH MINMSG/S
250001MB61224
2100001MB61224
4100004MB61224
8100004MB61224

Redpanda Connect S3 Sink — Multi-Instance

Best Configurations

INSTANCESFLUSHFETCH MINMSG/S
250001MB73170
8100001MB73170

Write Observations

Kafka Connect

  • flush.size is the dominant factor.
    Larger batches significantly improve throughput by reducing the number of S3 PUT operations.

  • Parallelism helps but saturates quickly.
    Increasing tasks improves throughput until S3 becomes the limiting factor.

  • Timing effects create discrete result bands.
    Flush interval and commit timing introduce measurable latency variance.

  • Practical ceiling: ~230k–250k msg/s.
    This reflects LocalStack S3 limits rather than Kafka itself.


Redpanda Connect

  • Single-process throughput is capped (~60k msg/s).
    Performance is invariant across thread count and configuration.

  • Processing parallelism does not translate to S3 parallelism.
    A shared output path limits scalability.

  • Multiple instances improve throughput (~73k msg/s).
    Parallel S3 writers across processes unlock limited scaling.

  • Scaling saturates quickly.
    Beyond 2 instances, gains disappear due to S3 bottlenecks.

  • Smaller flush sizes perform better.
    They avoid delays caused by timer-based flushes.


FINAL COMPARISON

MetricRedpanda ConnectKafka Connect
Peak throughput61k (single) / 73k (multi)250k msg/s
Typical throughput51k–61k111k–230k
Parameter sensitivityLowHigh
Scaling modelProcess-levelTask-level
Output concurrencyLimitedHigh
Resource footprint~200 MB RSS~2 GB JVM

Summary

  • Kafka Connect achieves ~4× higher peak throughput, driven by multiple independent S3 writers and strong batching efficiency.

  • Redpanda Connect is limited by shared output constraints.
    Internal concurrency improves processing but not S3 write parallelism.

  • Batching is critical for Kafka Connect but largely ineffective for Redpanda Connect.