docs/gateway/benchmarks.mdx
The TensorZero Gateway was built from the ground up with performance in mind.
It's written in Rust and designed to handle extreme concurrency with sub-millisecond overhead.
<Tip>See "Optimize latency and throughput" guide for more details on maximizing performance in production settings.
</Tip>We benchmarked the TensorZero Gateway against the popular LiteLLM Proxy (LiteLLM Gateway).
In a c7i.xlarge instance on AWS (4 vCPUs, 8 GB RAM), LiteLLM fails when concurrency reaches 1,000 QPS with the vast majority of requests timing out.
TensorZero Gateway handles 10,000 QPS in the same instance with 100% success rate and sub-millisecond latencies.
Even at low loads where LiteLLM is stable (100 QPS), TensorZero at 10,000 QPS achieves significantly lower latencies. Building in Rust (TensorZero) led to consistent sub-millisecond latency overhead under extreme load, whereas Python (LiteLLM) becomes a bottleneck even at moderate loads.
| Latency | LiteLLM Proxy (100 QPS) | LiteLLM Proxy (500 QPS) | LiteLLM Proxy (1,000 QPS) | TensorZero Gateway (10,000 QPS) | | :-----: | :----------------------------: | :----------------------------: | :------------------------------: | :------------------------------------: | | Mean | 4.91ms | 7.45ms | Failure | 0.37ms | | 50% | 4.83ms | 5.81ms | Failure | 0.35ms | | 90% | 5.26ms | 10.02ms | Failure | 0.50ms | | 95% | 5.41ms | 13.40ms | Failure | 0.58ms | | 99% | 5.87ms | 39.69ms | Failure | 0.94ms |
At 1,000 QPS, LiteLLM fails entirely with the vast majority of requests timing out, while TensorZero continues to operate smoothly even at 10x that load.
Technical Notes:
c7i.xlarge instance on AWS (4 vCPUs, 8 GB RAM) running Ubuntu 24.04.2 LTS.observability.enabled = false (i.e. disabled logging inferences to your database) in the TensorZero Gateway to make the scenarios comparable. (Even then, the observability features run asynchronously in the background, so they wouldn't materially affect latency given a powerful enough database deployment.)2025.5.7 and LiteLLM 1.74.9.Read more about the technical details and reproduction instructions here.