docs/deployment/optimize-latency-and-throughput.mdx
The TensorZero Gateway is designed from the ground up with performance in mind. Even with default settings, the gateway is fast and lightweight enough to be unnoticeable in most applications. The best practices below are designed to help you optimize the performance of the TensorZero Gateway for production deployments requiring maximum performance.
<Tip>The TensorZero Gateway can achieve <1ms P99 latency overhead at 10,000+ QPS. See Benchmarks for details.
</Tip>By default, the gateway uses async_writes to write observability data asynchronously, returning the response to the client immediately without waiting for database writes to complete.
Each database insert is handled immediately in separate background tasks.
For high-throughput applications, you can use gateway.observability.batch_writes instead, which collects multiple records and writes them together in batches for more efficient writes.
If you need strict data durability guarantees (ensuring data is persisted in the database before sending a response), you can disable async writes by setting gateway.observability.async_writes = false.
As a rule of thumb, consider the following decision matrix:
| High throughput | Low throughput | |
|---|---|---|
| Latency is critical | batch_writes | async_writes (default) |
| Latency is not critical | batch_writes | Synchronous writes |
See the Configuration Reference for more details.