Back to Nautilus Trader

Databento Adapter Benchmarks

crates/adapters/databento/benches/BENCHMARKS.md

1.230.07.3 KB
Original Source

Databento Adapter Benchmarks

Numbers measured 2026-06-26 on AMD Ryzen Threadripper 9980X under rustc 1.96.0, bench-lto profile (release opts + lto = "fat" + codegen-units = 1, debug = full). ASLR is disabled via setarch -R for the run. The CPU governor was performance for this capture.

Refresh on substantive perf change or before release; bump the date. Absolute numbers vary by machine; only same-machine deltas are meaningful.

How to reproduce

bash
sudo cpupower frequency-set -g performance
CARGO_BUILD_JOBS=16 setarch "$(uname -m)" -R cargo bench -p nautilus-databento --profile bench-lto \
    --bench data --bench micros --bench clients
sudo cpupower frequency-set -g powersave

For policy and the general noise-reduction recipe see BENCHMARKING.md at the repo root.

DBN stream decode (data.rs)

Fixture file -> Databento typed record. Covers file open, zstd setup/decode, and DBN decode. Stops before Nautilus instrument lookup, precision resolution, and domain type construction.

BenchMedianThroughput
dbn_stream_decode/mbo2.06 µs970 k/s
dbn_stream_decode/mbp12.03 µs983 k/s
dbn_stream_decode/mbp103.31 µs604 k/s
dbn_stream_decode/trades2.06 µs971 k/s
dbn_stream_decode/ohlcv_1s2.02 µs990 k/s
dbn_stream_decode/status2.13 µs1.88 M/s

Historical loader (data.rs)

Fixture file -> Nautilus domain value through the public DatabentoDataLoader API. Covers file open, zstd + DBN decode, instrument resolution when needed, price precision resolution, Nautilus type construction, and collection into the public return shape. No async runtime and no channel.

The benches use the same compressed fixtures as the Databento tests and seed ESM4.GLBX with price precision 2.

BenchMedianThroughput
historical_loader/mbo_deltas2.14 µs934 k/s
historical_loader/mbp1_quotes2.21 µs905 k/s
historical_loader/mbp10_depth3.97 µs504 k/s
historical_loader/bbo_quotes3.66 µs1.09 M/s
historical_loader/cmbp_quotes2.28 µs879 k/s
historical_loader/cbbo_quotes2.18 µs916 k/s
historical_loader/tbbo_trades3.46 µs578 k/s
historical_loader/trades2.16 µs924 k/s
historical_loader/bars2.30 µs871 k/s
historical_loader/status2.13 µs1.87 M/s
historical_loader/imbalance9.06 µs221 k/s
historical_loader/statistics1.89 µs1.06 M/s

Large MBO fixture diagnostics (data.rs)

The larger MBO diagnostics use tests/test_data/databento/esh4-glbx-mdp3-20231225.mbo.dbn.zst, a committed 997 KB DBN fixture with 68,792 raw MBO records and 65,819 decoded deltas. They exercise sustained decode and loader behavior without depending on local-only data files.

BenchMedianThroughput
large_mbo/dbn_stream_decode2.75 ms25.1 M/s
large_mbo/loader_collect5.67 ms11.6 M/s
large_mbo/loader_stream_count5.35 ms12.3 M/s

Client end-to-end (clients.rs)

Deterministic local-network benches for the public client surfaces. The historical rows route DatabentoHistoricalClient through a local HTTP fixture server that returns committed DBN zstd fixtures. The live row routes DatabentoFeedHandler through the mock LSG protocol server. The trade row measures one session with authenticate, subscribe, start, symbol mapping, 100 trade records, message-channel receive, and close. The MBO row keeps one live session open after subscribe/start warmup, then measures repeated 10,000-record MBO byte bursts through socket read, DBN decode, live handler dispatch, MBO buffering, and message-channel receive.

These rows are useful for same-machine regressions in client orchestration. They are not external Databento service latency claims.

BenchMedianThroughput
historical_client/trades_http65.5 µs30.5 k/s
historical_client/mbp1_quotes_http66.3 µs30.2 k/s
live_client/trades_mock_lsg41.1 ms2.43 k/s
live_client/mbo_stream_mock_lsg3.40 ms2.94 M/s

Component breakdown (micros.rs)

Diagnostic benches that decompose the pipeline numbers above. Use these to localise where time goes when a loader bench regresses.

record_decode measures already-decoded Databento records converted into Nautilus domain values.

BenchMedian
record_decode/mbo_delta13.0 ns
record_decode/mbo_trade24.9 ns
record_decode/trade25.0 ns
record_decode/mbp1_quote32.3 ns
record_decode/mbp1_trade48.4 ns
record_decode/mbp10_depth187 ns
record_decode/bbo_quote19.4 ns
record_decode/cmbp_quote32.2 ns
record_decode/cmbp_trade81.4 ns
record_decode/tbbo47.2 ns
record_decode/ohlcv16.4 ns
record_decode/status11.3 ns
record_decode/imbalance16.3 ns
record_decode/statistics5.34 ns

record_dispatch measures the generic RecordRef branch chain used by the loader and live feed handler.

BenchMedian
record_dispatch/trade36.9 ns
record_dispatch/mbp10_depth237 ns
record_dispatch/ohlcv27.9 ns

atom isolates primitive price, quantity, precision, record-header, and trade-ID construction costs.

BenchMedian
atom/decode_price_or_undef387 ps
atom/decode_price_increment6.72 ns
atom/decode_quantity5.93 ns
atom/precision_from_raw1.15 ns
atom/trade_id_from_sequence12.9 ns
atom/record_header_ref192 ps

Notes

  • File-backed benches include open and zstd setup costs because those costs are part of historical loader usage. The fixtures are tiny, so these rows are regression baselines for the public loader API rather than sustained streaming throughput claims.
  • Direct record decode is not the historical-loader bottleneck for most schemas. File open, zstd setup/decode, DBN stream iteration, and collection dominate the µs-level loader rows.
  • MBP10 direct decode is the largest pure Nautilus conversion row because it constructs 10 bid orders, 10 ask orders, and both count arrays.
  • CMBP trade rows include deterministic trade-ID derivation because CMBP/TCBBO schemas do not publish native trade IDs. The derivation hashes the instrument id, timestamps, price, size, and side without allocating an intermediate InstrumentId string, then formats the hash through a fixed hex buffer.
  • historical_loader/imbalance is materially slower than its direct decode row. If imbalance ingestion matters for a production workload, profile the stream path before changing domain construction.