Back to Nautilus Trader

Derive Adapter Benchmarks

crates/adapters/derive/benches/BENCHMARKS.md

1.228.08.0 KB
Original Source

Derive Adapter Benchmarks

Numbers measured 2026-05-30 on AMD Ryzen Threadripper 9980X under rustc 1.95.0, bench-lto profile (release opts + lto = "fat" + codegen-units = 1, debug = full). The CPU governor is pinned to performance and ASLR is disabled via setarch -R. The suite is run once to warm caches and settle clocks, then measured; a cold first run inflates every row by roughly 10%.

Refresh on substantive perf change or before release; bump the date. Absolute numbers vary by machine; only same-machine deltas are meaningful.

How to reproduce

bash
sudo cpupower frequency-set -g performance
setarch -R cargo bench -p nautilus-derive --profile bench-lto \
    --bench data --bench exec --bench micros --bench signing  # warm-up
setarch -R cargo bench -p nautilus-derive --profile bench-lto \
    --bench data --bench exec --bench micros --bench signing  # measured
sudo cpupower frequency-set -g powersave  # restore default

For policy and the general noise-reduction recipe see BENCHMARKING.md at the repo root.

Inbound pipeline (data.rs)

Raw WS frame bytes -> Nautilus domain type. Covers frame decode (single-pass into a typed frame) + channel decode (raw payload bytes -> typed struct) + parse + Nautilus type construction. No I/O, no async runtime, no channel. The bars row is the REST OHLCV path (Derive has no WS candle channel): it decodes the candle record and builds a Bar.

Rows ordered from the most fundamental market-data stream (book deltas) down through the ticker-derived streams (quotes/mark/index/funding/bars), then the options-specific greeks stream last.

BenchMedianThroughput
inbound_pipeline/book_deltas473 ns2.12 M/s
inbound_pipeline/quotes1.64 µs610 k/s
inbound_pipeline/trades742 ns1.35 M/s
inbound_pipeline/mark_price1.59 µs631 k/s
inbound_pipeline/index_price1.59 µs631 k/s
inbound_pipeline/funding_rate1.57 µs639 k/s
inbound_pipeline/bars670 ns1.49 M/s
inbound_pipeline/option_greeks2.35 µs426 k/s

Execution pipeline (exec.rs)

Strategy command (OrderAny / cancel) -> wire bytes ready to send. submit_limit, submit_market, and modify cover the signed private/order and private/replace path (normalize + ABI encode + EIP-712 sign + JSON serialize). cancel covers the unsigned private/cancel path (build + serialize). Derive supports only Limit and Market orders, so there is no stop-order row.

BenchMedianThroughput
exec_pipeline/submit_limit42.1 µs23.8 k/s
exec_pipeline/submit_market42.1 µs23.7 k/s
exec_pipeline/modify42.1 µs23.7 k/s
exec_pipeline/cancel45.8 ns21.8 M/s

Signing (signing.rs)

sign_trade_action is the EIP-712 order signature (ABI encode + keccak + secp256k1) the order-submit path pays per order. rest_auth_headers is the EIP-191 timestamp signature the HTTP read path pays per request. signer_from_key is the secp256k1 key expansion, paid once at client startup.

BenchMedian
sign_trade_action42.0 µs
rest_auth_headers40.9 µs
signer_from_key31.6 µs
abi_encode_trade236 ns
nonce_next45.8 ns

Dispatch (exec.rs)

Venue WS payload (DeriveOrdersSubscriptionData, DeriveTradesSubscriptionData) -> events emitted via ExecutionEventEmitter. Covers parse + dedup + identity lookup + event construction through dispatch_orders_payload / dispatch_trades_payload. orders_untracked forwards a raw status report (no registered identity); orders_tracked and trades_fill resolve a registered identity and emit OrderAccepted / OrderFilled events.

BenchMedianThroughput
dispatch/orders_untracked8.53 µs117 k/s
dispatch/orders_tracked9.01 µs111 k/s
dispatch/trades_fill8.45 µs118 k/s

Component breakdown (micros.rs)

Diagnostic benches that decompose the pipeline numbers above. Use these to localise where time goes when a pipeline bench regresses. decode_only is the raw-bytes -> typed-message cost; parse_only is the typed-message -> Nautilus domain cost; the two sum to the matching inbound number. order_report and fill_report decompose the inbound execution path that dispatch runs end-to-end.

BenchMedian
decode_only/orderbook423 ns
decode_only/ticker1.56 µs
parse_only/orderbook_deltas49.9 ns
parse_only/trade32.2 ns
parse_only/ticker_quote36.6 ns
parse_only/order_report90.7 ns
parse_only/fill_report109 ns
atom/decimal_from_str6.97 ns
atom/price_from_decimal_dp6.54 ns
atom/price_combined12.3 ns
atom/trade_id_new8.89 ns
atom/uuid4_new12.9 ns
atom/state_construct_primed4.11 µs
atom/state_drop_primed1.13 µs
atom/dedup_trade_hit11.6 ns

Notes

  • Inbound decode avoids the Value intermediate. The frame parses in a single pass into a typed struct, capturing params.data as a serde_json::value::RawValue (the raw payload bytes); each channel parser then decodes those bytes straight into its typed struct. Nothing materialises the frame, or the large data subtree, into a serde_json::Value tree. This roughly halved every inbound row versus the prior Value-based decode (e.g. decode_only/ticker 3.11 µs -> 1.56 µs, book_deltas 1.06 µs -> 0.47 µs).
  • Inbound is still decode-dominated. decode_only accounts for ~90% of book_deltas (423 ns of 473 ns) and ~95% of quotes (1.56 µs of 1.64 µs). Parse itself is sub-50 ns for a book delta and under 40 ns for a quote/trade; Decimal, Price, UUID4, and TradeId construction are all sub-15 ns.
  • The four ticker-derived rows share one decode in production. quotes, mark_price, index_price, and funding_rate each measure a standalone DeriveTickerMsg decode (~1.56 µs) plus a sub-40 ns parse, so they all land at ~1.6 µs. The live data client decodes a ticker frame once and derives all four from that single message; summing the four rows overcounts. The lever for all of them is the ticker decode, not the per-stream parse.
  • option_greeks is the heaviest inbound row (2.35 µs) because the option slim ticker carries the option_pricing block (delta/gamma/vega/theta/rho, IVs, forward) on top of the shared ticker fields.
  • Exec is signature-bound. sign_trade_action (EIP-712: ABI encode + keccak + secp256k1) is 42.0 µs and dominates submit_limit/submit_market and modify (all ~42.1 µs); ABI encode (236 ns) and JSON serialize are noise next to it. cancel is unsigned and lands at 46 ns. Optimisations that don't change the signing scheme won't move the signed rows. rest_auth_headers (EIP-191) costs ~41 µs because it is the same secp256k1 sign.
  • signer_from_key is amortised. The 31.6 µs secp256k1 key expansion runs once when the execution client constructs its signer, not per order.
  • Dispatch runs against a fresh WsDispatchState each iteration. The state is rebuilt in the iter_batched setup closure (excluded from timing), so the measured time is parse + dedup + identity lookup + ExecutionEventEmitter send. The channel send adds variance; these rows are noisier than the inbound and exec groups. Production state lives forever, so the steady-state dedup hit is ~12 ns (atom/dedup_trade_hit) rather than the per-iteration construct + drop (atom/state_construct_primed + atom/state_drop_primed).