docs/concepts/custom_data.md
Nautilus Trader supports custom data authored in Python and Rust, and moves that data through the same runtime, persistence, and query pipeline used by the rest of the platform.
This document explains how custom data is:
The custom-data architecture satisfies the following requirements:
CustomData wrapper at the PyO3 boundary.ParquetDataCatalog using dynamic type registration
instead of hardcoded schemas.There are two supported authoring modes:
| Mode | Example | Registration path | Encode/decode path | Wrapper backend |
|---|---|---|---|---|
| Pure Python | @customdataclass_pyo3 class | register_custom_data_class(...) | Python callback + Arrow C FFI | PythonCustomDataWrapper |
| Same‑binary Rust | #[custom_data] or #[custom_data(pyo3)] type | ensure_custom_data_registered::<T>() and native extractor | Native Rust | Native Rust payload |
Both modes converge on the same outer PyO3 CustomData wrapper and the same
DataType identity model.
sequenceDiagram
participant U as User code
participant P as Python layer
participant R as Rust model/catalog
participant G as Global DataRegistry
participant S as Storage
U->>P: define class/type
U->>P: register_custom_data_class(...) or module init
P->>R: install type registration
R->>G: store JSON/Arrow/extractor handlers
U->>P: CustomData(data_type, data)
P->>R: write_custom_data([...])
R->>G: lookup encoder by type_name
G-->>R: encoder
R->>S: write RecordBatch to Parquet
U->>P: query(type_name, ...)
P->>R: query catalog
R->>S: read RecordBatch + metadata
R->>G: lookup decoder by type_name
G-->>R: decoder
R-->>P: CustomData wrappers
P-->>U: typed data via .data
DataRegistrycrates/model/src/data/registry.rs is the central runtime registry module for
custom data in the main process. Registration uses atomic DashMap::entry() so
that concurrent register_* and ensure_* calls do not race.
The module contains several OnceLock-initialized DashMap singletons:
type_name.type_name.Arc<dyn CustomDataTrait>.Instead of hardcoding every type into the main binary, Nautilus resolves
handlers at runtime using the type_name stored in DataType and Parquet
metadata.
CustomDataThe outer PyO3 CustomData wrapper is the common container that crosses the
FFI boundary.
Constructor signature: CustomData(data_type, data) where DataType comes
first, then the inner payload.
It contains:
DataType.CustomDataTrait (wrapped in
Arc<dyn CustomDataTrait>).Timestamps (ts_event, ts_init) are delegated to the inner
CustomDataTrait implementation and exposed as properties on the wrapper.
On the Python side, CustomData exposes value semantics: __eq__ and
__repr__ are implemented (equality uses the Rust PartialEq logic).
Instances are intentionally unhashable so that equality remains consistent with
the inner payload comparison.
This wrapper is shared across both custom-data modes. User code interacts with one API even though the underlying payload may be:
CustomData JSON envelopeWhen serialized to JSON (e.g. for to_json_bytes / from_json_bytes, SQL
cache, or Redis), CustomData uses a single canonical envelope so that
deserialization does not depend on user payload field names:
type: The custom type name (from CustomDataTrait::type_name).data_type: An object with type_name, metadata, and optional
identifier.payload: The inner payload only (the result of CustomDataTrait::to_json
parsed as a value). Registered deserializers receive only this value in
from_json, so user structs can use any field names (including value)
without conflicting with wrapper metadata.This envelope is produced by Rust CustomData serialization and consumed by
DataRegistry when deserializing custom data from JSON.
DataTypeDataType identifies custom data for routing and persistence.
Constructor: DataType(type_name, metadata=None, identifier=None).
It includes:
type_name.metadata.identifier (used only for catalog pathing, not for routing or
equality).Equality, hashing, and topic routing are derived from type_name and
metadata only. Two DataType values with the same type name and metadata but
different identifiers compare equal and publish to the same message bus topic.
The identifier affects only the storage path under
data/custom/<type_name>/<identifier...>.
Custom-data storage and queries use DataType, not just the bare Rust/Python
class name. This allows the same logical type to be stored under different
metadata or identifiers while still decoding through the same registered
handler.
Registration bridges the gap between Python objects and Rust trait objects.
flowchart TD
A[User-defined custom type] --> B{Mode}
B --> C[Pure Python]
B --> D[Same-binary Rust]
C --> F[register_custom_data_class]
D --> G[ensure_custom_data_registered and native extractor]
F --> I[Python callbacks registered]
G --> J[Native JSON and Arrow handlers registered]
I --> L[Main-process DataRegistry]
J --> L
When Python code calls register_custom_data_class(MyType):
PythonCustomDataWrapper.DataRegistry.This path is flexible and user-friendly, but Arrow encoding and reconstruction rely on Python callbacks.
For Rust types defined inside Nautilus:
#[custom_data] or #[custom_data(pyo3)] generates the necessary trait,
JSON, and Arrow implementations.ensure_custom_data_registered::<T>() inserts native schema/encoder/decoder
handlers into DataRegistry.This path stays fully native in Rust for encode/decode.
register_custom_data_class(...) resolves types in the following order:
That ordering preserves the fastest available path for types already known natively by the main binary.
Internally, the outer CustomData wrapper can hold different payload
implementations.
PythonCustomDataWrapperUsed for pure Python custom data.
Responsibilities:
ts_event, ts_init, and type_name.CustomDataTrait.This is the fallback path when the main process does not have a native Rust representation for the type.
For Rust types compiled into Nautilus, the inner payload is the concrete Rust
type itself and can be downcast directly from Arc<dyn CustomDataTrait>.
No Python callback path is needed for serialization or decode.
Built-in Nautilus data types have schemas and encoders known statically to the
Rust binary. Custom data does not. The persistence layer therefore resolves
custom data dynamically using the registered type_name.
ParquetDataCatalog expects custom writes to come in as CustomData values.
The custom-data write path:
type_name, metadata, and identifier from DataType.DataRegistry.RecordBatch.data_type column containing the persisted DataType.type_name and metadata to the Arrow schema.The path layout is:
data/custom/<type_name>/<identifier...>Identifiers are normalized before becoming path segments.
On query:
type_name from schema metadata.DataRegistry for the registered decoder.RecordBatch into Vec<Data>.CustomData with the original DataType.This makes custom-data query resolution symmetric with write-time registration.
When converting a Feather stream to Parquet (e.g. after a backtest), the
custom-data branch decodes batches and writes them via
write_custom_data_batch so that custom data written through the Feather
writer is correctly converted to Parquet.
Pure Python custom data cannot provide native Rust Arrow encode logic directly.
For those types, Nautilus uses the Arrow C FFI interface to pass RecordBatch
data between Python and Rust without serialization overhead.
sequenceDiagram
participant R as Rust encoder
participant P as Python custom class
participant F as Arrow C FFI structs
participant C as Parquet writer
R->>P: encode_record_batch_py(items)
P->>P: build pyarrow.RecordBatch
P-->>F: _export_to_c (FFI_ArrowArray + FFI_ArrowSchema)
F-->>R: reconstruct native RecordBatch
R->>C: write Parquet
For pure Python classes:
encode_record_batch_py(...) on the Python class.pyarrow.RecordBatch._export_to_c into Arrow C FFI structs.RecordBatch from the FFI structs and writes it.For the reverse direction:
RecordBatch into Arrow C FFI structs.RecordBatch._import_from_c.decode_record_batch_py(metadata, batch) on the class.PythonCustomDataWrapper.The Arrow C FFI bridge is not used for same-binary Rust custom data. Those types use native Rust encode/decode handlers registered in the main process.
When custom data is loaded back from the catalog, reconstruction depends on the backend:
from_dict or from_json.In all cases the caller receives the same outer CustomData wrapper at the
PyO3 API boundary.
Custom data is not only a persistence feature. It also participates in Nautilus runtime routing.
Relevant integrations include:
crates/data/src/engine/mod.rs publishes CustomData through the message
bus.crates/common/src/msgbus/switchboard.rs derives custom topics from
DataType.crates/common/src/actor/* routes custom data into actor subscriptions.crates/trading/src/python/strategy.rs exposes custom data to Python
strategy on_data.crates/backtest/src/engine.rs treats Data::Custom as
data-engine-delivered input rather than exchange-routed data.A registered custom type can be persisted, queried, subscribed to, and consumed through the same runtime interfaces as other data families.
The SQL cache/database layer also supports CustomData.
Current behavior:
custom table.data_type, metadata, identifier, and full
JSON payload.CustomData using CustomData::from_json_bytes(...).add_custom_data and load_custom_data.custom:<ts_init_020>:<uuid> with full CustomData JSON as value.add_custom_data and load_custom_data filter by DataType
(type_name, metadata, identifier) and return results sorted by ts_init;
this is exposed via the PyO3 RedisCacheDatabase API.The Cython @customdataclass system is separate from this architecture.
This document describes the PyO3 custom-data system:
CustomData.This architecture gives Nautilus two important properties:
The result is one conceptual custom-data system with two backends, rather than separate feature silos for Python-only and Rust-only data types.