docs/content/reference/store-compaction.md
The Rerun datastore continuously compacts data as it comes in, in order find a sweet spot between ingestion speed, query performance and memory overhead.
The compaction is triggered by both number of rows and number of bytes thresholds, whichever happens to trigger first.
This is very similar to, and has many parallels with, the micro-batching mechanism running on the SDK side.
You can configure these thresholds using the following environment variables:
Sets the threshold, in bytes, after which a Chunk cannot be compacted any further.
Defaults to RERUN_CHUNK_MAX_BYTES=4194304 (4MiB).
Sets the threshold, in rows, after which a Chunk cannot be compacted any further.
Defaults to RERUN_CHUNK_MAX_ROWS=4096.
Sets the threshold, in rows, after which a Chunk cannot be compacted any further.
Applies specifically to non time-sorted chunks, which can be slower to query.
Defaults to RERUN_CHUNK_MAX_ROWS=1024.