docs/scanbench.md
scanbench benchmarks region scans directly from storage through:
greptime datanode scanbench ...
cargo build -p cmd --bin greptime
./target/debug/greptime datanode scanbench \
--config <CONFIG_TOML> \
--region-id <REGION_ID> \
--table-dir <TABLE_DIR> \
[--scanner <seq|unordered|series>] \
[--scan-config <SCAN_CONFIG_JSON>] \
[--parallelism <N>] \
[--iterations <N>] \
[--path-type <bare|data|metadata>] \
[--force-flat-format] \
[--enable-wal] \
[--pprof-file <FLAMEGRAPH_SVG>] \
[--pprof-after-warmup] \
[--verbose]
--config: Datanode/standalone TOML config.--region-id: Region ID in one of:
<u64> (example: 4398046511104)<table_id>:<region_number> (example: 1024:0)--table-dir: Table directory used in open request (example: greptime/public/1024).--scanner: Scan strategy. Default: seq.
seq: default scanunordered: time-windowed distributionseries: per-series distribution--scan-config: JSON file to tune scan request.--parallelism: Simulated scan parallelism. Default: 1.--iterations: Benchmark iterations. Default: 1.--path-type: Region path type (bare, data, metadata). Default: bare.--force-flat-format: Force reading the region in flat format. Default: disabled.--enable-wal: Enable WAL replay when opening the region. Default: disabled. When enabled, scanbench uses the log store configured in the [wal] section of the config TOML (raft-engine or Kafka). When disabled or when no WAL is configured, a NoopLogStore is used.--pprof-file: Output flamegraph path (Unix only).--pprof-after-warmup: Start profiling after the first iteration, using it as a warmup. Requires --pprof-file. Default: disabled.--verbose / -v: Enable verbose output.{
"projection": [0, 1, 2],
"projection_names": ["host", "cpu"],
"filters": ["host = 'web-1'", "cpu > 80"],
"series_row_selector": "last_row"
}
Notes:
projection (indexes) or projection_names (column names), not both.projection_names uses exact (case-sensitive) column name matching.filters is a list of SQL expressions (not full SQL statements), e.g. "host = 'web-1'".series_row_selector currently supports only "last_row".Default sequential scan:
./target/debug/greptime datanode scanbench \
--config /path/to/config.toml \
--region-id 1024:0 \
--table-dir greptime/public/1024
Unordered scan with parallelism:
./target/debug/greptime datanode scanbench \
--config /path/to/config.toml \
--region-id 1024:0 \
--table-dir greptime/public/1024 \
--scanner unordered \
--parallelism 8 \
--iterations 5
Series scan with scan config and flamegraph:
./target/debug/greptime datanode scanbench \
--config /path/to/config.toml \
--region-id 1024:0 \
--table-dir greptime/public/1024 \
--scanner series \
--scan-config /path/to/scan-config.json \
--pprof-file /tmp/scanbench.svg
Force flat-format read:
./target/debug/greptime datanode scanbench \
--config /path/to/config.toml \
--region-id 1024:0 \
--table-dir greptime/public/1024 \
--force-flat-format
Scan with WAL replay enabled (uses [wal] config from TOML):
./target/debug/greptime datanode scanbench \
--config /path/to/config.toml \
--region-id 1024:0 \
--table-dir greptime/public/1024 \
--enable-wal