programs/keeper-utils/README.md
A comprehensive utility tool for managing and analyzing ClickHouse Keeper data, including snapshots and changelogs.
Build the tool as part of the main ClickHouse build process:
cd /path/to/ClickHouse
mkdir -p build
cd build
cmake ..
make -j$(nproc) clickhouse-keeper-utils
The binary will be available at ./programs/clickhouse-keeper-utils.
Dump the current state of a Keeper cluster by loading snapshots and applying changelogs.
clickhouse-keeper-utils dump-state [options]
--snapshot-path (required): Path to the directory containing Keeper snapshots--log-path (required): Path to the directory containing Keeper changelogs--output-file, -o: Write output to file instead of stdout--output-format, -f: Output format when using --output-file (default: CSVWithNamesAndTypes)--parallel-output: Enable parallel output format processing when using --output-file--with-acl: Include ACL (Access Control List) information (only used with node tree output)--debug-mode: Enable debug output for log entries--end-index: Process changelog entries up to this index (exclusive). Useful for examining the state at a specific point in time.--dump-sessions: Dump session information instead of the node tree--help, -h: Show help messageNode Tree (default):
--with-acl to include ACL informationSession Information (--dump-sessions):
# Basic usage with default CSV format
clickhouse-keeper-utils dump-state --snapshot-path /var/lib/clickhouse/coordination/snapshots \
--log-path /var/lib/clickhouse/coordination/logs
# Save output to a file with JSON format
clickhouse-keeper-utils dump-state --snapshot-path /var/lib/clickhouse/coordination/snapshots \
--log-path /var/lib/clickhouse/coordination/logs \
--output-file output.json --output-format JSONEachRow
# Enable parallel output processing for better performance
clickhouse-keeper-utils dump-state --snapshot-path /var/lib/clickhouse/coordination/snapshots \
--log-path /var/lib/clickhouse/coordination/logs \
--output-file output.csv --parallel-output
# Include ACL information in the output
clickhouse-keeper-utils dump-state --snapshot-path /var/lib/clickhouse/coordination/snapshots \
--log-path /var/lib/clickhouse/coordination/logs \
--output-file output_with_acl.csv --with-acl
# Combine options: parallel processing with ACL information
clickhouse-keeper-utils dump-state --snapshot-path /var/lib/clickhouse/coordination/snapshots \
--log-path /var/lib/clickhouse/coordination/logs \
--output-file output_parallel_acl.csv --parallel-output --with-acl
# Dump session information instead of node tree
clickhouse-keeper-utils dump-state --snapshot-path /var/lib/clickhouse/coordination/snapshots \
--log-path /var/lib/clickhouse/coordination/logs \
--output-file sessions.csv --dump-sessions
# Dump session information with parallel processing
clickhouse-keeper-utils dump-state --snapshot-path /var/lib/clickhouse/coordination/snapshots \
--log-path /var/lib/clickhouse/coordination/logs \
--output-file sessions_parallel.csv --dump-sessions --parallel-output
# Dump state up to a specific changelog index
clickhouse-keeper-utils dump-state --snapshot-path /var/lib/clickhouse/coordination/snapshots \
--log-path /var/lib/clickhouse/coordination/logs \
--output-file state_at_index_1000.csv --end-index 1001
# Debug state up to a specific index with detailed logging
clickhouse-keeper-utils dump-state --snapshot-path /var/lib/clickhouse/coordination/snapshots \
--log-path /var/lib/clickhouse/coordination/logs \
--output-file debug_state.csv --debug-mode --end-index 5001
When using --dump-sessions, the output includes the following information for each session:
Session: The session IDTimeout: Session timeout in millisecondsEphemeral nodes: Number of ephemeral nodes owned by this sessionAuth: Authentication information (scheme and ID) for the sessionAnalyze Keeper snapshots and print basic information.
clickhouse-keeper-utils snapshot-analyzer [options]
--snapshot-path <path>: Path to the snapshots directory or a specific snapshot file (.bin or .bin.zstd). This is a required argument.--full-storage: If specified, the full storage (including node data) is loaded from the snapshot. This provides more detailed information like node count and digest but is slower. By default, only the node paths are loaded.--with-node-stats: If specified (and --full-storage is not), it calculates and displays statistics about the biggest subtrees, such as the top 10 nodes with the most descendants.--help, -h: Displays the help message.Analyze all snapshots in a directory (basic info):
clickhouse-keeper-utils snapshot-analyzer --snapshot-path /var/lib/clickhouse/coordination/snapshots/
Analyze a specific snapshot file with subtree statistics:
clickhouse-keeper-utils snapshot-analyzer --snapshot-path /var/lib/clickhouse/coordination/snapshots/snapshot_123.bin --with-node-stats
Analyze a snapshot with full storage loaded:
clickhouse-keeper-utils snapshot-analyzer --snapshot-path /var/lib/clickhouse/coordination/snapshots/snapshot_123.bin --full-storage
Analyze Keeper changelogs and print information about them.
clickhouse-keeper-utils changelog-analyzer --log-path <path> [--changelog <file>]
--log-path (required): Path to the directory containing Keeper changelogs--changelog: Analyze a specific changelog file--help, -h: Show help message# Analyze all changelogs in directory
clickhouse-keeper-utils changelog-analyzer --log-path /var/lib/clickhouse/coordination/logs
# Analyze specific changelog file
clickhouse-keeper-utils changelog-analyzer --log-path /var/lib/clickhouse/coordination/logs \
--changelog changelog_1.bin
Extract a range of entries from a changelog to a new file.
clickhouse-keeper-utils changelog-splicer --source <file> --destination <dir> [--start-index <n>] --end-index <n>
--source (required): Path to source changelog file--destination (required): Directory where to save the output changelog file--start-index: Start index (inclusive). If not specified, uses the source changelog's start index--end-index (required): End index (exclusive)--help, -h: Show help message# Extract specific range of entries
clickhouse-keeper-utils changelog-splicer --source /var/lib/clickhouse/coordination/logs/changelog_1.bin \
--destination /tmp --start-index 100 --end-index 200
# Extract from beginning of changelog to specified index
clickhouse-keeper-utils changelog-splicer --source /var/lib/clickhouse/coordination/logs/changelog_1.bin \
--destination /tmp --end-index 200
Deserialize and display the contents of a Keeper changelog file in a human-readable format, with options to filter by index range and include detailed request information.
clickhouse-keeper-utils changelog-deserializer --changelog-path <file> [options]
--changelog-path (required): Path to the changelog file to deserialize--output-file, -o: Write output to file instead of stdout--output-format, -f: Output format (default: CSVWithNamesAndTypes)--parallel-output: Enable parallel output format processing--with-requests: Include deserialized request details (session, ZXID, operation type, etc.)--start-index: Start index (inclusive) of entries to process (default: 0)--end-index: End index (exclusive) of entries to process (default: max uint64_t)--help, -h: Show help messageBasic columns (always included):
log_index: The index of the log entryterm: The Raft term of the entryentry_type: Type of the log entry (app_log, config_log, etc.)entry_size: Size of the entry data in bytesentry_crc32: CRC32 checksum of the entryAdditional columns (with --with-requests):
session_id: Client session ID for the requestzxid: ZooKeeper Transaction IDrequest_timestamp: Timestamp when the request was madedigest_value: Digest value for request verificationdigest_version: Version of the digest algorithmop_num: Type of operation (Create, Set, Get, etc.)xid: Transaction ID for the requestrequest_idx: Index of the request (0 for single requests, 1-N for multi requests)path: Path affected by the requesthas_watch: Whether the request includes a watchversion: Version number for the node (for versioned operations)is_ephemeral: Whether the node is ephemeral (for create operations)is_sequential: Whether the node is sequential (for create operations)data: Node data (for create/set operations)# Basic usage with console output
clickhouse-keeper-utils changelog-deserializer --changelog-path /var/lib/clickhouse/coordination/logs/changelog_1.bin
# Save to file with JSON format
clickhouse-keeper-utils changelog-deserializer --changelog-path /var/lib/clickhouse/coordination/logs/changelog_1.bin \
--output-file output.json --output-format JSONEachRow
# Process a specific range of entries
clickhouse-keeper-utils changelog-deserializer --changelog-path /var/lib/clickhouse/coordination/logs/changelog_1.bin \
--start-index 100 --end-index 200
# Include detailed request information
clickhouse-keeper-utils changelog-deserializer --changelog-path /var/lib/clickhouse/coordination/logs/changelog_1.bin \
--with-requests
# Process a specific range with request details and save to CSV
clickhouse-keeper-utils changelog-deserializer --changelog-path /var/lib/clickhouse/coordination/logs/changelog_1.bin \
--start-index 1000 --end-index 2000 --with-requests --output-file requests.csv
# Enable parallel processing for large files
clickhouse-keeper-utils changelog-deserializer --changelog-path /var/lib/clickhouse/coordination/logs/changelog_1.bin \
--output-file output.csv --parallel-output
# Combine all options
clickhouse-keeper-utils changelog-deserializer --changelog-path /var/lib/clickhouse/coordination/logs/changelog_1.bin \
--start-index 5000 --end-index 10000 --with-requests --output-file filtered_requests.csv --parallel-output
--with-requests, the output will include detailed information about each request, including session information, operation type, and node data.log_index but different request_idx.entry_type field indicates the type of log entry (e.g., app_log for regular operations, config_log for configuration changes).op_num field shows the type of ZooKeeper operation (Create, Set, Get, etc.) using the standard ZooKeeper op codes.--parallel-output for better performance.The dump-state command supports various output formats, including:
CSV / CSVWithNames / CSVWithNamesAndTypes (default)JSON / JSONEachRowTSV / TSVRaw / TSVWithNames / TSVWithNamesAndTypes--parallel-output and redirect output to a fileThis utility is part of ClickHouse and is distributed under the Apache 2.0 License.