ClickHouse

Plugin: go.d.plugin Module: clickhouse

Overview

This collector retrieves performance data from ClickHouse for connections, queries, resources, replication, IO, and data operations (inserts, selects, merges) using HTTP requests and ClickHouse system tables. It monitors your ClickHouse server's health and activity.

It sends HTTP requests to the ClickHouse HTTP interface, executing SELECT queries to retrieve data from various system tables. Specifically, it collects metrics from the following tables:

system.metrics
system.async_metrics
system.events
system.disks
system.parts
system.processes

This collector is supported on all platforms.

This collector supports collecting metrics from multiple instances of this integration, including remote instances.

Default Behavior

Auto-Detection

By default, it detects ClickHouse instances running on localhost that are listening on port 8123. On startup, it tries to collect metrics from:

http://127.0.0.1:8123

Limits

The default configuration for this integration does not impose any limits on data collection.

Performance Impact

The default configuration for this integration is not expected to impose a significant performance impact on the system.

Setup

You can configure the clickhouse collector in two ways:

Method	Best for	How to
UI	Fast setup without editing files	Go to Nodes → Configure this node → Collectors → Jobs, search for clickhouse, then click + to add a job.
File	If you prefer configuring via file, or need to automate deployments (e.g., with Ansible)	Edit `go.d/clickhouse.conf` and add a job.

:::important

UI configuration requires paid Netdata Cloud plan.

:::

Prerequisites

No action required.

Configuration

Options

The following options can be defined globally: update_every, autodetection_retry.

<details open><summary>Config options</summary>

Group	Option	Description	Default	Required
Collection	update_every	Data collection interval (seconds).	1	no
	autodetection_retry	Autodetection retry interval (seconds). Set 0 to disable.	0	no
Target	url	Target endpoint URL.	http://127.0.0.1:8123	yes
	timeout	HTTP request timeout (seconds).	1	no
HTTP Auth	username	Username for Basic HTTP authentication.		no
	password	Password for Basic HTTP authentication.		no
	bearer_token_file	Path to a file containing a bearer token (used for `Authorization: Bearer`).		no
TLS	tls_skip_verify	Skip TLS certificate and hostname verification (insecure).	no	no
	tls_ca	Path to CA bundle used to validate the server certificate.		no
	tls_cert	Path to client TLS certificate (for mTLS).		no
	tls_key	Path to client TLS private key (for mTLS).		no
Proxy	proxy_url	HTTP proxy URL.		no
	proxy_username	Username for proxy Basic HTTP authentication.		no
	proxy_password	Password for proxy Basic HTTP authentication.		no
Request	method	HTTP method to use.	GET	no
	body	Request body (e.g., for POST/PUT).		no
	headers	Additional HTTP headers (one per line as key: value).		no
	not_follow_redirects	Do not follow HTTP redirects.	no	no
	force_http2	Force HTTP/2 (including h2c over TCP).	no	no
Functions	functions.top_queries.disabled	Disable the top-queries function.	no	no
	functions.top_queries.timeout	Query timeout (seconds). Uses collector timeout if not set.		no
	functions.top_queries.limit	Maximum number of queries to return.	500	no
Virtual Node	vnode	Associates this data collection job with a Virtual Node.		no

</details>

via UI

Configure the clickhouse collector from the Netdata web interface:

Go to Nodes.
Select the node where you want the clickhouse data-collection job to run and click the :gear: (Configure this node). That node will run the data collection.
The Collectors → Jobs view opens by default.
In the Search box, type clickhouse (or scroll the list) to locate the clickhouse collector.
Click the + next to the clickhouse collector to add a new job.
Fill in the job fields, then click Test to verify the configuration and Submit to save.
- Test runs the job with the provided settings and shows whether data can be collected.
- If it fails, an error message appears with details (for example, connection refused, timeout, or command execution errors), so you can adjust and retest.

via File

The configuration file name for this integration is go.d/clickhouse.conf.

The file format is YAML. Generally, the structure is:

yaml

update_every: 1
autodetection_retry: 0
jobs:
  - name: some_name1
  - name: some_name2

You can edit the configuration file using the edit-config script from the Netdata config directory.

bash

cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata
sudo ./edit-config go.d/clickhouse.conf

Examples

Basic

A basic example configuration.

yaml

jobs:
  - name: local
    url: http://127.0.0.1:8123

HTTP authentication

Basic HTTP authentication.

<details open><summary>Config</summary>

yaml

jobs:
  - name: local
    url: http://127.0.0.1:8123
    username: username
    password: password

</details>

HTTPS with self-signed certificate

ClickHouse with enabled HTTPS and self-signed certificate.

<details open><summary>Config</summary>

yaml

jobs:
  - name: local
    url: https://127.0.0.1:8123
    tls_skip_verify: yes

</details>

Multi-instance

Note: When you define multiple jobs, their names must be unique.

Collecting metrics from local and remote instances.

<details open><summary>Config</summary>

yaml

jobs:
  - name: local
    url: http://127.0.0.1:8123

  - name: remote
    url: http://192.0.2.1:8123

</details>

Alerts

The following alerts are available:

Alert name	On metric	Description
clickhouse_restarted	clickhouse.uptime	ClickHouse has recently been restarted
clickhouse_queries_preempted	clickhouse.queries_preempted	ClickHouse has queries that are stopped and waiting due to priority setting
clickhouse_long_running_query	clickhouse.longest_running_query_time	ClickHouse has a long-running query exceeding the threshold
clickhouse_rejected_inserts	clickhouse.rejected_inserts	ClickHouse has INSERT queries that are rejected due to high number of active data parts for partition in a MergeTree
clickhouse_delayed_inserts	clickhouse.delayed_inserts	ClickHouse has INSERT queries that are throttled due to high number of active data parts for partition in a MergeTree
clickhouse_replication_lag	clickhouse.replicas_max_absolute_delay	ClickHouse is experiencing replication lag greater than 5 minutes
clickhouse_replicated_readonly_tables	clickhouse.replicated_readonly_tables	ClickHouse has replicated tables in readonly state due to ZooKeeper session loss/startup without ZooKeeper configured
clickhouse_max_part_count_for_partition	clickhouse.max_part_count_for_partition	ClickHouse high number of parts per partition
clickhouse_distributed_connections_failures	clickhouse.distributed_connections_fail_exhausted_retries	ClickHouse has failed distributed connections after exhausting all retry attempts
clickhouse_distributed_files_to_insert	clickhouse.distributed_files_to_insert	ClickHouse high number of pending files to process for asynchronous insertion into Distributed tables

Metrics

Metrics grouped by scope.

The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.

Per ClickHouse instance

These metrics refer to the entire monitored application.

This scope has no labels.

Metrics:

Metric	Dimensions	Unit
clickhouse.connections	tcp, http, mysql, postgresql, interserver	connections
clickhouse.slow_reads	slow	reads/s
clickhouse.read_backoff	read_backoff	events/s
clickhouse.memory_usage	used	bytes
clickhouse.running_queries	running	queries
clickhouse.queries_preempted	preempted	queries
clickhouse.queries	successful, failed	queries/s
clickhouse.select_queries	successful, failed	selects/s
clickhouse.insert_queries	successful, failed	inserts/s
clickhouse.queries_memory_limit_exceeded	mem_limit_exceeded	queries/s
clickhouse.longest_running_query_time	longest_query_time	seconds
clickhouse.queries_latency	queries_time	microseconds
clickhouse.select_queries_latency	selects_time	microseconds
clickhouse.insert_queries_latency	inserts_time	microseconds
clickhouse.io	reads, writes	bytes/s
clickhouse.iops	reads, writes	ops/s
clickhouse.io_errors	read, write	errors/s
clickhouse.io_seeks	lseek	ops/s
clickhouse.io_file_opens	file_open	ops/s
clickhouse.replicated_parts_current_activity	fetch, send, check	parts
clickhouse.replicas_max_absolute_dela	replication_delay	seconds
clickhouse.replicated_readonly_tables	read_only	tables
clickhouse.replicated_data_loss	data_loss	events
clickhouse.replicated_part_fetches	successful, failed	fetches/s
clickhouse.inserted_rows	inserted	rows/s
clickhouse.inserted_bytes	inserted	bytes/s
clickhouse.rejected_inserts	rejected	inserts/s
clickhouse.delayed_inserts	delayed	inserts/s
clickhouse.delayed_inserts_throttle_time	delayed_inserts_throttle_time	milliseconds
clickhouse.selected_bytes	selected	bytes/s
clickhouse.selected_rows	selected	rows/s
clickhouse.selected_parts	selected	parts/s
clickhouse.selected_ranges	selected	ranges/s
clickhouse.selected_marks	selected	marks/s
clickhouse.merges	merge	ops/s
clickhouse.merges_latency	merges_time	milliseconds
clickhouse.merged_uncompressed_bytes	merged_uncompressed	bytes/s
clickhouse.merged_rows	merged	rows/s
clickhouse.merge_tree_data_writer_inserted_rows	inserted	rows/s
clickhouse.merge_tree_data_writer_uncompressed_bytes	inserted	bytes/s
clickhouse.merge_tree_data_writer_compressed_bytes	written	bytes/s
clickhouse.uncompressed_cache_requests	hits, misses	requests/s
clickhouse.mark_cache_requests	hits, misses	requests/s
clickhouse.max_part_count_for_partition	max_parts_partition	parts
clickhouse.parts_count	temporary, pre_active, active, deleting, delete_on_destroy, outdated, wide, compact	parts
distributed_connections	active	connections
distributed_connections_attempts	connection	attempts/s
distributed_connections_fail_retries	connection_retry	fails/s
distributed_connections_fail_exhausted_retries	connection_retry_exhausted	fails/s
distributed_files_to_insert	pending_insertions	files
distributed_rejected_inserts	rejected	inserts/s
distributed_delayed_inserts	delayed	inserts/s
distributed_delayed_inserts_latency	delayed_time	milliseconds
distributed_sync_insertion_timeout_exceeded	sync_insertion	timeouts/s
distributed_async_insertions_failures	async_insertions	failures/s
clickhouse.uptime	uptime	seconds

Per disk

These metrics refer to the Disk.

Labels:

Label	Description
disk_name	Name of the disk as defined in the server configuration.

Metrics:

Metric	Dimensions	Unit
clickhouse.disk_space_usage	free, used	bytes

Per table

These metrics refer to the Database Table.

Labels:

Label	Description
database	Name of the database.
table	Name of the table.

Metrics:

Metric	Dimensions	Unit
clickhouse.database_table_size	size	bytes
clickhouse.database_table_parts	parts	parts
clickhouse.database_table_rows	rows	rows

Live Data

This collector exposes real-time functions for interactive troubleshooting in the Live tab.

Top Queries

Retrieves and aggregates SQL query performance metrics from ClickHouse system.query_log table.

This function queries the system.query_log table, which contains information about executed queries including timing, resource usage, and execution statistics. Queries are grouped by their normalized hash (normalized_query_hash) to aggregate statistics for identical query patterns with different literal values.

Use cases:

Identify slow queries that consume the most execution time
Find frequently executed queries that may benefit from optimization
Analyze I/O patterns by examining read/written rows and bytes

Query text is truncated at 4096 characters for display purposes.

Aspect	Description
Name	`Clickhouse:top-queries`
Require Cloud	yes
Performance	Queries `system.query_log` table and aggregates by `normalized_query_hash`:
• On busy systems with high query throughput, the table can grow large
• Default limit of 500 rows balances usefulness with performance
Security	Query text may contain unmasked literal values including potentially sensitive data:
• Personal information in query parameters
• Business data and internal identifiers
• Access should be restricted to authorized personnel only
Availability	Available when:
• The collector has successfully connected to ClickHouse
• `system.query_log` table is accessible
• Returns HTTP 503 if `system.query_log` is not accessible
• Returns HTTP 500 if the query fails
• Returns HTTP 504 if the query times out

Prerequisites

Grant access to `system.query_log`

Ensure the Netdata user can read system.query_log on the target ClickHouse instance.

Verify query_log is enabled (enabled by default):
sql
```
SELECT * FROM system.query_log LIMIT 1;
```
If using a dedicated monitoring user, grant SELECT access:
sql
```
GRANT SELECT ON system.query_log TO netdata_user;
```

:::info

The query_log table is enabled by default in ClickHouse
Only queries with type='QueryFinish' are included in the results
The normalized_query_hash column is used for grouping when available

:::

Parameters

Parameter	Type	Description	Required	Default	Options
Filter By	select	Select the primary sort column. The available options include total execution time, number of calls, rows read, and more. Defaults to total execution time to focus on most resource-intensive queries.	yes	totalTime

Returns

Aggregated query statistics from system.query_log, grouped by normalized query hash. Each row represents a unique query pattern with cumulative metrics across all executions.

Column	Type	Unit	Visibility	Description
Query ID	string		hidden	Unique hash identifier for the normalized query pattern. Queries with identical structure but different literal values share the same hash.
Query	string			SQL query text from one of the executions. Truncated to 4096 characters. Use this to identify the actual SQL being executed.
Database	string			Database name where the query was executed. Empty string for queries without a database context or system queries.
User	string			ClickHouse user that executed the query. Useful for identifying query sources and implementing per-user resource monitoring.
Calls	integer			Total number of times this query pattern has been executed. High values indicate frequently run queries that impact overall server load.
Total Time	duration	milliseconds		Cumulative execution time across all executions. High values indicate queries that consume significant server resources over time.
Avg Time	duration	milliseconds		Average execution time per query run. Use this to compare typical performance across different query patterns.
Min Time	duration	milliseconds	hidden	Minimum execution time observed for a single execution. Helps identify best-case query performance.
Max Time	duration	milliseconds	hidden	Maximum execution time observed for a single execution. Large gaps between min and max may indicate data skew or resource contention.
Read Rows	integer			Total number of rows read from storage across all executions. High values suggest queries scanning large amounts of data that may benefit from better filtering or indexing.
Read Bytes	integer			Total bytes read from storage across all executions. Indicates I/O load and data transfer volume for the query pattern.
Written Rows	integer		hidden	Total number of rows written across all executions. Relevant for INSERT, CREATE, or materialized view queries.
Written Bytes	integer		hidden	Total bytes written across all executions. Indicates storage impact of write operations.
Result Rows	integer			Total number of rows returned to clients across all executions. A high ratio of read rows to result rows indicates filtering or aggregation happening on large datasets.
Result Bytes	integer		hidden	Total bytes returned to clients across all executions. Large values may indicate queries returning more data than necessary.
Max Memory	float		hidden	Maximum memory used during any single execution. High values may indicate queries at risk of hitting memory limits under load.

Troubleshooting

Debug Mode

Important: Debug mode is not supported for data collection jobs created via the UI using the Dyncfg feature.

To troubleshoot issues with the clickhouse collector, run the go.d.plugin with the debug option enabled. The output should give you clues as to why the collector isn't working.

Navigate to the plugins.d directory, usually at /usr/libexec/netdata/plugins.d/. If that's not the case on your system, open netdata.conf and look for the plugins setting under [directories].
bash
```
cd /usr/libexec/netdata/plugins.d/
```
Switch to the netdata user.
bash
```
sudo -u netdata -s
```
Run the go.d.plugin to debug the collector:
bash
```
./go.d.plugin -d -m clickhouse
```
To debug a specific job:
bash
```
./go.d.plugin -d -m clickhouse -j jobName
```

Getting Logs

If you're encountering problems with the clickhouse collector, follow these steps to retrieve logs and identify potential issues:

Run the command specific to your system (systemd, non-systemd, or Docker container).
Examine the output for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.

System with systemd

Use the following command to view logs generated since the last Netdata service restart:

bash

journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep clickhouse

System without systemd

Locate the collector log file, typically at /var/log/netdata/collector.log, and use grep to filter for collector's name:

bash

grep clickhouse /var/log/netdata/collector.log

Note: This method shows logs from all restarts. Focus on the latest entries for troubleshooting current issues.

Docker Container

If your Netdata runs in a Docker container named "netdata" (replace if different), use this command:

bash

docker logs netdata 2>&1 | grep clickhouse

ClickHouse

ClickHouse

Overview

Default Behavior

Auto-Detection

Limits

Performance Impact

Setup

Prerequisites

Configuration

Options

via UI

via File

Examples

Basic

HTTP authentication

HTTPS with self-signed certificate

Multi-instance

Alerts

Metrics

Per ClickHouse instance

Per disk

Per table

Live Data

Top Queries

Prerequisites

Grant access to system.query_log

Parameters

Returns

Troubleshooting

Debug Mode

Getting Logs

System with systemd

System without systemd

Docker Container

Grant access to `system.query_log`