docs/en/operations/opentelemetry.md
OpenTelemetry is an open standard for collecting traces and metrics from the distributed application. ClickHouse has some support for OpenTelemetry.
ClickHouse accepts trace context HTTP headers, as described by the W3C recommendation. It also accepts trace context over a native protocol that is used for communication between ClickHouse servers or between the client and server. For manual testing, trace context headers conforming to the Trace Context recommendation can be supplied to clickhouse-client using --opentelemetry-traceparent and --opentelemetry-tracestate flags.
If no parent trace context is supplied or the provided trace context does not comply with W3C standard above, ClickHouse can start a new trace, with probability controlled by the opentelemetry_start_trace_probability setting.
The trace context is propagated to downstream services in the following cases:
Queries to remote ClickHouse servers, such as when using Distributed table engine.
url table function. Trace context information is sent in HTTP headers.
ClickHouse supports OpenTelemetry tracing for ClickHouse Keeper requests (ZooKeeper-compatible coordination service). This feature provides detailed visibility into the lifecycle of Keeper operations, from client request submission through server-side processing.
To enable tracing for Keeper requests, configure the following settings in your ZooKeeper/Keeper client configuration:
<clickhouse>
<zookeeper>
<node>
<host>keeper1</host>
<port>9181</port>
</node>
<!-- Enable OpenTelemetry tracing context propagation -->
<pass_opentelemetry_tracing_context>true</pass_opentelemetry_tracing_context>
</zookeeper>
</clickhouse>
When tracing is enabled, ClickHouse creates spans for both client-side and server-side Keeper operations:
Client-side spans:
zookeeper.create — Create a new nodezookeeper.get — Get node datazookeeper.set — Set node datazookeeper.remove — Remove a nodezookeeper.list — List child nodeszookeeper.exists — Check if a node existszookeeper.multi — Execute multiple operations atomicallyzookeeper.client.requests_queue — Time spent queueing requests before sendingServer-side spans (Keeper):
keeper.receive_request — Receiving and parsing the request from the clientkeeper.dispatcher.requests_queue — Request queuing in the dispatcherkeeper.write.pre_commit — Preprocessing write requests before Raft commitkeeper.write.commit — Processing write requests after Raft commitkeeper.read.wait_for_write — Read requests waiting for dependent writeskeeper.read.process — Processing read requestskeeper.dispatcher.responses_queue — Response queuing in the dispatcherkeeper.send_response — Sending the response to the clientTo manage tracing overhead, Keeper implements dynamic sampling. Sampling rate automatically adjusts between 1/10,000 and 1/10 based on request size. All requests (sampled and unsampled) have their durations recorded to histogram metrics for performance monitoring.
ClickHouse creates trace spans for each query and some of the query execution stages, such as query planning or distributed queries.
To be useful, the tracing information has to be exported to a monitoring system that supports OpenTelemetry, such as Jaeger or Prometheus. ClickHouse avoids a dependency on a particular monitoring system, instead only providing the tracing data through a system table. OpenTelemetry trace span information required by the standard is stored in the system.opentelemetry_span_log table.
The table must be enabled in the server configuration, see the opentelemetry_span_log element in the default config file config.xml. It is enabled by default.
The tags or attributes are saved as two parallel arrays, containing the keys and values. Use ARRAY JOIN to work with them.
Setting log_query_settings allows log changes to query settings during query execution. When enabled, any modifications made to query settings will be recorded in the OpenTelemetry span log. This feature is particularly useful in production environments for tracking configuration changes that may affect query performance.
At the moment, there is no ready tool that can export the tracing data from ClickHouse to a monitoring system.
For testing, it is possible to setup the export using a materialized view with the URL engine over the system.opentelemetry_span_log table, which would push the arriving log data to an HTTP endpoint of a trace collector. For example, to push the minimal span data to a Zipkin instance running at http://localhost:9411, in Zipkin v2 JSON format:
CREATE MATERIALIZED VIEW default.zipkin_spans
ENGINE = URL('http://127.0.0.1:9411/api/v2/spans', 'JSONEachRow')
SETTINGS output_format_json_named_tuples_as_objects = 1,
output_format_json_array_of_rows = 1 AS
SELECT
lower(hex(trace_id)) AS traceId,
CASE WHEN parent_span_id = 0 THEN '' ELSE lower(hex(parent_span_id)) END AS parentId,
lower(hex(span_id)) AS id,
operation_name AS name,
start_time_us AS timestamp,
finish_time_us - start_time_us AS duration,
cast(tuple('clickhouse'), 'Tuple(serviceName text)') AS localEndpoint,
cast(tuple(
attribute.values[indexOf(attribute.names, 'db.statement')]),
'Tuple("db.statement" text)') AS tags
FROM system.opentelemetry_span_log
In case of any errors, the part of the log data for which the error has occurred will be silently lost. Check the server log for error messages if the data does not arrive.