Back to Ntopng

ClickHouse Timeseries

doc/src/advanced_features/clickhouse_timeseries.rst

6.65.8 KB
Original Source

.. _ClickHouseTimeseriesAdvanced:

ClickHouse Timeseries #####################

This page describes the ClickHouse timeseries driver in detail. For a high-level overview of all timeseries drivers, see :ref:BasicConceptsTimeseries.

.. note::

ClickHouse timeseries support requires an Enterprise M or better license and the :code:HAVE_CLICKHOUSE compile-time flag. It is not available on Windows.

Overview

The ClickHouse timeseries driver stores all traffic metrics produced by ntopng (interface traffic, per-host statistics, application protocol breakdowns, etc.) in a ClickHouse columnar database. This provides high write throughput, flexible time-range queries, automatic data retention via TTL, and optional cluster/cloud deployment.

All timeseries data lands in a single table (timeseries) whose rows carry a schema name, a timestamp, a tag map (key-value pairs used for filtering), and a metric map (the actual numeric measurements). The driver is a drop-in replacement for the RRD and InfluxDB drivers — the same ntopng charts and API calls work regardless of which driver is active.

Configuration

Deployment Modes


The ClickHouse timeseries driver is selected via the ``-F`` option, which also configures the
connection parameters. Please check the **Flows Dump** section for configuring ClickHouse.

.. note::

   The same ``-F`` flag is used for ClickHouse **flow dump** (historical flows) and ClickHouse
   **timeseries**. When ``-F clickhouse`` (or a variant) is set, both feature sets share the
   same ClickHouse server connection and database.

Timeseries Driver Preference

The active timeseries driver can also be changed at runtime from the ntopng preferences UI under TimeseriesDriver. Setting it to clickhouse activates this driver for all new time-series writes.

Data Retention


Retention is controlled by the *Timeseries Data Retention* preference (in days, default 365).
The value is applied as a ClickHouse ``TTL`` clause on the ``timeseries`` table. When retention
changes, the table DDL is re-issued with the new TTL and ClickHouse handles background cleanup
automatically — no manual purge is required.

Database Schema
---------------

Table Definition

On first startup (or when the retention period changes), ntopng creates the following table in the configured ClickHouse database (default: ntopng):

.. code-block:: sql

CREATE TABLE IF NOT EXISTS ntopng.timeseries ( schema_name LowCardinality(String), tstamp DateTime CODEC(Delta, ZSTD), tags Map(LowCardinality(String), String), metrics Map(LowCardinality(String), Float64) ) ENGINE = MergeTree() PARTITION BY toYYYYMM(tstamp) ORDER BY (schema_name, tstamp) TTL tstamp + toIntervalDay(<retention_days>)

Column Details


``schema_name`` (``LowCardinality(String)``)
  Identifies the type of measurement, e.g. ``iface:traffic``, ``host:traffic``,
  ``interface:ndpi``. Using ``LowCardinality`` avoids redundant string storage for the small
  set of distinct schema identifiers.

``tstamp`` (``DateTime CODEC(Delta, ZSTD)``)
  Unix timestamp of the data point. The Delta codec stores differences between consecutive
  timestamps (reducing entropy for regular time series), then ZSTD compresses the result.

``tags`` (``Map(LowCardinality(String), String)``)
  Key-value pairs used to filter and group data. Keys are bounded-cardinality identifiers such
  as ``ifid``, ``host``, or ``protocol``. Values are the corresponding entity identifiers.
  Example: ``{'ifid': '0', 'host': '192.168.1.1'}``.

``metrics`` (``Map(LowCardinality(String), Float64)``)
  The numeric measurements. Keys are metric names (``bytes``, ``packets``, ``duration_ms``,
  etc.); values are ``Float64`` measurements.
  Example: ``{'bytes': 1048576.0, 'packets': 1024.0}``.

Partitioning and Ordering

Data is partitioned monthly (toYYYYMM(tstamp)), which allows ClickHouse to prune entire partitions when querying a specific time range. Within each partition, rows are sorted by (schema_name, tstamp), optimising the most common query pattern: filter by schema, scan a time range.

Query Behaviour

Counter vs. Gauge Metrics


Each timeseries schema declares whether its metrics are *counters* (monotonically increasing
totals, e.g. cumulative bytes) or *gauges* (instantaneous values, e.g. CPU load percentage).

- **Counter** metrics: the driver fetches one extra time bucket before the requested range,
  computes per-bucket deltas using a ``lag()`` window function, and divides by the bucket
  width to produce rates (e.g., bytes/second).
- **Gauge** metrics: the driver aggregates (``avg``, ``max``, or ``min``) directly within
  each time bucket.

Comparison with Other Drivers
------------------------------

.. list-table::
   :header-rows: 1
   :widths: 20 25 25 30

   * - Aspect
     - RRD
     - InfluxDB
     - ClickHouse
   * - Storage
     - Per-entity ``.rrd`` files
     - InfluxDB server (line protocol)
     - Centralised columnar table
   * - Write model
     - Synchronous file write per entity
     - HTTP line-protocol POST
     - Async in-memory queue → batch INSERT (TCP)
   * - Flush interval
     - Each periodic activity
     - Each periodic activity
     - Every 5 seconds (up to 2 000 rows/batch)
   * - Retention model
     - Fixed RRA archives (1 year total)
     - Retention policies
     - TTL clause, configurable in days
   * - Top-K queries
     - Scan many files; slow at scale
     - InfluxDB continuous queries
     - Single SQL query; fast at scale
   * - Cluster/HA
     - None (local filesystem)
     - InfluxDB Enterprise / OSS cluster
     - ClickHouse native cluster replication
   * - License required
     - Community+
     - Community+
     - Enterprise M+