Back to Ntopng

ntopng ClickHouse Database Schema

doc/README.clickhouse_schema.md

6.614.1 KB
Original Source

ntopng ClickHouse Database Schema

This document describes the ClickHouse database schema used by ntopng to persist flow records, alerts, assets, and related analytical data. The schema is defined in two files that must be kept in sync:

FilePurpose
httpdocs/misc/db_schema_clickhouse.sqlSingle-node ClickHouse deployment (uses MergeTree family engines)
httpdocs/misc/db_schema_clickhouse_cluster.sqlMulti-node replicated cluster deployment (uses ReplicatedMergeTree / ReplicatedReplacingMergeTree engines and ON CLUSTER '$CLUSTER' DDL)

Both files are parsed and executed by ntopng at startup via pro/src/ClickHouseDB.cpp. Statements are separated by @ delimiters rather than standard semicolons to allow ntopng's SQL runner to feed them one at a time. Each file contains both CREATE TABLE IF NOT EXISTS statements (idempotent schema bootstrap) and ALTER TABLE … ADD/MODIFY COLUMN IF NOT EXISTS statements (idempotent schema migrations for upgrades from older versions).


Table Overview

TableEngineDescription
flowsMergeTreePer-flow telemetry records
hourly_flowsMergeTreeHourly aggregated flow summaries
hourly_asnMergeTreeHourly per-ASN traffic statistics
active_monitoring_alertsMergeTreeHistorical active-monitoring alerts
host_alertsMergeTreeHistorical host alerts
mac_alertsMergeTreeHistorical MAC/device alerts
snmp_alertsMergeTreeHistorical SNMP alerts
network_alertsMergeTreeHistorical subnet alerts
as_alertsMergeTreeHistorical Autonomous System alerts
interface_alertsMergeTreeHistorical interface alerts
user_alertsMergeTreeHistorical user alerts
system_alertsMergeTreeHistorical system alerts
engaged_*_alertsMemoryCurrently active (engaged) alerts
vulnerability_scan_dataMergeTreePer-host vulnerability scan results
vulnerability_scan_reportMergeTreeVulnerability scan summary reports
mitre_table_infoReplacingMergeTreeMITRE ATT&CK mappings
assetsReplacingMergeTreeNetwork asset inventory

Views that merge historical and engaged tables are described in the Views section.


flows

The central table. Every bidirectional network flow observed by ntopng — whether captured locally or received via NetFlow/sFlow/IPFIX — produces one row. Partitioned by day on FIRST_SEEN; ordered by (FIRST_SEEN, IPV4_SRC_ADDR, IPV4_DST_ADDR).

Key column groups

GroupColumns
IdentityFLOW_ID, NTOPNG_INSTANCE_NAME, INTERFACE_ID
TimingFIRST_SEEN, LAST_SEEN
Network 5-tuplePROTOCOL, IPV4_SRC_ADDR / IPV6_SRC_ADDR, IP_SRC_PORT, IPV4_DST_ADDR / IPV6_DST_ADDR, IP_DST_PORT
Traffic countersPACKETS, TOTAL_BYTES, SRC2DST_BYTES, DST2SRC_BYTES, SRC2DST_PACKETS, DST2SRC_PACKETS
Layer 7L7_PROTO, L7_PROTO_MASTER, L7_CATEGORY, INFO, FLOW_RISK
QoS / latencySRC2DST_DSCP, DST2SRC_DSCP, CLIENT_NW_LATENCY_US, SERVER_NW_LATENCY_US, QOE_SCORE
Host metadataSRC_LABEL, DST_LABEL, SRC_COUNTRY_CODE, DST_COUNTRY_CODE, SRC_ASN, DST_ASN, SRC_PEER_ASN, DST_PEER_ASN, SRC_MAC, DST_MAC
Network topologyVLAN_ID, COMMUNITY_ID, OBSERVATION_POINT_ID, INTERFACE_ROLE, PROBE_IP, EXPORTER_SITE, INPUT_SNMP, OUTPUT_SNMP, SRC_NETWORK_ID, DST_NETWORK_ID
Host poolsSRC_HOST_POOL_ID, DST_HOST_POOL_ID
Process info (eBPF)SRC_PROC_NAME, DST_PROC_NAME, SRC_PROC_USER_NAME, DST_PROC_USER_NAME
TCP stateSRC2DST_TCP_FLAGS, DST2SRC_TCP_FLAGS, MINOR_CONNECTION_STATE, MAJOR_CONNECTION_STATE
FingerprintsCLIENT_FINGERPRINT, TCP_FINGERPRINT
NATPOST_NAT_IPV4_SRC_ADDR, POST_NAT_SRC_PORT, POST_NAT_IPV4_DST_ADDR, POST_NAT_DST_PORT
WirelessWLAN_SSID, WTP_MAC_ADDRESS
Alert fieldsSTATUS, SCORE, SEVERITY, ALERT_STATUS, ALERT_CATEGORY, ALERT_JSON, ALERTS_MAP, IS_ALERT_DELETED, FLOW_RISK
Threat intel flagsIS_CLI_ATTACKER, IS_CLI_VICTIM, IS_CLI_BLACKLISTED, IS_SRV_ATTACKER, IS_SRV_VICTIM, IS_SRV_BLACKLISTED
User annotationUSER_LABEL, USER_LABEL_TSTAMP, PROFILE, REQUIRE_ATTENTION
Protocol detailPROTOCOL_INFO_JSON, DOMAIN_NAME

Note on alert flows: A flow row doubles as an alert record when STATUS != 0. The flow_alerts_view filters for these rows and joins them with mitre_table_info for ATT&CK enrichment.

Country encoding: SRC_COUNTRY_CODE and DST_COUNTRY_CODE store two ISO 3166-1 ASCII letters packed into a UInt16 (high byte = first letter). The flow_alerts_view decodes them with char(bitShiftRight(..., 8), bitAnd(..., 0xFF)).


hourly_flows

Hourly rollup of the flows table. Multiple raw flows sharing the same 5-tuple within an hour are collapsed into one row with summed byte/packet counters. Used for long-term trend analysis and dashboards that do not need per-flow granularity.

Compared to flows it omits per-flow alert fields, process info, fingerprints, TCP flags, and NAT columns, and adds:

  • NUM_FLOWS UInt32 — number of raw flows collapsed into this row.

hourly_asn

Hourly aggregated traffic statistics broken down by source/destination ASN pair. Used for AS-level traffic analysis and BGP peer analytics. Partitioned by day on FIRST_SEEN; ordered by (FIRST_SEEN, SRC_ASN, DST_ASN).

Columns include SRC_ASN, DST_ASN, SRC_PEER_ASN, DST_PEER_ASN, directional byte/packet counters, PROBE_IP, and SNMP interface indices.


Alert Tables

ntopng models nine entity types that can generate alerts. Each entity type has a pair of tables:

Historical tableEngaged (in-memory) tableView
active_monitoring_alertsengaged_active_monitoring_alertsactive_monitoring_alerts_view
host_alertsengaged_host_alertshost_alerts_view
mac_alertsengaged_mac_alertsmac_alerts_view
snmp_alertsengaged_snmp_alertssnmp_alerts_view
network_alertsengaged_network_alertsnetwork_alerts_view
as_alertsengaged_as_alertsas_alerts_view
interface_alertsengaged_interface_alertsinterface_alerts_view
user_alertsengaged_user_alertsuser_alerts_view
system_alertsengaged_system_alertssystem_alerts_view

Common columns (all alert tables)

ColumnTypeDescription
rowidUUIDUnique row identifier
alert_idUInt32Alert type (maps to ntopng alert type enum)
alert_statusUInt8Lifecycle state: 0 = engaged/active, 1 = released/archived
interface_idUInt16ntopng interface; 65535 = system/global scope
tstampDateTimeAlert start time
tstamp_endDateTimeAlert resolution time (epoch zero if still active)
severityUInt8Severity level (maps to AlertLevel enum)
scoreUInt16Numeric risk/impact score
granularityUInt8Check interval that triggered the alert (1=1min, 2=5min, …)
counterUInt32Consecutive detection count
descriptionStringHuman-readable description
jsonStringAdditional context as JSON
user_labelStringUser-defined free-text label
user_label_tstampDateTimeWhen user_label was last set
alert_categoryUInt8Category (maps to AlertCategory enum)
require_attentionBooleanFlagged for manual review

Entity-specific columns

active_monitoring_alerts: resolved_ip, resolved_name, measurement, measure_threshold, measure_value

host_alerts: ip_version, ip, vlan_id, name, is_attacker, is_victim, is_client, is_server, host_pool_id, network, country

mac_alerts: address, device_type, name, is_attacker, is_victim

snmp_alerts: ip, port (ifIndex), name, port_name

network_alerts: local_network_id, name, alias

as_alerts: asn, name, alias

interface_alerts: ifid, subtype, name, alias

user_alerts: user

system_alerts: name


Engaged Alert Tables

Each engaged_*_alerts table uses the Memory engine and holds only the currently-firing (engaged) alerts. Rows are inserted when an alert fires and deleted when it resolves. They have identical column schemas to their MergeTree counterparts.

These tables are dropped and recreated on every ntopng startup to prevent stale engaged alerts from persisting across restarts.


vulnerability_scan_data

Stores raw per-host vulnerability scan results produced by the ntopng Vulnerability Scanner (VS) module. Partitioned by day on LAST_SCAN; ordered by (LAST_SCAN, HOST, SCAN_TYPE).

ColumnDescription
HOSTIP address or hostname of the scanned target
SCAN_TYPEScan type (e.g. nmap, openvas)
LAST_SCANTimestamp of the most recent scan
JSON_INFOFull scan results as JSON
VS_RESULT_FILEPath to the raw result file on disk

vulnerability_scan_report

One row per completed vulnerability scan report. Partitioned by day on REPORT_DATE.

ColumnDescription
REPORT_NAMEUser-defined report name
REPORT_DATEReport generation timestamp
REPORT_JSON_INFOFull report metadata as JSON
NUM_SCANNED_HOSTSHosts scanned
NUM_CVESCVEs identified
NUM_TCP_PORTSOpen TCP ports found
NUM_UDP_PORTSOpen UDP ports found

mitre_table_info

Maps ntopng alert type IDs and entity type IDs to MITRE ATT&CK framework entries. Uses ReplacingMergeTree keyed on (ALERT_ID, ENTITY_ID) to prevent duplicates.

ColumnDescription
ALERT_IDntopng alert type ID
ENTITY_IDntopng entity type ID (1=host, 4=flow, …)
TACTICMITRE ATT&CK tactic identifier
TECHNIQUEMITRE ATT&CK technique identifier
SUB_TECHNIQUESub-technique identifier (0 if none)
MITRE_IDATT&CK ID string (e.g. T1046, T1595.002)

This table is joined by host_alerts_view and flow_alerts_view to surface MITRE context alongside alert data.


assets

Network asset inventory. One row per discovered or imported asset (host, MAC address, network device). Uses ReplacingMergeTree(version) so that re-observations update existing rows rather than creating duplicates. Primary key is (type, key).

ColumnDescription
typeAsset category (e.g. host, mac, network_device)
keyUnique key within the type (IP address, MAC address, etc.)
ifidntopng interface where the asset was observed
ipIP address (empty if not applicable)
macMAC address
vlanVLAN (0 if untagged)
networkntopng local-network ID
nameResolved hostname or user-defined name
device_typeDevice category (maps to DeviceType enum)
manufacturerManufacturer from MAC OUI lookup
first_seenFirst observation timestamp
last_seenMost recent observation timestamp
gateway_macMAC address of the gateway used to reach this asset
json_infoAdditional metadata as JSON (OS info, open ports, etc.)
versionMonotonic counter used by ReplacingMergeTree for deduplication
os_typeDetected operating system type
modelHardware model string

Views

Views are recreated on each ntopng startup (DROP VIEW IF EXISTS + CREATE VIEW IF NOT EXISTS).

Per-entity alert views

Each *_alerts_view is a UNION ALL of its historical MergeTree table and its in-memory engaged counterpart, providing a unified query surface for both resolved and currently-active alerts:

sql
-- example
SELECT * FROM host_alerts
UNION ALL
SELECT * FROM engaged_host_alerts

host_alerts_view additionally LEFT JOINs mitre_table_info (on ENTITY_ID = 1) to attach ATT&CK tactic/technique columns.

flow_alerts_view

Selects only alert flows from the flows table (STATUS != 0 AND IS_ALERT_DELETED != 1), renames columns to a friendlier lowercase schema (e.g. IPV4_SRC_ADDR → resolved cli_ip string), and LEFT JOINs mitre_table_info (on ENTITY_ID = 4).

all_alerts_view

A single UNION ALL across all per-entity alert views plus alert flows, exposing a minimal common schema: entity_id, interface_id, alert_id, alert_status, require_attention, tstamp, tstamp_end, severity, score, alert_category. Entity ID values:

entity_idSource
0interface_alerts_view
1host_alerts_view
2network_alerts_view
3snmp_alerts_view
4flows (alert rows)
5mac_alerts_view
7user_alerts_view
8active_monitoring_alerts_view
9system_alerts_view
10as_alerts_view

Schema Versioning and Migrations

The schema files follow an additive migration pattern: new columns are added with ALTER TABLE … ADD COLUMN IF NOT EXISTS statements appended at the end of the file. Columns are never removed via migration (only via explicit DROP COLUMN IF EXISTS for columns that were renamed or replaced). This ensures forward compatibility — an older ntopng version can read a database created by a newer version without crashing, it will simply ignore unknown columns.

When adding a new column to the schema:

  1. Add it to the CREATE TABLE statement.
  2. Add a corresponding ALTER TABLE … ADD COLUMN IF NOT EXISTS migration statement.
  3. Keep db_schema_clickhouse.sql and db_schema_clickhouse_cluster.sql in sync.
  4. Also update pro/include/FlowsTable.h, pro/src/ClickHouseDB.cpp, scripts/lua/modules/tag_utils.lua, and scripts/lua/modules/historical_flow_utils.lua as noted in the header comment of the schema files.