Back to Netbird

Client Metrics

client/internal/metrics/infra/README.md

0.70.57.0 KB
Original Source

Client Metrics

Internal documentation for the NetBird client metrics system.

Overview

Client metrics track connection performance and sync durations using InfluxDB line protocol (influxdb.go). Each event is pushed once then cleared.

Metrics collection is always active (for debug bundles). Push to backend is:

  • Disabled by default (opt-in via NB_METRICS_PUSH_ENABLED=true)
  • Managed at daemon layer (survives engine restarts)

Architecture

Layer Separation

text
Daemon Layer (connect.go)
  ├─ Creates ClientMetrics instance once
  ├─ Starts/stops push lifecycle
  └─ Updates AgentInfo on profile switch
      │
      ▼
Engine Layer (engine.go)
  └─ Records metrics via ClientMetrics methods

Ingest Server

Clients do not talk to InfluxDB directly. An ingest server sits between clients and InfluxDB:

text
Client ──POST──▶ Ingest Server (:8087) ──▶ InfluxDB (internal)
                  │
                  ├─ Validates line protocol
                  ├─ Allowlists measurements, fields, and tags
                  ├─ Rejects out-of-bound values
                  └─ Serves remote config at /config
  • No secret/token-based client auth — the ingest server holds the InfluxDB token server-side. Clients must send a hashed peer ID via X-Peer-ID header.
  • InfluxDB is not exposed — only accessible within the docker network
  • Source: ingest/main.go

Metrics Collected

Connection Stage Timing

Measurement: netbird_peer_connection

FieldTimestampsDescription
signaling_to_connection_secondsSignalingReceived → ConnectionReadyICE/relay negotiation time after the first signal is received from the remote peer
connection_to_wg_handshake_secondsConnectionReady → WgHandshakeSuccessWireGuard cryptographic handshake latency once the transport layer is ready
total_secondsSignalingReceived → WgHandshakeSuccessEnd-to-end connection time anchored at the first received signal

Tags:

  • deployment_type: "cloud" | "selfhosted" | "unknown"
  • connection_type: "ice" | "relay"
  • attempt_type: "initial" | "reconnection"
  • version: NetBird version string
  • os: Operating system (linux, darwin, windows, android, ios, etc.)
  • arch: CPU architecture (amd64, arm64, etc.)

Note: SignalingReceived is set when the first offer or answer arrives from the remote peer (in both initial and reconnection paths). It excludes the potentially unbounded wait for the remote peer to come online.

Sync Duration

Measurement: netbird_sync

FieldDescription
duration_secondsTime to process a sync message from management server

Tags:

  • deployment_type: "cloud" | "selfhosted" | "unknown"
  • version: NetBird version string
  • os: Operating system (linux, darwin, windows, android, ios, etc.)
  • arch: CPU architecture (amd64, arm64, etc.)

Login Duration

Measurement: netbird_login

FieldDescription
duration_secondsTime to complete the login/auth exchange with management server

Tags:

  • deployment_type: "cloud" | "selfhosted" | "unknown"
  • result: "success" | "failure"
  • version: NetBird version string
  • os: Operating system (linux, darwin, windows, android, ios, etc.)
  • arch: CPU architecture (amd64, arm64, etc.)

Buffer Limits

The InfluxDB backend limits in-memory sample storage to prevent unbounded growth when pushes fail:

  • Max age: Samples older than 5 days are dropped
  • Max size: Estimated buffer size capped at 5 MB (~20k samples)

Configuration

Client Environment Variables

VariableDefaultDescription
NB_METRICS_PUSH_ENABLEDfalseEnable metrics push to backend
NB_METRICS_SERVER_URL(from remote config)Ingest server URL (e.g., https://ingest.netbird.io)
NB_METRICS_INTERVAL(from remote config)Push interval (e.g., "1m", "30m", "4h")
NB_METRICS_FORCE_SENDINGfalseSkip remote config, push unconditionally
NB_METRICS_CONFIG_URLhttps://ingest.netbird.io/configRemote push config URL

NB_METRICS_SERVER_URL and NB_METRICS_INTERVAL override their respective values but do not bypass remote config eligibility checks (version range). Use NB_METRICS_FORCE_SENDING=true to skip all remote config gating.

Ingest Server Environment Variables

VariableDefaultDescription
INGEST_LISTEN_ADDR:8087Listen address
INFLUXDB_URLhttp://influxdb:8086/api/v2/write?org=netbird&bucket=metrics&precision=nsInfluxDB write endpoint
INFLUXDB_TOKEN(required)InfluxDB auth token (server-side only)
CONFIG_METRICS_SERVER_URL(empty — disables /config)server_url in the remote config JSON (the URL clients push metrics to)
CONFIG_VERSION_SINCE0.0.0Minimum client version to push metrics
CONFIG_VERSION_UNTIL99.99.99Maximum client version to push metrics
CONFIG_PERIOD_MINUTES5Push interval in minutes

The ingest server serves a remote config JSON at GET /config when CONFIG_METRICS_SERVER_URL is set. Clients can use NB_METRICS_CONFIG_URL=http://<ingest>/config to fetch it.

Configuration Precedence

For URL and Interval, the precedence is:

  1. Environment variable - NB_METRICS_SERVER_URL / NB_METRICS_INTERVAL
  2. Remote config - fetched from NB_METRICS_CONFIG_URL
  3. Default - 5 minute interval, URL from remote config

Push Behavior

  1. StartPush() spawns background goroutine with timer
  2. First push happens immediately on startup
  3. Periodically: push()Export() → HTTP POST to ingest server
  4. On failure: log error, continue (non-blocking)
  5. On success: Reset() clears pushed samples
  6. StopPush() cancels context and waits for goroutine

Samples are collected with exact timestamps, pushed once, then cleared. No data is resent.

Local Development Setup

1. Configure and Start Services

bash
# From this directory (client/internal/metrics/infra)
cp .env.example .env
# Edit .env to set INFLUXDB_ADMIN_PASSWORD, INFLUXDB_ADMIN_TOKEN, and GRAFANA_ADMIN_PASSWORD
docker compose up -d

This starts:

2. Configure Client

bash
export NB_METRICS_PUSH_ENABLED=true
export NB_METRICS_FORCE_SENDING=true
export NB_METRICS_SERVER_URL=http://localhost:8087
export NB_METRICS_INTERVAL=1m

3. Run Client

bash
cd ../../../..
go run ./client/ up

4. View in Grafana

5. Verify Data

bash
# Query via InfluxDB (using admin token from .env)
docker compose exec influxdb influx query \
  'from(bucket: "metrics") |> range(start: -1h)' \
  --org netbird

# Check ingest server health
curl http://localhost:8087/health