docs/book/src/ops/observability.md
ZeroClaw emits structured logs, Prometheus metrics, and OpenTelemetry traces. All three are on by default in a release build.
| Platform | Default destination |
|---|---|
| Linux (systemd) | journald — journalctl --user -u zeroclaw |
| macOS (launchd) | ~/Library/Logs/ZeroClaw/zeroclaw.log (stdout), zeroclaw.err (stderr) |
| Homebrew on macOS | $HOMEBREW_PREFIX/var/log/zeroclaw.log |
| Windows | %LOCALAPPDATA%\ZeroClaw\logs\zeroclaw.log |
| Docker | container stdout — docker logs zeroclaw |
Foreground (zeroclaw daemon without service) | stderr |
Set via the RUST_LOG env var. Examples:
RUST_LOG=zeroclaw=info # default — high-signal events
RUST_LOG=zeroclaw=debug # verbose — per tool call, per provider call
RUST_LOG=zeroclaw::agent=trace # very verbose — just the agent loop
RUST_LOG=warn,zeroclaw::security=debug # quiet except security subsystem
For persistent changes, put the value in your service unit:
# ~/.config/systemd/user/zeroclaw.service.d/override.conf
[Service]
Environment=RUST_LOG=zeroclaw=info,zeroclaw::security=debug
JSON by default in service mode (easier for Loki/ELK ingestion). Pretty-print on a TTY (interactive zeroclaw daemon).
Force one or the other:
[observability]
log_format = "json" # or "pretty"
infoAt debug:
At trace:
The logger redacts known secret patterns (sk-*, ghp_*, xox[baprs]-*, ya29.*, AIza*, etc.) regardless of log level. Redaction happens at the logger layer — your log files and the journal never see them.
Audit this with:
grep -E 'sk-|ghp_|xox[baprs]' /path/to/log/file | head
# should return zero results in a normal run
If you see an unredacted secret, file an issue — the redaction list is in crates/zeroclaw-infra/src/redact.rs.
Prometheus exposition on the gateway:
curl -s http://localhost:42617/metrics
Key metrics:
| Metric | Labels | What it measures |
|---|---|---|
zeroclaw_provider_calls_total | provider, outcome | Provider calls by outcome (ok / timeout / error) |
zeroclaw_provider_latency_ms | provider | Histogram of provider call latency |
zeroclaw_tokens_total | provider, kind (input/output) | Token counters |
zeroclaw_tool_calls_total | tool, outcome | Tool invocations by outcome |
zeroclaw_tool_duration_ms | tool | Tool-execution histogram |
zeroclaw_channel_events_total | channel, direction (inbound/outbound) | Message flow |
zeroclaw_channel_errors_total | channel, kind | Disconnects, rate limits, auth failures |
zeroclaw_memory_searches_total | Memory retrieval calls | |
zeroclaw_policy_blocks_total | policy, tool | Security-policy denials |
A minimal Prometheus scrape config:
scrape_configs:
- job_name: zeroclaw
static_configs:
- targets: ["localhost:42617"]
The Grafana dashboard (in the templates repo) visualises the above.
OpenTelemetry over OTLP/HTTP. Off by default — enable in config:
[observability.otel]
enabled = true
endpoint = "http://localhost:4318"
service_name = "zeroclaw"
Spans you'll see:
agent.loop — one per inbound message, covers the whole turnprovider.chat — a provider call, with attributes for model, token counts, and retry counttool.invoke — a tool call, with outcome and durationsecurity.validate — policy check with decisionmemory.search — retrieval with hit count and max scorePair with Jaeger or Tempo for distributed tracing if you run multiple ZeroClaw instances or are instrumenting downstream services.
Separate from the general logs, tool receipts are written to:
<workspace>/receipts/<yyyy-mm-dd>.ndjson
One JSON line per tool invocation. Greppable, append-only, persistent across restarts. See Tool receipts.
The gateway exposes three health views:
curl -s http://localhost:42617/health # { "status": "ok", "version": "0.7.4" }
curl -s http://localhost:42617/health/channels # per-channel status
curl -s http://localhost:42617/health/providers # per-provider status + error rate
Point Uptime Kuma / your monitor at /health for a binary liveness check.
ZeroClaw doesn't rotate its own logs — the OS handles that:
/etc/systemd/journald.conf)json-file driver rotates at 10 MB with 3 retained files; configure in daemon.jsonFor the receipts log, rotate manually or via logrotate if the file grows faster than you want to retain:
<workspace>/receipts/2026-04-25.ndjson
<workspace>/receipts/2026-04-26.ndjson
...
Day-sharded means you can drop old files individually.