docs/netdata-ai/investigations/index.md
Ask Netdata anything about your infrastructure and get a deeply researched answer in minutes. Investigations turn your question and context into an analysis that correlates metrics, anomalies, and events across your systems.
Two easy entry points:
Troubleshoot with AI button (top‑right): Captures the current chart, dashboard, or service context automatically, then you add your questionInsights → New Investigation: Blank canvas for any custom promptReports complete in ~2 minutes and are saved in Insights; you’ll get an email when ready.
Think of it like briefing a teammate. Include timeframes, environments, related services, symptoms, and recent changes. Example formats:
Request: Why are my checkout‑service pods crashing repeatedly?
Context:
- Started after: deployment at 14:00 UTC of version 2.3.1
- Impact: Customer checkout failures, lost revenue ~$X/hour
- Recent changes: payment gateway integration update; workers 10→20
- Logs: "connection refused to payment-service:8080", "Java heap space"
- Environment: production / eks-prod-us-east-1
- Related: payment-service, inventory-service, redis-session-store
Request: Compare metrics before/after the user‑authentication‑service deploy.
Context:
- Service: user-authentication-service v2.2.0
- Deployed: 2025‑01‑24 09:00 UTC
- Changes: JWT→Redis sessions; Argon2 hashing added
- Concern: intermittent logouts; rising redis_connected_clients
- Windows: 24h before vs 24h after
Request: Identify underutilized nodes for cost savings.
Context:
- Monthly compute: ~$12K
- Mixed workloads (prod + staging)
- Dev envs run 24/7; batch nodes idle 20h/day
- Goal: save $2–3K/month without reliability impact