docs/troubleshooting/custom-investigations.md
Custom Investigations let you ask open-ended questions about your infrastructure and receive deeply researched reports powered by AI. Unlike traditional dashboards or query languages, this conversational interface analyzes your real-time, high-fidelity data to answer complex operational questions in minutes.
Create investigations for any scenario where you need deep analysis:
The key to powerful investigations is providing context. Think of it like briefing a teammate—the more details you share, the better the analysis.
Your Request:
Why are my checkout-service pods crashing repeatedly?
Your Context:
- Started after: deployment at 14:00 UTC of version 2.3.1
- Impact: Customer checkout failures, lost revenue ~$X/hour
- Recent changes: Updated payment gateway integration, increased worker threads from 10 to 20
- Error pattern in logs: "connection refused to payment-service:8080", "Java heap space"
- Environment: production / eks-prod-us-east-1
- Related services: payment-service, inventory-service, redis-session-store
Your Request:
Compare system metrics before and after the recent user-authentication-service deployment.
Your Context:
- Service: user-authentication-service v2.2.0
- Deployed: 2025-01-24 09:00 UTC
- Changes: Switched from JWT to Redis sessions, added Argon2 password hashing
- Specific concerns: Users reporting intermittent logouts, suspicious increase in redis_connected_clients
- Time windows: 24h before deployment vs 24h after
Your Request:
Identify underutilized nodes for cost optimization.
Your Context:
- Monthly AWS bill: $12K for compute
- Environment: Mixed workloads (prod + staging on same cluster)
- Known issues: Dev environments run 24/7, batch processing nodes idle 20h/day
- Goal: Find $2-3K/month in savings without impacting reliability
You can create investigations in two ways:
Click "Ask AI" next to any alert, or use the "Alert Troubleshooting" option in the Insights tab. This automatically captures your current context—including the specific alert, timeframe, and affected services. Add your question and any extra context, then start the investigation.
[SCREENSHOT FROM FIRST BLOG POST SHOULD BE PLACED HERE - showing the Insights tab interface]
:::note
Track AI credit usage from Settings → Usage & Billing → AI Credits.
:::
You can schedule recurring investigations from the Insights tab (daily/weekly/monthly). Use this to automate weekly health checks, monthly optimization reviews, or SLO conformance reports.