.agents/skills/detect-prod-regressions/SKILL.md
Run this skill as an evidence-first production sweep. The deliverable is a set of measured candidate bugs summarized for a human reviewer in a compact table, plus a short summary of what was checked. Do not touch Linear automatically.
Always review all production environments unless the user explicitly narrows the scope:
prod-usprod-euprod-hipaaprod-jpQuery both Datadog sites when needed. Default to the EU site for prod-eu and
the US site for the other production envs, but verify by querying facets or
running a small count query rather than assuming where a tag lives.
Use the user's stated window when provided. Otherwise:
Include the recent window and baseline window in any later handoff to
linear-bug-triage after the human explicitly approves sharing findings in
Linear.
Before querying Datadog, load the relevant Datadog MCP guidance for logs,
traces, metrics, and visualizations. Use the existing
datadog-query-recipes skill for query
syntax, production environment coverage, tenant/public API usage, and queue
consumer measurements. Use
debug-issue-with-datadog when a
candidate regression becomes an incident-style root-cause analysis.
For each environment, check:
Datadog errors
env, service, route/resource,
status code, error.type, and error.message.Datadog error logs
status:error logs by service, route/resource, source,
error.message, tenant/project identifiers when present, and environment.Datadog error spans
env, service, resource_name, http.route,
error.type, and error.message.API route latencies
service:web HTTP spans and metrics such as
trace.http_request.duration when available.Keep Datadog links for every query, trace, dashboard, or flamegraph used as evidence.
Before doing anything in Linear, show the human reviewer a findings table in chat and ask for permission to share the findings in Linear. The table should include one row per candidate with:
comment existing, create new, or none).Use a concrete markdown table shaped like this:
| Candidate | Envs | Service / Route | Recent Window | Baseline | Delta | Evidence | Proposed Linear Action |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Public scores latency regression | prod-us | `web` `GET /api/public/v2/scores/index` | p95 `7.44s`, p99 `12.77s`, count `71,448` | p95 `4.34s`, p99 `6.60s`, count `26,024` | p95 `+72%`, p99 `+94%`, volume `+174%` | [metrics](https://app.datadoghq.com/) [spans](https://app.datadoghq.com/) [trace](https://app.datadoghq.com/) | comment existing |
| ClickHouse socket hang up cluster | prod-us | `web-iso` `POST /` | errors `14,088` | errors `6,026` | `+134%` errors | [spans](https://app.datadoghq.com/) [trace](https://app.datadoghq.com/) | comment existing |
| Ingestion DLQ monitor noise | prod-eu | `worker-cpu` `langfuse.queue.ingestion.dlq_length` | max `150` | max `150` | flat | [metrics](https://app.datadoghq.eu/) | none |
If a requested signal is unavailable, write No measurements found in the
relevant cell instead of leaving it blank.
Only if the human explicitly approves, use
linear-bug-triage for Linear search,
deduplication, evidence comments, Triage issue creation, labels, and ticket
formatting. Treat that skill as the source of truth for Linear behavior once
approval is granted.
Hand off:
No measurements found for unavailable signals.Summarize: