.agents/skills/debug-issue-with-datadog/references/output-template.md
The analysis should be structured so it can be pasted directly as the first
investigative comment on the Linear issue. The example to anchor on is the
first comment on LFE-9475 (PostHog Integration Processing Failures).
Findings come first, recommendations last. If the data doesn't support a hypothesis, say so — do not invent root causes to fill the template.
(projectId, dominant cause) or by cluster.## Datadog APM + log analysis (<N>-day window, <YYYY-MM-DD> → <YYYY-MM-DD>)
Source: APM spans with `resource_name:"process <queue-name>"` across EU and US.
### Volume & error rate — <one-line summary of regional split>
| Region | Total spans | Errors | Error rate |
|---|---|---|---|
| EU (`prod-eu`) | <n> | <n> | **<pct>%** |
| US (`prod-us`) | <n> | <n> | **<pct>%** |
<One sentence explaining where the noise actually lives.>
### Hotspots — concentrated on ~<N> <region> projects
<Region> errors break down by `(projectId, error.message)`:
| ProjectId | Errors | Dominant cause |
|---|---|---|
| `<projectId>` | <n> | `<error message>` (<n>) + others |
| ... | ... | ... |
<Optional: contrast with another subsystem if relevant — e.g.
"Unlike blob storage, PostHog has multiple distinct root causes — not one
hotspot pattern.">
## Root cause by error class
### 1. `<error message>` — <n> errors, <n> projects
<2–4 sentences explaining what this error class actually is at the
implementation level (which library, which call site). Then list candidate
causes in order of likelihood. Mark which ones are confirmed by the data
vs. speculative.>
### 2. `<error message>` — <n> errors, mostly <n> projects
<Same pattern.>
### 3. <next class>
<...>
### <N>. <Symptom of upstream failure>
<Use this slot when a class is a *symptom* of another class rather than an
independent bug — call it out so suggested patches don't double-count.>
## Suggested patches
### P0 — <one-line summary, e.g. "Auto-disable integrations on persistent
auth failures">
<Why this is P0 — what noise it kills, what data it stops corrupting, what
unblocks downstream work.>
```ts
// <relative path from repo root>
// Short code sketch (5–20 lines). It does not need to compile —
// it must communicate the shape of the change.
```
### P0 — <next P0>
<...>
### P1 — <smaller / less urgent fix>
<Same shape.>
### P2 — <separate-but-surfaced finding>
<E.g. a Prisma pool sizing issue surfaced incidentally by this analysis but
not the original bug. Call it out with its own section so it doesn't get
lost.>
### Regional split explanation (only if relevant)
<One paragraph explaining why EU vs. US asymmetry exists — usually not an
infra bug, just where the affected tenants happen to live.>
Dashboards:
- EU APM: <url>
- US APM: <url>
- (logs / metrics / monitor links as relevant)
worker/src/features/<…>/<file>.ts is unfinished.