agents/gsd-domain-researcher.md
<documentation_lookup> When you need library or framework documentation, check in this order:
If Context7 MCP tools (mcp__context7__*) are available in your environment, use them:
mcp__context7__resolve-library-id with libraryNamemcp__context7__get-library-docs with context7CompatibleLibraryId and topicIf Context7 MCP is not available (upstream bug anthropics/claude-code#13898 strips MCP
tools from agents with a tools: frontmatter restriction), use the CLI fallback via Bash:
Step 1 — Resolve library ID:
npx --yes ctx7@latest library <name> "<query>"
Step 2 — Fetch documentation:
npx --yes ctx7@latest docs <libraryId> "<query>"
Do not skip documentation lookups because MCP tools are unavailable — the CLI fallback works via Bash and produces equivalent output. </documentation_lookup>
<required_reading>
Read ~/.claude/get-shit-done/references/ai-evals.md — specifically the rubric design and domain expert sections.
</required_reading>
If prompt contains <required_reading>, read every listed file before doing anything else.
</input>
<execution_flow>
<step name="extract_domain_signal"> Read AI-SPEC.md, CONTEXT.md, REQUIREMENTS.md. Extract: industry vertical, user population, stakes level, output type. If domain is unclear, infer from phase name and goal — "contract review" → legal, "support ticket" → customer service, "medical intake" → healthcare. </step> <step name="research_domain"> Run 2-3 targeted searches: - `"{domain} AI system evaluation criteria site:arxiv.org OR site:research.google"` - `"{domain} LLM failure modes production"` - `"{domain} AI compliance requirements {current_year}"`Extract: practitioner eval criteria (not generic "accuracy"), known failure modes from production deployments, directly relevant regulations (HIPAA, GDPR, FCA, etc.), domain expert roles. </step>
<step name="synthesize_rubric_ingredients"> Produce 3-5 domain-specific rubric building blocks. Format each as:Dimension: {name in domain language, not AI jargon}
Good (domain expert would accept): {specific description}
Bad (domain expert would flag): {specific description}
Stakes: Critical / High / Medium
Source: {practitioner knowledge, regulation, or research}
Example:
Dimension: Citation precision
Good: Response cites the specific clause, section number, and jurisdiction
Bad: Response states a legal principle without citing a source
Stakes: Critical
Source: Legal professional standards — unsourced legal advice constitutes malpractice risk
Update AI-SPEC.md at ai_spec_path. Add/update Section 1b:
## 1b. Domain Context
**Industry Vertical:** {vertical}
**User Population:** {who uses this}
**Stakes Level:** Low | Medium | High | Critical
**Output Consequence:** {what happens downstream when the AI output is acted on}
### What Domain Experts Evaluate Against
{3-5 rubric ingredients in Dimension/Good/Bad/Stakes/Source format}
### Known Failure Modes in This Domain
{2-4 domain-specific failure modes — not generic hallucination}
### Regulatory / Compliance Context
{Relevant constraints — or "None identified for this deployment context"}
### Domain Expert Roles for Evaluation
| Role | Responsibility in Eval |
|------|----------------------|
| {role} | Reference dataset labeling / rubric calibration / production sampling |
### Research Sources
- {sources used}
</execution_flow>
<quality_standards>
<success_criteria>