get-shit-done/templates/AI-SPEC.md
AI design contract generated by
/gsd-ai-integration-phase. Consumed bygsd-plannerandgsd-eval-auditor. Locks framework selection, implementation guidance, and evaluation strategy before planning begins.
System Type: <!-- RAG | Multi-Agent | Conversational | Extraction | Autonomous Agent | Content Generation | Code Automation | Hybrid -->
Description:
<!-- One-paragraph description of what this AI system does, who uses it, and what "good" looks like -->Critical Failure Modes:
<!-- The 3-5 behaviors that absolutely cannot go wrong in this system -->Researched by
gsd-domain-researcher. Grounds the evaluation strategy in domain expert knowledge.
Industry Vertical: <!-- healthcare | legal | finance | customer service | education | developer tooling | e-commerce | etc. -->
User Population: <!-- who uses this system and in what context -->
Stakes Level: <!-- Low | Medium | High | Critical -->
Output Consequence: <!-- what happens downstream when the AI output is acted on -->
| Role | Responsibility |
|---|---|
| <!-- e.g., Senior practitioner --> | <!-- Dataset labeling / rubric calibration / production sampling --> |
Selected Framework: <!-- e.g., LlamaIndex v0.10.x -->
Version: <!-- Pin the version -->
Rationale:
<!-- Why this framework fits this system type, team context, and production requirements -->Alternatives Considered:
| Framework | Ruled Out Because |
|---|---|
Vendor Lock-In Accepted: <!-- Yes / No / Partial — document the trade-off consciously -->
Fetched from official docs by
gsd-ai-researcher. Distilled for this specific use case.
# Install command(s)
# Key imports for this use case
# Minimal working example for this system type
| Concept | What It Is | When You Use It |
|---|---|---|
project/
├── # Framework-specific folder layout
Model Configuration:
<!-- Which model(s), temperature, max tokens, and other key parameters -->Core Pattern:
<!-- The primary implementation pattern for this system type in this framework -->Tool Use:
<!-- Tools/integrations needed and how to configure them -->State Management:
<!-- How state is persisted, retrieved, and updated -->Context Window Strategy:
<!-- How to manage context limits for this system type -->Written by
gsd-ai-researcher. Cross-cutting patterns every developer building AI systems needs — independent of framework choice.
# Pydantic output model for this system type
| Dimension | Rubric (Pass/Fail or 1-5) | Measurement Approach | Priority |
|---|---|---|---|
| Code / LLM Judge / Human | Critical / High / Medium |
Primary Tool: <!-- e.g., RAGAS + Langfuse -->
Setup:
# Install and configure
CI/CD Integration:
# Command to run evals in CI/CD pipeline
Size: <!-- e.g., 20 examples to start -->
Composition:
<!-- What scenario types the dataset covers: critical paths, edge cases, failure modes -->Labeling:
<!-- Who labels examples and how (domain expert, LLM judge with calibration, etc.) -->| Guardrail | Trigger | Intervention |
|---|---|---|
| Block / Escalate / Flag |
| Metric | Sampling Strategy | Action on Degradation |
|---|---|---|
Tracing Tool: <!-- e.g., Langfuse self-hosted -->
Key Metrics to Track:
<!-- 3-5 metrics that will be monitored in production -->Alert Thresholds:
<!-- When to page/alert -->Smart Sampling Strategy:
<!-- How to select interactions for human review — signal-based filters -->