Back to Docling

Architecture and Decision Guide

.agents/skills/building-pydantic-ai-agents/references/ARCHITECTURE.md

2.94.010.2 KB
Original Source

Architecture and Decision Guide

Detailed decision trees, comparison tables, and architecture overview for Pydantic AI.

Contents

Task-Family References

Use this file for comparisons and abstraction choices.

If the user already knows what they want to do, load the narrower task guide instead:

Decision Trees

Choosing a Tool Registration Method

Need RunContext (deps, usage, messages)?
├── Yes → Use @agent.tool
└── No → Pure function, no context needed?
    ├── Yes → Use @agent.tool_plain
    └── Tools defined outside agent file?
        ├── Yes → Use tools=[Tool(...)] in constructor
        └── Dynamic tools based on context?
            ├── Yes → Use ToolPrepareFunc
            └── Multiple related tools as a group?
                └── Yes → Use FunctionToolset

Choosing an Output Mode

Need structured data with Pydantic validation?
├── Yes → Does provider support native JSON mode?
│   ├── Yes, and you want it → Use NativeOutput(MyModel)
│   └── No, or prefer consistency → Use ToolOutput(MyModel) [default]
└── No → Need custom parsing logic?
    ├── Yes → Use TextOutput(parser_fn)
    └── No → Just plain text?
        └── Yes → Use output_type=str [default]

Dynamic schema at runtime?
└── Yes → Use StructuredDict(json_schema)

Choosing a Multi-Agent Pattern

Child agent returns result to parent?
├── Yes → Use agent delegation via tools
└── No → Permanent hand-off to specialist?
    ├── Yes → Use output functions
    └── Application code between agents?
        ├── Yes → Use programmatic hand-off
        └── Complex state machine?
            └── Yes → Use Graph-based control

Choosing How to Extend Agent Behavior

Need reusable behavior across agents (tools + hooks + instructions)?
├── Yes → Build a custom capability (subclass AbstractCapability)
└── No → Just intercepting lifecycle events?
    ├── Yes → Complex interception needing tools/instructions too?
    │   ├── Yes → Subclass AbstractCapability
    │   └── No → Use Hooks capability with decorators
    └── No → Defining agents from config files?
        ├── Yes → Use Agent.from_file() with YAML/JSON specs
        └── No → Just adding tools?
            ├── Yes → Use @agent.tool or Toolset
            └── Pass args directly to Agent constructor

Choosing a Capability

Need model thinking/reasoning?
├── Yes → Use Thinking(effort='high')
└── Need web search?
    ├── Yes → Use WebSearch() (auto-fallback to local)
    └── Need URL fetching?
        ├── Yes → Use WebFetch()
        └── Need MCP servers?
            ├── Yes → Use MCP()
            └── Need lifecycle hooks only?
                ├── Yes → Use Hooks()
                └── Need to filter/modify tool defs per step?
                    └── Yes → Use PrepareTools()

Choosing a Testing Approach

Need deterministic, fast tests?
├── Yes → Use TestModel with agent.override()
└── Need specific tool call behavior?
    ├── Yes → Use FunctionModel
    └── Testing against real API (integration)?
        └── Yes → Use pytest-recording with VCR cassettes

Comparison Tables

Output Mode Comparison

ScenarioMode
Need structured data and want maximum provider compatibilityToolOutput (default) — works with all providers, supports streaming
Want the provider to natively enforce JSON schema complianceNativeOutput — OpenAI, Anthropic, Google only; limited streaming
Provider doesn't support tools or JSON modePromptedOutput — works everywhere as a fallback
LLM returns non-JSON structured text (markdown, YAML, domain-specific)TextOutput — custom parsing function

Model Provider Prefixes

ProviderPrefixExample
OpenAIopenai:openai:gpt-5.2
Anthropicanthropic:anthropic:claude-sonnet-4-6
Google (AI Studio)google-gla:google-gla:gemini-3-pro-preview
Google (Vertex)google-vertex:google-vertex:gemini-3-pro-preview
Groqgroq:groq:llama-3.3-70b-versatile
Mistralmistral:mistral:mistral-large-latest
Coherecohere:cohere:command-r-plus-08-2024
AWS Bedrockbedrock:bedrock:anthropic.claude-sonnet-4-6
Azureazure:azure:gpt-5.2
OpenRouteropenrouter:openrouter:anthropic/claude-sonnet-4-6
xAIxai:xai:grok-3
DeepSeekdeepseek:deepseek:deepseek-chat
Fireworksfireworks:fireworks:accounts/fireworks/models/llama-v3p3-70b-instruct
Togethertogether:together:meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
Ollama (local)ollama:ollama:llama3.2
GitHub Modelsgithub:github:openai/gpt-5.2
Hugging Facehuggingface:huggingface:meta-llama/Llama-3.3-70B-Instruct
Cerebrascerebras:cerebras:llama-4-scout-17b-16e-instruct
Herokuheroku:heroku:claude-sonnet-4-6

Additional prefixes: litellm:, nebius:, ovhcloud:, alibaba:, sambanova:, vercel:, outlines:, moonshotai:. For truly custom providers, subclass Model or use OpenAIChatModel with a custom base_url.

Tool Decorator Comparison

ScenarioDecorator
Tool needs access to deps, usage stats, messages, or retry info@agent.toolRunContext as required first param
Pure function, no agent context needed@agent.tool_plain
Tools defined in a separate module or shared across agentsTool(fn) — pass to agent constructor via tools=[...]

Built-in Capabilities

CapabilityWhat it providesUsable in YAML Specs
ThinkingModel thinking/reasoning at configurable effortYes
HooksDecorator-based lifecycle hook registrationNo
WebSearchWeb search — builtin when supported, local fallbackYes
WebFetchURL fetching — builtin when supported, custom fallbackYes
ImageGenerationImage generation — builtin when supported, custom fallbackYes
MCPMCP server — builtin when supported, direct connectionYes
PrepareToolsFilters or modifies tool definitions per stepNo
PrefixToolsWraps a capability and prefixes its tool namesYes
BuiltinToolRegisters a builtin tool with the agentYes
ToolsetWraps an AbstractToolsetNo
HistoryProcessorWraps a history processor functionNo

When to Use Each Agent Method

ScenarioMethod
Building a chatbot or assistant that shows tool calls, progress, and output in real-timeagent.run(event_stream_handler=...) — streams all events while running to completion
Running an autonomous agent, batch job, or background taskagent.run()
Writing a CLI tool, script, or Jupyter notebook (no async)agent.run_sync()
Streaming final text word-by-word to a UIagent.run_stream()
Synchronous streaming for CLI tools or scripts (no async)agent.run_stream_sync()
Receiving an async iterable of typed events (tool calls, results, final output)agent.run_stream_events()
Inspecting or modifying state between agent steps, human-in-the-loop approvalagent.iter()

See Run Methods and Streaming for event_stream_handler details.

Architecture Overview

Agent execution flow: Agent.run()UserPromptNodeModelRequestNodeCallToolsNode → (loop or end)

Key generic types:

  • Agent[AgentDepsT, OutputDataT] — dependency type + output type
  • RunContext[AgentDepsT] — available in tools and system prompts
  • AbstractCapability[AgentDepsT] — base class for reusable behavior bundles

Agent construction:

  • Python: Agent(model, instructions=..., tools=..., capabilities=...)
  • Declarative: Agent.from_file('agent.yaml') or Agent.from_spec({...})

Capabilities are the primary extension point — they bundle tools, lifecycle hooks, instructions, and model settings into reusable units. Built-in capabilities include Thinking, WebSearch, WebFetch, Hooks, MCP, and more.

Lifecycle hooks (via Hooks or AbstractCapability) intercept every stage: before_runbefore_model_requestbefore_tool_executeafter_tool_executeafter_model_requestafter_run

Model string format: "provider:model-name" (e.g., "openai:gpt-5.2", "anthropic:claude-sonnet-4-6", "google-gla:gemini-3-pro-preview")

Output modes:

  • ToolOutput — structured data via tool calls (default for Pydantic models)
  • NativeOutput — provider-specific structured output
  • PromptedOutput — prompt-based structured extraction
  • TextOutput — plain text responses