docs/models/openrouter.md
To use OpenRouterModel, you need to either install pydantic-ai, or install pydantic-ai-slim with the openrouter optional group:
pip/uv-add "pydantic-ai-slim[openrouter]"
To use OpenRouter, first create an API key at openrouter.ai/keys.
You can set the OPENROUTER_API_KEY environment variable and use [OpenRouterProvider][pydantic_ai.providers.openrouter.OpenRouterProvider] by name:
from pydantic_ai import Agent
agent = Agent('openrouter:anthropic/claude-sonnet-4.6')
...
Or initialise the model and provider directly:
from pydantic_ai import Agent
from pydantic_ai.models.openrouter import OpenRouterModel
from pydantic_ai.providers.openrouter import OpenRouterProvider
model = OpenRouterModel(
'anthropic/claude-sonnet-4.6',
provider=OpenRouterProvider(api_key='your-openrouter-api-key'),
)
agent = Agent(model)
...
OpenRouter has an app attribution feature to track your application in their public ranking and analytics.
You can pass in an app_url and app_title when initializing the provider to enable app attribution.
from pydantic_ai.providers.openrouter import OpenRouterProvider
provider=OpenRouterProvider(
api_key='your-openrouter-api-key',
app_url='https://your-app.com',
app_title='Your App',
),
...
You can customize model behavior using [OpenRouterModelSettings][pydantic_ai.models.openrouter.OpenRouterModelSettings]:
from pydantic_ai import Agent
from pydantic_ai.models.openrouter import OpenRouterModel, OpenRouterModelSettings
settings = OpenRouterModelSettings(
openrouter_reasoning={
'effort': 'high',
},
openrouter_usage={
'include': True,
}
)
model = OpenRouterModel('openai/gpt-5.2')
agent = Agent(model, model_settings=settings)
...
For Anthropic models via OpenRouter, you can enable eager input streaming to reduce latency for tool calls with large inputs.
Set [anthropic_eager_input_streaming][pydantic_ai.models.anthropic.AnthropicModelSettings.anthropic_eager_input_streaming] in [AnthropicModelSettings][pydantic_ai.models.anthropic.AnthropicModelSettings]:
from pydantic_ai import Agent
from pydantic_ai.models.anthropic import AnthropicModelSettings
from pydantic_ai.models.openrouter import OpenRouterModel
model = OpenRouterModel('anthropic/claude-sonnet-4-5')
settings = AnthropicModelSettings(anthropic_eager_input_streaming=True)
agent = Agent(model, model_settings=settings)
...
OpenRouter supports prompt caching for downstream providers that implement it. Pydantic AI's OpenRouter cache settings control explicit cache_control breakpoints for Anthropic and Gemini models:
OpenRouterModelSettings.openrouter_cache_instructions][pydantic_ai.models.openrouter.OpenRouterModelSettings.openrouter_cache_instructions] to True or specify '5m' / '1h' directlyOpenRouterModelSettings.openrouter_cache_messages][pydantic_ai.models.openrouter.OpenRouterModelSettings.openrouter_cache_messages] to True to automatically cache the last message in the conversationOpenRouterModelSettings.openrouter_cache_tool_definitions][pydantic_ai.models.openrouter.OpenRouterModelSettings.openrouter_cache_tool_definitions] to True or specify '5m' / '1h' directlyCachePoint][pydantic_ai.messages.CachePoint]: Insert a CachePoint marker in user messages to cache everything before it!!! note "Provider Differences"
- Anthropic models support prefix-based caching for both system instructions and message content. TTL values ('5m', '1h') are passed through to the provider.
- Gemini models support caching for system instructions and normal message content, but OpenRouter uses only the last breakpoint across normal message content for Gemini caching.
Use openrouter_cache_messages or [CachePoint][pydantic_ai.messages.CachePoint] when that final message boundary is intentional; use openrouter_cache_instructions only for fully static system context. TTL values are ignored by Gemini.
Cached Gemini systemInstruction content is immutable, so put dynamic prompt segments in a later user message instead of after cached system instructions.
- Minimum token thresholds apply; see OpenRouter's minimum token requirements for current provider-specific values.
Use [OpenRouterModelSettings][pydantic_ai.models.openrouter.OpenRouterModelSettings] to enable explicit caching for system instructions, the last conversation message, and tool definitions:
from pydantic_ai import Agent, RunContext
from pydantic_ai.models.openrouter import OpenRouterModel, OpenRouterModelSettings
model = OpenRouterModel('anthropic/claude-sonnet-4.6')
agent = Agent(
model,
instructions='You are a specialized assistant with deep domain knowledge...',
model_settings=OpenRouterModelSettings(
openrouter_cache_instructions=True, # Cache system instructions (broadly supported)
openrouter_cache_messages=True, # Cache the last message (best with Anthropic)
openrouter_cache_tool_definitions=True, # Cache tool definitions (Anthropic only)
),
)
@agent.tool
def search_docs(ctx: RunContext, query: str) -> str:
"""Search documentation."""
return f'Results for {query}'
...
Each setting accepts True or an explicit '5m' / '1h' TTL value. True sends Anthropic's default '5m' TTL for Anthropic models; Gemini ignores TTL values and manages cache lifetime itself. Check result.usage().cache_write_tokens on initial writes and result.usage().cache_read_tokens on reuse, including subsequent calls with message_history=result.all_messages().
OpenRouter uses provider sticky routing after prompt-cached requests to improve cache locality. For cache-sensitive workflows that need stricter provider control or disabled fallbacks, also set [openrouter_provider][pydantic_ai.models.openrouter.OpenRouterModelSettings.openrouter_provider], for example with {'order': ['anthropic'], 'allow_fallbacks': False}.
Use [CachePoint][pydantic_ai.messages.CachePoint] markers to control exactly where cache boundaries are placed:
from pydantic_ai import Agent, CachePoint
from pydantic_ai.models.openrouter import OpenRouterModel
model = OpenRouterModel('anthropic/claude-sonnet-4.6')
agent = Agent(model)
prompt = [
'Long reference document or context to cache...',
CachePoint(), # Cache everything before this point
'Now answer my question about the context above',
]
...
Pass the prompt list to agent.run_sync(prompt). Everything before the CachePoint() marker is cached. You can place multiple markers for fine-grained control over cache boundaries.
!!! warning "Anthropic cache-breakpoint ordering"
Anthropic processes cache breakpoints in a fixed order — tool definitions, then system instructions, then messages — and rejects a '1h' breakpoint that appears after a '5m' one in that sequence. When mixing TTLs across CachePoint markers or the cache settings on an Anthropic model, place the longer ('1h') breakpoints before the shorter ('5m') ones. Anthropic also allows at most four explicit breakpoints per request; excess breakpoints are dropped (oldest first) before the request is sent.
OpenRouter supports web search via its plugins. You can enable it using the [WebSearchTool][pydantic_ai.native_tools.WebSearchTool].
You can customize the web search behavior using the search_context_size parameter on [WebSearchTool][pydantic_ai.native_tools.WebSearchTool]:
from pydantic_ai import Agent
from pydantic_ai.capabilities import NativeTool
from pydantic_ai.models.openrouter import OpenRouterModel
from pydantic_ai.native_tools import WebSearchTool
tool = WebSearchTool(search_context_size='high')
model = OpenRouterModel('openai/gpt-4.1')
agent = Agent(
model,
capabilities=[NativeTool(tool)],
)
result = agent.run_sync('What is the latest news in AI?')