packages/coding-agent/docs/models.md
Add custom providers and models (Ollama, vLLM, LM Studio, proxies) via ~/.pi/agent/models.json.
For local models (Ollama, LM Studio, vLLM), only id is required per model:
{
"providers": {
"ollama": {
"baseUrl": "http://localhost:11434/v1",
"api": "openai-completions",
"apiKey": "ollama",
"models": [
{ "id": "llama3.1:8b" },
{ "id": "qwen2.5-coder:7b" }
]
}
}
}
The apiKey is required but Ollama ignores it, so any value works.
Some OpenAI-compatible servers do not understand the developer role used for reasoning-capable models. For those providers, set compat.supportsDeveloperRole to false so pi sends the system prompt as a system message instead. If the server also does not support reasoning_effort, set compat.supportsReasoningEffort to false too.
You can set compat at the provider level to apply to all models, or at the model level to override a specific model. This commonly applies to Ollama, vLLM, SGLang, and similar OpenAI-compatible servers.
{
"providers": {
"ollama": {
"baseUrl": "http://localhost:11434/v1",
"api": "openai-completions",
"apiKey": "ollama",
"compat": {
"supportsDeveloperRole": false,
"supportsReasoningEffort": false
},
"models": [
{
"id": "gpt-oss:20b",
"reasoning": true
}
]
}
}
}
Override defaults when you need specific values:
{
"providers": {
"ollama": {
"baseUrl": "http://localhost:11434/v1",
"api": "openai-completions",
"apiKey": "ollama",
"models": [
{
"id": "llama3.1:8b",
"name": "Llama 3.1 8B (Local)",
"reasoning": false,
"input": ["text"],
"contextWindow": 128000,
"maxTokens": 32000,
"cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }
}
]
}
}
}
The file reloads each time you open /model. Edit during session; no restart needed.
Use google-generative-ai with a baseUrl to add models from Google AI Studio, including custom Gemma 4 entries:
{
"providers": {
"my-google": {
"baseUrl": "https://generativelanguage.googleapis.com/v1beta",
"api": "google-generative-ai",
"apiKey": "GEMINI_API_KEY",
"models": [
{
"id": "gemma-4-31b-it",
"name": "Gemma 4 31B",
"input": ["text", "image"],
"contextWindow": 262144,
"reasoning": true
}
]
}
}
}
The baseUrl is required when adding custom models to the google-generative-ai API type.
| API | Description |
|---|---|
openai-completions | OpenAI Chat Completions (most compatible) |
openai-responses | OpenAI Responses API |
anthropic-messages | Anthropic Messages API |
google-generative-ai | Google Generative AI |
Set api at provider level (default for all models) or model level (override per model).
| Field | Description |
|---|---|
baseUrl | API endpoint URL |
api | API type (see above) |
apiKey | API key (see value resolution below) |
headers | Custom headers (see value resolution below) |
authHeader | Set true to add Authorization: Bearer <apiKey> automatically |
models | Array of model configurations |
modelOverrides | Per-model overrides for built-in models on this provider |
The apiKey and headers fields support three formats:
"!command" executes and uses stdout
"apiKey": "!security find-generic-password -ws 'anthropic'"
"apiKey": "!op read 'op://vault/item/credential'"
"apiKey": "MY_API_KEY"
"apiKey": "sk-..."
For models.json, shell commands are resolved at request time. pi intentionally does not apply built-in TTL, stale reuse, or recovery logic for arbitrary commands. Different commands need different caching and failure strategies, and pi cannot infer the right one.
If your command is slow, expensive, rate-limited, or should keep using a previous value on transient failures, wrap it in your own script or command that implements the caching or TTL behavior you want.
/model availability checks use configured auth presence and do not execute shell commands.
{
"providers": {
"custom-proxy": {
"baseUrl": "https://proxy.example.com/v1",
"apiKey": "MY_API_KEY",
"api": "anthropic-messages",
"headers": {
"x-portkey-api-key": "PORTKEY_API_KEY",
"x-secret": "!op read 'op://vault/item/secret'"
},
"models": [...]
}
}
}
| Field | Required | Default | Description |
|---|---|---|---|
id | Yes | — | Model identifier (passed to the API) |
name | No | id | Human-readable model label. Used for matching (--model patterns) and shown in model details/status text. |
api | No | provider's api | Override provider's API for this model |
reasoning | No | false | Supports extended thinking |
thinkingLevelMap | No | omitted | Maps pi thinking levels to provider values and marks unsupported levels (see below) |
input | No | ["text"] | Input types: ["text"] or ["text", "image"] |
contextWindow | No | 128000 | Context window size in tokens |
maxTokens | No | 16384 | Maximum output tokens |
cost | No | all zeros | {"input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0} (per million tokens) |
compat | No | provider compat | Provider compatibility overrides. Merged with provider-level compat when both are set. |
Current behavior:
/model and --list-models list entries by model id.name is used for model matching and detail/status text.Use thinkingLevelMap on a model to describe model-specific thinking controls. Keys are pi thinking levels: off, minimal, low, medium, high, xhigh.
Values are tristate:
| Value | Meaning |
|---|---|
| omitted | Level is supported and uses the provider's default mapping |
| string | Level is supported and this value is sent to the provider |
null | Level is unsupported and hidden/skipped/clamped away |
Example for a model that only supports off, high, and max reasoning:
{
"id": "deepseek-v4-pro",
"reasoning": true,
"thinkingLevelMap": {
"minimal": null,
"low": null,
"medium": null,
"high": "high",
"xhigh": "max"
}
}
Example for a model where thinking cannot be disabled:
{
"id": "always-thinking-model",
"reasoning": true,
"thinkingLevelMap": {
"off": null
}
}
Migration: older configs that used compat.reasoningEffortMap should move that mapping to model-level thinkingLevelMap. Use null for levels that should not appear in the UI.
Route a built-in provider through a proxy without redefining models:
{
"providers": {
"anthropic": {
"baseUrl": "https://my-proxy.example.com/v1"
}
}
}
All built-in Anthropic models remain available. Existing OAuth or API key auth continues to work.
To merge custom models into a built-in provider, include the models array:
{
"providers": {
"anthropic": {
"baseUrl": "https://my-proxy.example.com/v1",
"apiKey": "ANTHROPIC_API_KEY",
"api": "anthropic-messages",
"models": [...]
}
}
}
Merge semantics:
id within the provider.id matches a built-in model id, the custom model replaces that built-in model.id is new, it is added alongside built-in models.Use modelOverrides to customize specific built-in models without replacing the provider's full model list.
{
"providers": {
"openrouter": {
"modelOverrides": {
"anthropic/claude-sonnet-4": {
"name": "Claude Sonnet 4 (Bedrock Route)",
"compat": {
"openRouterRouting": {
"only": ["amazon-bedrock"]
}
}
}
}
}
}
}
modelOverrides supports these fields per model: name, reasoning, input, cost (partial), contextWindow, maxTokens, headers, compat.
Behavior notes:
modelOverrides are applied to built-in provider models.baseUrl/headers with modelOverrides.models is also defined for a provider, custom models are merged after built-in overrides. A custom model with the same id replaces the overridden built-in model entry.For providers or proxies using api: "anthropic-messages", use compat.supportsEagerToolInputStreaming to control Anthropic fine-grained tool streaming compatibility.
By default pi sends per-tool eager_input_streaming: true. If a proxy or Anthropic-compatible backend rejects that field, set supportsEagerToolInputStreaming to false. Pi will omit tools[].eager_input_streaming and send the legacy fine-grained-tool-streaming-2025-05-14 beta header for tool-enabled requests instead.
{
"providers": {
"anthropic-proxy": {
"baseUrl": "https://proxy.example.com",
"api": "anthropic-messages",
"apiKey": "ANTHROPIC_PROXY_KEY",
"compat": {
"supportsEagerToolInputStreaming": false,
"supportsLongCacheRetention": true
},
"models": [
{
"id": "claude-opus-4-7",
"reasoning": true,
"input": ["text", "image"]
}
]
}
}
}
| Field | Description |
|---|---|
supportsEagerToolInputStreaming | Whether the provider accepts per-tool eager_input_streaming. Default: true. Set to false to omit that field and use the legacy fine-grained tool streaming beta header on tool-enabled requests. |
supportsLongCacheRetention | Whether the provider accepts Anthropic long cache retention (cache_control.ttl: "1h") when cache retention is long. Default: true. |
For providers with partial OpenAI compatibility, use the compat field.
compat applies defaults to all models under that provider.compat overrides provider-level values for that model.{
"providers": {
"local-llm": {
"baseUrl": "http://localhost:8080/v1",
"api": "openai-completions",
"compat": {
"supportsUsageInStreaming": false,
"maxTokensField": "max_tokens"
},
"models": [...]
}
}
}
| Field | Description |
|---|---|
supportsStore | Provider supports store field |
supportsDeveloperRole | Use developer vs system role |
supportsReasoningEffort | Support for reasoning_effort parameter |
supportsUsageInStreaming | Supports stream_options: { include_usage: true } (default: true) |
maxTokensField | Use max_completion_tokens or max_tokens |
requiresToolResultName | Include name on tool result messages |
requiresAssistantAfterToolResult | Insert an assistant message before a user message after tool results |
requiresThinkingAsText | Convert thinking blocks to plain text |
requiresReasoningContentOnAssistantMessages | Include empty reasoning_content on all replayed assistant messages when reasoning is enabled |
thinkingFormat | Use reasoning_effort, deepseek, zai, qwen, or qwen-chat-template thinking parameters |
cacheControlFormat | Use Anthropic-style cache_control markers on the system prompt, last tool definition, and last user/assistant text content. Currently only anthropic is supported. |
supportsStrictMode | Include the strict field in tool definitions |
supportsLongCacheRetention | Whether the provider accepts long cache retention when cache retention is long: prompt_cache_retention: "24h" for OpenAI prompt caching, or cache_control.ttl: "1h" when cacheControlFormat is anthropic. Default: true. |
openRouterRouting | OpenRouter provider routing preferences. This object is sent as-is in the provider field of the OpenRouter API request. |
vercelGatewayRouting | Vercel AI Gateway routing config for provider selection (only, order) |
qwen uses top-level enable_thinking. Use qwen-chat-template for local Qwen-compatible servers that require chat_template_kwargs.enable_thinking.
cacheControlFormat: "anthropic" is for OpenAI-compatible providers that expose Anthropic-style prompt caching through cache_control markers on text content and tool definitions.
Example:
{
"providers": {
"openrouter": {
"baseUrl": "https://openrouter.ai/api/v1",
"apiKey": "OPENROUTER_API_KEY",
"api": "openai-completions",
"models": [
{
"id": "openrouter/anthropic/claude-3.5-sonnet",
"name": "OpenRouter Claude 3.5 Sonnet",
"compat": {
"openRouterRouting": {
"allow_fallbacks": true,
"require_parameters": false,
"data_collection": "deny",
"zdr": true,
"enforce_distillable_text": false,
"order": ["anthropic", "amazon-bedrock", "google-vertex"],
"only": ["anthropic", "amazon-bedrock"],
"ignore": ["gmicloud", "friendli"],
"quantizations": ["fp16", "bf16"],
"sort": {
"by": "price",
"partition": "model"
},
"max_price": {
"prompt": 10,
"completion": 20
},
"preferred_min_throughput": {
"p50": 100,
"p90": 50
},
"preferred_max_latency": {
"p50": 1,
"p90": 3,
"p99": 5
}
}
}
}
]
}
}
}
Vercel AI Gateway example:
{
"providers": {
"vercel-ai-gateway": {
"baseUrl": "https://ai-gateway.vercel.sh/v1",
"apiKey": "AI_GATEWAY_API_KEY",
"api": "openai-completions",
"models": [
{
"id": "moonshotai/kimi-k2.5",
"name": "Kimi K2.5 (Fireworks via Vercel)",
"reasoning": true,
"input": ["text", "image"],
"cost": { "input": 0.6, "output": 3, "cacheRead": 0, "cacheWrite": 0 },
"contextWindow": 262144,
"maxTokens": 262144,
"compat": {
"vercelGatewayRouting": {
"only": ["fireworks", "novita"],
"order": ["fireworks", "novita"]
}
}
}
]
}
}
}