Custom Models

Add custom providers and models (Ollama, vLLM, LM Studio, proxies) via ~/.pi/agent/models.json.

Minimal Example
Full Example
Supported APIs
Provider Configuration
Model Configuration
Overriding Built-in Providers
Per-model Overrides
Anthropic Messages Compatibility
OpenAI Compatibility

Minimal Example

For local models (Ollama, LM Studio, vLLM), only id is required per model:

json

{
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434/v1",
      "api": "openai-completions",
      "apiKey": "ollama",
      "models": [
        { "id": "llama3.1:8b" },
        { "id": "qwen2.5-coder:7b" }
      ]
    }
  }
}

The apiKey is required but Ollama ignores it, so any value works.

Some OpenAI-compatible servers do not understand the developer role used for reasoning-capable models. For those providers, set compat.supportsDeveloperRole to false so pi sends the system prompt as a system message instead. If the server also does not support reasoning_effort, set compat.supportsReasoningEffort to false too.

You can set compat at the provider level to apply to all models, or at the model level to override a specific model. This commonly applies to Ollama, vLLM, SGLang, and similar OpenAI-compatible servers.

json

{
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434/v1",
      "api": "openai-completions",
      "apiKey": "ollama",
      "compat": {
        "supportsDeveloperRole": false,
        "supportsReasoningEffort": false
      },
      "models": [
        {
          "id": "gpt-oss:20b",
          "reasoning": true
        }
      ]
    }
  }
}

Full Example

Override defaults when you need specific values:

json

{
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434/v1",
      "api": "openai-completions",
      "apiKey": "ollama",
      "models": [
        {
          "id": "llama3.1:8b",
          "name": "Llama 3.1 8B (Local)",
          "reasoning": false,
          "input": ["text"],
          "contextWindow": 128000,
          "maxTokens": 32000,
          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }
        }
      ]
    }
  }
}

The file reloads each time you open /model. Edit during session; no restart needed.

Google AI Studio Example

Use google-generative-ai with a baseUrl to add models from Google AI Studio, including custom Gemma 4 entries:

json

{
  "providers": {
    "my-google": {
      "baseUrl": "https://generativelanguage.googleapis.com/v1beta",
      "api": "google-generative-ai",
      "apiKey": "GEMINI_API_KEY",
      "models": [
        {
          "id": "gemma-4-31b-it",
          "name": "Gemma 4 31B",
          "input": ["text", "image"],
          "contextWindow": 262144,
          "reasoning": true
        }
      ]
    }
  }
}

The baseUrl is required when adding custom models to the google-generative-ai API type.

Supported APIs

API	Description
`openai-completions`	OpenAI Chat Completions (most compatible)
`openai-responses`	OpenAI Responses API
`anthropic-messages`	Anthropic Messages API
`google-generative-ai`	Google Generative AI

Set api at provider level (default for all models) or model level (override per model).

Provider Configuration

Field	Description
`baseUrl`	API endpoint URL
`api`	API type (see above)
`apiKey`	API key (see value resolution below)
`headers`	Custom headers (see value resolution below)
`authHeader`	Set `true` to add `Authorization: Bearer <apiKey>` automatically
`models`	Array of model configurations
`modelOverrides`	Per-model overrides for built-in models on this provider

Value Resolution

The apiKey and headers fields support three formats:

Shell command: "!command" executes and uses stdout

json

"apiKey": "!security find-generic-password -ws 'anthropic'"
"apiKey": "!op read 'op://vault/item/credential'"

Environment variable: Uses the value of the named variable
json
```
"apiKey": "MY_API_KEY"
```
Literal value: Used directly
json
```
"apiKey": "sk-..."
```

For models.json, shell commands are resolved at request time. pi intentionally does not apply built-in TTL, stale reuse, or recovery logic for arbitrary commands. Different commands need different caching and failure strategies, and pi cannot infer the right one.

If your command is slow, expensive, rate-limited, or should keep using a previous value on transient failures, wrap it in your own script or command that implements the caching or TTL behavior you want.

/model availability checks use configured auth presence and do not execute shell commands.

Custom Headers

json

{
  "providers": {
    "custom-proxy": {
      "baseUrl": "https://proxy.example.com/v1",
      "apiKey": "MY_API_KEY",
      "api": "anthropic-messages",
      "headers": {
        "x-portkey-api-key": "PORTKEY_API_KEY",
        "x-secret": "!op read 'op://vault/item/secret'"
      },
      "models": [...]
    }
  }
}

Model Configuration

Field	Required	Default	Description
`id`	Yes	—	Model identifier (passed to the API)
`name`	No	`id`	Human-readable model label. Used for matching (`--model` patterns) and shown in model details/status text.
`api`	No	provider's `api`	Override provider's API for this model
`reasoning`	No	`false`	Supports extended thinking
`thinkingLevelMap`	No	omitted	Maps pi thinking levels to provider values and marks unsupported levels (see below)
`input`	No	`["text"]`	Input types: `["text"]` or `["text", "image"]`
`contextWindow`	No	`128000`	Context window size in tokens
`maxTokens`	No	`16384`	Maximum output tokens
`cost`	No	all zeros	`{"input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0}` (per million tokens)
`compat`	No	provider `compat`	Provider compatibility overrides. Merged with provider-level `compat` when both are set.

Current behavior:

/model and --list-models list entries by model id.
The configured name is used for model matching and detail/status text.

Thinking Level Map

Use thinkingLevelMap on a model to describe model-specific thinking controls. Keys are pi thinking levels: off, minimal, low, medium, high, xhigh.

Values are tristate:

Value	Meaning
omitted	Level is supported and uses the provider's default mapping
string	Level is supported and this value is sent to the provider
`null`	Level is unsupported and hidden/skipped/clamped away

Example for a model that only supports off, high, and max reasoning:

json

{
  "id": "deepseek-v4-pro",
  "reasoning": true,
  "thinkingLevelMap": {
    "minimal": null,
    "low": null,
    "medium": null,
    "high": "high",
    "xhigh": "max"
  }
}

Example for a model where thinking cannot be disabled:

json

{
  "id": "always-thinking-model",
  "reasoning": true,
  "thinkingLevelMap": {
    "off": null
  }
}

Migration: older configs that used compat.reasoningEffortMap should move that mapping to model-level thinkingLevelMap. Use null for levels that should not appear in the UI.

Overriding Built-in Providers

Route a built-in provider through a proxy without redefining models:

json

{
  "providers": {
    "anthropic": {
      "baseUrl": "https://my-proxy.example.com/v1"
    }
  }
}

All built-in Anthropic models remain available. Existing OAuth or API key auth continues to work.

To merge custom models into a built-in provider, include the models array:

json

{
  "providers": {
    "anthropic": {
      "baseUrl": "https://my-proxy.example.com/v1",
      "apiKey": "ANTHROPIC_API_KEY",
      "api": "anthropic-messages",
      "models": [...]
    }
  }
}

Merge semantics:

Built-in models are kept.
Custom models are upserted by id within the provider.
If a custom model id matches a built-in model id, the custom model replaces that built-in model.
If a custom model id is new, it is added alongside built-in models.

Per-model Overrides

Use modelOverrides to customize specific built-in models without replacing the provider's full model list.

json

{
  "providers": {
    "openrouter": {
      "modelOverrides": {
        "anthropic/claude-sonnet-4": {
          "name": "Claude Sonnet 4 (Bedrock Route)",
          "compat": {
            "openRouterRouting": {
              "only": ["amazon-bedrock"]
            }
          }
        }
      }
    }
  }
}

modelOverrides supports these fields per model: name, reasoning, input, cost (partial), contextWindow, maxTokens, headers, compat.

Behavior notes:

modelOverrides are applied to built-in provider models.
Unknown model IDs are ignored.
You can combine provider-level baseUrl/headers with modelOverrides.
If models is also defined for a provider, custom models are merged after built-in overrides. A custom model with the same id replaces the overridden built-in model entry.

Anthropic Messages Compatibility

For providers or proxies using api: "anthropic-messages", use compat.supportsEagerToolInputStreaming to control Anthropic fine-grained tool streaming compatibility.

By default pi sends per-tool eager_input_streaming: true. If a proxy or Anthropic-compatible backend rejects that field, set supportsEagerToolInputStreaming to false. Pi will omit tools[].eager_input_streaming and send the legacy fine-grained-tool-streaming-2025-05-14 beta header for tool-enabled requests instead.

json

{
  "providers": {
    "anthropic-proxy": {
      "baseUrl": "https://proxy.example.com",
      "api": "anthropic-messages",
      "apiKey": "ANTHROPIC_PROXY_KEY",
      "compat": {
        "supportsEagerToolInputStreaming": false,
        "supportsLongCacheRetention": true
      },
      "models": [
        {
          "id": "claude-opus-4-7",
          "reasoning": true,
          "input": ["text", "image"]
        }
      ]
    }
  }
}

Field	Description
`supportsEagerToolInputStreaming`	Whether the provider accepts per-tool `eager_input_streaming`. Default: `true`. Set to `false` to omit that field and use the legacy fine-grained tool streaming beta header on tool-enabled requests.
`supportsLongCacheRetention`	Whether the provider accepts Anthropic long cache retention (`cache_control.ttl: "1h"`) when cache retention is `long`. Default: `true`.

OpenAI Compatibility

For providers with partial OpenAI compatibility, use the compat field.

Provider-level compat applies defaults to all models under that provider.
Model-level compat overrides provider-level values for that model.

json

{
  "providers": {
    "local-llm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "compat": {
        "supportsUsageInStreaming": false,
        "maxTokensField": "max_tokens"
      },
      "models": [...]
    }
  }
}

Field	Description
`supportsStore`	Provider supports `store` field
`supportsDeveloperRole`	Use `developer` vs `system` role
`supportsReasoningEffort`	Support for `reasoning_effort` parameter
`supportsUsageInStreaming`	Supports `stream_options: { include_usage: true }` (default: `true`)
`maxTokensField`	Use `max_completion_tokens` or `max_tokens`
`requiresToolResultName`	Include `name` on tool result messages
`requiresAssistantAfterToolResult`	Insert an assistant message before a user message after tool results
`requiresThinkingAsText`	Convert thinking blocks to plain text
`requiresReasoningContentOnAssistantMessages`	Include empty `reasoning_content` on all replayed assistant messages when reasoning is enabled
`thinkingFormat`	Use `reasoning_effort`, `deepseek`, `zai`, `qwen`, or `qwen-chat-template` thinking parameters
`cacheControlFormat`	Use Anthropic-style `cache_control` markers on the system prompt, last tool definition, and last user/assistant text content. Currently only `anthropic` is supported.
`supportsStrictMode`	Include the `strict` field in tool definitions
`supportsLongCacheRetention`	Whether the provider accepts long cache retention when cache retention is `long`: `prompt_cache_retention: "24h"` for OpenAI prompt caching, or `cache_control.ttl: "1h"` when `cacheControlFormat` is `anthropic`. Default: `true`.
`openRouterRouting`	OpenRouter provider routing preferences. This object is sent as-is in the `provider` field of the OpenRouter API request.
`vercelGatewayRouting`	Vercel AI Gateway routing config for provider selection (`only`, `order`)

qwen uses top-level enable_thinking. Use qwen-chat-template for local Qwen-compatible servers that require chat_template_kwargs.enable_thinking.

cacheControlFormat: "anthropic" is for OpenAI-compatible providers that expose Anthropic-style prompt caching through cache_control markers on text content and tool definitions.

Example:

json

{
  "providers": {
    "openrouter": {
      "baseUrl": "https://openrouter.ai/api/v1",
      "apiKey": "OPENROUTER_API_KEY",
      "api": "openai-completions",
      "models": [
        {
          "id": "openrouter/anthropic/claude-3.5-sonnet",
          "name": "OpenRouter Claude 3.5 Sonnet",
          "compat": {
            "openRouterRouting": {
              "allow_fallbacks": true,
              "require_parameters": false,
              "data_collection": "deny",
              "zdr": true,
              "enforce_distillable_text": false,
              "order": ["anthropic", "amazon-bedrock", "google-vertex"],
              "only": ["anthropic", "amazon-bedrock"],
              "ignore": ["gmicloud", "friendli"],
              "quantizations": ["fp16", "bf16"],
              "sort": {
                "by": "price",
                "partition": "model"
              },
              "max_price": {
                "prompt": 10,
                "completion": 20
              },
              "preferred_min_throughput": {
                "p50": 100,
                "p90": 50
              },
              "preferred_max_latency": {
                "p50": 1,
                "p90": 3,
                "p99": 5
              }
            }
          }
        }
      ]
    }
  }
}

Vercel AI Gateway example:

json

{
  "providers": {
    "vercel-ai-gateway": {
      "baseUrl": "https://ai-gateway.vercel.sh/v1",
      "apiKey": "AI_GATEWAY_API_KEY",
      "api": "openai-completions",
      "models": [
        {
          "id": "moonshotai/kimi-k2.5",
          "name": "Kimi K2.5 (Fireworks via Vercel)",
          "reasoning": true,
          "input": ["text", "image"],
          "cost": { "input": 0.6, "output": 3, "cacheRead": 0, "cacheWrite": 0 },
          "contextWindow": 262144,
          "maxTokens": 262144,
          "compat": {
            "vercelGatewayRouting": {
              "only": ["fireworks", "novita"],
              "order": ["fireworks", "novita"]
            }
          }
        }
      ]
    }
  }
}

Custom Models

Custom Models

Table of Contents

Minimal Example

Full Example

Google AI Studio Example

Supported APIs

Provider Configuration

Value Resolution

Custom Headers

Model Configuration

Thinking Level Map

Overriding Built-in Providers

Per-model Overrides

Anthropic Messages Compatibility

OpenAI Compatibility