Back to Pi Mono

Custom Models

packages/coding-agent/docs/models.md

0.73.015.3 KB
Original Source

Custom Models

Add custom providers and models (Ollama, vLLM, LM Studio, proxies) via ~/.pi/agent/models.json.

Table of Contents

Minimal Example

For local models (Ollama, LM Studio, vLLM), only id is required per model:

json
{
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434/v1",
      "api": "openai-completions",
      "apiKey": "ollama",
      "models": [
        { "id": "llama3.1:8b" },
        { "id": "qwen2.5-coder:7b" }
      ]
    }
  }
}

The apiKey is required but Ollama ignores it, so any value works.

Some OpenAI-compatible servers do not understand the developer role used for reasoning-capable models. For those providers, set compat.supportsDeveloperRole to false so pi sends the system prompt as a system message instead. If the server also does not support reasoning_effort, set compat.supportsReasoningEffort to false too.

You can set compat at the provider level to apply to all models, or at the model level to override a specific model. This commonly applies to Ollama, vLLM, SGLang, and similar OpenAI-compatible servers.

json
{
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434/v1",
      "api": "openai-completions",
      "apiKey": "ollama",
      "compat": {
        "supportsDeveloperRole": false,
        "supportsReasoningEffort": false
      },
      "models": [
        {
          "id": "gpt-oss:20b",
          "reasoning": true
        }
      ]
    }
  }
}

Full Example

Override defaults when you need specific values:

json
{
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434/v1",
      "api": "openai-completions",
      "apiKey": "ollama",
      "models": [
        {
          "id": "llama3.1:8b",
          "name": "Llama 3.1 8B (Local)",
          "reasoning": false,
          "input": ["text"],
          "contextWindow": 128000,
          "maxTokens": 32000,
          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }
        }
      ]
    }
  }
}

The file reloads each time you open /model. Edit during session; no restart needed.

Google AI Studio Example

Use google-generative-ai with a baseUrl to add models from Google AI Studio, including custom Gemma 4 entries:

json
{
  "providers": {
    "my-google": {
      "baseUrl": "https://generativelanguage.googleapis.com/v1beta",
      "api": "google-generative-ai",
      "apiKey": "GEMINI_API_KEY",
      "models": [
        {
          "id": "gemma-4-31b-it",
          "name": "Gemma 4 31B",
          "input": ["text", "image"],
          "contextWindow": 262144,
          "reasoning": true
        }
      ]
    }
  }
}

The baseUrl is required when adding custom models to the google-generative-ai API type.

Supported APIs

APIDescription
openai-completionsOpenAI Chat Completions (most compatible)
openai-responsesOpenAI Responses API
anthropic-messagesAnthropic Messages API
google-generative-aiGoogle Generative AI

Set api at provider level (default for all models) or model level (override per model).

Provider Configuration

FieldDescription
baseUrlAPI endpoint URL
apiAPI type (see above)
apiKeyAPI key (see value resolution below)
headersCustom headers (see value resolution below)
authHeaderSet true to add Authorization: Bearer <apiKey> automatically
modelsArray of model configurations
modelOverridesPer-model overrides for built-in models on this provider

Value Resolution

The apiKey and headers fields support three formats:

  • Shell command: "!command" executes and uses stdout
    json
    "apiKey": "!security find-generic-password -ws 'anthropic'"
    "apiKey": "!op read 'op://vault/item/credential'"
    
  • Environment variable: Uses the value of the named variable
    json
    "apiKey": "MY_API_KEY"
    
  • Literal value: Used directly
    json
    "apiKey": "sk-..."
    

For models.json, shell commands are resolved at request time. pi intentionally does not apply built-in TTL, stale reuse, or recovery logic for arbitrary commands. Different commands need different caching and failure strategies, and pi cannot infer the right one.

If your command is slow, expensive, rate-limited, or should keep using a previous value on transient failures, wrap it in your own script or command that implements the caching or TTL behavior you want.

/model availability checks use configured auth presence and do not execute shell commands.

Custom Headers

json
{
  "providers": {
    "custom-proxy": {
      "baseUrl": "https://proxy.example.com/v1",
      "apiKey": "MY_API_KEY",
      "api": "anthropic-messages",
      "headers": {
        "x-portkey-api-key": "PORTKEY_API_KEY",
        "x-secret": "!op read 'op://vault/item/secret'"
      },
      "models": [...]
    }
  }
}

Model Configuration

FieldRequiredDefaultDescription
idYesModel identifier (passed to the API)
nameNoidHuman-readable model label. Used for matching (--model patterns) and shown in model details/status text.
apiNoprovider's apiOverride provider's API for this model
reasoningNofalseSupports extended thinking
thinkingLevelMapNoomittedMaps pi thinking levels to provider values and marks unsupported levels (see below)
inputNo["text"]Input types: ["text"] or ["text", "image"]
contextWindowNo128000Context window size in tokens
maxTokensNo16384Maximum output tokens
costNoall zeros{"input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0} (per million tokens)
compatNoprovider compatProvider compatibility overrides. Merged with provider-level compat when both are set.

Current behavior:

  • /model and --list-models list entries by model id.
  • The configured name is used for model matching and detail/status text.

Thinking Level Map

Use thinkingLevelMap on a model to describe model-specific thinking controls. Keys are pi thinking levels: off, minimal, low, medium, high, xhigh.

Values are tristate:

ValueMeaning
omittedLevel is supported and uses the provider's default mapping
stringLevel is supported and this value is sent to the provider
nullLevel is unsupported and hidden/skipped/clamped away

Example for a model that only supports off, high, and max reasoning:

json
{
  "id": "deepseek-v4-pro",
  "reasoning": true,
  "thinkingLevelMap": {
    "minimal": null,
    "low": null,
    "medium": null,
    "high": "high",
    "xhigh": "max"
  }
}

Example for a model where thinking cannot be disabled:

json
{
  "id": "always-thinking-model",
  "reasoning": true,
  "thinkingLevelMap": {
    "off": null
  }
}

Migration: older configs that used compat.reasoningEffortMap should move that mapping to model-level thinkingLevelMap. Use null for levels that should not appear in the UI.

Overriding Built-in Providers

Route a built-in provider through a proxy without redefining models:

json
{
  "providers": {
    "anthropic": {
      "baseUrl": "https://my-proxy.example.com/v1"
    }
  }
}

All built-in Anthropic models remain available. Existing OAuth or API key auth continues to work.

To merge custom models into a built-in provider, include the models array:

json
{
  "providers": {
    "anthropic": {
      "baseUrl": "https://my-proxy.example.com/v1",
      "apiKey": "ANTHROPIC_API_KEY",
      "api": "anthropic-messages",
      "models": [...]
    }
  }
}

Merge semantics:

  • Built-in models are kept.
  • Custom models are upserted by id within the provider.
  • If a custom model id matches a built-in model id, the custom model replaces that built-in model.
  • If a custom model id is new, it is added alongside built-in models.

Per-model Overrides

Use modelOverrides to customize specific built-in models without replacing the provider's full model list.

json
{
  "providers": {
    "openrouter": {
      "modelOverrides": {
        "anthropic/claude-sonnet-4": {
          "name": "Claude Sonnet 4 (Bedrock Route)",
          "compat": {
            "openRouterRouting": {
              "only": ["amazon-bedrock"]
            }
          }
        }
      }
    }
  }
}

modelOverrides supports these fields per model: name, reasoning, input, cost (partial), contextWindow, maxTokens, headers, compat.

Behavior notes:

  • modelOverrides are applied to built-in provider models.
  • Unknown model IDs are ignored.
  • You can combine provider-level baseUrl/headers with modelOverrides.
  • If models is also defined for a provider, custom models are merged after built-in overrides. A custom model with the same id replaces the overridden built-in model entry.

Anthropic Messages Compatibility

For providers or proxies using api: "anthropic-messages", use compat.supportsEagerToolInputStreaming to control Anthropic fine-grained tool streaming compatibility.

By default pi sends per-tool eager_input_streaming: true. If a proxy or Anthropic-compatible backend rejects that field, set supportsEagerToolInputStreaming to false. Pi will omit tools[].eager_input_streaming and send the legacy fine-grained-tool-streaming-2025-05-14 beta header for tool-enabled requests instead.

json
{
  "providers": {
    "anthropic-proxy": {
      "baseUrl": "https://proxy.example.com",
      "api": "anthropic-messages",
      "apiKey": "ANTHROPIC_PROXY_KEY",
      "compat": {
        "supportsEagerToolInputStreaming": false,
        "supportsLongCacheRetention": true
      },
      "models": [
        {
          "id": "claude-opus-4-7",
          "reasoning": true,
          "input": ["text", "image"]
        }
      ]
    }
  }
}
FieldDescription
supportsEagerToolInputStreamingWhether the provider accepts per-tool eager_input_streaming. Default: true. Set to false to omit that field and use the legacy fine-grained tool streaming beta header on tool-enabled requests.
supportsLongCacheRetentionWhether the provider accepts Anthropic long cache retention (cache_control.ttl: "1h") when cache retention is long. Default: true.

OpenAI Compatibility

For providers with partial OpenAI compatibility, use the compat field.

  • Provider-level compat applies defaults to all models under that provider.
  • Model-level compat overrides provider-level values for that model.
json
{
  "providers": {
    "local-llm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "compat": {
        "supportsUsageInStreaming": false,
        "maxTokensField": "max_tokens"
      },
      "models": [...]
    }
  }
}
FieldDescription
supportsStoreProvider supports store field
supportsDeveloperRoleUse developer vs system role
supportsReasoningEffortSupport for reasoning_effort parameter
supportsUsageInStreamingSupports stream_options: { include_usage: true } (default: true)
maxTokensFieldUse max_completion_tokens or max_tokens
requiresToolResultNameInclude name on tool result messages
requiresAssistantAfterToolResultInsert an assistant message before a user message after tool results
requiresThinkingAsTextConvert thinking blocks to plain text
requiresReasoningContentOnAssistantMessagesInclude empty reasoning_content on all replayed assistant messages when reasoning is enabled
thinkingFormatUse reasoning_effort, deepseek, zai, qwen, or qwen-chat-template thinking parameters
cacheControlFormatUse Anthropic-style cache_control markers on the system prompt, last tool definition, and last user/assistant text content. Currently only anthropic is supported.
supportsStrictModeInclude the strict field in tool definitions
supportsLongCacheRetentionWhether the provider accepts long cache retention when cache retention is long: prompt_cache_retention: "24h" for OpenAI prompt caching, or cache_control.ttl: "1h" when cacheControlFormat is anthropic. Default: true.
openRouterRoutingOpenRouter provider routing preferences. This object is sent as-is in the provider field of the OpenRouter API request.
vercelGatewayRoutingVercel AI Gateway routing config for provider selection (only, order)

qwen uses top-level enable_thinking. Use qwen-chat-template for local Qwen-compatible servers that require chat_template_kwargs.enable_thinking.

cacheControlFormat: "anthropic" is for OpenAI-compatible providers that expose Anthropic-style prompt caching through cache_control markers on text content and tool definitions.

Example:

json
{
  "providers": {
    "openrouter": {
      "baseUrl": "https://openrouter.ai/api/v1",
      "apiKey": "OPENROUTER_API_KEY",
      "api": "openai-completions",
      "models": [
        {
          "id": "openrouter/anthropic/claude-3.5-sonnet",
          "name": "OpenRouter Claude 3.5 Sonnet",
          "compat": {
            "openRouterRouting": {
              "allow_fallbacks": true,
              "require_parameters": false,
              "data_collection": "deny",
              "zdr": true,
              "enforce_distillable_text": false,
              "order": ["anthropic", "amazon-bedrock", "google-vertex"],
              "only": ["anthropic", "amazon-bedrock"],
              "ignore": ["gmicloud", "friendli"],
              "quantizations": ["fp16", "bf16"],
              "sort": {
                "by": "price",
                "partition": "model"
              },
              "max_price": {
                "prompt": 10,
                "completion": 20
              },
              "preferred_min_throughput": {
                "p50": 100,
                "p90": 50
              },
              "preferred_max_latency": {
                "p50": 1,
                "p90": 3,
                "p99": 5
              }
            }
          }
        }
      ]
    }
  }
}

Vercel AI Gateway example:

json
{
  "providers": {
    "vercel-ai-gateway": {
      "baseUrl": "https://ai-gateway.vercel.sh/v1",
      "apiKey": "AI_GATEWAY_API_KEY",
      "api": "openai-completions",
      "models": [
        {
          "id": "moonshotai/kimi-k2.5",
          "name": "Kimi K2.5 (Fireworks via Vercel)",
          "reasoning": true,
          "input": ["text", "image"],
          "cost": { "input": 0.6, "output": 3, "cacheRead": 0, "cacheWrite": 0 },
          "contextWindow": 262144,
          "maxTokens": 262144,
          "compat": {
            "vercelGatewayRouting": {
              "only": ["fireworks", "novita"],
              "order": ["fireworks", "novita"]
            }
          }
        }
      ]
    }
  }
}