docs/users/configuration/model-providers.md
Qwen Code allows you to configure multiple model providers through the modelProviders setting in your settings.json. This enables you to switch between different AI models and providers using the /model command.
Use modelProviders to declare curated model lists per auth type that the /model picker can switch between. Keys must be valid auth types (openai, anthropic, gemini, etc.). Each entry requires an id and must include envKey, with optional name, description, baseUrl, and generationConfig. Credentials are never persisted in settings; the runtime reads them from process.env[envKey]. Qwen OAuth models remain hard-coded and cannot be overridden.
[!note]
Only the
/modelcommand exposes non-default auth types. Anthropic, Gemini, etc., must be defined viamodelProviders. The/authcommand lists Qwen OAuth, Alibaba Cloud Coding Plan, and API Key as the built-in authentication options.
[!note]
Model uniqueness: Models within the same
authTypeare uniquely identified by the combination ofid+baseUrl. This means you can define the same model ID (e.g.,"gpt-4o") multiple times under a singleauthTypeas long as each entry has a differentbaseUrl— for example, one pointing to OpenAI directly and another to a proxy endpoint. If two entries share both the sameidand the samebaseUrl(or both omitbaseUrl), the first occurrence wins and subsequent duplicates are skipped with a warning.
Below are comprehensive configuration examples for different authentication types, showing the available parameters and their combinations.
The modelProviders object keys must be valid authType values. Currently supported auth types are:
| Auth Type | Description |
|---|---|
openai | OpenAI-compatible APIs (OpenAI, Azure OpenAI, local inference servers like vLLM/Ollama) |
anthropic | Anthropic Claude API |
gemini | Google Gemini API |
qwen-oauth | Qwen OAuth (hard-coded, cannot be overridden in modelProviders) |
[!warning] If an invalid auth type key is used (e.g., a typo like
"openai-custom"), the configuration will be silently skipped and the models will not appear in the/modelpicker. Always use one of the supported auth type values listed above.
Qwen Code uses the following official SDKs to send requests to each provider:
| Auth Type | SDK Package |
|---|---|
openai | openai - Official OpenAI Node.js SDK |
anthropic | @anthropic-ai/sdk - Official Anthropic SDK |
gemini | @google/genai - Official Google GenAI SDK |
qwen-oauth | openai with custom provider (DashScope-compatible) |
This means the baseUrl you configure should be compatible with the corresponding SDK's expected API format. For example, when using openai auth type, the endpoint must accept OpenAI API format requests.
openai)This auth type supports not only OpenAI's official API but also any OpenAI-compatible endpoint, including aggregated model providers like OpenRouter.
{
"env": {
"OPENAI_API_KEY": "sk-your-actual-openai-key-here",
"OPENROUTER_API_KEY": "sk-or-your-actual-openrouter-key-here"
},
"modelProviders": {
"openai": [
{
"id": "gpt-4o",
"name": "GPT-4o",
"envKey": "OPENAI_API_KEY",
"baseUrl": "https://api.openai.com/v1",
"generationConfig": {
"timeout": 60000,
"maxRetries": 3,
"enableCacheControl": true,
"contextWindowSize": 128000,
"modalities": {
"image": true
},
"customHeaders": {
"X-Client-Request-ID": "req-123"
},
"extra_body": {
"enable_thinking": true,
"service_tier": "priority"
},
"samplingParams": {
"temperature": 0.2,
"top_p": 0.8,
"max_tokens": 4096,
"presence_penalty": 0.1,
"frequency_penalty": 0.1
}
}
},
{
"id": "gpt-4o-mini",
"name": "GPT-4o Mini",
"envKey": "OPENAI_API_KEY",
"baseUrl": "https://api.openai.com/v1",
"generationConfig": {
"timeout": 30000,
"samplingParams": {
"temperature": 0.5,
"max_tokens": 2048
}
}
},
{
"id": "openai/gpt-4o",
"name": "GPT-4o (via OpenRouter)",
"envKey": "OPENROUTER_API_KEY",
"baseUrl": "https://openrouter.ai/api/v1",
"generationConfig": {
"timeout": 120000,
"maxRetries": 3,
"samplingParams": {
"temperature": 0.7
}
}
}
]
}
}
anthropic){
"env": {
"ANTHROPIC_API_KEY": "sk-ant-your-actual-anthropic-key-here"
},
"modelProviders": {
"anthropic": [
{
"id": "claude-3-5-sonnet",
"name": "Claude 3.5 Sonnet",
"envKey": "ANTHROPIC_API_KEY",
"baseUrl": "https://api.anthropic.com/v1",
"generationConfig": {
"timeout": 120000,
"maxRetries": 3,
"contextWindowSize": 200000,
"samplingParams": {
"temperature": 0.7,
"max_tokens": 8192,
"top_p": 0.9
}
}
},
{
"id": "claude-3-opus",
"name": "Claude 3 Opus",
"envKey": "ANTHROPIC_API_KEY",
"baseUrl": "https://api.anthropic.com/v1",
"generationConfig": {
"timeout": 180000,
"samplingParams": {
"temperature": 0.3,
"max_tokens": 4096
}
}
}
]
}
}
gemini){
"env": {
"GEMINI_API_KEY": "AIza-your-actual-gemini-key-here"
},
"modelProviders": {
"gemini": [
{
"id": "gemini-2.0-flash",
"name": "Gemini 2.0 Flash",
"envKey": "GEMINI_API_KEY",
"baseUrl": "https://generativelanguage.googleapis.com",
"capabilities": {
"vision": true
},
"generationConfig": {
"timeout": 60000,
"maxRetries": 2,
"contextWindowSize": 1000000,
"schemaCompliance": "auto",
"samplingParams": {
"temperature": 0.4,
"top_p": 0.95,
"max_tokens": 8192,
"top_k": 40
}
}
}
]
}
}
Most local inference servers (vLLM, Ollama, LM Studio, etc.) provide an OpenAI-compatible API endpoint. Configure them using the openai auth type with a local baseUrl:
{
"env": {
"OLLAMA_API_KEY": "ollama",
"VLLM_API_KEY": "not-needed",
"LMSTUDIO_API_KEY": "lm-studio"
},
"modelProviders": {
"openai": [
{
"id": "qwen2.5-7b",
"name": "Qwen2.5 7B (Ollama)",
"envKey": "OLLAMA_API_KEY",
"baseUrl": "http://localhost:11434/v1",
"generationConfig": {
"timeout": 300000,
"maxRetries": 1,
"contextWindowSize": 32768,
"samplingParams": {
"temperature": 0.7,
"top_p": 0.9,
"max_tokens": 4096
}
}
},
{
"id": "llama-3.1-8b",
"name": "Llama 3.1 8B (vLLM)",
"envKey": "VLLM_API_KEY",
"baseUrl": "http://localhost:8000/v1",
"generationConfig": {
"timeout": 120000,
"maxRetries": 2,
"contextWindowSize": 128000,
"samplingParams": {
"temperature": 0.6,
"max_tokens": 8192
}
}
},
{
"id": "local-model",
"name": "Local Model (LM Studio)",
"envKey": "LMSTUDIO_API_KEY",
"baseUrl": "http://localhost:1234/v1",
"generationConfig": {
"timeout": 60000,
"samplingParams": {
"temperature": 0.5
}
}
}
]
}
}
For local servers that don't require authentication, you can use any placeholder value for the API key:
# For Ollama (no auth required)
export OLLAMA_API_KEY="ollama"
# For vLLM (if no auth is configured)
export VLLM_API_KEY="not-needed"
[!note]
The
extra_bodyparameter is only supported for OpenAI-compatible providers (openai,qwen-oauth). It is ignored for Anthropic, and Gemini providers.
[!note]
About
envKey: TheenvKeyfield specifies the name of an environment variable, not the actual API key value. For the configuration to work, you need to ensure the corresponding environment variable is set with your real API key. There are two ways to do this:
- Option 1: Using a
.envfile (recommended for security):bashBe sure to add# ~/.qwen/.env (or project root) OPENAI_API_KEY=sk-your-actual-key-here.envto your.gitignoreto prevent accidentally committing secrets.- Option 2: Using the
envfield insettings.json(as shown in the examples above):json{ "env": { "OPENAI_API_KEY": "sk-your-actual-key-here" } }Each provider example includes an
envfield to illustrate how the API key should be configured.
Alibaba Cloud Coding Plan provides a pre-configured set of Qwen models optimized for coding tasks. This feature is available for users with Alibaba Cloud Coding Plan API access and offers a simplified setup experience with automatic model configuration updates.
When you authenticate with an Alibaba Cloud Coding Plan API key using the /auth command, Qwen Code automatically configures the following models:
| Model ID | Name | Description |
|---|---|---|
qwen3.5-plus | qwen3.5-plus | Advanced model with thinking enabled |
qwen3-coder-plus | qwen3-coder-plus | Optimized for coding tasks |
qwen3-max-2026-01-23 | qwen3-max-2026-01-23 | Latest max model with thinking enabled |
/auth command in Qwen CodeThe models will be automatically configured and added to your /model picker.
Alibaba Cloud Coding Plan supports two regions:
| Region | Endpoint | Description |
|---|---|---|
| China | https://coding.dashscope.aliyuncs.com/v1 | Mainland China endpoint |
| Global/International | https://coding-intl.dashscope.aliyuncs.com/v1 | International endpoint |
The region is selected during authentication and stored in settings.json under codingPlan.region. To switch regions, re-run the /auth command and select a different region.
When you configure Coding Plan through the /auth command, the API key is stored using the reserved environment variable name BAILIAN_CODING_PLAN_API_KEY. By default, it is stored in the env field of your settings.json file.
[!warning]
Security Recommendation: For better security, it is recommended to move the API key from
settings.jsonto a separate.envfile and load it as an environment variable. For example:bash# ~/.qwen/.env BAILIAN_CODING_PLAN_API_KEY=your-api-key-hereThen ensure this file is added to your
.gitignoreif you're using project-level settings.
Coding Plan model configurations are versioned. When Qwen Code detects a newer version of the model template, you will be prompted to update. Accepting the update will:
The update process ensures you always have access to the latest model configurations and features without manual intervention.
If you prefer to manually configure Coding Plan models, you can add them to your settings.json like any OpenAI-compatible provider:
{
"modelProviders": {
"openai": [
{
"id": "qwen3-coder-plus",
"name": "qwen3-coder-plus",
"description": "Qwen3-Coder via Alibaba Cloud Coding Plan",
"envKey": "YOUR_CUSTOM_ENV_KEY",
"baseUrl": "https://coding.dashscope.aliyuncs.com/v1"
}
]
}
}
[!note]
When using manual configuration:
- You can use any environment variable name for
envKey- You do not need to configure
codingPlan.*- Automatic updates will not apply to manually configured Coding Plan models
[!warning]
If you also use automatic Coding Plan configuration, automatic updates may overwrite your manual configurations if they use the same
envKeyandbaseUrlas the automatic configuration. To avoid this, ensure your manual configuration uses a differentenvKeyif possible.
The effective auth/model/credential values are chosen per field using the following precedence (first present wins). You can combine --auth-type with --model to point directly at a provider entry; these CLI flags run before other layers.
| Layer (highest → lowest) | authType | model | apiKey | baseUrl | apiKeyEnvKey | proxy |
|---|---|---|---|---|---|---|
| Programmatic overrides | /auth | /auth input | /auth input | /auth input | — | — |
| Model provider selection | — | modelProvider.id | env[modelProvider.envKey] | modelProvider.baseUrl | modelProvider.envKey | — |
| CLI arguments | --auth-type | --model | --openaiApiKey (or provider-specific equivalents) | --openaiBaseUrl (or provider-specific equivalents) | — | — |
| Environment variables | — | Provider-specific mapping (e.g. OPENAI_MODEL) | Provider-specific mapping (e.g. OPENAI_API_KEY) | Provider-specific mapping (e.g. OPENAI_BASE_URL) | — | — |
Settings (settings.json) | security.auth.selectedType | model.name | security.auth.apiKey | security.auth.baseUrl | — | — |
| Default / computed | Falls back to AuthType.QWEN_OAUTH | Built-in default (OpenAI ⇒ qwen3-coder-plus) | — | — | — | Config.getProxy() if configured |
*When present, CLI auth flags override settings. Otherwise, security.auth.selectedType or the implicit default determine the auth type. Qwen OAuth and OpenAI are the only auth types surfaced without extra configuration.
[!warning]
Deprecation of
security.auth.apiKeyandsecurity.auth.baseUrl: Directly configuring API credentials viasecurity.auth.apiKeyandsecurity.auth.baseUrlinsettings.jsonis deprecated. These settings were used in historical versions for credentials entered through the UI, but the credential input flow was removed in version 0.10.1. These fields will be fully removed in a future release. It is strongly recommended to migrate tomodelProvidersfor all model and credential configurations. UseenvKeyinmodelProvidersto reference environment variables for secure credential management instead of hardcoding credentials in settings files.
The configuration resolution follows a strict layering model with one crucial rule: the modelProvider layer is impermeable.
When a modelProvider model IS selected (e.g., via /model command choosing a provider-configured model):
generationConfig from the provider is applied atomicallymodelProviders[].generationConfig use the provider's valuesundefined (not inherited from settings)If a model is listed in modelProviders, put all model-specific
generation settings for that model in the matching provider entry. Top-level
model.generationConfig values, including contextWindowSize,
modalities, customHeaders, and extra_body, are ignored for provider
models. Configure those fields under
modelProviders[authType][].generationConfig for them to apply.
When NO modelProvider model is selected (e.g., using --model with a raw model ID, or using CLI/env/settings directly):
generationConfig| Priority | Source | Behavior |
|---|---|---|
| 1 | Programmatic overrides | Runtime /model, /auth changes |
| 2 | modelProviders[authType][].generationConfig | Impermeable layer - completely replaces all generationConfig fields; lower layers do not participate |
| 3 | settings.model.generationConfig | Only used for Runtime Models (when no provider model is selected) |
| 4 | Content-generator defaults | Provider-specific defaults (e.g., OpenAI vs Gemini) - only for Runtime Models |
The following fields are treated as atomic objects - provider values completely replace the entire object, no merging occurs:
samplingParams - Temperature, top_p, max_tokens, etc.customHeaders - Custom HTTP headersextra_body - Extra request body parameters// User settings (~/.qwen/settings.json)
{
"model": {
"generationConfig": {
"timeout": 30000,
"samplingParams": { "temperature": 0.5, "max_tokens": 1000 }
}
}
}
// modelProviders configuration
{
"modelProviders": {
"openai": [{
"id": "gpt-4o",
"envKey": "OPENAI_API_KEY",
"generationConfig": {
"timeout": 60000,
"samplingParams": { "temperature": 0.2 }
}
}]
}
}
When gpt-4o is selected from modelProviders:
timeout = 60000 (from provider, overrides settings)samplingParams.temperature = 0.2 (from provider, completely replaces settings object)samplingParams.max_tokens = undefined (not defined in provider, and provider layer does not inherit from settings — fields are explicitly set to undefined if not provided)When using a raw model via --model gpt-4 (not from modelProviders, creates a Runtime Model):
timeout = 30000 (from settings)samplingParams.temperature = 0.5 (from settings)samplingParams.max_tokens = 1000 (from settings)The merge strategy for modelProviders itself is REPLACE: the entire modelProviders from project settings will override the corresponding section in user settings, rather than merging the two.
The optional reasoning field under generationConfig controls how aggressively the model reasons before responding. The Anthropic and Gemini converters always honor it. The OpenAI-compatible pipeline honors it unless generationConfig.samplingParams is set — see the "Interaction with samplingParams" caveat below.
{
"modelProviders": {
"openai": [
{
"id": "deepseek-v4-pro",
"name": "DeepSeek V4 Pro",
"baseUrl": "https://api.deepseek.com/v1",
"envKey": "DEEPSEEK_API_KEY",
"generationConfig": {
// The four-tier scale:
// 'low' | 'medium' — server-mapped to 'high' on DeepSeek
// 'high' — default reasoning intensity
// 'max' — DeepSeek-specific extra-strong tier
// Or set `false` to disable reasoning entirely.
"reasoning": { "effort": "max" },
},
},
],
},
}
| Protocol / provider | Wire shape | Notes |
|---|---|---|
OpenAI / DeepSeek (api.deepseek.com) | Flat reasoning_effort: <effort> body parameter | When reasoning.effort is set in the nested config shape, it's rewritten to flat reasoning_effort and 'low'/'medium' are normalized to 'high', 'xhigh' to 'max' — mirroring DeepSeek's server-side back-compat. Top-level samplingParams.reasoning_effort or extra_body.reasoning_effort overrides skip this normalization and ship verbatim. |
| OpenAI (other compatible servers) | reasoning: { effort, ... } passed through verbatim | Set via samplingParams (e.g. samplingParams.reasoning_effort for GPT-5/o-series) when the provider expects a different shape. |
Anthropic (real api.anthropic.com) | output_config: { effort } plus the effort-2025-11-24 beta header | Real Anthropic accepts 'low'/'medium'/'high' only. 'max' is clamped to 'high' with a debugLogger.warn line (once per generator); if you want max effort, switch the baseURL to a DeepSeek-compatible endpoint that supports it. |
Anthropic (api.deepseek.com/anthropic) | Same output_config: { effort } + beta header | 'max' is passed through unchanged. |
Gemini (@google/genai) | thinkingConfig: { includeThoughts: true, thinkingLevel } | 'low' → LOW, 'high'/'max' → HIGH, others → THINKING_LEVEL_UNSPECIFIED (Gemini has no MAX tier). |
reasoning: falseSetting reasoning: false (the literal boolean) explicitly disables thinking on every provider — useful for cheap side queries that don't benefit from reasoning. This is honored at the request level too via request.config.thinkingConfig.includeThoughts: false for one-off calls (e.g. suggestion generation).
On a api.deepseek.com baseURL, the OpenAI pipeline emits the explicit thinking: { type: 'disabled' } field that DeepSeek V4+ requires — the server-side default is 'enabled', so simply omitting reasoning_effort would still pay thinking latency/cost. Self-hosted DeepSeek backends (sglang/vllm) and other OpenAI-compatible servers do not receive this field; if you need to disable thinking on those, inject thinking: { type: 'disabled' } (or whatever knob your inference framework exposes) via samplingParams/extra_body.
samplingParams (OpenAI-compatible only)[!warning]
When
generationConfig.samplingParamsis set on an OpenAI-compatible provider, the pipeline ships those keys to the wire verbatim and skips the separatereasoninginjection entirely. So a config like{ samplingParams: { temperature: 0.5 }, reasoning: { effort: 'max' } }will silently drop the reasoning field on OpenAI/DeepSeek requests.If you set
samplingParams, include the reasoning knob inside it directly — for DeepSeek that'ssamplingParams.reasoning_effort, for GPT-5/o-series it'ssamplingParams.reasoning_effort(their flat field) orsamplingParams.reasoning(the nested object). For OpenRouter and other providers the field name varies; consult the provider docs.The Anthropic and Gemini converters are unaffected — they always read
reasoning.effortdirectly regardless ofsamplingParams.
budget_tokensYou can pin an exact thinking-token budget by including budget_tokens alongside effort:
"reasoning": { "effort": "high", "budget_tokens": 50000 }
For Anthropic this becomes thinking.budget_tokens. For OpenAI/DeepSeek the field is preserved but currently ignored by the server — reasoning_effort is the load-bearing knob.
Qwen Code distinguishes between two types of model configurations:
modelProviders configuration/model command list with full metadata (name, description, capabilities)--model), environment variables, or settingsmodelProvidersWhen you configure a model without using modelProviders, Qwen Code automatically creates a RuntimeModelSnapshot to preserve your configuration:
# This creates a RuntimeModelSnapshot with ID: $runtime|openai|my-custom-model
qwen --auth-type openai --model my-custom-model --openaiApiKey $KEY --openaiBaseUrl https://api.example.com/v1
The snapshot:
/model command list as a runtime option/model $runtime|openai|my-custom-model| Aspect | Provider Model | Runtime Model |
|---|---|---|
| Configuration source | modelProviders in settings | CLI, env, settings layers |
| Configuration atomicity | Complete, impermeable package | Layered, each field resolved independently |
| Reusability | Always available in /model list | Captured as snapshot, appears if complete |
| Team sharing | Yes (via committed settings) | No (user-local) |
| Credential storage | Reference via envKey only | May capture actual key in snapshot |
[!important]
Define
modelProvidersin the user-scope~/.qwen/settings.jsonwhenever possible and avoid persisting credential overrides in any scope. Keeping the provider catalog in user settings prevents merge/override conflicts between project and user scopes and ensures/authand/modelupdates always write back to a consistent scope.
/model and /auth persist model.name (where applicable) and security.auth.selectedType to the closest writable scope that already defines modelProviders; otherwise they fall back to the user scope. This keeps workspace/user files in sync with the active provider catalog.modelProviders, the resolver mixes CLI/env/settings layers, creating Runtime Models. This is fine for single-provider setups but cumbersome when frequently switching. Define provider catalogs whenever multi-model workflows are common so that switches stay atomic, source-attributed, and debuggable.