docs/en/models/openai.mdx
OpenAI offers the most complete coverage and can simultaneously serve text chat, vision understanding, image generation, speech-to-text (ASR), text-to-speech (TTS), and embedding. A single open_ai_api_key lets the Agent use all of these capabilities.
{
"model": "gpt-5.5",
"open_ai_api_key": "YOUR_API_KEY",
"open_ai_api_base": "https://api.openai.com/v1"
}
| Parameter | Description |
|---|---|
model | Same as OpenAI's model parameter; supports gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, the gpt-5 series, gpt-4.1, the o-series, etc. Agent mode defaults to gpt-5.5; use gpt-5.4 for better cost-efficiency |
open_ai_api_key | Create one on the OpenAI Platform |
open_ai_api_base | Optional; change it to access a third-party proxy |
bot_type | Not required when using OpenAI's official models; set to openai when accessing other vendors via the compatible protocol |
OpenAI models like gpt-5.5, gpt-5.4, gpt-4o, and gpt-4.1 natively support vision. Once open_ai_api_key is configured, the Agent's Vision tool automatically uses the main model to recognize images. If the main model does not support vision or you want to specify it explicitly, set it in the configuration file:
{
"tools": {
"vision": {
"model": "gpt-5.4-mini"
}
}
}
Supported Vision models: gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, gpt-5, gpt-4.1, gpt-4.1-mini, gpt-4o.
Specify the image generation model in the configuration file; the Agent automatically routes image generation skill calls to OpenAI:
{
"skills": {
"image-generation": {
"model": "gpt-image-2"
}
}
}
Supported image generation models: gpt-image-2, gpt-image-1.
{
"voice_to_text": "openai",
"voice_to_text_model": "gpt-4o-mini-transcribe"
}
| Parameter | Description |
|---|---|
voice_to_text | Set to openai to enable OpenAI speech-to-text |
voice_to_text_model | Optional, defaults to gpt-4o-mini-transcribe; can also be gpt-4o-transcribe, whisper-1 |
Credentials are automatically reused from open_ai_api_key.
{
"text_to_voice": "openai",
"text_to_voice_model": "tts-1",
"tts_voice_id": "alloy"
}
| Parameter | Description |
|---|---|
text_to_voice_model | tts-1, tts-1-hd, gpt-4o-mini-tts |
tts_voice_id | Voices: alloy, echo, fable, onyx, nova, shimmer, ash, ballad, coral, sage, verse |
{
"embedding_provider": "openai",
"embedding_model": "text-embedding-3-small"
}
Available models: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002. After changing the embedding, run /memory rebuild-index to rebuild the index.