Back to Chatgpt On Wechat

OpenAI

docs/en/models/openai.mdx

2.0.93.1 KB
Original Source

OpenAI offers the most complete coverage and can simultaneously serve text chat, vision understanding, image generation, speech-to-text (ASR), text-to-speech (TTS), and embedding. A single open_ai_api_key lets the Agent use all of these capabilities.

<Tip> All capabilities below can be configured in one place via the "Model Management" page in the Web Console, with no need to manually edit the configuration file. </Tip>

Text Chat

json
{
  "model": "gpt-5.5",
  "open_ai_api_key": "YOUR_API_KEY",
  "open_ai_api_base": "https://api.openai.com/v1"
}
ParameterDescription
modelSame as OpenAI's model parameter; supports gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, the gpt-5 series, gpt-4.1, the o-series, etc. Agent mode defaults to gpt-5.5; use gpt-5.4 for better cost-efficiency
open_ai_api_keyCreate one on the OpenAI Platform
open_ai_api_baseOptional; change it to access a third-party proxy
bot_typeNot required when using OpenAI's official models; set to openai when accessing other vendors via the compatible protocol

Image Understanding

OpenAI models like gpt-5.5, gpt-5.4, gpt-4o, and gpt-4.1 natively support vision. Once open_ai_api_key is configured, the Agent's Vision tool automatically uses the main model to recognize images. If the main model does not support vision or you want to specify it explicitly, set it in the configuration file:

json
{
  "tools": {
    "vision": {
      "model": "gpt-5.4-mini"
    }
  }
}

Supported Vision models: gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, gpt-5, gpt-4.1, gpt-4.1-mini, gpt-4o.

Image Generation

Specify the image generation model in the configuration file; the Agent automatically routes image generation skill calls to OpenAI:

json
{
  "skills": {
    "image-generation": {
      "model": "gpt-image-2"
    }
  }
}

Supported image generation models: gpt-image-2, gpt-image-1.

Speech-to-Text (ASR)

json
{
  "voice_to_text": "openai",
  "voice_to_text_model": "gpt-4o-mini-transcribe"
}
ParameterDescription
voice_to_textSet to openai to enable OpenAI speech-to-text
voice_to_text_modelOptional, defaults to gpt-4o-mini-transcribe; can also be gpt-4o-transcribe, whisper-1

Credentials are automatically reused from open_ai_api_key.

Text-to-Speech (TTS)

json
{
  "text_to_voice": "openai",
  "text_to_voice_model": "tts-1",
  "tts_voice_id": "alloy"
}
ParameterDescription
text_to_voice_modeltts-1, tts-1-hd, gpt-4o-mini-tts
tts_voice_idVoices: alloy, echo, fable, onyx, nova, shimmer, ash, ballad, coral, sage, verse

Embedding

json
{
  "embedding_provider": "openai",
  "embedding_model": "text-embedding-3-small"
}

Available models: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002. After changing the embedding, run /memory rebuild-index to rebuild the index.