Back to Chatgpt On Wechat

image-generation - Image Generation

docs/en/skills/image-generation.mdx

2.0.94.4 KB
Original Source

A general-purpose image generation and editing skill supporting six providers: OpenAI, Gemini, Seedream (Volcengine Ark), Qwen (DashScope), MiniMax, and LinkAI. Configure any one provider's key to start using it; configure multiple to enable automatic fallback.

Supported Models

ProviderModels / AliasesNotes
OpenAIgpt-image-2, gpt-image-1General-purpose, high quality, supports quality parameter
Gemini Nano Banananano-banana-2, nano-banana-pro, nano-bananaCorresponds to the image variants of gemini-3.1-flash, gemini-3-pro, gemini-2.5-flash
Seedream (Volcengine Ark)seedream-5.0-lite, seedream-4.5Native 2K–4K, up to 14 reference images for fusion
Qwen (DashScope)qwen-image-2.0, qwen-image-2.0-proStrong with Chinese text rendering and text-image layouts
MiniMaximage-01Fast and simple
LinkAIAny modelUniversal gateway, used as fallback

Model Selection

By default, "auto routing + automatic fallback" is used:

  1. Pick the first configured provider in the order OpenAI → Gemini → Seedream → Qwen → MiniMax → LinkAI
  2. On errors such as 401, model not enabled, or network issues, automatically switch to the next provider
  3. If the user specifies a model in the conversation (e.g. "use seedream to draw a cat"), the corresponding provider is promoted to the front

To pin a specific model:

json
{
  "skills": {
    "image-generation": {
      "model": "seedream-5.0-lite"
    }
  }
}

Configuring API Keys

<Tip> It is recommended to configure providers from the "Model Management" page in the [Web Console](/channels/web). Chat model keys configured there are automatically reused by the image generation skill — no need to set them twice. You can also edit the configuration file manually or temporarily set keys in a conversation using the `env_config` tool. </Tip>

Credentials are shared with the main model providers:

FieldProvider
openai_api_keyOpenAI
gemini_api_keyGemini
ark_api_keyVolcengine Ark (Seedream)
dashscope_api_keyAlibaba DashScope (Qwen)
minimax_api_keyMiniMax
linkai_api_keyLinkAI

Enabling and Disabling

The skill automatically adjusts its status based on API keys:

  • Key configured: the Agent calls the skill directly when it receives a drawing request
  • Key not configured: the skill still appears in context (marked as "needs configuration") — the Agent will guide the user to set up a key

To control it manually:

text
/skill disable image-generation    # Disable
/skill enable image-generation     # Re-enable

Equivalent terminal commands: cow skill disable image-generation / cow skill enable image-generation.

Parameters

ParameterTypeRequiredDefaultDescription
promptstringYesImage description
image_urlstring / listNonullInput image for editing — local path or URL; pass a list for multi-image fusion
qualitystringNoautolow / medium / high, supported only by some providers
sizestringNoauto512 / 1K / 2K / 3K / 4K, or pixel value like 1024x1024
aspect_ratiostringNonull1:1 / 3:2 / 2:3 / 16:9 / 9:16 / 21:9; Gemini also supports 1:4 / 4:1 / 1:8 / 8:1
<Warning> **Higher quality and larger size cost more and take longer.** For everyday conversations, use the defaults (`auto`) or `quality=low` + `size=1K` — about 20 seconds per image. For posters or when high resolution is explicitly requested, use `quality=high` + `size=2K/4K` — may take 1–5 minutes. </Warning>

Common Use Cases

  • Text-to-image: generate illustrations, posters, icons, avatars, storyboards, etc. from a description
  • Image-to-image: change styles, swap elements, add decorations or text on an existing image
  • Multi-image fusion: combine multiple reference images into one (outfit swaps, character group photos, etc.)
<Note> - Bash timeout should be set to 600 seconds: each provider has a 300-second HTTP timeout, and the script may try multiple providers sequentially - Input images are automatically compressed to ≤ 4 MB with the longest edge ≤ 4096 px - Gemini / Seedream / Qwen / MiniMax do not support the `quality` parameter - Seedream defaults to 2K; `seedream-5.0-lite` supports up to 3K; `seedream-4.5` supports up to 4K </Note>