docs/tools/image-generation.md
The image_generate tool lets the agent create and edit images using your
configured providers. Generated images are delivered automatically as media
attachments in the agent's reply.
Codex OAuth uses the same `openai/gpt-image-2` model ref. When an
`openai-codex` OAuth profile is configured, OpenClaw routes image
requests through that OAuth profile instead of first trying
`OPENAI_API_KEY`. Explicit `models.providers.openai` config (API key,
custom/Azure base URL) opts back into the direct OpenAI Images API
route.
The agent calls `image_generate` automatically. No tool allow-listing
needed — it is enabled by default when a provider is available.
| Goal | Model ref | Auth |
|---|---|---|
| OpenAI image generation with API billing | openai/gpt-image-2 | OPENAI_API_KEY |
| OpenAI image generation with Codex subscription auth | openai/gpt-image-2 | OpenAI Codex OAuth |
| OpenAI transparent-background PNG/WebP | openai/gpt-image-1.5 | OPENAI_API_KEY or OpenAI Codex OAuth |
| DeepInfra image generation | deepinfra/black-forest-labs/FLUX-1-schnell | DEEPINFRA_API_KEY |
| OpenRouter image generation | openrouter/google/gemini-3.1-flash-image-preview | OPENROUTER_API_KEY |
| LiteLLM image generation | litellm/gpt-image-2 | LITELLM_API_KEY |
| Google Gemini image generation | google/gemini-3.1-flash-image-preview | GEMINI_API_KEY or GOOGLE_API_KEY |
The same image_generate tool handles text-to-image and reference-image
editing. Use image for one reference or images for multiple references.
Provider-supported output hints such as quality, outputFormat, and
background are forwarded when available and reported as ignored when a
provider does not support them. Bundled transparent-background support is
OpenAI-specific; other providers may still preserve PNG alpha if their
backend emits it.
| Provider | Default model | Edit support | Auth |
|---|---|---|---|
| ComfyUI | workflow | Yes (1 image, workflow-configured) | COMFY_API_KEY or COMFY_CLOUD_API_KEY for cloud |
| DeepInfra | black-forest-labs/FLUX-1-schnell | Yes (1 image) | DEEPINFRA_API_KEY |
| fal | fal-ai/flux/dev | Yes | FAL_KEY |
gemini-3.1-flash-image-preview | Yes | GEMINI_API_KEY or GOOGLE_API_KEY | |
| LiteLLM | gpt-image-2 | Yes (up to 5 input images) | LITELLM_API_KEY |
| MiniMax | image-01 | Yes (subject reference) | MINIMAX_API_KEY or MiniMax OAuth (minimax-portal) |
| OpenAI | gpt-image-2 | Yes (up to 4 images) | OPENAI_API_KEY or OpenAI Codex OAuth |
| OpenRouter | google/gemini-3.1-flash-image-preview | Yes (up to 5 input images) | OPENROUTER_API_KEY |
| Vydra | grok-imagine | No | VYDRA_API_KEY |
| xAI | grok-imagine-image | Yes (up to 5 images) | XAI_API_KEY |
Use action: "list" to inspect available providers and models at runtime:
/tool image_generate action=list
| Capability | ComfyUI | DeepInfra | fal | MiniMax | OpenAI | Vydra | xAI | |
|---|---|---|---|---|---|---|---|---|
| Generate (max count) | Workflow-defined | 4 | 4 | 4 | 9 | 4 | 1 | 4 |
| Edit / reference | 1 image (workflow) | 1 image | 1 image | Up to 5 images | 1 image (subject ref) | Up to 5 images | — | Up to 5 images |
| Size control | — | ✓ | ✓ | ✓ | — | Up to 4K | — | — |
| Aspect ratio | — | — | ✓ (generate only) | ✓ | ✓ | — | — | ✓ |
| Resolution (1K/2K/4K) | — | — | ✓ | ✓ | — | — | — | 1K, 2K |
{
agents: {
defaults: {
imageGenerationModel: {
primary: "openai/gpt-image-2",
timeoutMs: 180_000,
fallbacks: [
"openrouter/google/gemini-3.1-flash-image-preview",
"google/gemini-3.1-flash-image-preview",
"fal/fal-ai/flux/dev",
],
},
},
},
}
OpenClaw tries providers in this order:
model parameter from the tool call (if the agent specifies one).imageGenerationModel.primary from config.imageGenerationModel.fallbacks in order.If a provider fails (auth error, rate limit, etc.), the next configured candidate is tried automatically. If all fail, the error includes details from each attempt.
<AccordionGroup> <Accordion title="Per-call model overrides are exact"> A per-call `model` override tries only that provider/model and does not continue to configured primary/fallback or auto-detected providers. </Accordion> <Accordion title="Auto-detection is auth-aware"> A provider default only enters the candidate list when OpenClaw can actually authenticate that provider. Set `agents.defaults.mediaGenerationAutoProviderFallback: false` to use only explicit `model`, `primary`, and `fallbacks` entries. </Accordion> <Accordion title="Timeouts"> Set `agents.defaults.imageGenerationModel.timeoutMs` for slow image backends. A per-call `timeoutMs` tool parameter overrides the configured default. </Accordion> <Accordion title="Inspect at runtime"> Use `action: "list"` to inspect the currently registered providers, their default models, and auth env-var hints. </Accordion> </AccordionGroup>OpenAI, OpenRouter, Google, DeepInfra, fal, MiniMax, ComfyUI, and xAI support editing reference images. Pass a reference image path or URL:
"Generate a watercolor version of this photo" + image: "/path/to/photo.jpg"
OpenAI, OpenRouter, Google, and xAI support up to 5 reference images via the
images parameter. fal, MiniMax, and ComfyUI support 1.
The `openai/gpt-image-1.5`, `openai/gpt-image-1`, and
`openai/gpt-image-1-mini` models can still be selected explicitly. Use
`gpt-image-1.5` for transparent-background PNG/WebP output; the current
`gpt-image-2` API rejects `background: "transparent"`.
`gpt-image-2` supports both text-to-image generation and
reference-image editing through the same `image_generate` tool.
OpenClaw forwards `prompt`, `count`, `size`, `quality`, `outputFormat`,
and reference images to OpenAI. OpenAI does **not** receive
`aspectRatio` or `resolution` directly; when possible OpenClaw maps
those into a supported `size`, otherwise the tool reports them as
ignored overrides.
OpenAI-specific options live under the `openai` object:
```json
{
"quality": "low",
"outputFormat": "jpeg",
"openai": {
"background": "opaque",
"moderation": "low",
"outputCompression": 60,
"user": "end-user-42"
}
}
```
`openai.background` accepts `transparent`, `opaque`, or `auto`;
transparent outputs require `outputFormat` `png` or `webp` and a
transparency-capable OpenAI image model. OpenClaw routes default
`gpt-image-2` transparent-background requests to `gpt-image-1.5`.
`openai.outputCompression` applies to JPEG/WebP outputs.
The top-level `background` hint is provider-neutral and currently maps
to the same OpenAI `background` request field when the OpenAI provider
is selected. Providers that do not declare background support return
it in `ignoredOverrides` instead of receiving the unsupported parameter.
To route OpenAI image generation through an Azure OpenAI deployment
instead of `api.openai.com`, see
[Azure OpenAI endpoints](/providers/openai#azure-openai-endpoints).
```json5
{
agents: {
defaults: {
imageGenerationModel: {
primary: "openrouter/google/gemini-3.1-flash-image-preview",
},
},
},
}
```
OpenClaw forwards `prompt`, `count`, reference images, and
Gemini-compatible `aspectRatio` / `resolution` hints to OpenRouter.
Current built-in OpenRouter image model shortcuts include
`google/gemini-3.1-flash-image-preview`,
`google/gemini-3-pro-image-preview`, and `openai/gpt-5.4-image-2`. Use
`action: "list"` to see what your configured plugin exposes.
- `minimax/image-01` for API-key setups
- `minimax-portal/image-01` for OAuth setups
- Models: `xai/grok-imagine-image`, `xai/grok-imagine-image-pro`
- Count: up to 4
- References: one `image` or up to five `images`
- Aspect ratios: `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `2:3`, `3:2`
- Resolutions: `1K`, `2K`
- Outputs: returned as OpenClaw-managed image attachments
OpenClaw intentionally does not expose xAI-native `quality`, `mask`,
`user`, or extra native-only aspect ratios until those controls exist
in the shared cross-provider `image_generate` contract.
Equivalent CLI:
openclaw infer image generate \
--model openai/gpt-image-1.5 \
--output-format png \
--background transparent \
--prompt "A simple red circle sticker on a transparent background" \
--json
The same --output-format and --background flags are available on
openclaw infer image edit; --openai-background remains as an
OpenAI-specific alias. Bundled providers other than OpenAI do not declare
explicit background control today, so background: "transparent" is reported
as ignored for them.
imageGenerationModel config