docs/tools/image-generation.md
The image_generate tool lets the agent create and edit images using your
configured providers. In chat sessions, image generation runs asynchronously:
OpenClaw records a background task, returns the task id immediately, and wakes
the agent when the provider finishes. The completion agent must send generated
images through the message tool. If the requester session is inactive and
some generated images are still missing from message-tool delivery, OpenClaw
sends an idempotent direct fallback with only the missing images.
Codex OAuth uses the same `openai/gpt-image-2` model ref. When an
`openai-codex` OAuth profile is configured, OpenClaw routes image
requests through that OAuth profile instead of first trying
`OPENAI_API_KEY`. Explicit `models.providers.openai` config (API key,
custom/Azure base URL) opts back into the direct OpenAI Images API
route.
The agent calls `image_generate` automatically. No tool allow-listing
needed - it is enabled by default when a provider is available. The tool
returns a background task id, then the completion agent sends the generated
attachment through the `message` tool when it is ready.
| Goal | Model ref | Auth |
|---|---|---|
| OpenAI image generation with API billing | openai/gpt-image-2 | OPENAI_API_KEY |
| OpenAI image generation with Codex subscription auth | openai/gpt-image-2 | OpenAI Codex OAuth |
| OpenAI transparent-background PNG/WebP | openai/gpt-image-1.5 | OPENAI_API_KEY or OpenAI Codex OAuth |
| DeepInfra image generation | deepinfra/black-forest-labs/FLUX-1-schnell | DEEPINFRA_API_KEY |
| OpenRouter image generation | openrouter/google/gemini-3.1-flash-image-preview | OPENROUTER_API_KEY |
| LiteLLM image generation | litellm/gpt-image-2 | LITELLM_API_KEY |
| Google Gemini image generation | google/gemini-3.1-flash-image-preview | GEMINI_API_KEY or GOOGLE_API_KEY |
The same image_generate tool handles text-to-image and reference-image
editing. Use image for one reference or images for multiple references.
Provider-supported output hints such as quality, outputFormat, and
background are forwarded when available and reported as ignored when a
provider does not support them. Bundled transparent-background support is
OpenAI-specific; other providers may still preserve PNG alpha if their
backend emits it.
| Provider | Default model | Edit support | Auth |
|---|---|---|---|
| ComfyUI | workflow | Yes (1 image, workflow-configured) | COMFY_API_KEY or COMFY_CLOUD_API_KEY for cloud |
| DeepInfra | black-forest-labs/FLUX-1-schnell | Yes (1 image) | DEEPINFRA_API_KEY |
| fal | fal-ai/flux/dev | Yes (model-specific limits) | FAL_KEY |
gemini-3.1-flash-image-preview | Yes | GEMINI_API_KEY or GOOGLE_API_KEY | |
| LiteLLM | gpt-image-2 | Yes (up to 5 input images) | LITELLM_API_KEY |
| MiniMax | image-01 | Yes (subject reference) | MINIMAX_API_KEY or MiniMax OAuth (minimax-portal) |
| OpenAI | gpt-image-2 | Yes (up to 4 images) | OPENAI_API_KEY or OpenAI Codex OAuth |
| OpenRouter | google/gemini-3.1-flash-image-preview | Yes (up to 5 input images) | OPENROUTER_API_KEY |
| Vydra | grok-imagine | No | VYDRA_API_KEY |
| xAI | grok-imagine-image | Yes (up to 5 images) | XAI_API_KEY |
Use action: "list" to inspect available providers and models at runtime:
/tool image_generate action=list
Use action: "status" to inspect the active image-generation task for the
current session:
/tool image_generate action=status
| Capability | ComfyUI | DeepInfra | fal | MiniMax | OpenAI | Vydra | xAI | |
|---|---|---|---|---|---|---|---|---|
| Generate (max count) | Workflow-defined | 4 | 4 | 4 | 9 | 4 | 1 | 4 |
| Edit / reference | 1 image (workflow) | 1 image | Flux: 1; GPT: 10; NB2: 14 | Up to 5 images | 1 image (subject ref) | Up to 5 images | - | Up to 5 images |
| Size control | - | ✓ | ✓ | ✓ | - | Up to 4K | - | - |
| Aspect ratio | - | - | ✓ | ✓ | ✓ | - | - | ✓ |
| Resolution (1K/2K/4K) | - | - | ✓ | ✓ | - | - | - | 1K, 2K |
{
agents: {
defaults: {
imageGenerationModel: {
primary: "openai/gpt-image-2",
timeoutMs: 180_000,
fallbacks: [
"openrouter/google/gemini-3.1-flash-image-preview",
"google/gemini-3.1-flash-image-preview",
"fal/fal-ai/flux/dev",
],
},
},
},
}
OpenClaw tries providers in this order:
model parameter from the tool call (if the agent specifies one).imageGenerationModel.primary from config.imageGenerationModel.fallbacks in order.If a provider fails (auth error, rate limit, etc.), the next configured candidate is tried automatically. If all fail, the error includes details from each attempt.
<AccordionGroup> <Accordion title="Per-call model overrides are exact"> A per-call `model` override tries only that provider/model and does not continue to configured primary/fallback or auto-detected providers. </Accordion> <Accordion title="Auto-detection is auth-aware"> A provider default only enters the candidate list when OpenClaw can actually authenticate that provider. Set `agents.defaults.mediaGenerationAutoProviderFallback: false` to use only explicit `model`, `primary`, and `fallbacks` entries. </Accordion> <Accordion title="Timeouts"> Set `agents.defaults.imageGenerationModel.timeoutMs` for slow image backends. A per-call `timeoutMs` tool parameter overrides the configured default, and configured defaults override plugin-authored provider defaults. Google and OpenRouter hosted image providers use 180 second defaults; xAI and Azure OpenAI image generation use 600 seconds. Codex dynamic-tool calls use a 120 second `image_generate` bridge default and honor the same timeout budget when configured, bounded by OpenClaw's 600000 ms dynamic-tool bridge maximum. </Accordion> <Accordion title="Inspect at runtime"> Use `action: "list"` to inspect the currently registered providers, their default models, and auth env-var hints. </Accordion> </AccordionGroup>OpenAI, OpenRouter, Google, DeepInfra, fal, MiniMax, ComfyUI, and xAI support editing reference images. Pass a reference image path or URL:
"Generate a watercolor version of this photo" + image: "/path/to/photo.jpg"
OpenAI, OpenRouter, Google, and xAI support up to 5 reference images via the
images parameter. fal supports 1 reference image for Flux image-to-image, up
to 10 for GPT Image 2 edits, and up to 14 for Nano Banana 2 edits. MiniMax and
ComfyUI support 1.
The `openai/gpt-image-1.5`, `openai/gpt-image-1`, and
`openai/gpt-image-1-mini` models can still be selected explicitly. Use
`gpt-image-1.5` for transparent-background PNG/WebP output; the current
`gpt-image-2` API rejects `background: "transparent"`.
`gpt-image-2` supports both text-to-image generation and
reference-image editing through the same `image_generate` tool.
OpenClaw forwards `prompt`, `count`, `size`, `quality`, `outputFormat`,
and reference images to OpenAI. OpenAI does **not** receive
`aspectRatio` or `resolution` directly; when possible OpenClaw maps
those into a supported `size`, otherwise the tool reports them as
ignored overrides.
OpenAI-specific options live under the `openai` object:
```json
{
"quality": "low",
"outputFormat": "jpeg",
"openai": {
"background": "opaque",
"moderation": "low",
"outputCompression": 60,
"user": "end-user-42"
}
}
```
`openai.background` accepts `transparent`, `opaque`, or `auto`;
transparent outputs require `outputFormat` `png` or `webp` and a
transparency-capable OpenAI image model. OpenClaw routes default
`gpt-image-2` transparent-background requests to `gpt-image-1.5`.
`openai.outputCompression` applies to JPEG/WebP outputs.
The top-level `background` hint is provider-neutral and currently maps
to the same OpenAI `background` request field when the OpenAI provider
is selected. Providers that do not declare background support return
it in `ignoredOverrides` instead of receiving the unsupported parameter.
To route OpenAI image generation through an Azure OpenAI deployment
instead of `api.openai.com`, see
[Azure OpenAI endpoints](/providers/openai#azure-openai-endpoints).
```json5
{
agents: {
defaults: {
imageGenerationModel: {
primary: "openrouter/google/gemini-3.1-flash-image-preview",
},
},
},
}
```
OpenClaw forwards `prompt`, `count`, reference images, and
Gemini-compatible `aspectRatio` / `resolution` hints to OpenRouter.
Current built-in OpenRouter image model shortcuts include
`google/gemini-3.1-flash-image-preview`,
`google/gemini-3-pro-image-preview`, and `openai/gpt-5.4-image-2`. Use
`action: "list"` to see what your configured plugin exposes.
- `minimax/image-01` for API-key setups
- `minimax-portal/image-01` for OAuth setups
- Models: `xai/grok-imagine-image`, `xai/grok-imagine-image-quality`
- Count: up to 4
- References: one `image` or up to five `images`
- Aspect ratios: `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `2:3`, `3:2`
- Resolutions: `1K`, `2K`
- Outputs: returned as OpenClaw-managed image attachments
OpenClaw intentionally does not expose xAI-native `quality`, `mask`,
`user`, or extra native-only aspect ratios until those controls exist
in the shared cross-provider `image_generate` contract.
Equivalent CLI:
openclaw infer image generate \
--model openai/gpt-image-1.5 \
--output-format png \
--background transparent \
--prompt "A simple red circle sticker on a transparent background" \
--json
The same --output-format and --background flags are available on
openclaw infer image edit; --openai-background remains as an
OpenAI-specific alias. Bundled providers other than OpenAI do not declare
explicit background control today, so background: "transparent" is reported
as ignored for them.
imageGenerationModel config