docs/tools/music-generation.md
The music_generate tool lets the agent create music or audio through the
shared music-generation capability with configured providers — ComfyUI,
fal, Google, MiniMax, and OpenRouter today.
For session-backed agent runs, OpenClaw starts music generation as a background task, tracks it in the task ledger, then wakes the agent again when the track is ready so the agent can tell the user and attach the finished audio. Generated-media completions are delivered by the agent through the message tool; OpenClaw does not auto-post the file as a fallback if the completion agent writes only a private final reply. The completion wake explicitly warns the agent that normal final replies are private for this route.
<Note> The built-in shared tool only appears when at least one music-generation provider is available. If you do not see `music_generate` in your agent's tools, configure `agents.defaults.musicGenerationModel` or set up a provider API key. </Note> The agent calls `music_generate` automatically. No tool
allow-listing needed.
</Step>
</Steps>
For direct synchronous contexts without a session-backed agent run,
the built-in tool still falls back to inline generation and returns
the final media path in the tool result.
Example prompts:
Generate a cinematic piano track with soft strings and no vocals.
Generate an energetic chiptune loop about launching a rocket at sunrise.
| Provider | Default model | Reference inputs | Supported controls | Auth |
|---|---|---|---|---|
| ComfyUI | workflow | Up to 1 image | Workflow-defined music or audio | COMFY_API_KEY, COMFY_CLOUD_API_KEY |
| fal | fal-ai/minimax-music/v2.6 | None | lyrics, instrumental, durationSeconds, format | FAL_KEY or FAL_API_KEY |
lyria-3-clip-preview | Up to 10 images | lyrics, instrumental, format | GEMINI_API_KEY, GOOGLE_API_KEY | |
| MiniMax | music-2.6 | None | lyrics, instrumental, durationSeconds, format=mp3 | MINIMAX_API_KEY or MiniMax OAuth |
| OpenRouter | google/lyria-3-pro-preview | Up to 1 image | lyrics, instrumental, durationSeconds, format | OPENROUTER_API_KEY |
The explicit mode contract used by music_generate, contract tests, and the
shared live sweep:
| Provider | generate | edit | Edit limit | Shared live lanes |
|---|---|---|---|---|
| ComfyUI | ✓ | ✓ | 1 image | Not in the shared sweep; covered by extensions/comfy/comfy.live.test.ts |
| fal | ✓ | — | None | generate |
| ✓ | ✓ | 10 images | generate, edit | |
| MiniMax | ✓ | — | None | generate |
| OpenRouter | ✓ | ✓ | 1 image | generate, edit |
Use action: "list" to inspect available shared providers and models at
runtime:
/tool music_generate action=list
Use action: "status" to inspect the active session-backed music task:
/tool music_generate action=status
Direct generation example:
/tool music_generate prompt="Dreamy lo-fi hip hop with vinyl texture and gentle rain" instrumental=true
Provider request timeouts are operator configuration only. OpenClaw uses
agents.defaults.musicGenerationModel.timeoutMs when configured, raises values
below 120000ms to 120000ms, and otherwise defaults provider requests to
300000ms.
Session-backed music generation runs as a background task:
music_generate creates a background task, returns a
started/task response immediately, and posts the finished track later in
a follow-up agent message.queued or running, later
music_generate calls in the same session return task status instead of
starting another generation. Use action: "status" to check explicitly.openclaw tasks list or openclaw tasks show <taskId>
inspects queued, running, and terminal status.music_generate again.| State | Meaning |
|---|---|
queued | Task created, waiting for the provider to accept it. |
running | Provider is processing (typically 30 seconds to 3 minutes depending on provider and duration). |
succeeded | Track ready; the agent wakes and posts it to the conversation. |
failed | Provider error or timeout; the agent wakes with error details. |
Check status from the CLI:
openclaw tasks list
openclaw tasks show <taskId>
openclaw tasks cancel <taskId>
{
agents: {
defaults: {
musicGenerationModel: {
primary: "google/lyria-3-clip-preview",
fallbacks: ["fal/fal-ai/minimax-music/v2.6", "minimax/music-2.6"],
},
},
},
}
OpenClaw tries providers in this order:
model parameter from the tool call (if the agent specifies one).musicGenerationModel.primary from config.musicGenerationModel.fallbacks in order.If a provider fails, the next candidate is tried automatically. If all fail, the error includes details from each attempt.
Set agents.defaults.mediaGenerationAutoProviderFallback: false to use only
explicit model, primary, and fallbacks entries.
If you are debugging ComfyUI-specific behavior, see ComfyUI. If you are debugging shared provider behavior, start with fal, Google (Gemini), MiniMax, or OpenRouter.
The shared music-generation contract supports explicit mode declarations:
generate for prompt-only generation.edit when the request includes one or more reference images.New provider implementations should prefer explicit mode blocks:
capabilities: {
generate: {
maxTracks: 1,
supportsLyrics: true,
supportsFormat: true,
},
edit: {
enabled: true,
maxTracks: 1,
maxInputImages: 1,
supportsFormat: true,
},
}
Legacy flat fields such as maxInputImages, supportsLyrics, and
supportsFormat are not enough to advertise edit support. Providers
should declare generate and edit explicitly so live tests, contract
tests, and the shared music_generate tool can validate mode support
deterministically.
Opt-in live coverage for the shared bundled providers:
OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/music-generation-providers.live.test.ts
Repo wrapper:
pnpm test:live:media music
This live file uses already-exported provider env vars ahead of stored auth
profiles by default, and runs both generate and declared edit coverage when
the provider enables edit mode. Coverage today:
google: generate plus editfal: generate onlyminimax: generate onlyopenrouter: generate plus editcomfy: separate Comfy live coverage, not the shared provider sweepOpt-in live coverage for the bundled ComfyUI music path:
OPENCLAW_LIVE_TEST=1 COMFY_LIVE_TEST=1 pnpm test:live -- extensions/comfy/comfy.live.test.ts
The Comfy live file also covers comfy image and video workflows when those sections are configured.
music_generate runsmusicGenerationModel config