Use this file to discover all available pages before exploring further.

Music generation

The

text

music_generate

tool lets the agent create music or audio through the shared music-generation capability with configured providers — Google, MiniMax, and workflow-configured ComfyUI today.

For session-backed agent runs, OpenClaw starts music generation as a background task, tracks it in the task ledger, then wakes the agent again when the track is ready so the agent can post the finished audio back into the original channel.

note

The built-in shared tool only appears when at least one music-generation provider is available. If you do not see `music_generate` in your agent's tools, configure `agents.defaults.musicGenerationModel` or set up a provider API key.

Quick start

Configure auth

Set an API key for at least one provider — for example `GEMINI_API_KEY` or `MINIMAX_API_KEY`.


text
  <Step title="Pick a default model (optional)">
    ```json5}
    {
      agents: {
        defaults: {
          musicGenerationModel: {
            primary: "google/lyria-3-clip-preview",
          },
        },
      },
    }
    ```
  </Step>

  <Step title="Ask the agent">
    *"Generate an upbeat synthpop track about a night drive through a
    neon city."*

    The agent calls `music_generate` automatically. No tool
    allow-listing needed.
  </Step>
</Steps>

For direct synchronous contexts without a session-backed agent run,
the built-in tool still falls back to inline generation and returns
the final media path in the tool result.

Configure the workflow

Configure `plugins.entries.comfy.config.music` with a workflow JSON and prompt/output nodes.


text
  <Step title="Cloud auth (optional)">
    For Comfy Cloud, set `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY`.
  </Step>

  <Step title="Call the tool">
    ```text}
    /tool music_generate prompt="Warm ambient synth loop with soft tape texture"
    ```
  </Step>
</Steps>

Example prompts:


text
Generate a cinematic piano track with soft strings and no vocals.


text
Generate an energetic chiptune loop about launching a rocket at sunrise.

Supported providers

Provider	Default model	Reference inputs	Supported controls	Auth
ComfyUI	text `workflow`	Up to 1 image	Workflow-defined music or audio	text `COMFY_API_KEY` , text `COMFY_CLOUD_API_KEY`
Google	text `lyria-3-clip-preview`	Up to 10 images	text `lyrics` , text `instrumental` , text `format`	text `GEMINI_API_KEY` , text `GOOGLE_API_KEY`
MiniMax	text `music-2.6`	None	text `lyrics` , text `instrumental` , text `durationSeconds` , text `format=mp3`	text `MINIMAX_API_KEY` or MiniMax OAuth

Capability matrix

The explicit mode contract used by

text

music_generate

, contract tests, and the shared live sweep:

Provider	text `generate`	text `edit`	Edit limit	Shared live lanes
ComfyUI	✓	✓	1 image	Not in the shared sweep; covered by text `extensions/comfy/comfy.live.test.ts`
Google	✓	✓	10 images	text `generate` , text `edit`
MiniMax	✓	—	None	text `generate`

Use

text

action: "list"

to inspect available shared providers and models at runtime:


text
/tool music_generate action=list

Use

text

action: "status"

to inspect the active session-backed music task:


text
/tool music_generate action=status

Direct generation example:


text
/tool music_generate prompt="Dreamy lo-fi hip hop with vinyl texture and gentle rain" instrumental=true

Tool parameters

Music generation prompt. Required for `action: "generate"`. `"status"` returns the current session task; `"list"` inspects providers. Provider/model override (e.g. `google/lyria-3-pro-preview`, `comfy/workflow`). Optional lyrics when the provider supports explicit lyric input. Request instrumental-only output when the provider supports it. Single reference image path or URL. Multiple reference images (up to 10 on supporting providers). Target duration in seconds when the provider supports duration hints. Output format hint when the provider supports it.

Output filename hint. Optional provider request timeout in milliseconds.

note

Not all providers support all parameters. OpenClaw still validates hard limits such as input counts before submission. When a provider supports duration but uses a shorter maximum than the requested value, OpenClaw clamps to the closest supported duration. Truly unsupported optional hints are ignored with a warning when the selected provider or model cannot honor them. Tool results report applied settings; `details.normalization` captures any requested-to-applied mapping.

Async behavior

Session-backed music generation runs as a background task:

Background task:
text
music_generate
creates a background task, returns a started/task response immediately, and posts the finished track later in a follow-up agent message.
Duplicate prevention: while a task is
text
queued
or
text
running
, later
text
music_generate
calls in the same session return task status instead of starting another generation. Use
text
action: "status"
to check explicitly.
Status lookup:
text
openclaw tasks list
or
text
openclaw tasks show <taskId>
inspects queued, running, and terminal status.
Completion wake: OpenClaw injects an internal completion event back into the same session so the model can write the user-facing follow-up itself.
Prompt hint: later user/manual turns in the same session get a small runtime hint when a music task is already in flight, so the model does not blindly call
text
music_generate
again.
No-session fallback: direct/local contexts without a real agent session run inline and return the final audio result in the same turn.

Task lifecycle

State	Meaning
text `queued`	Task created, waiting for the provider to accept it.
text `running`	Provider is processing (typically 30 seconds to 3 minutes depending on provider and duration).
text `succeeded`	Track ready; the agent wakes and posts it to the conversation.
text `failed`	Provider error or timeout; the agent wakes with error details.

Check status from the CLI:


bash
openclaw tasks list
openclaw tasks show <taskId>
openclaw tasks cancel <taskId>

Configuration

Model selection


json5
{
  agents: {
    defaults: {
      musicGenerationModel: {
        primary: "google/lyria-3-clip-preview",
        fallbacks: ["minimax/music-2.6"],
      },
    },
  },
}

Provider selection order

OpenClaw tries providers in this order:

text
model
parameter from the tool call (if the agent specifies one).
text
musicGenerationModel.primary
from config.
text
musicGenerationModel.fallbacks
in order.
Auto-detection using auth-backed provider defaults only:
- current default provider first;
- remaining registered music-generation providers in provider-id order.

If a provider fails, the next candidate is tried automatically. If all fail, the error includes details from each attempt.

Set

text

agents.defaults.mediaGenerationAutoProviderFallback: false

to use only explicit

text

model

text

primary

, and

text

fallbacks

entries.

Provider notes

Choosing the right path

Shared provider-backed when you want model selection, provider failover, and the built-in async task/status flow.
Plugin path (ComfyUI) when you need a custom workflow graph or a provider that is not part of the shared bundled music capability.

If you are debugging ComfyUI-specific behavior, see ComfyUI. If you are debugging shared provider behavior, start with Google (Gemini) or MiniMax.

Provider capability modes

The shared music-generation contract supports explicit mode declarations:

text
generate
for prompt-only generation.
text
edit
when the request includes one or more reference images.

New provider implementations should prefer explicit mode blocks:


typescript
capabilities: {
  generate: {
    maxTracks: 1,
    supportsLyrics: true,
    supportsFormat: true,
  },
  edit: {
    enabled: true,
    maxTracks: 1,
    maxInputImages: 1,
    supportsFormat: true,
  },
}

Legacy flat fields such as

text

maxInputImages

text

supportsLyrics

, and

text

supportsFormat

are not enough to advertise edit support. Providers should declare

text

generate

and

text

edit

explicitly so live tests, contract tests, and the shared

text

music_generate

tool can validate mode support deterministically.

Live tests

Opt-in live coverage for the shared bundled providers:


bash
OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/music-generation-providers.live.test.ts

Repo wrapper:


bash
pnpm test:live:media music

This live file loads missing provider env vars from

text

~/.profile

, prefers live/env API keys ahead of stored auth profiles by default, and runs both

text

generate

and declared

text

edit

coverage when the provider enables edit mode. Coverage today:

text
google
:
text
generate
plus
text
edit
text
minimax
:
text
generate
only
text
comfy
: separate Comfy live coverage, not the shared provider sweep

Opt-in live coverage for the bundled ComfyUI music path:


bash
OPENCLAW_LIVE_TEST=1 COMFY_LIVE_TEST=1 pnpm test:live -- extensions/comfy/comfy.live.test.ts

The Comfy live file also covers comfy image and video workflows when those sections are configured.

Background tasks — task tracking for detached
text
music_generate
runs
ComfyUI
Configuration reference —
text
musicGenerationModel
config
Google (Gemini)
MiniMax
Models — model configuration and failover
Tools overview

OpenClaw Docs

Music generation

note

Quick start

Configure auth

Configure the workflow

Supported providers

Capability matrix

Tool parameters

note

Async behavior

Task lifecycle

Configuration

Model selection

Provider selection order

Provider notes

Choosing the right path

Provider capability modes

Live tests

Related