Use this file to discover all available pages before exploring further.

Image generation

The

text

image_generate

tool lets the agent create and edit images using your configured providers. Generated images are delivered automatically as media attachments in the agent's reply.

note

The tool only appears when at least one image-generation provider is available. If you do not see `image_generate` in your agent's tools, configure `agents.defaults.imageGenerationModel`, set up a provider API key, or sign in with OpenAI Codex OAuth.

Quick start

Configure auth

Set an API key for at least one provider (for example `OPENAI_API_KEY`, `GEMINI_API_KEY`, `OPENROUTER_API_KEY`) or sign in with OpenAI Codex OAuth.

Pick a default model (optional)

```json5} { agents: { defaults: { imageGenerationModel: { primary: "openai/gpt-image-2", timeoutMs: 180_000, }, }, }, } ```


text
Codex OAuth uses the same `openai/gpt-image-2` model ref. When an
`openai-codex` OAuth profile is configured, OpenClaw routes image
requests through that OAuth profile instead of first trying
`OPENAI_API_KEY`. Explicit `models.providers.openai` config (API key,
custom/Azure base URL) opts back into the direct OpenAI Images API
route.

Ask the agent

*"Generate an image of a friendly robot mascot."*


text
The agent calls `image_generate` automatically. No tool allow-listing
needed — it is enabled by default when a provider is available.

warning

For OpenAI-compatible LAN endpoints such as LocalAI, keep the custom `models.providers.openai.baseUrl` and explicitly opt in with `browser.ssrfPolicy.dangerouslyAllowPrivateNetwork: true`. Private and internal image endpoints remain blocked by default.

Common routes

Goal	Model ref	Auth
OpenAI image generation with API billing	text `openai/gpt-image-2`	text `OPENAI_API_KEY`
OpenAI image generation with Codex subscription auth	text `openai/gpt-image-2`	OpenAI Codex OAuth
OpenAI transparent-background PNG/WebP	text `openai/gpt-image-1.5`	text `OPENAI_API_KEY` or OpenAI Codex OAuth
DeepInfra image generation	text `deepinfra/black-forest-labs/FLUX-1-schnell`	text `DEEPINFRA_API_KEY`
OpenRouter image generation	text `openrouter/google/gemini-3.1-flash-image-preview`	text `OPENROUTER_API_KEY`
LiteLLM image generation	text `litellm/gpt-image-2`	text `LITELLM_API_KEY`
Google Gemini image generation	text `google/gemini-3.1-flash-image-preview`	text `GEMINI_API_KEY` or text `GOOGLE_API_KEY`

The same

text

image_generate

tool handles text-to-image and reference-image editing. Use

text

image

for one reference or

text

images

for multiple references. Provider-supported output hints such as

text

quality

text

outputFormat

, and

text

background

are forwarded when available and reported as ignored when a provider does not support them. Bundled transparent-background support is OpenAI-specific; other providers may still preserve PNG alpha if their backend emits it.

Supported providers

Provider	Default model	Edit support	Auth
ComfyUI	text `workflow`	Yes (1 image, workflow-configured)	text `COMFY_API_KEY` or text `COMFY_CLOUD_API_KEY` for cloud
DeepInfra	text `black-forest-labs/FLUX-1-schnell`	Yes (1 image)	text `DEEPINFRA_API_KEY`
fal	text `fal-ai/flux/dev`	Yes	text `FAL_KEY`
Google	text `gemini-3.1-flash-image-preview`	Yes	text `GEMINI_API_KEY` or text `GOOGLE_API_KEY`
LiteLLM	text `gpt-image-2`	Yes (up to 5 input images)	text `LITELLM_API_KEY`
MiniMax	text `image-01`	Yes (subject reference)	text `MINIMAX_API_KEY` or MiniMax OAuth ( text `minimax-portal` )
OpenAI	text `gpt-image-2`	Yes (up to 4 images)	text `OPENAI_API_KEY` or OpenAI Codex OAuth
OpenRouter	text `google/gemini-3.1-flash-image-preview`	Yes (up to 5 input images)	text `OPENROUTER_API_KEY`
Vydra	text `grok-imagine`	No	text `VYDRA_API_KEY`
xAI	text `grok-imagine-image`	Yes (up to 5 images)	text `XAI_API_KEY`

Use

text

action: "list"

to inspect available providers and models at runtime:


text
/tool image_generate action=list

Provider capabilities

Capability	ComfyUI	DeepInfra	fal	Google	MiniMax	OpenAI	Vydra	xAI
Generate (max count)	Workflow-defined	4	4	4	9	4	1	4
Edit / reference	1 image (workflow)	1 image	1 image	Up to 5 images	1 image (subject ref)	Up to 5 images	—	Up to 5 images
Size control	—	✓	✓	✓	—	Up to 4K	—	—
Aspect ratio	—	—	✓ (generate only)	✓	✓	—	—	✓
Resolution (1K/2K/4K)	—	—	✓	✓	—	—	—	1K, 2K

Tool parameters

Image generation prompt. Required for `action: "generate"`. Use `"list"` to inspect available providers and models at runtime. Provider/model override (e.g. `openai/gpt-image-2`). Use `openai/gpt-image-1.5` for transparent OpenAI backgrounds. Single reference image path or URL for edit mode. Multiple reference images for edit mode (up to 5 on supporting providers). Size hint: `1024x1024`, `1536x1024`, `1024x1536`, `2048x2048`, `3840x2160`. Aspect ratio: `1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, `21:9`.

Resolution hint.

Quality hint when the provider supports it. Output format hint when the provider supports it. Background hint when the provider supports it. Use `transparent` with `outputFormat: "png"` or `"webp"` for transparency-capable providers.

Number of images to generate (1–4). Optional provider request timeout in milliseconds. Output filename hint.

OpenAI-only hints: `background`, `moderation`, `outputCompression`, and `user`.

note

Not all providers support all parameters. When a fallback provider supports a nearby geometry option instead of the exact requested one, OpenClaw remaps to the closest supported size, aspect ratio, or resolution before submission. Unsupported output hints are dropped for providers that do not declare support and reported in the tool result. Tool results report the applied settings; `details.normalization` captures any requested-to-applied translation.

Configuration

Model selection


json5
{
  agents: {
    defaults: {
      imageGenerationModel: {
        primary: "openai/gpt-image-2",
        timeoutMs: 180_000,
        fallbacks: [
          "openrouter/google/gemini-3.1-flash-image-preview",
          "google/gemini-3.1-flash-image-preview",
          "fal/fal-ai/flux/dev",
        ],
      },
    },
  },
}

Provider selection order

OpenClaw tries providers in this order:

text
model
parameter from the tool call (if the agent specifies one).
text
imageGenerationModel.primary
from config.
text
imageGenerationModel.fallbacks
in order.
Auto-detection — auth-backed provider defaults only:
- current default provider first;
- remaining registered image-generation providers in provider-id order.

If a provider fails (auth error, rate limit, etc.), the next configured candidate is tried automatically. If all fail, the error includes details from each attempt.

Image editing

OpenAI, OpenRouter, Google, DeepInfra, fal, MiniMax, ComfyUI, and xAI support editing reference images. Pass a reference image path or URL:


text
"Generate a watercolor version of this photo" + image: "/path/to/photo.jpg"

OpenAI, OpenRouter, Google, and xAI support up to 5 reference images via the

text

images

parameter. fal, MiniMax, and ComfyUI support 1.

Provider deep dives

Examples

```text} /tool image_generate action=generate model=openai/gpt-image-2 prompt="A clean editorial poster for OpenClaw image generation" size=3840x2160 count=1 ``` ```text} /tool image_generate action=generate model=openai/gpt-image-1.5 prompt="A simple red circle sticker on a transparent background" outputFormat=png background=transparent ```


text
Equivalent CLI:

```bash}
openclaw infer image generate \
  --model openai/gpt-image-1.5 \
  --output-format png \
  --background transparent \
  --prompt "A simple red circle sticker on a transparent background" \
  --json
```

```text} /tool image_generate action=generate model=openai/gpt-image-2 prompt="Two visual directions for a calm productivity app icon" size=1024x1024 count=2 ``` ```text} /tool image_generate action=generate model=openai/gpt-image-2 prompt="Keep the subject, replace the background with a bright studio setup" image=/path/to/reference.png size=1024x1536 ``` ```text} /tool image_generate action=generate model=openai/gpt-image-2 prompt="Combine the character identity from the first image with the color palette from the second" images='["/path/to/character.png","/path/to/palette.jpg"]' size=1536x1024 ```

The same

text

--output-format

and

text

--background

flags are available on

text

openclaw infer image edit

;

text

--openai-background

remains as an OpenAI-specific alias. Bundled providers other than OpenAI do not declare explicit background control today, so

text

background: "transparent"

is reported as ignored for them.

Tools overview — all available agent tools
ComfyUI — local ComfyUI and Comfy Cloud workflow setup
fal — fal image and video provider setup
Google (Gemini) — Gemini image provider setup
MiniMax — MiniMax image provider setup
OpenAI — OpenAI Images provider setup
Vydra — Vydra image, video, and speech setup
xAI — Grok image, video, search, code execution, and TTS setup
Configuration reference —
text
imageGenerationModel
config
Models — model configuration and failover

OpenClaw Docs

Image generation

note

Quick start

Configure auth

Pick a default model (optional)

Ask the agent

warning

Common routes

Supported providers

Provider capabilities

Tool parameters

note

Configuration

Model selection

Provider selection order

Image editing

Provider deep dives

Examples

Related