Use this file to discover all available pages before exploring further.

Testing: live suites

For quick start, QA runners, unit/integration suites, and Docker flows, see Testing. This page covers the live (network-touching) test suites: model matrix, CLI backends, ACP, and media-provider live tests, plus credential handling.

Live: local profile smoke commands

Source

text

~/.profile

before ad hoc live checks so provider keys and local tool paths match your shell:


bash
source ~/.profile

Safe media smoke:


bash
pnpm openclaw infer tts convert --local --json \
  --text "OpenClaw live smoke." \
  --output /tmp/openclaw-live-smoke.mp3

Safe voice-call readiness smoke:


bash
pnpm openclaw voicecall setup --json
pnpm openclaw voicecall smoke --to "+15555550123"

text

voicecall smoke

is a dry run unless

text

--yes

is also present. Use

text

--yes

only when you intentionally want to place a real notify call. For Twilio, Telnyx, and Plivo, a successful readiness check requires a public webhook URL; local-only loopback/private fallbacks are rejected by design.

Live: Android node capability sweep

Test:
text
src/gateway/android-node.capabilities.live.test.ts
Script:
text
pnpm android:test:integration
Goal: invoke every command currently advertised by a connected Android node and assert command contract behavior.
Scope:
- Preconditioned/manual setup (the suite does not install/run/pair the app).
- Command-by-command gateway
  text
  node.invoke
  validation for the selected Android node.
Required pre-setup:
- Android app already connected + paired to the gateway.
- App kept in foreground.
- Permissions/capture consent granted for capabilities you expect to pass.
Optional target overrides:
- text
  OPENCLAW_ANDROID_NODE_ID
  or
  text
  OPENCLAW_ANDROID_NODE_NAME
  .
- text
  OPENCLAW_ANDROID_GATEWAY_URL
  /
  text
  OPENCLAW_ANDROID_GATEWAY_TOKEN
  /
  text
  OPENCLAW_ANDROID_GATEWAY_PASSWORD
  .
Full Android setup details: Android App

Live: model smoke (profile keys)

Live tests are split into two layers so we can isolate failures:

“Direct model” tells us the provider/model can answer at all with the given key.
“Gateway smoke” tells us the full gateway+agent pipeline works for that model (sessions, history, tools, sandbox policy, etc.).

Layer 1: Direct model completion (no gateway)

Test:
text
src/agents/models.profiles.live.test.ts
Goal:
- Enumerate discovered models
- Use
  text
  getApiKeyForModel
  to select models you have creds for
- Run a small completion per model (and targeted regressions where needed)
How to enable:
- text
  pnpm test:live
  (or
  text
  OPENCLAW_LIVE_TEST=1
  if invoking Vitest directly)
Set
text
OPENCLAW_LIVE_MODELS=modern
(or
text
all
, alias for modern) to actually run this suite; otherwise it skips to keep
text
pnpm test:live
focused on gateway smoke
How to select models:
- text
  OPENCLAW_LIVE_MODELS=modern
  to run the modern allowlist (Opus/Sonnet 4.6+, GPT-5.2 + Codex, Gemini 3, DeepSeek V4, GLM 4.7, MiniMax M2.7, Grok 4)
- text
  OPENCLAW_LIVE_MODELS=all
  is an alias for the modern allowlist
- or
  text
  OPENCLAW_LIVE_MODELS="openai/gpt-5.5,openai-codex/gpt-5.5,anthropic/claude-opus-4-6,..."
  (comma allowlist)
- Modern/all sweeps default to a curated high-signal cap; set
  text
  OPENCLAW_LIVE_MAX_MODELS=0
  for an exhaustive modern sweep or a positive number for a smaller cap.
- Exhaustive sweeps use
  text
  OPENCLAW_LIVE_TEST_TIMEOUT_MS
  for the whole direct-model test timeout. Default: 60 minutes.
- Direct-model probes run with 20-way parallelism by default; set
  text
  OPENCLAW_LIVE_MODEL_CONCURRENCY
  to override.
How to select providers:
- text
  OPENCLAW_LIVE_PROVIDERS="google,google-antigravity,google-gemini-cli"
  (comma allowlist)
Where keys come from:
- By default: profile store and env fallbacks
- Set
  text
  OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1
  to enforce profile store only
Why this exists:
- Separates “provider API is broken / key is invalid” from “gateway agent pipeline is broken”
- Contains small, isolated regressions (example: OpenAI Responses/Codex Responses reasoning replay + tool-call flows)

Layer 2: Gateway + dev agent smoke (what "@openclaw" actually does)

Test:
text
src/gateway/gateway-models.profiles.live.test.ts
Goal:
- Spin up an in-process gateway
- Create/patch a
  text
  agent:dev:*
  session (model override per run)
- Iterate models-with-keys and assert:
  - “meaningful” response (no tools)
  - a real tool invocation works (read probe)
  - optional extra tool probes (exec+read probe)
  - OpenAI regression paths (tool-call-only → follow-up) keep working
Probe details (so you can explain failures quickly):
- text
  read
  probe: the test writes a nonce file in the workspace and asks the agent to
  text
  read
  it and echo the nonce back.
- text
  exec+read
  probe: the test asks the agent to
  text
  exec
  -write a nonce into a temp file, then
  text
  read
  it back.
- image probe: the test attaches a generated PNG (cat + randomized code) and expects the model to return
  text
  cat <CODE>
  .
- Implementation reference:
  text
  src/gateway/gateway-models.profiles.live.test.ts
  and
  text
  src/gateway/live-image-probe.ts
  .
How to enable:
- text
  pnpm test:live
  (or
  text
  OPENCLAW_LIVE_TEST=1
  if invoking Vitest directly)
How to select models:
- Default: modern allowlist (Opus/Sonnet 4.6+, GPT-5.2 + Codex, Gemini 3, DeepSeek V4, GLM 4.7, MiniMax M2.7, Grok 4)
- text
  OPENCLAW_LIVE_GATEWAY_MODELS=all
  is an alias for the modern allowlist
- Or set
  text
  OPENCLAW_LIVE_GATEWAY_MODELS="provider/model"
  (or comma list) to narrow
- Modern/all gateway sweeps default to a curated high-signal cap; set
  text
  OPENCLAW_LIVE_GATEWAY_MAX_MODELS=0
  for an exhaustive modern sweep or a positive number for a smaller cap.
How to select providers (avoid “OpenRouter everything”):
- text
  OPENCLAW_LIVE_GATEWAY_PROVIDERS="google,google-antigravity,google-gemini-cli,openai,anthropic,zai,minimax"
  (comma allowlist)
Tool + image probes are always on in this live test:
- text
  read
  probe +
  text
  exec+read
  probe (tool stress)
- image probe runs when the model advertises image input support
- Flow (high level):
  - Test generates a tiny PNG with “CAT” + random code (
    text
    src/gateway/live-image-probe.ts
    )
  - Sends it via
    text
    agent
    
    text
    attachments: [{ mimeType: "image/png", content: "<base64>" }]
  - Gateway parses attachments into
    text
    images[]
    (
    text
    src/gateway/server-methods/agent.ts
    +
    text
    src/gateway/chat-attachments.ts
    )
  - Embedded agent forwards a multimodal user message to the model
  - Assertion: reply contains
    text
    cat
    + the code (OCR tolerance: minor mistakes allowed)

tip

To see what you can test on your machine (and the exact `provider/model` ids), run:


bash
openclaw models list
openclaw models list --json

Live: CLI backend smoke (Claude, Codex, Gemini, or other local CLIs)

Test:
text
src/gateway/gateway-cli-backend.live.test.ts
Goal: validate the Gateway + agent pipeline using a local CLI backend, without touching your default config.
Backend-specific smoke defaults live with the owning extension's
text
cli-backend.ts
definition.
Enable:
- text
  pnpm test:live
  (or
  text
  OPENCLAW_LIVE_TEST=1
  if invoking Vitest directly)
- text
  OPENCLAW_LIVE_CLI_BACKEND=1
Defaults:
- Default provider/model:
  text
  claude-cli/claude-sonnet-4-6
- Command/args/image behavior come from the owning CLI backend plugin metadata.
Overrides (optional):
- text
  OPENCLAW_LIVE_CLI_BACKEND_MODEL="codex-cli/gpt-5.5"
- text
  OPENCLAW_LIVE_CLI_BACKEND_COMMAND="/full/path/to/codex"
- text
  OPENCLAW_LIVE_CLI_BACKEND_ARGS='["exec","--json","--color","never","--sandbox","read-only","--skip-git-repo-check"]'
- text
  OPENCLAW_LIVE_CLI_BACKEND_IMAGE_PROBE=1
  to send a real image attachment (paths are injected into the prompt). Docker recipes default this off unless explicitly requested.
- text
  OPENCLAW_LIVE_CLI_BACKEND_IMAGE_ARG="--image"
  to pass image file paths as CLI args instead of prompt injection.
- text
  OPENCLAW_LIVE_CLI_BACKEND_IMAGE_MODE="repeat"
  (or
  text
  "list"
  ) to control how image args are passed when
  text
  IMAGE_ARG
  is set.
- text
  OPENCLAW_LIVE_CLI_BACKEND_RESUME_PROBE=1
  to send a second turn and validate resume flow.
- text
  OPENCLAW_LIVE_CLI_BACKEND_MODEL_SWITCH_PROBE=1
  to opt into the Claude Sonnet -> Opus same-session continuity probe when the selected model supports a switch target. Docker recipes default this off for aggregate reliability.
- text
  OPENCLAW_LIVE_CLI_BACKEND_MCP_PROBE=1
  to opt into the MCP/tool loopback probe. Docker recipes default this off unless explicitly requested.

Example:


bash
OPENCLAW_LIVE_CLI_BACKEND=1 \
  OPENCLAW_LIVE_CLI_BACKEND_MODEL="codex-cli/gpt-5.5" \
  pnpm test:live src/gateway/gateway-cli-backend.live.test.ts

Cheap Gemini MCP config smoke:


bash
OPENCLAW_LIVE_TEST=1 \
  pnpm test:live src/agents/cli-runner/bundle-mcp.gemini.live.test.ts

This does not ask Gemini to generate a response. It writes the same system settings OpenClaw gives Gemini, then runs

text

gemini --debug mcp list

to prove a saved

text

transport: "streamable-http"

server is normalized to Gemini's HTTP MCP shape and can connect to a local streamable-HTTP MCP server.

Docker recipe:


bash
pnpm test:docker:live-cli-backend

Single-provider Docker recipes:


bash
pnpm test:docker:live-cli-backend:claude
pnpm test:docker:live-cli-backend:claude-subscription
pnpm test:docker:live-cli-backend:codex
pnpm test:docker:live-cli-backend:gemini

Notes:

The Docker runner lives at
text
scripts/test-live-cli-backend-docker.sh
.
It runs the live CLI-backend smoke inside the repo Docker image as the non-root
text
node
user.
It resolves CLI smoke metadata from the owning extension, then installs the matching Linux CLI package (
text
@anthropic-ai/claude-code
,
text
@openai/codex
, or
text
@google/gemini-cli
) into a cached writable prefix at
text
OPENCLAW_DOCKER_CLI_TOOLS_DIR
(default:
text
~/.cache/openclaw/docker-cli-tools
).
text
pnpm test:docker:live-cli-backend:claude-subscription
requires portable Claude Code subscription OAuth through either
text
~/.claude/.credentials.json
with
text
claudeAiOauth.subscriptionType
or
text
CLAUDE_CODE_OAUTH_TOKEN
from
text
claude setup-token
. It first proves direct
text
claude -p
in Docker, then runs two Gateway CLI-backend turns without preserving Anthropic API-key env vars. This subscription lane disables the Claude MCP/tool and image probes by default because Claude currently routes third-party app usage through extra-usage billing instead of normal subscription plan limits.
The live CLI-backend smoke now exercises the same end-to-end flow for Claude, Codex, and Gemini: text turn, image classification turn, then MCP
text
cron
tool call verified through the gateway CLI.
Claude's default smoke also patches the session from Sonnet to Opus and verifies the resumed session still remembers an earlier note.

Live: ACP bind smoke (
text
`/acp spawn ... --bind here`
)

Test:
text
src/gateway/gateway-acp-bind.live.test.ts
Goal: validate the real ACP conversation-bind flow with a live ACP agent:
- send
  text
  /acp spawn <agent> --bind here
- bind a synthetic message-channel conversation in place
- send a normal follow-up on that same conversation
- verify the follow-up lands in the bound ACP session transcript
Enable:
- text
  pnpm test:live src/gateway/gateway-acp-bind.live.test.ts
- text
  OPENCLAW_LIVE_ACP_BIND=1
Defaults:
- ACP agents in Docker:
  text
  claude,codex,gemini
- ACP agent for direct
  text
  pnpm test:live ...
  :
  text
  claude
- Synthetic channel: Slack DM-style conversation context
- ACP backend:
  text
  acpx
Overrides:
- text
  OPENCLAW_LIVE_ACP_BIND_AGENT=claude
- text
  OPENCLAW_LIVE_ACP_BIND_AGENT=codex
- text
  OPENCLAW_LIVE_ACP_BIND_AGENT=droid
- text
  OPENCLAW_LIVE_ACP_BIND_AGENT=gemini
- text
  OPENCLAW_LIVE_ACP_BIND_AGENT=opencode
- text
  OPENCLAW_LIVE_ACP_BIND_AGENTS=claude,codex,gemini
- text
  OPENCLAW_LIVE_ACP_BIND_AGENT_COMMAND='npx -y @agentclientprotocol/claude-agent-acp@<version>'
- text
  OPENCLAW_LIVE_ACP_BIND_CODEX_MODEL=gpt-5.5
- text
  OPENCLAW_LIVE_ACP_BIND_OPENCODE_MODEL=opencode/kimi-k2.6
- text
  OPENCLAW_LIVE_ACP_BIND_REQUIRE_TRANSCRIPT=1
- text
  OPENCLAW_LIVE_ACP_BIND_REQUIRE_CRON=1
- text
  OPENCLAW_LIVE_ACP_BIND_PARENT_MODEL=openai/gpt-5.5
Notes:
- This lane uses the gateway
  text
  chat.send
  surface with admin-only synthetic originating-route fields so tests can attach message-channel context without pretending to deliver externally.
- When
  text
  OPENCLAW_LIVE_ACP_BIND_AGENT_COMMAND
  is unset, the test uses the embedded
  text
  acpx
  plugin's built-in agent registry for the selected ACP harness agent.
- Bound-session cron MCP creation is best-effort by default because external ACP harnesses can cancel MCP calls after the bind/image proof has passed; set
  text
  OPENCLAW_LIVE_ACP_BIND_REQUIRE_CRON=1
  to make that post-bind cron probe strict.

Example:


bash
OPENCLAW_LIVE_ACP_BIND=1 \
  OPENCLAW_LIVE_ACP_BIND_AGENT=claude \
  pnpm test:live src/gateway/gateway-acp-bind.live.test.ts

Docker recipe:


bash
pnpm test:docker:live-acp-bind

Single-agent Docker recipes:


bash
pnpm test:docker:live-acp-bind:claude
pnpm test:docker:live-acp-bind:codex
pnpm test:docker:live-acp-bind:droid
pnpm test:docker:live-acp-bind:gemini
pnpm test:docker:live-acp-bind:opencode

Docker notes:

The Docker runner lives at
text
scripts/test-live-acp-bind-docker.sh
.
By default, it runs the ACP bind smoke against the aggregate live CLI agents in sequence:
text
claude
,
text
codex
, then
text
gemini
.
Use
text
OPENCLAW_LIVE_ACP_BIND_AGENTS=claude
,
text
OPENCLAW_LIVE_ACP_BIND_AGENTS=codex
,
text
OPENCLAW_LIVE_ACP_BIND_AGENTS=droid
,
text
OPENCLAW_LIVE_ACP_BIND_AGENTS=gemini
, or
text
OPENCLAW_LIVE_ACP_BIND_AGENTS=opencode
to narrow the matrix.
It sources
text
~/.profile
, stages the matching CLI auth material into the container, then installs the requested live CLI (
text
@anthropic-ai/claude-code
,
text
@openai/codex
, Factory Droid via
text
https://app.factory.ai/cli
,
text
@google/gemini-cli
, or
text
opencode-ai
) if missing. The ACP backend itself is the bundled embedded
text
acpx/runtime
package from the
text
acpx
plugin.
The Droid Docker variant stages
text
~/.factory
for settings, forwards
text
FACTORY_API_KEY
, and requires that API key because local Factory OAuth/keyring auth is not portable into the container. It uses ACPX's built-in
text
droid exec --output-format acp
registry entry.
The OpenCode Docker variant is a strict single-agent regression lane. It writes a temporary
text
OPENCODE_CONFIG_CONTENT
default model from
text
OPENCLAW_LIVE_ACP_BIND_OPENCODE_MODEL
(default
text
opencode/kimi-k2.6
) after sourcing
text
~/.profile
, and
text
pnpm test:docker:live-acp-bind:opencode
requires a bound assistant transcript instead of accepting the generic post-bind skip.
Direct
text
acpx
CLI calls are only a manual/workaround path for comparing behavior outside the Gateway. The Docker ACP bind smoke exercises OpenClaw's embedded
text
acpx
runtime backend.

Live: Codex app-server harness smoke

Goal: validate the plugin-owned Codex harness through the normal gateway
text
agent
method:
- load the bundled
  text
  codex
  plugin
- select
  text
  OPENCLAW_AGENT_RUNTIME=codex
- send a first gateway agent turn to
  text
  openai/gpt-5.5
  with the Codex harness forced
- send a second turn to the same OpenClaw session and verify the app-server thread can resume
- run
  text
  /codex status
  and
  text
  /codex models
  through the same gateway command path
- optionally run two Guardian-reviewed escalated shell probes: one benign command that should be approved and one fake-secret upload that should be denied so the agent asks back
Test:
text
src/gateway/gateway-codex-harness.live.test.ts
Enable:
text
OPENCLAW_LIVE_CODEX_HARNESS=1
Default model:
text
openai/gpt-5.5
Optional image probe:
text
OPENCLAW_LIVE_CODEX_HARNESS_IMAGE_PROBE=1
Optional MCP/tool probe:
text
OPENCLAW_LIVE_CODEX_HARNESS_MCP_PROBE=1
Optional Guardian probe:
text
OPENCLAW_LIVE_CODEX_HARNESS_GUARDIAN_PROBE=1
The smoke sets
text
OPENCLAW_AGENT_HARNESS_FALLBACK=none
so a broken Codex harness cannot pass by silently falling back to PI.
Auth: Codex app-server auth from the local Codex subscription login. Docker smokes can also provide
text
OPENAI_API_KEY
for non-Codex probes when applicable, plus optional copied
text
~/.codex/auth.json
and
text
~/.codex/config.toml
.

Local recipe:


bash
source ~/.profile
OPENCLAW_LIVE_CODEX_HARNESS=1 \
  OPENCLAW_LIVE_CODEX_HARNESS_IMAGE_PROBE=1 \
  OPENCLAW_LIVE_CODEX_HARNESS_MCP_PROBE=1 \
  OPENCLAW_LIVE_CODEX_HARNESS_GUARDIAN_PROBE=1 \
  OPENCLAW_LIVE_CODEX_HARNESS_MODEL=openai/gpt-5.5 \
  pnpm test:live -- src/gateway/gateway-codex-harness.live.test.ts

Docker recipe:


bash
source ~/.profile
pnpm test:docker:live-codex-harness

Docker notes:

The Docker runner lives at
text
scripts/test-live-codex-harness-docker.sh
.
It sources the mounted
text
~/.profile
, passes
text
OPENAI_API_KEY
, copies Codex CLI auth files when present, installs
text
@openai/codex
into a writable mounted npm prefix, stages the source tree, then runs only the Codex-harness live test.
Docker enables the image, MCP/tool, and Guardian probes by default. Set
text
OPENCLAW_LIVE_CODEX_HARNESS_IMAGE_PROBE=0
or
text
OPENCLAW_LIVE_CODEX_HARNESS_MCP_PROBE=0
or
text
OPENCLAW_LIVE_CODEX_HARNESS_GUARDIAN_PROBE=0
when you need a narrower debug run.
Docker also exports
text
OPENCLAW_AGENT_HARNESS_FALLBACK=none
, matching the live test config so legacy aliases or PI fallback cannot hide a Codex harness regression.

Recommended live recipes

Narrow, explicit allowlists are fastest and least flaky:

Single model, direct (no gateway):
- text
  OPENCLAW_LIVE_MODELS="openai/gpt-5.5" pnpm test:live src/agents/models.profiles.live.test.ts
Single model, gateway smoke:
- text
  OPENCLAW_LIVE_GATEWAY_MODELS="openai/gpt-5.5" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts
Tool calling across several providers:
- text
  OPENCLAW_LIVE_GATEWAY_MODELS="openai/gpt-5.5,openai-codex/gpt-5.5,anthropic/claude-opus-4-6,google/gemini-3-flash-preview,deepseek/deepseek-v4-flash,zai/glm-5.1,minimax/MiniMax-M2.7" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts
Google focus (Gemini API key + Antigravity):
- Gemini (API key):
  text
  OPENCLAW_LIVE_GATEWAY_MODELS="google/gemini-3-flash-preview" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts
- Antigravity (OAuth):
  text
  OPENCLAW_LIVE_GATEWAY_MODELS="google-antigravity/claude-opus-4-6-thinking,google-antigravity/gemini-3-pro-high" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts
Google adaptive thinking smoke:
- If local keys live in shell profile:
  text
  source ~/.profile
- Gemini 3 dynamic default:
  text
  pnpm openclaw qa manual --provider-mode live-frontier --model google/gemini-3.1-pro-preview --alt-model google/gemini-3.1-pro-preview --message '/think adaptive Reply exactly: GEMINI_ADAPTIVE_OK' --timeout-ms 180000
- Gemini 2.5 dynamic budget:
  text
  pnpm openclaw qa manual --provider-mode live-frontier --model google/gemini-2.5-flash --alt-model google/gemini-2.5-flash --message '/think adaptive Reply exactly: GEMINI25_ADAPTIVE_OK' --timeout-ms 180000

Notes:

text
google/...
uses the Gemini API (API key).
text
google-antigravity/...
uses the Antigravity OAuth bridge (Cloud Code Assist-style agent endpoint).
text
google-gemini-cli/...
uses the local Gemini CLI on your machine (separate auth + tooling quirks).
Gemini API vs Gemini CLI:
- API: OpenClaw calls Google’s hosted Gemini API over HTTP (API key / profile auth); this is what most users mean by “Gemini”.
- CLI: OpenClaw shells out to a local
  text
  gemini
  binary; it has its own auth and can behave differently (streaming/tool support/version skew).

Live: model matrix (what we cover)

There is no fixed “CI model list” (live is opt-in), but these are the recommended models to cover regularly on a dev machine with keys.

Modern smoke set (tool calling + image)

This is the “common models” run we expect to keep working:

OpenAI (non-Codex):
text
openai/gpt-5.5
OpenAI Codex OAuth:
text
openai-codex/gpt-5.5
Anthropic:
text
anthropic/claude-opus-4-6
(or
text
anthropic/claude-sonnet-4-6
)
Google (Gemini API):
text
google/gemini-3.1-pro-preview
and
text
google/gemini-3-flash-preview
(avoid older Gemini 2.x models)
Google (Antigravity):
text
google-antigravity/claude-opus-4-6-thinking
and
text
google-antigravity/gemini-3-flash
DeepSeek:
text
deepseek/deepseek-v4-flash
and
text
deepseek/deepseek-v4-pro
Z.AI (GLM):
text
zai/glm-5.1
MiniMax:
text
minimax/MiniMax-M2.7

Run gateway smoke with tools + image:

text

OPENCLAW_LIVE_GATEWAY_MODELS="openai/gpt-5.5,openai-codex/gpt-5.5,anthropic/claude-opus-4-6,google/gemini-3.1-pro-preview,google/gemini-3-flash-preview,google-antigravity/claude-opus-4-6-thinking,google-antigravity/gemini-3-flash,deepseek/deepseek-v4-flash,zai/glm-5.1,minimax/MiniMax-M2.7" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts

Baseline: tool calling (Read + optional Exec)

Pick at least one per provider family:

OpenAI:
text
openai/gpt-5.5
Anthropic:
text
anthropic/claude-opus-4-6
(or
text
anthropic/claude-sonnet-4-6
)
Google:
text
google/gemini-3-flash-preview
(or
text
google/gemini-3.1-pro-preview
)
DeepSeek:
text
deepseek/deepseek-v4-flash
Z.AI (GLM):
text
zai/glm-5.1
MiniMax:
text
minimax/MiniMax-M2.7

Optional additional coverage (nice to have):

xAI:
text
xai/grok-4
(or latest available)
Mistral:
text
mistral/
… (pick one “tools” capable model you have enabled)
Cerebras:
text
cerebras/
… (if you have access)
LM Studio:
text
lmstudio/
… (local; tool calling depends on API mode)

Vision: image send (attachment → multimodal message)

Include at least one image-capable model in

text

OPENCLAW_LIVE_GATEWAY_MODELS

(Claude/Gemini/OpenAI vision-capable variants, etc.) to exercise the image probe.

Aggregators / alternate gateways

If you have keys enabled, we also support testing via:

OpenRouter:
text
openrouter/...
(hundreds of models; use
text
openclaw models scan
to find tool+image capable candidates)
OpenCode:
text
opencode/...
for Zen and
text
opencode-go/...
for Go (auth via
text
OPENCODE_API_KEY
/
text
OPENCODE_ZEN_API_KEY
)

More providers you can include in the live matrix (if you have creds/config):

Built-in:
text
openai
,
text
openai-codex
,
text
anthropic
,
text
google
,
text
google-vertex
,
text
google-antigravity
,
text
google-gemini-cli
,
text
zai
,
text
openrouter
,
text
opencode
,
text
opencode-go
,
text
xai
,
text
groq
,
text
cerebras
,
text
mistral
,
text
github-copilot
Via
text
models.providers
(custom endpoints):
text
minimax
(cloud/API), plus any OpenAI/Anthropic-compatible proxy (LM Studio, vLLM, LiteLLM, etc.)

tip

Do not hardcode "all models" in docs. The authoritative list is whatever `discoverModels(...)` returns on your machine plus whatever keys are available.

Credentials (never commit)

Live tests discover credentials the same way the CLI does. Practical implications:

If the CLI works, live tests should find the same keys.
If a live test says “no creds”, debug the same way you’d debug
text
openclaw models list
/ model selection.
Per-agent auth profiles:
text
~/.openclaw/agents/<agentId>/agent/auth-profiles.json
(this is what “profile keys” means in the live tests)
Config:
text
~/.openclaw/openclaw.json
(or
text
OPENCLAW_CONFIG_PATH
)
Legacy state dir:
text
~/.openclaw/credentials/
(copied into the staged live home when present, but not the main profile-key store)
Live local runs copy the active config, per-agent
text
auth-profiles.json
files, legacy
text
credentials/
, and supported external CLI auth dirs into a temp test home by default; staged live homes skip
text
workspace/
and
text
sandboxes/
, and
text
agents.*.workspace
/
text
agentDir
path overrides are stripped so probes stay off your real host workspace.

If you want to rely on env keys (e.g. exported in your

text

~/.profile

), run local tests after

text

source ~/.profile

, or use the Docker runners below (they can mount

text

~/.profile

into the container).

Deepgram live (audio transcription)

Test:
text
extensions/deepgram/audio.live.test.ts
Enable:
text
DEEPGRAM_API_KEY=... DEEPGRAM_LIVE_TEST=1 pnpm test:live extensions/deepgram/audio.live.test.ts

BytePlus coding plan live

Test:
text
extensions/byteplus/live.test.ts
Enable:
text
BYTEPLUS_API_KEY=... BYTEPLUS_LIVE_TEST=1 pnpm test:live extensions/byteplus/live.test.ts
Optional model override:
text
BYTEPLUS_CODING_MODEL=ark-code-latest

ComfyUI workflow media live

Test:
text
extensions/comfy/comfy.live.test.ts
Enable:
text
OPENCLAW_LIVE_TEST=1 COMFY_LIVE_TEST=1 pnpm test:live -- extensions/comfy/comfy.live.test.ts
Scope:
- Exercises the bundled comfy image, video, and
  text
  music_generate
  paths
- Skips each capability unless
  text
  plugins.entries.comfy.config.<capability>
  is configured
- Useful after changing comfy workflow submission, polling, downloads, or plugin registration

Image generation live

Test:
text
test/image-generation.runtime.live.test.ts
Command:
text
pnpm test:live test/image-generation.runtime.live.test.ts
Harness:
text
pnpm test:live:media image
Scope:
- Enumerates every registered image-generation provider plugin
- Loads missing provider env vars from your login shell (
  text
  ~/.profile
  ) before probing
- Uses live/env API keys ahead of stored auth profiles by default, so stale test keys in
  text
  auth-profiles.json
  do not mask real shell credentials
- Skips providers with no usable auth/profile/model
- Runs each configured provider through the shared image-generation runtime:
  - text
    <provider>:generate
  - text
    <provider>:edit
    when the provider declares edit support
Current bundled providers covered:
- text
  deepinfra
- text
  fal
- text
  google
- text
  minimax
- text
  openai
- text
  openrouter
- text
  vydra
- text
  xai
Optional narrowing:
- text
  OPENCLAW_LIVE_IMAGE_GENERATION_PROVIDERS="openai,google,openrouter,xai"
- text
  OPENCLAW_LIVE_IMAGE_GENERATION_PROVIDERS="deepinfra"
- text
  OPENCLAW_LIVE_IMAGE_GENERATION_MODELS="openai/gpt-image-2,google/gemini-3.1-flash-image-preview,openrouter/google/gemini-3.1-flash-image-preview,xai/grok-imagine-image"
- text
  OPENCLAW_LIVE_IMAGE_GENERATION_CASES="google:flash-generate,google:pro-edit,openrouter:generate,xai:default-generate,xai:default-edit"
Optional auth behavior:
- text
  OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1
  to force profile-store auth and ignore env-only overrides

For the shipped CLI path, add an

text

infer

smoke after the provider/runtime live test passes:


bash
OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_INFER_CLI_TEST=1 pnpm test:live -- test/image-generation.infer-cli.live.test.ts
openclaw infer image providers --json
openclaw infer image generate \
  --model google/gemini-3.1-flash-image-preview \
  --prompt "Minimal flat test image: one blue square on a white background, no text." \
  --output ./openclaw-infer-image-smoke.png \
  --json

This covers CLI argument parsing, config/default-agent resolution, bundled plugin activation, on-demand bundled runtime-dependency repair, the shared image-generation runtime, and the live provider request.

Music generation live

Test:
text
extensions/music-generation-providers.live.test.ts
Enable:
text
OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/music-generation-providers.live.test.ts
Harness:
text
pnpm test:live:media music
Scope:
- Exercises the shared bundled music-generation provider path
- Currently covers Google and MiniMax
- Loads provider env vars from your login shell (
  text
  ~/.profile
  ) before probing
- Uses live/env API keys ahead of stored auth profiles by default, so stale test keys in
  text
  auth-profiles.json
  do not mask real shell credentials
- Skips providers with no usable auth/profile/model
- Runs both declared runtime modes when available:
  - text
    generate
    with prompt-only input
  - text
    edit
    when the provider declares
    text
    capabilities.edit.enabled
- Current shared-lane coverage:
  - text
    google
    :
    text
    generate
    ,
    text
    edit
  - text
    minimax
    :
    text
    generate
  - text
    comfy
    : separate Comfy live file, not this shared sweep
Optional narrowing:
- text
  OPENCLAW_LIVE_MUSIC_GENERATION_PROVIDERS="google,minimax"
- text
  OPENCLAW_LIVE_MUSIC_GENERATION_MODELS="google/lyria-3-clip-preview,minimax/music-2.6"
Optional auth behavior:
- text
  OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1
  to force profile-store auth and ignore env-only overrides

Video generation live

Test:
text
extensions/video-generation-providers.live.test.ts
Enable:
text
OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/video-generation-providers.live.test.ts
Harness:
text
pnpm test:live:media video
Scope:
- Exercises the shared bundled video-generation provider path
- Defaults to the release-safe smoke path: non-FAL providers, one text-to-video request per provider, one-second lobster prompt, and a per-provider operation cap from
  text
  OPENCLAW_LIVE_VIDEO_GENERATION_TIMEOUT_MS
  (
  text
  180000
  by default)
- Skips FAL by default because provider-side queue latency can dominate release time; pass
  text
  --video-providers fal
  or
  text
  OPENCLAW_LIVE_VIDEO_GENERATION_PROVIDERS="fal"
  to run it explicitly
- Loads provider env vars from your login shell (
  text
  ~/.profile
  ) before probing
- Uses live/env API keys ahead of stored auth profiles by default, so stale test keys in
  text
  auth-profiles.json
  do not mask real shell credentials
- Skips providers with no usable auth/profile/model
- Runs only
  text
  generate
  by default
- Set
  text
  OPENCLAW_LIVE_VIDEO_GENERATION_FULL_MODES=1
  to also run declared transform modes when available:
  - text
    imageToVideo
    when the provider declares
    text
    capabilities.imageToVideo.enabled
    and the selected provider/model accepts buffer-backed local image input in the shared sweep
  - text
    videoToVideo
    when the provider declares
    text
    capabilities.videoToVideo.enabled
    and the selected provider/model accepts buffer-backed local video input in the shared sweep
- Current declared-but-skipped
  text
  imageToVideo
  providers in the shared sweep:
  - text
    vydra
    because bundled
    text
    veo3
    is text-only and bundled
    text
    kling
    requires a remote image URL
- Provider-specific Vydra coverage:
  - text
    OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_VYDRA_VIDEO=1 pnpm test:live -- extensions/vydra/vydra.live.test.ts
  - that file runs
    text
    veo3
    text-to-video plus a
    text
    kling
    lane that uses a remote image URL fixture by default
- Current
  text
  videoToVideo
  live coverage:
  - text
    runway
    only when the selected model is
    text
    runway/gen4_aleph
- Current declared-but-skipped
  text
  videoToVideo
  providers in the shared sweep:
  - text
    alibaba
    ,
    text
    qwen
    ,
    text
    xai
    because those paths currently require remote
    text
    http(s)
    / MP4 reference URLs
  - text
    google
    because the current shared Gemini/Veo lane uses local buffer-backed input and that path is not accepted in the shared sweep
  - text
    openai
    because the current shared lane lacks org-specific video inpaint/remix access guarantees
Optional narrowing:
- text
  OPENCLAW_LIVE_VIDEO_GENERATION_PROVIDERS="deepinfra,google,openai,runway"
- text
  OPENCLAW_LIVE_VIDEO_GENERATION_MODELS="google/veo-3.1-fast-generate-preview,openai/sora-2,runway/gen4_aleph"
- text
  OPENCLAW_LIVE_VIDEO_GENERATION_SKIP_PROVIDERS=""
  to include every provider in the default sweep, including FAL
- text
  OPENCLAW_LIVE_VIDEO_GENERATION_TIMEOUT_MS=60000
  to reduce each provider operation cap for an aggressive smoke run
Optional auth behavior:
- text
  OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1
  to force profile-store auth and ignore env-only overrides

Media live harness

Command:
text
pnpm test:live:media
Purpose:
- Runs the shared image, music, and video live suites through one repo-native entrypoint
- Auto-loads missing provider env vars from
  text
  ~/.profile
- Auto-narrows each suite to providers that currently have usable auth by default
- Reuses
  text
  scripts/test-live.mjs
  , so heartbeat and quiet-mode behavior stay consistent
Examples:
- text
  pnpm test:live:media
- text
  pnpm test:live:media image video --providers openai,google,minimax
- text
  pnpm test:live:media video --video-providers openai,runway --all-providers
- text
  pnpm test:live:media music --quiet

Testing — unit, integration, QA, and Docker suites

OpenClaw Docs

Testing: live suites

Live: local profile smoke commands

Live: Android node capability sweep

Live: model smoke (profile keys)

Layer 1: Direct model completion (no gateway)

Layer 2: Gateway + dev agent smoke (what "@openclaw" actually does)

tip

Live: CLI backend smoke (Claude, Codex, Gemini, or other local CLIs)

Live: ACP bind smoke (
text
`/acp spawn ... --bind here`
)

Live: Codex app-server harness smoke

Recommended live recipes

Live: model matrix (what we cover)

Modern smoke set (tool calling + image)

Baseline: tool calling (Read + optional Exec)

Vision: image send (attachment → multimodal message)

Aggregators / alternate gateways

tip

Credentials (never commit)

Deepgram live (audio transcription)

BytePlus coding plan live

ComfyUI workflow media live

Image generation live

Music generation live

Video generation live

Media live harness

Related

OpenClaw Docs

Testing: live suites

Live: local profile smoke commands

Live: Android node capability sweep

Live: model smoke (profile keys)

Layer 1: Direct model completion (no gateway)

Layer 2: Gateway + dev agent smoke (what "@openclaw" actually does)

tip

Live: CLI backend smoke (Claude, Codex, Gemini, or other local CLIs)

Live: ACP bind smoke (textCopy/acp spawn ... --bind here)

Live: Codex app-server harness smoke

Recommended live recipes

Live: model matrix (what we cover)

Modern smoke set (tool calling + image)

Baseline: tool calling (Read + optional Exec)

Vision: image send (attachment → multimodal message)

Aggregators / alternate gateways

tip

Credentials (never commit)

Deepgram live (audio transcription)

BytePlus coding plan live

ComfyUI workflow media live

Image generation live

Music generation live

Video generation live

Media live harness

Related

Live: ACP bind smoke (
text
`/acp spawn ... --bind here`
)