Use this file to discover all available pages before exploring further.

Tests

Full testing kit (suites, live, Docker): Testing
text
pnpm test:force
: Kills any lingering gateway process holding the default control port, then runs the full Vitest suite with an isolated gateway port so server tests don’t collide with a running instance. Use this when a prior gateway run left port 18789 occupied.
text
pnpm test:coverage
: Runs the unit suite with V8 coverage (via
text
vitest.unit.config.ts
). This is a loaded-file unit coverage gate, not whole-repo all-file coverage. Thresholds are 70% lines/functions/statements and 55% branches. Because
text
coverage.all
is false, the gate measures files loaded by the unit coverage suite instead of treating every split-lane source file as uncovered.
text
pnpm test:coverage:changed
: Runs unit coverage only for files changed since
text
origin/main
.
text
pnpm test:changed
: cheap smart changed test run. It runs precise targets from direct test edits, sibling
text
*.test.ts
files, explicit source mappings, and the local import graph. Broad/config/package changes are skipped unless they map to precise tests.
text
OPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed
: explicit broad changed test run. Use it when a test harness/config/package edit should fall back to Vitest's broader changed-test behavior.
text
pnpm changed:lanes
: shows the architectural lanes triggered by the diff against
text
origin/main
.
text
pnpm check:changed
: runs the smart changed check gate for the diff against
text
origin/main
. It runs typecheck, lint, and guard commands for the affected architectural lanes, but does not run Vitest tests. Use
text
pnpm test:changed
or explicit
text
pnpm test <target>
for test proof.
text
pnpm test
: routes explicit file/directory targets through scoped Vitest lanes. Untargeted runs use fixed shard groups and expand to leaf configs for local parallel execution; the extension group always expands to the per-extension shard configs instead of one giant root-project process.
Test wrapper runs end with a short
text
[test] passed|failed|skipped ... in ...
summary. Vitest's own duration line stays the per-shard detail.
Shared OpenClaw test state: use
text
src/test-utils/openclaw-test-state.ts
from Vitest when a test needs an isolated
text
HOME
,
text
OPENCLAW_STATE_DIR
,
text
OPENCLAW_CONFIG_PATH
, config fixture, workspace, agent dir, or auth-profile store.
Process E2E helpers: use
text
test/helpers/openclaw-test-instance.ts
when a Vitest process-level E2E test needs a running Gateway, CLI env, log capture, and cleanup in one place.
Docker/Bash E2E helpers: lanes that source
text
scripts/lib/docker-e2e-image.sh
can pass
text
docker_e2e_test_state_shell_b64 <label> <scenario>
into the container and decode it with
text
scripts/lib/openclaw-e2e-instance.sh
; multi-home scripts can pass
text
docker_e2e_test_state_function_b64
and call
text
openclaw_test_state_create <label> <scenario>
in each flow. Lower-level callers can use
text
scripts/lib/openclaw-test-state.mjs shell --label <name> --scenario <name>
for an in-container shell snippet, or
text
node scripts/lib/openclaw-test-state.mjs -- create --label <name> --scenario <name> --env-file <path> --json
for a sourceable host env file. The
text
--
before
text
create
keeps newer Node runtimes from treating
text
--env-file
as a Node flag. Docker/Bash lanes that launch a Gateway can source
text
scripts/lib/openclaw-e2e-instance.sh
inside the container for entrypoint resolution, mock OpenAI startup, Gateway foreground/background launch, readiness probes, state env export, log dumps, and process cleanup.
Full, extension, and include-pattern shard runs update local timing data in
text
.artifacts/vitest-shard-timings.json
; later whole-config runs use those timings to balance slow and fast shards. Include-pattern CI shards append the shard name to the timing key, which keeps filtered shard timings visible without replacing whole-config timing data. Set
text
OPENCLAW_TEST_PROJECTS_TIMINGS=0
to ignore the local timing artifact.
Selected
text
plugin-sdk
and
text
commands
test files now route through dedicated light lanes that keep only
text
test/setup.ts
, leaving runtime-heavy cases on their existing lanes.
Source files with sibling tests map to that sibling before falling back to wider directory globs. Helper edits under
text
src/channels/plugins/contracts/test-helpers
,
text
src/plugin-sdk/test-helpers
, and
text
src/plugins/contracts
use a local import graph to run importing tests instead of broad-running every shard when the dependency path is precise.
text
auto-reply
now also splits into three dedicated configs (
text
core
,
text
top-level
,
text
reply
) so the reply harness does not dominate the lighter top-level status/token/helper tests.
Base Vitest config now defaults to
text
pool: "threads"
and
text
isolate: false
, with the shared non-isolated runner enabled across the repo configs.
text
pnpm test:channels
runs
text
vitest.channels.config.ts
.
text
pnpm test:extensions
and
text
pnpm test extensions
run all extension/plugin shards. Heavy channel plugins, the browser plugin, and OpenAI run as dedicated shards; other plugin groups stay batched. Use
text
pnpm test extensions/<id>
for one bundled plugin lane.
text
pnpm test:perf:imports
: enables Vitest import-duration + import-breakdown reporting, while still using scoped lane routing for explicit file/directory targets.
text
pnpm test:perf:imports:changed
: same import profiling, but only for files changed since
text
origin/main
.
text
pnpm test:perf:changed:bench -- --ref <git-ref>
benchmarks the routed changed-mode path against the native root-project run for the same committed git diff.
text
pnpm test:perf:changed:bench -- --worktree
benchmarks the current worktree change set without committing first.
text
pnpm test:perf:profile:main
: writes a CPU profile for the Vitest main thread (
text
.artifacts/vitest-main-profile
).
text
pnpm test:perf:profile:runner
: writes CPU + heap profiles for the unit runner (
text
.artifacts/vitest-runner-profile
).
text
pnpm test:perf:groups --full-suite --allow-failures --output .artifacts/test-perf/baseline-before.json
: runs every full-suite Vitest leaf config serially and writes grouped duration data plus per-config JSON/log artifacts. The Test Performance Agent uses this as its baseline before attempting slow-test fixes.
text
pnpm test:perf:groups:compare .artifacts/test-perf/baseline-before.json .artifacts/test-perf/after-agent.json
: compares grouped reports after a performance-focused change.
Gateway integration: opt-in via
text
OPENCLAW_TEST_INCLUDE_GATEWAY=1 pnpm test
or
text
pnpm test:gateway
.
text
pnpm test:e2e
: Runs gateway end-to-end smoke tests (multi-instance WS/HTTP/node pairing). Defaults to
text
threads
+
text
isolate: false
with adaptive workers in
text
vitest.e2e.config.ts
; tune with
text
OPENCLAW_E2E_WORKERS=<n>
and set
text
OPENCLAW_E2E_VERBOSE=1
for verbose logs.
text
pnpm test:live
: Runs provider live tests (minimax/zai). Requires API keys and
text
LIVE=1
(or provider-specific
text
*_LIVE_TEST=1
) to unskip.
text
pnpm test:docker:all
: Builds the shared live-test image, packs OpenClaw once as an npm tarball, builds/reuses a bare Node/Git runner image plus a functional image that installs that tarball into
text
/app
, then runs Docker smoke lanes with
text
OPENCLAW_SKIP_DOCKER_BUILD=1
through a weighted scheduler. The bare image (
text
OPENCLAW_DOCKER_E2E_BARE_IMAGE
) is used for installer/update/plugin-dependency lanes; those lanes mount the prebuilt tarball instead of using copied repo sources. The functional image (
text
OPENCLAW_DOCKER_E2E_FUNCTIONAL_IMAGE
) is used for normal built-app functionality lanes.
text
scripts/package-openclaw-for-docker.mjs
is the single local/CI package packer and validates the tarball plus
text
dist/postinstall-inventory.json
before Docker consumes it. Docker lane definitions live in
text
scripts/lib/docker-e2e-scenarios.mjs
; planner logic lives in
text
scripts/lib/docker-e2e-plan.mjs
;
text
scripts/test-docker-all.mjs
executes the selected plan.
text
node scripts/test-docker-all.mjs --plan-json
emits the scheduler-owned CI plan for selected lanes, image kinds, package/live-image needs, state scenarios, and credential checks without building or running Docker.
text
OPENCLAW_DOCKER_ALL_PARALLELISM=<n>
controls process slots and defaults to 10;
text
OPENCLAW_DOCKER_ALL_TAIL_PARALLELISM=<n>
controls the provider-sensitive tail pool and defaults to 10. Heavy lane caps default to
text
OPENCLAW_DOCKER_ALL_LIVE_LIMIT=9
,
text
OPENCLAW_DOCKER_ALL_NPM_LIMIT=10
, and
text
OPENCLAW_DOCKER_ALL_SERVICE_LIMIT=7
; provider caps default to one heavy lane per provider via
text
OPENCLAW_DOCKER_ALL_LIVE_CLAUDE_LIMIT=4
,
text
OPENCLAW_DOCKER_ALL_LIVE_CODEX_LIMIT=4
, and
text
OPENCLAW_DOCKER_ALL_LIVE_GEMINI_LIMIT=4
. Use
text
OPENCLAW_DOCKER_ALL_WEIGHT_LIMIT
or
text
OPENCLAW_DOCKER_ALL_DOCKER_LIMIT
for larger hosts. If one lane exceeds the effective weight or resource cap on a low-parallelism host, it can still start from an empty pool and will run alone until it releases capacity. Lane starts are staggered by 2 seconds by default to avoid local Docker daemon create storms; override with
text
OPENCLAW_DOCKER_ALL_START_STAGGER_MS=<ms>
. The runner preflights Docker by default, cleans stale OpenClaw E2E containers, emits active-lane status every 30 seconds, shares provider CLI tool caches between compatible lanes, retries transient live-provider failures once by default (
text
OPENCLAW_DOCKER_ALL_LIVE_RETRIES=<n>
), and stores lane timings in
text
.artifacts/docker-tests/lane-timings.json
for longest-first ordering on later runs. Use
text
OPENCLAW_DOCKER_ALL_DRY_RUN=1
to print the lane manifest without running Docker,
text
OPENCLAW_DOCKER_ALL_STATUS_INTERVAL_MS=<ms>
to tune status output, or
text
OPENCLAW_DOCKER_ALL_TIMINGS=0
to disable timing reuse. Use
text
OPENCLAW_DOCKER_ALL_LIVE_MODE=skip
for deterministic/local lanes only or
text
OPENCLAW_DOCKER_ALL_LIVE_MODE=only
for live-provider lanes only; package aliases are
text
pnpm test:docker:local:all
and
text
pnpm test:docker:live:all
. Live-only mode merges main and tail live lanes into one longest-first pool so provider buckets can pack Claude, Codex, and Gemini work together. The runner stops scheduling new pooled lanes after the first failure unless
text
OPENCLAW_DOCKER_ALL_FAIL_FAST=0
is set, and each lane has a 120-minute fallback timeout overrideable with
text
OPENCLAW_DOCKER_ALL_LANE_TIMEOUT_MS
; selected live/tail lanes use tighter per-lane caps. CLI backend Docker setup commands have their own timeout via
text
OPENCLAW_LIVE_CLI_BACKEND_SETUP_TIMEOUT_SECONDS
(default 180). Per-lane logs,
text
summary.json
,
text
failures.json
, and phase timings are written under
text
.artifacts/docker-tests/<run-id>/
; use
text
pnpm test:docker:timings <summary.json>
to inspect slow lanes and
text
pnpm test:docker:rerun <run-id|summary.json|failures.json>
to print cheap targeted rerun commands.
text
pnpm test:docker:browser-cdp-snapshot
: Builds a Chromium-backed source E2E container, starts raw CDP plus an isolated Gateway, runs
text
browser doctor --deep
, and verifies CDP role snapshots include link URLs, cursor-promoted clickables, iframe refs, and frame metadata.
CLI backend live Docker probes can be run as focused lanes, for example
text
pnpm test:docker:live-cli-backend:codex
,
text
pnpm test:docker:live-cli-backend:codex:resume
, or
text
pnpm test:docker:live-cli-backend:codex:mcp
. Claude and Gemini have matching
text
:resume
and
text
:mcp
aliases.
text
pnpm test:docker:openwebui
: Starts Dockerized OpenClaw + Open WebUI, signs in through Open WebUI, checks
text
/api/models
, then runs a real proxied chat through
text
/api/chat/completions
. Requires a usable live model key (for example OpenAI in
text
~/.profile
), pulls an external Open WebUI image, and is not expected to be CI-stable like the normal unit/e2e suites.
text
pnpm test:docker:mcp-channels
: Starts a seeded Gateway container and a second client container that spawns
text
openclaw mcp serve
, then verifies routed conversation discovery, transcript reads, attachment metadata, live event queue behavior, outbound send routing, and Claude-style channel + permission notifications over the real stdio bridge. The Claude notification assertion reads the raw stdio MCP frames directly so the smoke reflects what the bridge actually emits.
text
pnpm test:docker:upgrade-survivor
: Installs the packed OpenClaw tarball over a dirty old-user fixture, runs package update plus non-interactive doctor without live provider or channel keys, then starts a loopback Gateway and checks that agents, channel config, plugin allowlists, workspace/session files, stale plugin runtime-deps state, startup, and RPC status survive.
text
pnpm test:docker:published-upgrade-survivor
: Installs
text
openclaw@latest
by default, seeds realistic existing-user files without live provider or channel keys, configures that baseline with a baked
text
openclaw config set
command recipe, updates that published install to the packed OpenClaw tarball, runs non-interactive doctor, writes
text
.artifacts/upgrade-survivor/summary.json
, then starts a loopback Gateway and checks that configured intents, workspace/session files, stale plugin config/runtime-deps state, startup, and RPC status survive or repair cleanly. Override the baseline with
text
OPENCLAW_UPGRADE_SURVIVOR_BASELINE_SPEC
; Package Acceptance exposes the same value as
text
published_upgrade_survivor_baseline
.

Local PR gate

For local PR land/gate checks, run:

text
pnpm check:changed
text
pnpm check
text
pnpm check:test-types
text
pnpm build
text
pnpm test
text
pnpm check:docs

text

pnpm test

flakes on a loaded host, rerun once before treating it as a regression, then isolate with

text

pnpm test <path/to/test>

. For memory-constrained hosts, use:

text
OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test
text
OPENCLAW_VITEST_FS_MODULE_CACHE_PATH=/tmp/openclaw-vitest-cache pnpm test:changed

Model latency bench (local keys)

Script:

text

scripts/bench-model.ts

Usage:

text
source ~/.profile && pnpm tsx scripts/bench-model.ts --runs 10
Optional env:
text
MINIMAX_API_KEY
,
text
MINIMAX_BASE_URL
,
text
MINIMAX_MODEL
,
text
ANTHROPIC_API_KEY
Default prompt: “Reply with a single word: ok. No punctuation or extra text.”

Last run (2025-12-31, 20 runs):

minimax median 1279ms (min 1114, max 2431)
opus median 2454ms (min 1224, max 3170)

CLI startup bench

Script:

text

scripts/bench-cli-startup.ts

Usage:

text
pnpm test:startup:bench
text
pnpm test:startup:bench:smoke
text
pnpm test:startup:bench:save
text
pnpm test:startup:bench:update
text
pnpm test:startup:bench:check
text
pnpm tsx scripts/bench-cli-startup.ts
text
pnpm tsx scripts/bench-cli-startup.ts --runs 12
text
pnpm tsx scripts/bench-cli-startup.ts --preset real
text
pnpm tsx scripts/bench-cli-startup.ts --preset real --case status --case gatewayStatus --runs 3
text
pnpm tsx scripts/bench-cli-startup.ts --preset real --case tasksJson --case tasksListJson --case tasksAuditJson --runs 3
text
pnpm tsx scripts/bench-cli-startup.ts --entry openclaw.mjs --entry-secondary dist/entry.js --preset all
text
pnpm tsx scripts/bench-cli-startup.ts --preset all --output .artifacts/cli-startup-bench-all.json
text
pnpm tsx scripts/bench-cli-startup.ts --preset real --case gatewayStatusJson --output .artifacts/cli-startup-bench-smoke.json
text
pnpm tsx scripts/bench-cli-startup.ts --preset real --cpu-prof-dir .artifacts/cli-cpu
text
pnpm tsx scripts/bench-cli-startup.ts --json

Presets:

text
startup
:
text
--version
,
text
--help
,
text
health
,
text
health --json
,
text
status --json
,
text
status
text
real
:
text
health
,
text
status
,
text
status --json
,
text
sessions
,
text
sessions --json
,
text
tasks --json
,
text
tasks list --json
,
text
tasks audit --json
,
text
agents list --json
,
text
gateway status
,
text
gateway status --json
,
text
gateway health --json
,
text
config get gateway.port
text
all
: both presets

Output includes

text

sampleCount

, avg, p50, p95, min/max, exit-code/signal distribution, and max RSS summaries for each command. Optional

text

--cpu-prof-dir

text

--heap-prof-dir

writes V8 profiles per run so timing and profile capture use the same harness.

Saved output conventions:

text
pnpm test:startup:bench:smoke
writes the targeted smoke artifact at
text
.artifacts/cli-startup-bench-smoke.json
text
pnpm test:startup:bench:save
writes the full-suite artifact at
text
.artifacts/cli-startup-bench-all.json
using
text
runs=5
and
text
warmup=1
text
pnpm test:startup:bench:update
refreshes the checked-in baseline fixture at
text
test/fixtures/cli-startup-bench.json
using
text
runs=5
and
text
warmup=1

Checked-in fixture:

text
test/fixtures/cli-startup-bench.json
Refresh with
text
pnpm test:startup:bench:update
Compare current results against the fixture with
text
pnpm test:startup:bench:check

Onboarding E2E (Docker)

Docker is optional; this is only needed for containerized onboarding smoke tests.

Full cold-start flow in a clean Linux container:


bash
scripts/e2e/onboard-docker.sh

This script drives the interactive wizard via a pseudo-tty, verifies config/workspace/session files, then starts the gateway and runs

text

openclaw health

QR import smoke (Docker)

Ensures the maintained QR runtime helper loads under the supported Docker Node runtimes (Node 24 default, Node 22 compatible):


bash
pnpm test:docker:qr

OpenClaw Docs

Tests

Local PR gate

Model latency bench (local keys)

CLI startup bench

Onboarding E2E (Docker)

QR import smoke (Docker)

Related