Use this file to discover all available pages before exploring further.

CI pipeline

OpenClaw CI runs on every push to

text

main

and every pull request. The

text

preflight

job classifies the diff and turns expensive lanes off when only unrelated areas changed. Manual

text

workflow_dispatch

runs intentionally bypass smart scoping and fan out the full graph for release candidates and broad validation. Android lanes stay opt-in through

text

include_android

. Release-only plugin coverage lives in the separate

text

Plugin Prerelease

workflow and only runs from

text

Full Release Validation

or an explicit manual dispatch.

Pipeline overview

Job	Purpose	When it runs
text `preflight`	Detect docs-only changes, changed scopes, changed extensions, and build the CI manifest	Always on non-draft pushes and PRs
text `security-scm-fast`	Private key detection and workflow audit via text `zizmor`	Always on non-draft pushes and PRs
text `security-dependency-audit`	Dependency-free production lockfile audit against npm advisories	Always on non-draft pushes and PRs
text `security-fast`	Required aggregate for the fast security jobs	Always on non-draft pushes and PRs
text `check-dependencies`	Production Knip dependency-only pass plus the unused-file allowlist guard	Node-relevant changes
text `build-artifacts`	Build text `dist/` , Control UI, built-artifact checks, and reusable downstream artifacts	Node-relevant changes
text `checks-fast-core`	Fast Linux correctness lanes such as bundled/plugin-contract/protocol checks	Node-relevant changes
text `checks-fast-contracts-channels`	Sharded channel contract checks with a stable aggregate check result	Node-relevant changes
text `checks-node-core-test`	Core Node test shards, excluding channel, bundled, contract, and extension lanes	Node-relevant changes
text `check`	Sharded main local gate equivalent: prod types, lint, guards, test types, and strict smoke	Node-relevant changes
text `check-additional`	Architecture, boundary, extension-surface guards, package-boundary, and gateway-watch shards	Node-relevant changes
text `build-smoke`	Built-CLI smoke tests and startup-memory smoke	Node-relevant changes
text `checks`	Verifier for built-artifact channel tests	Node-relevant changes
text `checks-node-compat-node22`	Node 22 compatibility build and smoke lane	Manual CI dispatch for releases
text `check-docs`	Docs formatting, lint, and broken-link checks	Docs changed
text `skills-python`	Ruff + pytest for Python-backed skills	Python-skill-relevant changes
text `checks-windows`	Windows-specific process/path tests plus shared runtime import specifier regressions	Windows-relevant changes
text `macos-node`	macOS TypeScript test lane using the shared built artifacts	macOS-relevant changes
text `macos-swift`	Swift lint, build, and tests for the macOS app	macOS-relevant changes
text `android`	Android unit tests for both flavors plus one debug APK build	Android-relevant changes
text `test-performance-agent`	Daily Codex slow-test optimization after trusted activity	Main CI success or manual dispatch

Fail-fast order

text
preflight
decides which lanes exist at all. The
text
docs-scope
and
text
changed-scope
logic are steps inside this job, not standalone jobs.
text
security-scm-fast
,
text
security-dependency-audit
,
text
security-fast
,
text
check
,
text
check-additional
,
text
check-docs
, and
text
skills-python
fail quickly without waiting on the heavier artifact and platform matrix jobs.
text
build-artifacts
overlaps with the fast Linux lanes so downstream consumers can start as soon as the shared build is ready.
Heavier platform and runtime lanes fan out after that:
text
checks-fast-core
,
text
checks-fast-contracts-channels
,
text
checks-node-core-test
,
text
checks
,
text
checks-windows
,
text
macos-node
,
text
macos-swift
, and
text
android
.

GitHub may mark superseded jobs as

text

cancelled

when a newer push lands on the same PR or

text

main

ref. Treat that as CI noise unless the newest run for the same ref is also failing. Aggregate shard checks use

text

!cancelled() && always()

so they still report normal shard failures but do not queue after the whole workflow has already been superseded. The automatic CI concurrency key is versioned (

text

CI-v7-*

) so a GitHub-side zombie in an old queue group cannot indefinitely block newer main runs. Manual full-suite runs use

text

CI-manual-v1-*

and do not cancel in-progress runs.

Scope and routing

Scope logic lives in

text

scripts/ci-changed-scope.mjs

and is covered by unit tests in

text

src/scripts/ci-changed-scope.test.ts

. Manual dispatch skips changed-scope detection and makes the preflight manifest act as if every scoped area changed.

CI workflow edits validate the Node CI graph plus workflow linting, but do not force Windows, Android, or macOS native builds by themselves; those platform lanes stay scoped to platform source changes.
CI routing-only edits, selected cheap core-test fixture edits, and narrow plugin contract helper/test-routing edits use a fast Node-only manifest path:
text
preflight
, security, and a single
text
checks-fast-core
task. That path skips build artifacts, Node 22 compatibility, channel contracts, full core shards, bundled-plugin shards, and additional guard matrices when the change is limited to the routing or helper surfaces the fast task exercises directly.
Windows Node checks are scoped to Windows-specific process/path wrappers, npm/pnpm/UI runner helpers, package manager config, and the CI workflow surfaces that execute that lane; unrelated source, plugin, install-smoke, and test-only changes stay on the Linux Node lanes.

The slowest Node test families are split or balanced so each job stays small without over-reserving runners: channel contracts run as three weighted shards, small core unit lanes are paired, auto-reply runs as four balanced workers (with the reply subtree split into agent-runner, dispatch, and commands/state-routing shards), and agentic gateway/plugin configs are spread across the existing source-only agentic Node jobs instead of waiting on built artifacts. Broad browser, QA, media, and miscellaneous plugin tests use their dedicated Vitest configs instead of the shared plugin catch-all. Include-pattern shards record timing entries using the CI shard name, so

text

.artifacts/vitest-shard-timings.json

can distinguish a whole config from a filtered shard.

text

check-additional

keeps package-boundary compile/canary work together and separates runtime topology architecture from gateway watch coverage; the boundary guard shard runs its small independent guards concurrently inside one job. Gateway watch, channel tests, and the core support-boundary shard run concurrently inside

text

build-artifacts

after

text

dist/

and

text

dist-runtime/

are already built.

Android CI runs both

text

testPlayDebugUnitTest

and

text

testThirdPartyDebugUnitTest

and then builds the Play debug APK. The third-party flavor has no separate source set or manifest; its unit-test lane still compiles the flavor with the SMS/call-log BuildConfig flags, while avoiding a duplicate debug APK packaging job on every Android-relevant push.

The

text

check-dependencies

shard runs

text

pnpm deadcode:dependencies

(a production Knip dependency-only pass pinned to the latest Knip version, with pnpm's minimum release age disabled for the

text

dlx

install) and

text

pnpm deadcode:unused-files

, which compares Knip's production unused-file findings against

text

scripts/deadcode-unused-files.allowlist.mjs

. The unused-file guard fails when a PR adds a new unreviewed unused file or leaves a stale allowlist entry, while preserving intentional dynamic plugin, generated, build, live-test, and package bridge surfaces that Knip cannot resolve statically.

Manual dispatches

Manual CI dispatches run the same job graph as normal CI but force every non-Android scoped lane on: Linux Node shards, bundled-plugin shards, channel contracts, Node 22 compatibility,

text

check

text

check-additional

, build smoke, docs checks, Python skills, Windows, macOS, and Control UI i18n. Standalone manual CI dispatches run Android only with

text

include_android=true

; the full release umbrella enables Android by passing

text

include_android=true

. Plugin prerelease static checks, the release-only

text

agentic-plugins

shard, the full extension batch sweep, and plugin prerelease Docker lanes are excluded from CI. The Docker prerelease suite runs only when

text

Full Release Validation

dispatches the separate

text

Plugin Prerelease

workflow with the release-validation gate enabled.

Manual runs use a unique concurrency group so a release-candidate full suite is not cancelled by another push or PR run on the same ref. The optional

text

target_ref

input lets a trusted caller run that graph against a branch, tag, or full commit SHA while using the workflow file from the selected dispatch ref.


bash
gh workflow run ci.yml --ref release/YYYY.M.D
gh workflow run ci.yml --ref main -f target_ref=<branch-or-sha> -f include_android=true
gh workflow run full-release-validation.yml --ref main -f ref=<branch-or-sha>

Runners

Runner	Jobs
text `ubuntu-24.04`	text `preflight` , fast security jobs and aggregates ( text `security-scm-fast` , text `security-dependency-audit` , text `security-fast` ), fast protocol/contract/bundled checks, sharded channel contract checks, text `check` shards except lint, text `check-additional` shards and aggregates, Node test aggregate verifiers, docs checks, Python skills, workflow-sanity, labeler, auto-response; install-smoke preflight also uses GitHub-hosted Ubuntu so the Blacksmith matrix can queue earlier
text `blacksmith-4vcpu-ubuntu-2404`	text `CodeQL Critical Quality` , lower-weight extension shards, text `checks-fast-core` , text `checks-node-compat-node22` , text `check-prod-types` , and text `check-test-types`
text `blacksmith-8vcpu-ubuntu-2404`	text `build-artifacts` , build-smoke, Linux Node test shards, bundled plugin test shards, text `android`
text `blacksmith-16vcpu-ubuntu-2404`	text `check-lint` (CPU-sensitive enough that 8 vCPU cost more than they saved); install-smoke Docker builds (32-vCPU queue time cost more than it saved)
text `blacksmith-16vcpu-windows-2025`	text `checks-windows`
text `blacksmith-6vcpu-macos-latest`	text `macos-node` on text `openclaw/openclaw` ; forks fall back to text `macos-latest`
text `blacksmith-12vcpu-macos-latest`	text `macos-swift` on text `openclaw/openclaw` ; forks fall back to text `macos-latest`

Local equivalents


bash
pnpm changed:lanes                            # inspect the local changed-lane classifier for origin/main...HEAD
pnpm check:changed                            # smart local check gate: changed typecheck/lint/guards by boundary lane
pnpm check                                    # fast local gate: prod tsgo + sharded lint + parallel fast guards
pnpm check:test-types
pnpm check:timed                              # same gate with per-stage timings
pnpm build:strict-smoke
pnpm check:architecture
pnpm test:gateway:watch-regression
pnpm test                                     # vitest tests
pnpm test:changed                             # cheap smart changed Vitest targets
pnpm test:channels
pnpm test:contracts:channels
pnpm check:docs                               # docs format + lint + broken links
pnpm build                                    # build dist when CI artifact/build-smoke lanes matter
pnpm ci:timings                               # summarize the latest origin/main push CI run
pnpm ci:timings:recent                        # compare recent successful main CI runs
node scripts/ci-run-timings.mjs <run-id>      # summarize wall time, queue time, and slowest jobs
node scripts/ci-run-timings.mjs --latest-main # ignore issue/comment noise and choose origin/main push CI
node scripts/ci-run-timings.mjs --recent 10   # compare recent successful main CI runs
pnpm test:perf:groups --full-suite --allow-failures --output .artifacts/test-perf/baseline-before.json
pnpm test:perf:groups:compare .artifacts/test-perf/baseline-before.json .artifacts/test-perf/after-agent.json

Full Release Validation

text

Full Release Validation

is the manual umbrella workflow for "run everything before release." It accepts a branch, tag, or full commit SHA, dispatches the manual

text

CI

workflow with that target, dispatches

text

Plugin Prerelease

for release-only plugin/package/static/Docker proof, and dispatches

text

OpenClaw Release Checks

for install smoke, package acceptance, Docker release-path suites, live/E2E, OpenWebUI, QA Lab parity, Matrix, and Telegram lanes. It can also run the post-publish

text

NPM Telegram Beta E2E

workflow when a published package spec is provided.

See Full release validation for the stage matrix, exact workflow job names, profile differences, artifacts, and focused rerun handles.

text

release_profile

controls live/provider breadth passed into release checks. The manual release workflows default to

text

stable

; use

text

full

only when you intentionally want the broad advisory provider/media matrix.

text
minimum
keeps the fastest OpenAI/core release-critical lanes.
text
stable
adds the stable provider/backend set.
text
full
runs the broad advisory provider/media matrix.

The umbrella records the dispatched child run ids, and the final

text

Verify full validation

job re-checks current child run conclusions and appends slowest-job tables for each child run. If a child workflow is rerun and turns green, rerun only the parent verifier job to refresh the umbrella result and timing summary.

For recovery, both

text

Full Release Validation

and

text

OpenClaw Release Checks

text

rerun_group

. Use

text

all

for a release candidate,

text

ci

for only the normal full CI child,

text

plugin-prerelease

for only the plugin prerelease child,

text

release-checks

for every release child, or a narrower group:

text

install-smoke

text

cross-os

text

live-e2e

text

package

text

qa

text

qa-parity

text

qa-live

, or

text

npm-telegram

on the umbrella. This keeps a failed release box rerun bounded after a focused fix.

text

OpenClaw Release Checks

uses the trusted workflow ref to resolve the selected ref once into a

text

release-package-under-test

tarball, then passes that artifact to both the live/E2E release-path Docker workflow and the package acceptance shard. That keeps the package bytes consistent across release boxes and avoids repacking the same candidate in multiple child jobs.

Duplicate

text

Full Release Validation

runs for

text

ref=main

and

text

rerun_group=all

supersede the older umbrella. The parent monitor cancels any child workflow it has already dispatched when the parent is cancelled, so newer main validation does not sit behind a stale two-hour release-check run. Release branch/tag validation and focused rerun groups keep

text

cancel-in-progress: false

Live and E2E shards

The release live/E2E child keeps broad native

text

pnpm test:live

coverage, but it runs it as named shards through

text

scripts/test-live-shard.mjs

instead of one serial job:

text
native-live-src-agents
text
native-live-src-gateway-core
provider-filtered
text
native-live-src-gateway-profiles
jobs
text
native-live-src-gateway-backends
text
native-live-test
text
native-live-extensions-a-k
text
native-live-extensions-l-n
text
native-live-extensions-openai
text
native-live-extensions-o-z-other
text
native-live-extensions-xai
split media audio/video shards and provider-filtered music shards

That keeps the same file coverage while making slow live provider failures easier to rerun and diagnose. The aggregate

text

native-live-extensions-o-z

text

native-live-extensions-media

, and

text

native-live-extensions-media-music

shard names remain valid for manual one-shot reruns.

The native live media shards run in

text

ghcr.io/openclaw/openclaw-live-media-runner:ubuntu-24.04

, built by the

text

Live Media Runner Image

workflow. That image preinstalls

text

ffmpeg

and

text

ffprobe

; media jobs only verify the binaries before setup. Keep Docker-backed live suites on normal Blacksmith runners — container jobs are the wrong place to launch nested Docker tests.

Docker-backed live model/backend shards use a separate shared

text

ghcr.io/openclaw/openclaw-live-test:<sha>

image per selected commit. The live release workflow builds and pushes that image once, then the Docker live model, provider-sharded gateway, CLI backend, ACP bind, and Codex harness shards run with

text

OPENCLAW_SKIP_DOCKER_BUILD=1

. Gateway Docker shards carry explicit script-level

text

timeout

caps below the workflow job timeout so a stuck container or cleanup path fails fast instead of consuming the whole release-check budget. If those shards rebuild the full source Docker target independently, the release run is misconfigured and will waste wall clock on duplicate image builds.

Package Acceptance

Use

text

Package Acceptance

when the question is "does this installable OpenClaw package work as a product?" It is different from normal CI: normal CI validates the source tree, while package acceptance validates a single tarball through the same Docker E2E harness users exercise after install or update.

Jobs

text
resolve_package
checks out
text
workflow_ref
, resolves one package candidate, writes
text
.artifacts/docker-e2e-package/openclaw-current.tgz
, writes
text
.artifacts/docker-e2e-package/package-candidate.json
, uploads both as the
text
package-under-test
artifact, and prints the source, workflow ref, package ref, version, SHA-256, and profile in the GitHub step summary.
text
docker_acceptance
calls
text
openclaw-live-and-e2e-checks-reusable.yml
with
text
ref=workflow_ref
and
text
package_artifact_name=package-under-test
. The reusable workflow downloads that artifact, validates the tarball inventory, prepares package-digest Docker images when needed, and runs the selected Docker lanes against that package instead of packing the workflow checkout. When a profile selects multiple targeted
text
docker_lanes
, the reusable workflow prepares the package and shared images once, then fans those lanes out as parallel targeted Docker jobs with unique artifacts.
text
package_telegram
optionally calls
text
NPM Telegram Beta E2E
. It runs when
text
telegram_mode
is not
text
none
and installs the same
text
package-under-test
artifact when Package Acceptance resolved one; standalone Telegram dispatch can still install a published npm spec.
text
summary
fails the workflow if package resolution, Docker acceptance, or the optional Telegram lane failed.

Candidate sources

text
source=npm
accepts only
text
openclaw@beta
,
text
openclaw@latest
, or an exact OpenClaw release version such as
text
openclaw@2026.4.27-beta.2
. Use this for published beta/stable acceptance.
text
source=ref
packs a trusted
text
package_ref
branch, tag, or full commit SHA. The resolver fetches OpenClaw branches/tags, verifies the selected commit is reachable from repository branch history or a release tag, installs deps in a detached worktree, and packs it with
text
scripts/package-openclaw-for-docker.mjs
.
text
source=url
downloads an HTTPS
text
.tgz
;
text
package_sha256
is required.
text
source=artifact
downloads one
text
.tgz
from
text
artifact_run_id
and
text
artifact_name
;
text
package_sha256
is optional but should be supplied for externally shared artifacts.

Keep

text

workflow_ref

and

text

package_ref

separate.

text

workflow_ref

is the trusted workflow/harness code that runs the test.

text

package_ref

is the source commit that gets packed when

text

source=ref

. This lets the current test harness validate older trusted source commits without running old workflow logic.

Suite profiles

text
smoke
—
text
npm-onboard-channel-agent
,
text
gateway-network
,
text
config-reload
text
package
—
text
npm-onboard-channel-agent
,
text
doctor-switch
,
text
update-channel-switch
,
text
upgrade-survivor
,
text
published-upgrade-survivor
,
text
bundled-channel-deps-compat
,
text
plugins-offline
,
text
plugin-update
text
product
—
text
package
plus
text
mcp-channels
,
text
cron-mcp-cleanup
,
text
openai-web-search-minimal
,
text
openwebui
text
full
— full Docker release-path chunks with OpenWebUI
text
custom
— exact
text
docker_lanes
; required when
text
suite_profile=custom

The

text

package

profile uses offline plugin coverage so published-package validation is not gated on live ClawHub availability. The optional Telegram lane reuses the

text

package-under-test

artifact in

text

NPM Telegram Beta E2E

, with the published npm spec path kept for standalone dispatches.

Release checks call Package Acceptance with

text

source=ref

text

package_ref=<release-ref>

text

workflow_ref=<release workflow ref>

text

suite_profile=custom

text

docker_lanes='bundled-channel-deps-compat plugins-offline'

, and

text

telegram_mode=mock-openai

. Release-path Docker chunks cover the overlapping package/update/plugin lanes; Package Acceptance keeps the artifact-native bundled-channel compat, offline plugin, and Telegram proof against the same resolved package tarball. Cross-OS release checks still cover OS-specific onboarding, installer, and platform behavior; package/update product validation should start with Package Acceptance. The

text

published-upgrade-survivor

Docker lane validates one published package baseline per run. In Package Acceptance, the resolved

text

package-under-test

tarball is always the candidate and

text

published_upgrade_survivor_baseline

selects the published baseline, defaulting to

text

openclaw@latest

; failed-lane rerun commands preserve that baseline. Local runs can set

text

OPENCLAW_UPGRADE_SURVIVOR_BASELINE_SPEC

to an exact package such as

text

openclaw@2026.4.15

. The published lane configures the baseline with a baked

text

openclaw config set

command recipe, then records recipe steps in

text

summary.json

. Broader previous-version coverage should shard Package Acceptance across exact

text

published_upgrade_survivor_baseline

values. The Windows packaged and installer fresh lanes also verify that an installed package can import a browser-control override from a raw absolute Windows path. The OpenAI cross-OS agent-turn smoke defaults to

text

OPENCLAW_CROSS_OS_OPENAI_MODEL

when set, otherwise

text

openai/gpt-5.4-mini

, so the install and gateway proof stays fast and deterministic.

Legacy compatibility windows

Package Acceptance has bounded legacy-compatibility windows for already-published packages. Packages through

text

2026.4.25

, including

text

2026.4.25-beta.*

, may use the compatibility path:

known private QA entries in
text
dist/postinstall-inventory.json
may point at tarball-omitted files;
text
doctor-switch
may skip the
text
gateway install --wrapper
persistence subcase when the package does not expose that flag;
text
update-channel-switch
may prune missing
text
pnpm.patchedDependencies
from the tarball-derived fake git fixture and may log missing persisted
text
update.channel
;
plugin smokes may read legacy install-record locations or accept missing marketplace install-record persistence;
text
plugin-update
may allow config metadata migration while still requiring the install record and no-reinstall behavior to stay unchanged.

The published

text

2026.4.26

package may also warn for local build metadata stamp files that were already shipped. Later packages must satisfy the modern contracts; the same conditions fail instead of warn or skip.

Examples


bash
# Validate the current beta package with product-level coverage.
gh workflow run package-acceptance.yml \
  --ref main \
  -f workflow_ref=main \
  -f source=npm \
  -f package_spec=openclaw@beta \
  -f suite_profile=product \
  -f telegram_mode=mock-openai

# Pack and validate a release branch with the current harness.
gh workflow run package-acceptance.yml \
  --ref main \
  -f workflow_ref=main \
  -f source=ref \
  -f package_ref=release/YYYY.M.D \
  -f suite_profile=package \
  -f telegram_mode=mock-openai

# Validate a tarball URL. SHA-256 is mandatory for source=url.
gh workflow run package-acceptance.yml \
  --ref main \
  -f workflow_ref=main \
  -f source=url \
  -f package_url=https://example.com/openclaw-current.tgz \
  -f package_sha256=<64-char-sha256> \
  -f suite_profile=smoke

# Reuse a tarball uploaded by another Actions run.
gh workflow run package-acceptance.yml \
  --ref main \
  -f workflow_ref=main \
  -f source=artifact \
  -f artifact_run_id=<run-id> \
  -f artifact_name=package-under-test \
  -f suite_profile=custom \
  -f docker_lanes='install-e2e plugin-update'

When debugging a failed package acceptance run, start at the

text

resolve_package

summary to confirm the package source, version, and SHA-256. Then inspect the

text

docker_acceptance

child run and its Docker artifacts:

text

.artifacts/docker-tests/**/summary.json

text

failures.json

, lane logs, phase timings, and rerun commands. Prefer rerunning the failed package profile or exact Docker lanes instead of rerunning full release validation.

Install smoke

The separate

text

Install Smoke

workflow reuses the same scope script through its own

text

preflight

job. It splits smoke coverage into

text

run_fast_install_smoke

and

text

run_full_install_smoke

Fast path runs for pull requests touching Docker/package surfaces, bundled plugin package/manifest changes, or core plugin/channel/gateway/Plugin SDK surfaces that the Docker smoke jobs exercise. Source-only bundled plugin changes, test-only edits, and docs-only edits do not reserve Docker workers. The fast path builds the root Dockerfile image once, checks the CLI, runs the agents delete shared-workspace CLI smoke, runs the container gateway-network e2e, verifies a bundled extension build arg, and runs the bounded bundled-plugin Docker profile under a 240-second aggregate command timeout (each scenario's Docker run capped separately).
Full path keeps QR package install and installer Docker/update coverage for nightly scheduled runs, manual dispatches, workflow-call release checks, and pull requests that truly touch installer/package/Docker surfaces. In full mode, install-smoke prepares or reuses one target-SHA GHCR root Dockerfile smoke image, then runs QR package install, root Dockerfile/gateway smokes, installer/update smokes, and the fast bundled-plugin Docker E2E as separate jobs so installer work does not wait behind the root image smokes.

text

main

pushes (including merge commits) do not force the full path; when changed-scope logic would request full coverage on a push, the workflow keeps the fast Docker smoke and leaves the full install smoke to nightly or release validation.

The slow Bun global install image-provider smoke is separately gated by

text

run_bun_global_install_smoke

. It runs on the nightly schedule and from the release checks workflow, and manual

text

Install Smoke

dispatches can opt into it, but pull requests and

text

main

pushes do not. QR and installer Docker tests keep their own install-focused Dockerfiles.

Local Docker E2E

text

pnpm test:docker:all

prebuilds one shared live-test image, packs OpenClaw once as an npm tarball, and builds two shared

text

scripts/e2e/Dockerfile

images:

a bare Node/Git runner for installer/update/plugin-dependency lanes;
a functional image that installs the same tarball into
text
/app
for normal functionality lanes.

Docker lane definitions live in

text

scripts/lib/docker-e2e-scenarios.mjs

, planner logic lives in

text

scripts/lib/docker-e2e-plan.mjs

, and the runner only executes the selected plan. The scheduler selects the image per lane with

text

OPENCLAW_DOCKER_E2E_BARE_IMAGE

and

text

OPENCLAW_DOCKER_E2E_FUNCTIONAL_IMAGE

, then runs lanes with

text

OPENCLAW_SKIP_DOCKER_BUILD=1

Tunables

Variable	Default	Purpose
text `OPENCLAW_DOCKER_ALL_PARALLELISM`	10	Main-pool slot count for normal lanes.
text `OPENCLAW_DOCKER_ALL_TAIL_PARALLELISM`	10	Provider-sensitive tail-pool slot count.
text `OPENCLAW_DOCKER_ALL_LIVE_LIMIT`	9	Concurrent live lane cap so providers do not throttle.
text `OPENCLAW_DOCKER_ALL_NPM_LIMIT`	10	Concurrent npm install lane cap.
text `OPENCLAW_DOCKER_ALL_SERVICE_LIMIT`	7	Concurrent multi-service lane cap.
text `OPENCLAW_DOCKER_ALL_START_STAGGER_MS`	2000	Stagger between lane starts to avoid Docker daemon create storms; set text `0` for no stagger.
text `OPENCLAW_DOCKER_ALL_LANE_TIMEOUT_MS`	7200000	Per-lane fallback timeout (120 minutes); selected live/tail lanes use tighter caps.
text `OPENCLAW_DOCKER_ALL_DRY_RUN`	unset	text `1` prints the scheduler plan without running lanes.
text `OPENCLAW_DOCKER_ALL_LANES`	unset	Comma-separated exact lane list; skips cleanup smoke so agents can reproduce one failed lane.

A lane heavier than its effective cap can still start from an empty pool, then runs alone until it releases capacity. The local aggregate preflights Docker, removes stale OpenClaw E2E containers, emits active-lane status, persists lane timings for longest-first ordering, and stops scheduling new pooled lanes after the first failure by default.

Reusable live/E2E workflow

The reusable live/E2E workflow asks

text

scripts/test-docker-all.mjs --plan-json

which package, image kind, live image, lane, and credential coverage is required.

text

scripts/docker-e2e.mjs

then converts that plan into GitHub outputs and summaries. It either packs OpenClaw through

text

scripts/package-openclaw-for-docker.mjs

, downloads a current-run package artifact, or downloads a package artifact from

text

package_artifact_run_id

; validates the tarball inventory; builds and pushes package-digest-tagged bare/functional GHCR Docker E2E images through Blacksmith's Docker layer cache when the plan needs package-installed lanes; and reuses provided

text

docker_e2e_bare_image

text

docker_e2e_functional_image

inputs or existing package-digest images instead of rebuilding. Docker image pulls are retried with a bounded 180-second per-attempt timeout so a stuck registry/cache stream retries quickly instead of consuming most of the CI critical path.

Release-path chunks

Release Docker coverage runs smaller chunked jobs with

text

OPENCLAW_SKIP_DOCKER_BUILD=1

so each chunk pulls only the image kind it needs and executes multiple lanes through the same weighted scheduler:

text
OPENCLAW_DOCKER_ALL_PROFILE=release-path
text
OPENCLAW_DOCKER_ALL_CHUNK=core | package-update-openai | package-update-anthropic | package-update-core | plugins-runtime-plugins | plugins-runtime-services | plugins-runtime-install-a..h | bundled-channels

Current release Docker chunks are

text

core

text

package-update-openai

text

package-update-anthropic

text

package-update-core

text

plugins-runtime-plugins

text

plugins-runtime-services

text

plugins-runtime-install-a

through

text

plugins-runtime-install-h

text

bundled-channels-core

text

bundled-channels-update-a

text

bundled-channels-update-discord

text

bundled-channels-update-b

, and

text

bundled-channels-contracts

. The aggregate

text

bundled-channels

chunk remains available for manual one-shot reruns, and

text

plugins-runtime-core

text

plugins-runtime

, and

text

plugins-integrations

remain aggregate plugin/runtime aliases. The

text

install-e2e

lane alias remains the aggregate manual rerun alias for both provider installer lanes. The

text

bundled-channels

chunk runs split

text

bundled-channel-*

and

text

bundled-channel-update-*

lanes rather than the serial all-in-one

text

bundled-channel-deps

lane.

OpenWebUI is folded into

text

plugins-runtime-services

when full release-path coverage requests it, and keeps a standalone

text

openwebui

chunk only for OpenWebUI-only dispatches. Bundled-channel update lanes retry once for transient npm network failures.

Each chunk uploads

text

.artifacts/docker-tests/

with lane logs, timings,

text

summary.json

text

failures.json

, phase timings, scheduler plan JSON, slow-lane tables, and per-lane rerun commands. The workflow

text

docker_lanes

input runs selected lanes against the prepared images instead of the chunk jobs, which keeps failed-lane debugging bounded to one targeted Docker job and prepares, downloads, or reuses the package artifact for that run; if a selected lane is a live Docker lane, the targeted job builds the live-test image locally for that rerun. Generated per-lane GitHub rerun commands include

text

package_artifact_run_id

text

package_artifact_name

, and prepared image inputs when those values exist, so a failed lane can reuse the exact package and images from the failed run.


bash
pnpm test:docker:rerun <run-id>      # download Docker artifacts and print combined/per-lane targeted rerun commands
pnpm test:docker:timings <summary>   # slow-lane and phase critical-path summaries

The scheduled live/E2E workflow runs the full release-path Docker suite daily.

Plugin Prerelease

text

Plugin Prerelease

is more expensive product/package coverage, so it is a separate workflow dispatched by

text

Full Release Validation

or by an explicit operator. Normal pull requests,

text

main

pushes, and standalone manual CI dispatches keep that suite off. It balances bundled plugin tests across eight extension workers; those extension shard jobs run up to two plugin config groups at a time with one Vitest worker per group and a larger Node heap so import-heavy plugin batches do not create extra CI jobs. The release-only Docker prerelease path batches targeted Docker lanes in small groups to avoid reserving dozens of runners for one-to-three-minute jobs.

QA Lab

QA Lab has dedicated CI lanes outside the main smart-scoped workflow.

The
text
Parity gate
workflow runs on matching PR changes and manual dispatch; it builds the private QA runtime and compares the mock GPT-5.5 and Opus 4.6 agentic packs.
The
text
QA-Lab - All Lanes
workflow runs nightly on
text
main
and on manual dispatch; it fans out the mock parity gate, live Matrix lane, and live Telegram and Discord lanes as parallel jobs. Live jobs use the
text
qa-live-shared
environment, and Telegram/Discord use Convex leases.

Release checks run Matrix and Telegram live transport lanes with the deterministic mock provider and mock-qualified models (

text

mock-openai/gpt-5.5

and

text

mock-openai/gpt-5.5-alt

) so the channel contract is isolated from live model latency and normal provider-plugin startup. The live transport gateway disables memory search because QA parity covers memory behavior separately; provider connectivity is covered by the separate live model, native provider, and Docker provider suites.

Matrix uses

text

--profile fast

for scheduled and release gates, adding

text

--fail-fast

only when the checked-out CLI supports it. The CLI default and manual workflow input remain

text

all

; manual

text

matrix_profile=all

dispatch always shards full Matrix coverage into

text

transport

text

media

text

e2ee-smoke

text

e2ee-deep

, and

text

e2ee-cli

jobs.

text

OpenClaw Release Checks

also runs the release-critical QA Lab lanes before release approval; its QA parity gate runs the candidate and baseline packs as parallel lane jobs, then downloads both artifacts into a small report job for the final parity comparison.

Do not put the PR landing path behind

text

Parity gate

unless the change actually touches QA runtime, model-pack parity, or a surface the parity workflow owns. For normal channel, config, docs, or unit-test fixes, treat it as an optional signal and follow the scoped CI/check evidence instead.

CodeQL

The

text

CodeQL

workflow is intentionally a narrow first-pass security scanner, not the full repository sweep. Daily, manual, and non-draft pull request guard runs scan Actions workflow code plus the highest-risk JavaScript/TypeScript surfaces with high-confidence security queries filtered to high/critical

text

security-severity

The pull request guard stays light: it only starts for changes under

text

.github/actions

text

.github/codeql

text

.github/workflows

text

packages

, or

text

src

, and it runs the same high-confidence security matrix as the scheduled workflow. Android and macOS CodeQL stay out of PR defaults.

Security categories

Category	Surface
text `/codeql-security-high/core-auth-secrets`	Auth, secrets, sandbox, cron, and gateway baseline
text `/codeql-security-high/channel-runtime-boundary`	Core channel implementation contracts plus the channel plugin runtime, gateway, Plugin SDK, secrets, audit touchpoints
text `/codeql-security-high/network-ssrf-boundary`	Core SSRF, IP parsing, network guard, web-fetch, and Plugin SDK SSRF policy surfaces
text `/codeql-security-high/mcp-process-tool-boundary`	MCP servers, process execution helpers, outbound delivery, and agent tool-execution gates
text `/codeql-security-high/plugin-trust-boundary`	Plugin install, loader, manifest, registry, runtime-dependency staging, source-loading, and Plugin SDK package contract trust surfaces

Platform-specific security shards

text
CodeQL Android Critical Security
— scheduled Android security shard. Builds the Android app manually for CodeQL on the smallest Blacksmith Linux runner accepted by workflow sanity. Uploads under
text
/codeql-critical-security/android
.
text
CodeQL macOS Critical Security
— weekly/manual macOS security shard. Builds the macOS app manually for CodeQL on Blacksmith macOS, filters dependency build results out of uploaded SARIF, and uploads under
text
/codeql-critical-security/macos
. Kept outside daily defaults because macOS build dominates runtime even when clean.

Critical Quality categories

text

CodeQL Critical Quality

is the matching non-security shard. It runs only error-severity, non-security JavaScript/TypeScript quality queries over narrow high-value surfaces on the smaller Blacksmith Linux runner. Its pull request guard is intentionally smaller than the scheduled profile: non-draft PRs only run the matching

text

agent-runtime-boundary

text

config-boundary

text

core-auth-secrets

text

channel-runtime-boundary

text

gateway-runtime-boundary

text

memory-runtime-boundary

text

mcp-process-runtime-boundary

text

provider-runtime-boundary

text

session-diagnostics-boundary

text

plugin-boundary

text

plugin-sdk-package-contract

, and

text

plugin-sdk-reply-runtime

shards for agent command/model/tool execution and reply dispatch code, config schema/migration/IO code, auth/secrets/sandbox/security code, core channel and bundled channel plugin runtime, gateway protocol/server-method, memory runtime/SDK glue, MCP/process/outbound delivery, provider runtime/model catalog, session diagnostics/delivery queues, plugin loader, Plugin SDK/package-contract, or Plugin SDK reply runtime changes. CodeQL config and quality workflow changes run all twelve PR quality shards.

Manual dispatch accepts:


text
profile=all|agent-runtime-boundary|config-boundary|core-auth-secrets|channel-runtime-boundary|gateway-runtime-boundary|memory-runtime-boundary|mcp-process-runtime-boundary|plugin-boundary|plugin-sdk-package-contract|plugin-sdk-reply-runtime|provider-runtime-boundary|session-diagnostics-boundary

The narrow profiles are teaching/iteration hooks for running one quality shard in isolation.

Category	Surface
text `/codeql-critical-quality/core-auth-secrets`	Auth, secrets, sandbox, cron, and gateway security boundary code
text `/codeql-critical-quality/config-boundary`	Config schema, migration, normalization, and IO contracts
text `/codeql-critical-quality/gateway-runtime-boundary`	Gateway protocol schemas and server method contracts
text `/codeql-critical-quality/channel-runtime-boundary`	Core channel and bundled channel plugin implementation contracts
text `/codeql-critical-quality/agent-runtime-boundary`	Command execution, model/provider dispatch, auto-reply dispatch and queues, and ACP control-plane runtime contracts
text `/codeql-critical-quality/mcp-process-runtime-boundary`	MCP servers and tool bridges, process supervision helpers, and outbound delivery contracts
text `/codeql-critical-quality/memory-runtime-boundary`	Memory host SDK, memory runtime facades, memory Plugin SDK aliases, memory runtime activation glue, and memory doctor commands
text `/codeql-critical-quality/session-diagnostics-boundary`	Reply queue internals, session delivery queues, outbound session binding/delivery helpers, diagnostic event/log bundle surfaces, and session doctor CLI contracts
text `/codeql-critical-quality/plugin-sdk-reply-runtime`	Plugin SDK inbound reply dispatch, reply payload/chunking/runtime helpers, channel reply options, delivery queues, and session/thread binding helpers
text `/codeql-critical-quality/provider-runtime-boundary`	Model catalog normalization, provider auth and discovery, provider runtime registration, provider defaults/catalogs, and web/search/fetch/embedding registries
text `/codeql-critical-quality/ui-control-plane`	Control UI bootstrap, local persistence, gateway control flows, and task control-plane runtime contracts
text `/codeql-critical-quality/web-media-runtime-boundary`	Core web fetch/search, media IO, media understanding, image-generation, and media-generation runtime contracts
text `/codeql-critical-quality/plugin-boundary`	Loader, registry, public-surface, and Plugin SDK entrypoint contracts
text `/codeql-critical-quality/plugin-sdk-package-contract`	Published package-side Plugin SDK source and plugin package contract helpers

Quality stays separate from security so quality findings can be scheduled, measured, disabled, or expanded without obscuring security signal. Swift, Python, and bundled-plugin CodeQL expansion should be added back as scoped or sharded follow-up work only after the narrow profiles have stable runtime and signal.

Maintenance workflows

Docs Agent

The

text

Docs Agent

workflow is an event-driven Codex maintenance lane for keeping existing docs aligned with recently landed changes. It has no pure schedule: a successful non-bot push CI run on

text

main

can trigger it, and manual dispatch can run it directly. Workflow-run invocations skip when

text

main

has moved on or when another non-skipped Docs Agent run was created in the last hour. When it runs, it reviews the commit range from the previous non-skipped Docs Agent source SHA to current

text

main

, so one hourly run can cover all main changes accumulated since the last docs pass.

Test Performance Agent

The

text

Test Performance Agent

workflow is an event-driven Codex maintenance lane for slow tests. It has no pure schedule: a successful non-bot push CI run on

text

main

can trigger it, but it skips if another workflow-run invocation already ran or is running that UTC day. Manual dispatch bypasses that daily activity gate. The lane builds a full-suite grouped Vitest performance report, lets Codex make only small coverage-preserving test performance fixes instead of broad refactors, then reruns the full-suite report and rejects changes that reduce the passing baseline test count. If the baseline has failing tests, Codex may fix only obvious failures and the after-agent full-suite report must pass before anything is committed. When

text

main

advances before the bot push lands, the lane rebases the validated patch, reruns

text

pnpm check:changed

, and retries the push; conflicting stale patches are skipped. It uses GitHub-hosted Ubuntu so the Codex action can keep the same drop-sudo safety posture as the docs agent.

Duplicate PRs After Merge

The

text

Duplicate PRs After Merge

workflow is a manual maintainer workflow for post-land duplicate cleanup. It defaults to dry-run and only closes explicitly listed PRs when

text

apply=true

. Before mutating GitHub, it verifies that the landed PR is merged and that each duplicate has either a shared referenced issue or overlapping changed hunks.


bash
gh workflow run duplicate-after-merge.yml \
  -f landed_pr=70532 \
  -f duplicate_prs='70530,70592' \
  -f apply=true

Local check gates and changed routing

Local changed-lane logic lives in

text

scripts/changed-lanes.mjs

and is executed by

text

scripts/check-changed.mjs

. That local check gate is stricter about architecture boundaries than the broad CI platform scope:

core production changes run core prod and core test typecheck plus core lint/guards;
core test-only changes run only core test typecheck plus core lint;
extension production changes run extension prod and extension test typecheck plus extension lint;
extension test-only changes run extension test typecheck plus extension lint;
public Plugin SDK or plugin-contract changes expand to extension typecheck because extensions depend on those core contracts (Vitest extension sweeps stay explicit test work);
release metadata-only version bumps run targeted version/config/root-dependency checks;
unknown root/config changes fail safe to all check lanes.

Local changed-test routing lives in

text

scripts/test-projects.test-support.mjs

and is intentionally cheaper than

text

check:changed

: direct test edits run themselves, source edits prefer explicit mappings, then sibling tests and import-graph dependents. Shared group-room delivery config is one of the explicit mappings: changes to the group visible-reply config, source reply delivery mode, or the message-tool system prompt route through the core reply tests plus Discord and Slack delivery regressions so a shared default change fails before the first PR push. Use

text

OPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed

only when the change is harness-wide enough that the cheap mapped set is not a trustworthy proxy.

Testbox validation

Run Testbox from the repo root and prefer a fresh warmed box for broad proof. Before spending a slow gate on a box that was reused, expired, or just reported an unexpectedly large sync, run

text

pnpm testbox:sanity

inside the box first.

The sanity check fails fast when required root files such as

text

pnpm-lock.yaml

disappeared or when

text

git status --short

shows at least 200 tracked deletions. That usually means the remote sync state is not a trustworthy copy of the PR; stop that box and warm a fresh one instead of debugging the product test failure. For intentional large-deletion PRs, set

text

OPENCLAW_TESTBOX_ALLOW_MASS_DELETIONS=1

for that sanity run.

text

pnpm testbox:run

also terminates a local Blacksmith CLI invocation that stays in the sync phase for more than five minutes without post-sync output. Set

text

OPENCLAW_TESTBOX_SYNC_TIMEOUT_MS=0

to disable that guard, or use a larger millisecond value for unusually large local diffs.

OpenClaw Docs

CI pipeline

Pipeline overview

Fail-fast order

Scope and routing

Manual dispatches

Runners

Local equivalents

Full Release Validation

Live and E2E shards

Package Acceptance

Jobs

Candidate sources

Suite profiles

Legacy compatibility windows

Examples

Install smoke

Local Docker E2E

Tunables

Reusable live/E2E workflow

Release-path chunks

Plugin Prerelease

QA Lab

CodeQL

Security categories

Platform-specific security shards

Critical Quality categories

Maintenance workflows

Docs Agent

Test Performance Agent

Duplicate PRs After Merge

Local check gates and changed routing

Testbox validation

Related