TaskFlow
DashboardFreewriteWhiteboardsProjectsCRMTasksNotificationsSettingsAgent TowerAPI Docs
OpenClaw Docs
?

User

Member

Caricamento in corso...

Home
Progetti
Task
Notifiche
CRM

    OpenClaw

    Documentation Mirror

    Documentation Overview

    Docs

    Auth credential semantics
    Scheduled tasks
    Hooks
    Automation & tasks
    Standing orders
    Task flow
    Background tasks
    BlueBubbles
    Broadcast groups
    Channel routing
    Discord
    Feishu
    Google Chat
    Group messages
    Groups
    iMessage
    Chat channels
    IRC
    LINE
    Channel location parsing
    Matrix
    Matrix migration
    Matrix push rules for quiet previews
    Mattermost
    Microsoft Teams
    Nextcloud Talk
    Nostr
    Pairing
    QA channel
    QQ bot
    Signal
    Slack
    Synology Chat
    Telegram
    Tlon
    Channel troubleshooting
    Twitch
    WeChat
    WhatsApp
    Yuanbao
    Zalo
    Zalo personal
    CI pipeline
    ACP
    Agent
    Agents
    Approvals
    Backup
    Browser
    Channels
    Clawbot
    `openclaw commitments`
    Completion
    Config
    Configure
    Cron
    Daemon
    Dashboard
    Devices
    Directory
    DNS
    Docs
    Doctor
    Flows (redirect)
    Gateway
    Health
    Hooks
    CLI reference
    Inference CLI
    Logs
    MCP
    Memory
    Message
    Migrate
    Models
    Node
    Nodes
    Onboard
    Pairing
    Plugins
    Proxy
    QR
    Reset
    Sandbox CLI
    Secrets
    Security
    Sessions
    Setup
    Skills
    Status
    System
    `openclaw tasks`
    TUI
    Uninstall
    Update
    Voicecall
    Webhooks
    Wiki
    Active memory
    Agent runtime
    Agent loop
    Agent runtimes
    Agent workspace
    Gateway architecture
    Channel docking
    Inferred commitments
    Compaction
    Context
    Context engine
    Delegate architecture
    Dreaming
    Experimental features
    Features
    Markdown formatting
    Memory overview
    Builtin memory engine
    Honcho memory
    QMD memory engine
    Memory search
    Messages
    Model failover
    Model providers
    Models CLI
    Multi-agent routing
    OAuth
    OpenClaw App SDK
    Presence
    QA overview
    Matrix QA
    Command queue
    Steering queue
    Retry policy
    Session management
    Session pruning
    Session tools
    SOUL.md personality guide
    Streaming and chunking
    System prompt
    Timezones
    TypeBox
    Typing indicators
    Usage tracking
    Date and time
    Node + tsx crash
    Diagnostics flags
    Authentication
    Background exec and process tool
    Bonjour discovery
    Bridge protocol
    CLI backends
    Configuration — agents
    Configuration — channels
    Configuration — tools and custom providers
    Configuration
    Configuration examples
    Configuration reference
    Diagnostics export
    Discovery and transports
    Doctor
    Gateway lock
    Health checks
    Heartbeat
    Gateway runbook
    Local models
    Gateway logging
    Multiple gateways
    Network model
    OpenAI chat completions
    OpenResponses API
    OpenShell
    OpenTelemetry export
    Gateway-owned pairing
    Prometheus metrics
    Gateway protocol
    Remote access
    Remote gateway setup
    Sandbox vs tool policy vs elevated
    Sandboxing
    Secrets management
    Secrets apply plan contract
    Security audit checks
    Security
    Tailscale
    Tools invoke API
    Troubleshooting
    Trusted proxy auth
    Debugging
    Environment variables
    FAQ
    FAQ: first-run setup
    FAQ: models and auth
    GPT-5.5 / Codex agentic parity
    GPT-5.5 / Codex parity maintainer notes
    Help
    Scripts
    Testing
    Testing: live suites
    General troubleshooting
    OpenClaw
    Ansible
    Azure
    Bun (experimental)
    ClawDock
    Release channels
    DigitalOcean
    Docker
    Docker VM runtime
    exe.dev
    Fly.io
    GCP
    Hetzner
    Hostinger
    Install
    Installer internals
    Kubernetes
    macOS VMs
    Migration guide
    Migrating from Claude
    Migrating from Hermes
    Nix
    Node.js
    Northflank
    Oracle Cloud
    Podman
    Railway
    Raspberry Pi
    Render
    Uninstall
    Updating
    Logging
    Network
    Audio and voice notes
    Camera capture
    Image and media support
    Nodes
    Location command
    Media understanding
    Talk mode
    Node troubleshooting
    Voice wake
    Pi integration architecture
    Pi development workflow
    Android app
    Platforms
    iOS app
    Linux app
    Gateway on macOS
    Canvas
    Gateway lifecycle
    macOS dev setup
    Health checks (macOS)
    Menu bar icon
    macOS logging
    Menu bar
    Peekaboo bridge
    macOS permissions
    Remote control
    macOS signing
    Skills (macOS)
    Voice overlay
    Voice wake (macOS)
    WebChat (macOS)
    macOS IPC
    macOS app
    Windows
    Plugin internals
    Plugin architecture internals
    Building plugins
    Plugin bundles
    Codex Computer Use
    Codex harness
    Community plugins
    Plugin compatibility
    Google Meet plugin
    Plugin hooks
    Plugin manifest
    Memory LanceDB
    Memory wiki
    Message presentation
    Agent harness plugins
    Building channel plugins
    Channel turn kernel
    Plugin entry points
    Plugin SDK migration
    Plugin SDK overview
    Building provider plugins
    Plugin runtime helpers
    Plugin setup and config
    Plugin SDK subpaths
    Plugin testing
    Skill workshop plugin
    Voice call plugin
    Webhooks plugin
    Zalo personal plugin
    OpenProse
    Alibaba Model Studio
    Anthropic
    Arcee AI
    Azure Speech
    Amazon Bedrock
    Amazon Bedrock Mantle
    Chutes
    Claude Max API proxy
    Cloudflare AI gateway
    ComfyUI
    Deepgram
    Deepinfra
    DeepSeek
    ElevenLabs
    Fal
    Fireworks
    GitHub Copilot
    GLM (Zhipu)
    Google (Gemini)
    Gradium
    Groq
    Hugging Face (inference)
    Provider directory
    Inferrs
    Inworld
    Kilocode
    LiteLLM
    LM Studio
    MiniMax
    Mistral
    Model provider quickstart
    Moonshot AI
    NVIDIA
    Ollama
    OpenAI
    OpenCode
    OpenCode Go
    OpenRouter
    Perplexity
    Qianfan
    Qwen
    Runway
    SGLang
    StepFun
    Synthetic
    Tencent Cloud (TokenHub)
    Together AI
    Venice AI
    Vercel AI gateway
    vLLM
    Volcengine (Doubao)
    Vydra
    xAI
    Xiaomi MiMo
    Z.AI
    Default AGENTS.md
    Release policy
    API usage and costs
    Credits
    Device model database
    Full release validation
    Memory configuration reference
    OpenClaw App SDK API design
    Prompt caching
    Rich output protocol
    RPC adapters
    SecretRef credential surface
    Session management deep dive
    AGENTS.md template
    BOOT.md template
    BOOTSTRAP.md template
    HEARTBEAT.md template
    IDENTITY template
    SOUL.md template
    TOOLS.md template
    USER template
    Tests
    Token use and costs
    Transcript hygiene
    Onboarding reference
    Contributing to the threat model
    Threat model (MITRE ATLAS)
    Formal verification (security models)
    Network proxy
    Agent bootstrapping
    Docs directory
    Getting started
    Docs hubs
    OpenClaw lore
    Onboarding (macOS app)
    Onboarding overview
    Personal assistant setup
    Setup
    Showcase
    Onboarding (CLI)
    CLI automation
    CLI setup reference
    ACP agents
    ACP agents — setup
    Agent send
    apply_patch tool
    Brave search
    Browser (OpenClaw-managed)
    Browser control API
    Browser troubleshooting
    Browser login
    WSL2 + Windows + remote Chrome CDP troubleshooting
    BTW side questions
    ClawHub
    Code execution
    Creating skills
    Diffs
    DuckDuckGo search
    Elevated mode
    Exa search
    Exec tool
    Exec approvals
    Exec approvals — advanced
    Firecrawl
    Gemini search
    Grok search
    Image generation
    Tools and plugins
    Kimi search
    LLM task
    Lobster
    Tool-loop detection
    Media overview
    MiniMax search
    Multi-agent sandbox and tools
    Music generation
    Ollama web search
    PDF tool
    Perplexity search
    Plugins
    Reactions
    SearXNG search
    Skills
    Skills config
    Slash commands
    Sub-agents
    Tavily
    Thinking levels
    Tokenjuice
    Trajectory bundles
    Text-to-speech
    Video generation
    Web search
    Web fetch
    Linux server
    Control UI
    Dashboard
    Web
    TUI
    WebChat

    OpenAPI Specs

    openapi
    TaskFlow
    docs/openclaw
    Original Docs

    Real-time Synchronized Documentation

    Last sync: 01/05/2026 07:00:09

    Note: This content is mirrored from docs.openclaw.ai and is subject to their terms and conditions.

    OpenClaw Docs

    v2.4.0 Production

    Last synced: Today, 22:00

    Technical reference for the OpenClaw framework. Real-time synchronization with the official documentation engine.

    Use this file to discover all available pages before exploring further.

    Testing

    OpenClaw has three Vitest suites (unit/integration, e2e, live) and a small set of Docker runners. This doc is a "how we test" guide:

    • What each suite covers (and what it deliberately does not cover).
    • Which commands to run for common workflows (local, pre-push, debugging).
    • How live tests discover credentials and select models/providers.
    • How to add regressions for real-world model/provider issues.

    note

    **QA stack (qa-lab, qa-channel, live transport lanes)** is documented separately:
    • QA overview — architecture, command surface, scenario authoring.
    • Matrix QA — reference for
      text
      pnpm openclaw qa matrix
      .
    • QA channel — the synthetic transport plugin used by repo-backed scenarios.

    This page covers running the regular test suites and Docker/Parallels runners. The QA-specific runners section below (QA-specific runners) lists the concrete

    text
    qa
    invocations and points back at the references above.

    Quick start

    Most days:

    • Full gate (expected before push):
      text
      pnpm build && pnpm check && pnpm check:test-types && pnpm test
    • Faster local full-suite run on a roomy machine:
      text
      pnpm test:max
    • Direct Vitest watch loop:
      text
      pnpm test:watch
    • Direct file targeting now routes extension/channel paths too:
      text
      pnpm test extensions/discord/src/monitor/message-handler.preflight.test.ts
    • Prefer targeted runs first when you are iterating on a single failure.
    • Docker-backed QA site:
      text
      pnpm qa:lab:up
    • Linux VM-backed QA lane:
      text
      pnpm openclaw qa suite --runner multipass --scenario channel-chat-baseline

    When you touch tests or want extra confidence:

    • Coverage gate:
      text
      pnpm test:coverage
    • E2E suite:
      text
      pnpm test:e2e

    When debugging real providers/models (requires real creds):

    • Live suite (models + gateway tool/image probes):
      text
      pnpm test:live
    • Target one live file quietly:
      text
      pnpm test:live -- src/agents/models.profiles.live.test.ts
    • Docker live model sweep:
      text
      pnpm test:docker:live-models
      • Each selected model now runs a text turn plus a small file-read-style probe. Models whose metadata advertises
        text
        image
        input also run a tiny image turn. Disable the extra probes with
        text
        OPENCLAW_LIVE_MODEL_FILE_PROBE=0
        or
        text
        OPENCLAW_LIVE_MODEL_IMAGE_PROBE=0
        when isolating provider failures.
      • CI coverage: daily
        text
        OpenClaw Scheduled Live And E2E Checks
        and manual
        text
        OpenClaw Release Checks
        both call the reusable live/E2E workflow with
        text
        include_live_suites: true
        , which includes separate Docker live model matrix jobs sharded by provider.
      • For focused CI reruns, dispatch
        text
        OpenClaw Live And E2E Checks (Reusable)
        with
        text
        include_live_suites: true
        and
        text
        live_models_only: true
        .
      • Add new high-signal provider secrets to
        text
        scripts/ci-hydrate-live-auth.sh
        plus
        text
        .github/workflows/openclaw-live-and-e2e-checks-reusable.yml
        and its scheduled/release callers.
    • Native Codex bound-chat smoke:
      text
      pnpm test:docker:live-codex-bind
      • Runs a Docker live lane against the Codex app-server path, binds a synthetic Slack DM with
        text
        /codex bind
        , exercises
        text
        /codex fast
        and
        text
        /codex permissions
        , then verifies a plain reply and an image attachment route through the native plugin binding instead of ACP.
    • Codex app-server harness smoke:
      text
      pnpm test:docker:live-codex-harness
      • Runs gateway agent turns through the plugin-owned Codex app-server harness, verifies
        text
        /codex status
        and
        text
        /codex models
        , and by default exercises image, cron MCP, sub-agent, and Guardian probes. Disable the sub-agent probe with
        text
        OPENCLAW_LIVE_CODEX_HARNESS_SUBAGENT_PROBE=0
        when isolating other Codex app-server failures. For a focused sub-agent check, disable the other probes:
        text
        OPENCLAW_LIVE_CODEX_HARNESS_IMAGE_PROBE=0 OPENCLAW_LIVE_CODEX_HARNESS_MCP_PROBE=0 OPENCLAW_LIVE_CODEX_HARNESS_GUARDIAN_PROBE=0 OPENCLAW_LIVE_CODEX_HARNESS_SUBAGENT_PROBE=1 pnpm test:docker:live-codex-harness
        . This exits after the sub-agent probe unless
        text
        OPENCLAW_LIVE_CODEX_HARNESS_SUBAGENT_ONLY=0
        is set.
    • Crestodian rescue command smoke:
      text
      pnpm test:live:crestodian-rescue-channel
      • Opt-in belt-and-suspenders check for the message-channel rescue command surface. It exercises
        text
        /crestodian status
        , queues a persistent model change, replies
        text
        /crestodian yes
        , and verifies the audit/config write path.
    • Crestodian planner Docker smoke:
      text
      pnpm test:docker:crestodian-planner
      • Runs Crestodian in a configless container with a fake Claude CLI on
        text
        PATH
        and verifies the fuzzy planner fallback translates into an audited typed config write.
    • Crestodian first-run Docker smoke:
      text
      pnpm test:docker:crestodian-first-run
      • Starts from an empty OpenClaw state dir, routes bare
        text
        openclaw
        to Crestodian, applies setup/model/agent/Discord plugin + SecretRef writes, validates config, and verifies audit entries. The same Ring 0 setup path is also covered in QA Lab by
        text
        pnpm openclaw qa suite --scenario crestodian-ring-zero-setup
        .
    • Moonshot/Kimi cost smoke: with
      text
      MOONSHOT_API_KEY
      set, run
      text
      openclaw models list --provider moonshot --json
      , then run an isolated
      text
      openclaw agent --local --session-id live-kimi-cost --message 'Reply exactly: KIMI_LIVE_OK' --thinking off --json
      against
      text
      moonshot/kimi-k2.6
      . Verify the JSON reports Moonshot/K2.6 and the assistant transcript stores normalized
      text
      usage.cost
      .

    tip

    When you only need one failing case, prefer narrowing live tests via the allowlist env vars described below.

    QA-specific runners

    These commands sit beside the main test suites when you need QA-lab realism:

    CI runs QA Lab in dedicated workflows.

    text
    Parity gate
    runs on matching PRs and from manual dispatch with mock providers.
    text
    QA-Lab - All Lanes
    runs nightly on
    text
    main
    and from manual dispatch with the mock parity gate, live Matrix lane, Convex-managed live Telegram lane, and Convex-managed live Discord lane as parallel jobs. Scheduled QA and release checks pass Matrix
    text
    --profile fast
    explicitly, while the Matrix CLI and manual workflow input default remain
    text
    all
    ; manual dispatch can shard
    text
    all
    into
    text
    transport
    ,
    text
    media
    ,
    text
    e2ee-smoke
    ,
    text
    e2ee-deep
    , and
    text
    e2ee-cli
    jobs.
    text
    OpenClaw Release Checks
    runs parity plus the fast Matrix and Telegram lanes before release approval, using
    text
    mock-openai/gpt-5.5
    for release transport checks so they stay deterministic and avoid normal provider-plugin startup. These live transport gateways disable memory search; memory behavior stays covered by the QA parity suites.

    Full release live media shards use

    text
    ghcr.io/openclaw/openclaw-live-media-runner:ubuntu-24.04
    , which already has
    text
    ffmpeg
    and
    text
    ffprobe
    . Docker live model/backend shards use the shared
    text
    ghcr.io/openclaw/openclaw-live-test:<sha>
    image built once per selected commit, then pull it with
    text
    OPENCLAW_SKIP_DOCKER_BUILD=1
    instead of rebuilding inside every shard.

    • text
      pnpm openclaw qa suite
      • Runs repo-backed QA scenarios directly on the host.
      • Runs multiple selected scenarios in parallel by default with isolated gateway workers.
        text
        qa-channel
        defaults to concurrency 4 (bounded by the selected scenario count). Use
        text
        --concurrency <count>
        to tune the worker count, or
        text
        --concurrency 1
        for the older serial lane.
      • Exits non-zero when any scenario fails. Use
        text
        --allow-failures
        when you want artifacts without a failing exit code.
      • Supports provider modes
        text
        live-frontier
        ,
        text
        mock-openai
        , and
        text
        aimock
        .
        text
        aimock
        starts a local AIMock-backed provider server for experimental fixture and protocol-mock coverage without replacing the scenario-aware
        text
        mock-openai
        lane.
    • text
      pnpm test:gateway:cpu-scenarios
      • Runs the gateway startup bench plus a small mock QA Lab scenario pack (
        text
        channel-chat-baseline
        ,
        text
        memory-failure-fallback
        ,
        text
        gateway-restart-inflight-run
        ) and writes a combined CPU observation summary under
        text
        .artifacts/gateway-cpu-scenarios/
        .
      • Flags only sustained hot CPU observations by default (
        text
        --cpu-core-warn
        plus
        text
        --hot-wall-warn-ms
        ), so short startup bursts are recorded as metrics without looking like the minutes-long gateway peg regression.
      • Uses built
        text
        dist
        artifacts; run a build first when the checkout does not already have fresh runtime output.
    • text
      pnpm openclaw qa suite --runner multipass
      • Runs the same QA suite inside a disposable Multipass Linux VM.
      • Keeps the same scenario-selection behavior as
        text
        qa suite
        on the host.
      • Reuses the same provider/model selection flags as
        text
        qa suite
        .
      • Live runs forward the supported QA auth inputs that are practical for the guest: env-based provider keys, the QA live provider config path, and
        text
        CODEX_HOME
        when present.
      • Output dirs must stay under the repo root so the guest can write back through the mounted workspace.
      • Writes the normal QA report + summary plus Multipass logs under
        text
        .artifacts/qa-e2e/...
        .
    • text
      pnpm qa:lab:up
      • Starts the Docker-backed QA site for operator-style QA work.
    • text
      pnpm test:docker:npm-onboard-channel-agent
      • Builds an npm tarball from the current checkout, installs it globally in Docker, runs non-interactive OpenAI API-key onboarding, configures Telegram by default, verifies enabling the plugin installs runtime dependencies on demand, runs doctor, and runs one local agent turn against a mocked OpenAI endpoint.
      • Use
        text
        OPENCLAW_NPM_ONBOARD_CHANNEL=discord
        to run the same packaged-install lane with Discord.
    • text
      pnpm test:docker:session-runtime-context
      • Runs a deterministic built-app Docker smoke for embedded runtime context transcripts. It verifies hidden OpenClaw runtime context is persisted as a non-display custom message instead of leaking into the visible user turn, then seeds an affected broken session JSONL and verifies
        text
        openclaw doctor --fix
        rewrites it to the active branch with a backup.
    • text
      pnpm test:docker:npm-telegram-live
      • Installs an OpenClaw package candidate in Docker, runs installed-package onboarding, configures Telegram through the installed CLI, then reuses the live Telegram QA lane with that installed package as the SUT Gateway.
      • Defaults to
        text
        OPENCLAW_NPM_TELEGRAM_PACKAGE_SPEC=openclaw@beta
        ; set
        text
        OPENCLAW_NPM_TELEGRAM_PACKAGE_TGZ=/path/to/openclaw-current.tgz
        or
        text
        OPENCLAW_CURRENT_PACKAGE_TGZ
        to test a resolved local tarball instead of installing from the registry.
      • Uses the same Telegram env credentials or Convex credential source as
        text
        pnpm openclaw qa telegram
        . For CI/release automation, set
        text
        OPENCLAW_NPM_TELEGRAM_CREDENTIAL_SOURCE=convex
        plus
        text
        OPENCLAW_QA_CONVEX_SITE_URL
        and the role secret. If
        text
        OPENCLAW_QA_CONVEX_SITE_URL
        and a Convex role secret are present in CI, the Docker wrapper selects Convex automatically.
      • text
        OPENCLAW_NPM_TELEGRAM_CREDENTIAL_ROLE=ci|maintainer
        overrides the shared
        text
        OPENCLAW_QA_CREDENTIAL_ROLE
        for this lane only.
      • GitHub Actions exposes this lane as the manual maintainer workflow
        text
        NPM Telegram Beta E2E
        . It does not run on merge. The workflow uses the
        text
        qa-live-shared
        environment and Convex CI credential leases.
    • GitHub Actions also exposes
      text
      Package Acceptance
      for side-run product proof against one candidate package. It accepts a trusted ref, published npm spec, HTTPS tarball URL plus SHA-256, or tarball artifact from another run, uploads the normalized
      text
      openclaw-current.tgz
      as
      text
      package-under-test
      , then runs the existing Docker E2E scheduler with smoke, package, product, full, or custom lane profiles. Set
      text
      telegram_mode=mock-openai
      or
      text
      live-frontier
      to run the Telegram QA workflow against the same
      text
      package-under-test
      artifact.
      • Latest beta product proof:
    bash
    gh workflow run package-acceptance.yml --ref main \ -f source=npm \ -f package_spec=openclaw@beta \ -f suite_profile=product \ -f telegram_mode=mock-openai
    • Exact tarball URL proof requires a digest:
    bash
    gh workflow run package-acceptance.yml --ref main \ -f source=url \ -f package_url=https://registry.npmjs.org/openclaw/-/openclaw-VERSION.tgz \ -f package_sha256=<sha256> \ -f suite_profile=package
    • Artifact proof downloads a tarball artifact from another Actions run:
    bash
    gh workflow run package-acceptance.yml --ref main \ -f source=artifact \ -f artifact_run_id=<run-id> \ -f artifact_name=<artifact-name> \ -f suite_profile=smoke
    • text
      pnpm test:docker:bundled-channel-deps

      • Packs and installs the current OpenClaw build in Docker, starts the Gateway with OpenAI configured, then enables bundled channel/plugins via config edits.
      • Verifies setup discovery leaves unconfigured plugin runtime dependencies absent, the first configured Gateway or doctor run installs each bundled plugin's runtime dependencies on demand, and a second restart does not reinstall dependencies that were already activated.
      • Also installs a known older npm baseline, enables Telegram before running
        text
        openclaw update --tag <candidate>
        , and verifies the candidate's post-update doctor repairs bundled channel runtime dependencies without a harness-side postinstall repair.
    • text
      pnpm test:parallels:npm-update

      • Runs the native packaged-install update smoke across Parallels guests. Each selected platform first installs the requested baseline package, then runs the installed

        text
        openclaw update
        command in the same guest and verifies the installed version, update status, gateway readiness, and one local agent turn.

      • Use

        text
        --platform macos
        ,
        text
        --platform windows
        , or
        text
        --platform linux
        while iterating on one guest. Use
        text
        --json
        for the summary artifact path and per-lane status.

      • The OpenAI lane uses

        text
        openai/gpt-5.5
        for the live agent-turn proof by default. Pass
        text
        --model <provider/model>
        or set
        text
        OPENCLAW_PARALLELS_OPENAI_MODEL
        when deliberately validating another OpenAI model.

      • Wrap long local runs in a host timeout so Parallels transport stalls cannot consume the rest of the testing window:

        bash
        timeout --foreground 150m pnpm test:parallels:npm-update -- --json timeout --foreground 90m pnpm test:parallels:npm-update -- --platform windows --json
      • The script writes nested lane logs under

        text
        /tmp/openclaw-parallels-npm-update.*
        . Inspect
        text
        windows-update.log
        ,
        text
        macos-update.log
        , or
        text
        linux-update.log
        before assuming the outer wrapper is hung.

      • Windows update can spend 10 to 15 minutes in post-update doctor/runtime dependency repair on a cold guest; that is still healthy when the nested npm debug log is advancing.

      • Do not run this aggregate wrapper in parallel with individual Parallels macOS, Windows, or Linux smoke lanes. They share VM state and can collide on snapshot restore, package serving, or guest gateway state.

      • The post-update proof runs the normal bundled plugin surface because capability facades such as speech, image generation, and media understanding are loaded through bundled runtime APIs even when the agent turn itself only checks a simple text response.

    • text
      pnpm openclaw qa aimock

      • Starts only the local AIMock provider server for direct protocol smoke testing.
    • text
      pnpm openclaw qa matrix

      • Runs the Matrix live QA lane against a disposable Docker-backed Tuwunel homeserver. Source-checkout only — packaged installs do not ship
        text
        qa-lab
        .
      • Full CLI, profile/scenario catalog, env vars, and artifact layout: Matrix QA.
    • text
      pnpm openclaw qa telegram

      • Runs the Telegram live QA lane against a real private group using the driver and SUT bot tokens from env.
      • Requires
        text
        OPENCLAW_QA_TELEGRAM_GROUP_ID
        ,
        text
        OPENCLAW_QA_TELEGRAM_DRIVER_BOT_TOKEN
        , and
        text
        OPENCLAW_QA_TELEGRAM_SUT_BOT_TOKEN
        . The group id must be the numeric Telegram chat id.
      • Supports
        text
        --credential-source convex
        for shared pooled credentials. Use env mode by default, or set
        text
        OPENCLAW_QA_CREDENTIAL_SOURCE=convex
        to opt into pooled leases.
      • Exits non-zero when any scenario fails. Use
        text
        --allow-failures
        when you want artifacts without a failing exit code.
      • Requires two distinct bots in the same private group, with the SUT bot exposing a Telegram username.
      • For stable bot-to-bot observation, enable Bot-to-Bot Communication Mode in
        text
        @BotFather
        for both bots and ensure the driver bot can observe group bot traffic.
      • Writes a Telegram QA report, summary, and observed-messages artifact under
        text
        .artifacts/qa-e2e/...
        . Replying scenarios include RTT from driver send request to observed SUT reply.

    Live transport lanes share one standard contract so new transports do not drift; the per-lane coverage matrix lives in QA overview → Live transport coverage.

    text
    qa-channel
    is the broad synthetic suite and is not part of that matrix.

    Shared Telegram credentials via Convex (v1)

    When

    text
    --credential-source convex
    (or
    text
    OPENCLAW_QA_CREDENTIAL_SOURCE=convex
    ) is enabled for
    text
    openclaw qa telegram
    , QA lab acquires an exclusive lease from a Convex-backed pool, heartbeats that lease while the lane is running, and releases the lease on shutdown.

    Reference Convex project scaffold:

    • text
      qa/convex-credential-broker/

    Required env vars:

    • text
      OPENCLAW_QA_CONVEX_SITE_URL
      (for example
      text
      https://your-deployment.convex.site
      )
    • One secret for the selected role:
      • text
        OPENCLAW_QA_CONVEX_SECRET_MAINTAINER
        for
        text
        maintainer
      • text
        OPENCLAW_QA_CONVEX_SECRET_CI
        for
        text
        ci
    • Credential role selection:
      • CLI:
        text
        --credential-role maintainer|ci
      • Env default:
        text
        OPENCLAW_QA_CREDENTIAL_ROLE
        (defaults to
        text
        ci
        in CI,
        text
        maintainer
        otherwise)

    Optional env vars:

    • text
      OPENCLAW_QA_CREDENTIAL_LEASE_TTL_MS
      (default
      text
      1200000
      )
    • text
      OPENCLAW_QA_CREDENTIAL_HEARTBEAT_INTERVAL_MS
      (default
      text
      30000
      )
    • text
      OPENCLAW_QA_CREDENTIAL_ACQUIRE_TIMEOUT_MS
      (default
      text
      90000
      )
    • text
      OPENCLAW_QA_CREDENTIAL_HTTP_TIMEOUT_MS
      (default
      text
      15000
      )
    • text
      OPENCLAW_QA_CONVEX_ENDPOINT_PREFIX
      (default
      text
      /qa-credentials/v1
      )
    • text
      OPENCLAW_QA_CREDENTIAL_OWNER_ID
      (optional trace id)
    • text
      OPENCLAW_QA_ALLOW_INSECURE_HTTP=1
      allows loopback
      text
      http://
      Convex URLs for local-only development.

    text
    OPENCLAW_QA_CONVEX_SITE_URL
    should use
    text
    https://
    in normal operation.

    Maintainer admin commands (pool add/remove/list) require

    text
    OPENCLAW_QA_CONVEX_SECRET_MAINTAINER
    specifically.

    CLI helpers for maintainers:

    bash
    pnpm openclaw qa credentials doctor pnpm openclaw qa credentials add --kind telegram --payload-file qa/telegram-credential.json pnpm openclaw qa credentials list --kind telegram pnpm openclaw qa credentials remove --credential-id <credential-id>

    Use

    text
    doctor
    before live runs to check the Convex site URL, broker secrets, endpoint prefix, HTTP timeout, and admin/list reachability without printing secret values. Use
    text
    --json
    for machine-readable output in scripts and CI utilities.

    Default endpoint contract (

    text
    OPENCLAW_QA_CONVEX_SITE_URL
    +
    text
    /qa-credentials/v1
    ):

    • text
      POST /acquire
      • Request:
        text
        { kind, ownerId, actorRole, leaseTtlMs, heartbeatIntervalMs }
      • Success:
        text
        { status: "ok", credentialId, leaseToken, payload, leaseTtlMs?, heartbeatIntervalMs? }
      • Exhausted/retryable:
        text
        { status: "error", code: "POOL_EXHAUSTED" | "NO_CREDENTIAL_AVAILABLE", ... }
    • text
      POST /heartbeat
      • Request:
        text
        { kind, ownerId, actorRole, credentialId, leaseToken, leaseTtlMs }
      • Success:
        text
        { status: "ok" }
        (or empty
        text
        2xx
        )
    • text
      POST /release
      • Request:
        text
        { kind, ownerId, actorRole, credentialId, leaseToken }
      • Success:
        text
        { status: "ok" }
        (or empty
        text
        2xx
        )
    • text
      POST /admin/add
      (maintainer secret only)
      • Request:
        text
        { kind, actorId, payload, note?, status? }
      • Success:
        text
        { status: "ok", credential }
    • text
      POST /admin/remove
      (maintainer secret only)
      • Request:
        text
        { credentialId, actorId }
      • Success:
        text
        { status: "ok", changed, credential }
      • Active lease guard:
        text
        { status: "error", code: "LEASE_ACTIVE", ... }
    • text
      POST /admin/list
      (maintainer secret only)
      • Request:
        text
        { kind?, status?, includePayload?, limit? }
      • Success:
        text
        { status: "ok", credentials, count }

    Payload shape for Telegram kind:

    • text
      { groupId: string, driverToken: string, sutToken: string }
    • text
      groupId
      must be a numeric Telegram chat id string.
    • text
      admin/add
      validates this shape for
      text
      kind: "telegram"
      and rejects malformed payloads.

    Adding a channel to QA

    The architecture and scenario-helper names for new channel adapters live in QA overview → Adding a channel. The minimum bar: implement the transport runner on the shared

    text
    qa-lab
    host seam, declare
    text
    qaRunners
    in the plugin manifest, mount as
    text
    openclaw qa <runner>
    , and author scenarios under
    text
    qa/scenarios/
    .

    Test suites (what runs where)

    Think of the suites as “increasing realism” (and increasing flakiness/cost):

    Unit / integration (default)

    • Command:
      text
      pnpm test
    • Config: untargeted runs use the
      text
      vitest.full-*.config.ts
      shard set and may expand multi-project shards into per-project configs for parallel scheduling
    • Files: core/unit inventories under
      text
      src/**/*.test.ts
      ,
      text
      packages/**/*.test.ts
      , and
      text
      test/**/*.test.ts
      ; UI unit tests run in the dedicated
      text
      unit-ui
      shard
    • Scope:
      • Pure unit tests
      • In-process integration tests (gateway auth, routing, tooling, parsing, config)
      • Deterministic regressions for known bugs
    • Expectations:
      • Runs in CI
      • No real keys required
      • Should be fast and stable
      • Resolver and public-surface loader tests must prove broad
        text
        api.js
        and
        text
        runtime-api.js
        fallback behavior with generated tiny plugin fixtures, not real bundled plugin source APIs. Real plugin API loads belong in plugin-owned contract/integration suites.

    Stability (gateway)

    • Command:
      text
      pnpm test:stability:gateway
    • Config:
      text
      vitest.gateway.config.ts
      , forced to one worker
    • Scope:
      • Starts a real loopback Gateway with diagnostics enabled by default
      • Drives synthetic gateway message, memory, and large-payload churn through the diagnostic event path
      • Queries
        text
        diagnostics.stability
        over the Gateway WS RPC
      • Covers diagnostic stability bundle persistence helpers
      • Asserts the recorder remains bounded, synthetic RSS samples stay under the pressure budget, and per-session queue depths drain back to zero
    • Expectations:
      • CI-safe and keyless
      • Narrow lane for stability-regression follow-up, not a substitute for the full Gateway suite

    E2E (gateway smoke)

    • Command:
      text
      pnpm test:e2e
    • Config:
      text
      vitest.e2e.config.ts
    • Files:
      text
      src/**/*.e2e.test.ts
      ,
      text
      test/**/*.e2e.test.ts
      , and bundled-plugin E2E tests under
      text
      extensions/
    • Runtime defaults:
      • Uses Vitest
        text
        threads
        with
        text
        isolate: false
        , matching the rest of the repo.
      • Uses adaptive workers (CI: up to 2, local: 1 by default).
      • Runs in silent mode by default to reduce console I/O overhead.
    • Useful overrides:
      • text
        OPENCLAW_E2E_WORKERS=<n>
        to force worker count (capped at 16).
      • text
        OPENCLAW_E2E_VERBOSE=1
        to re-enable verbose console output.
    • Scope:
      • Multi-instance gateway end-to-end behavior
      • WebSocket/HTTP surfaces, node pairing, and heavier networking
    • Expectations:
      • Runs in CI (when enabled in the pipeline)
      • No real keys required
      • More moving parts than unit tests (can be slower)

    E2E: OpenShell backend smoke

    • Command:
      text
      pnpm test:e2e:openshell
    • File:
      text
      extensions/openshell/src/backend.e2e.test.ts
    • Scope:
      • Starts an isolated OpenShell gateway on the host via Docker
      • Creates a sandbox from a temporary local Dockerfile
      • Exercises OpenClaw's OpenShell backend over real
        text
        sandbox ssh-config
        + SSH exec
      • Verifies remote-canonical filesystem behavior through the sandbox fs bridge
    • Expectations:
      • Opt-in only; not part of the default
        text
        pnpm test:e2e
        run
      • Requires a local
        text
        openshell
        CLI plus a working Docker daemon
      • Uses isolated
        text
        HOME
        /
        text
        XDG_CONFIG_HOME
        , then destroys the test gateway and sandbox
    • Useful overrides:
      • text
        OPENCLAW_E2E_OPENSHELL=1
        to enable the test when running the broader e2e suite manually
      • text
        OPENCLAW_E2E_OPENSHELL_COMMAND=/path/to/openshell
        to point at a non-default CLI binary or wrapper script

    Live (real providers + real models)

    • Command:
      text
      pnpm test:live
    • Config:
      text
      vitest.live.config.ts
    • Files:
      text
      src/**/*.live.test.ts
      ,
      text
      test/**/*.live.test.ts
      , and bundled-plugin live tests under
      text
      extensions/
    • Default: enabled by
      text
      pnpm test:live
      (sets
      text
      OPENCLAW_LIVE_TEST=1
      )
    • Scope:
      • “Does this provider/model actually work today with real creds?”
      • Catch provider format changes, tool-calling quirks, auth issues, and rate limit behavior
    • Expectations:
      • Not CI-stable by design (real networks, real provider policies, quotas, outages)
      • Costs money / uses rate limits
      • Prefer running narrowed subsets instead of “everything”
    • Live runs source
      text
      ~/.profile
      to pick up missing API keys.
    • By default, live runs still isolate
      text
      HOME
      and copy config/auth material into a temp test home so unit fixtures cannot mutate your real
      text
      ~/.openclaw
      .
    • Set
      text
      OPENCLAW_LIVE_USE_REAL_HOME=1
      only when you intentionally need live tests to use your real home directory.
    • text
      pnpm test:live
      now defaults to a quieter mode: it keeps
      text
      [live] ...
      progress output, but suppresses the extra
      text
      ~/.profile
      notice and mutes gateway bootstrap logs/Bonjour chatter. Set
      text
      OPENCLAW_LIVE_TEST_QUIET=0
      if you want the full startup logs back.
    • API key rotation (provider-specific): set
      text
      *_API_KEYS
      with comma/semicolon format or
      text
      *_API_KEY_1
      ,
      text
      *_API_KEY_2
      (for example
      text
      OPENAI_API_KEYS
      ,
      text
      ANTHROPIC_API_KEYS
      ,
      text
      GEMINI_API_KEYS
      ) or per-live override via
      text
      OPENCLAW_LIVE_*_KEY
      ; tests retry on rate limit responses.
    • Progress/heartbeat output:
      • Live suites now emit progress lines to stderr so long provider calls are visibly active even when Vitest console capture is quiet.
      • text
        vitest.live.config.ts
        disables Vitest console interception so provider/gateway progress lines stream immediately during live runs.
      • Tune direct-model heartbeats with
        text
        OPENCLAW_LIVE_HEARTBEAT_MS
        .
      • Tune gateway/probe heartbeats with
        text
        OPENCLAW_LIVE_GATEWAY_HEARTBEAT_MS
        .

    Which suite should I run?

    Use this decision table:

    • Editing logic/tests: run
      text
      pnpm test
      (and
      text
      pnpm test:coverage
      if you changed a lot)
    • Touching gateway networking / WS protocol / pairing: add
      text
      pnpm test:e2e
    • Debugging “my bot is down” / provider-specific failures / tool calling: run a narrowed
      text
      pnpm test:live

    Live (network-touching) tests

    For the live model matrix, CLI backend smokes, ACP smokes, Codex app-server harness, and all media-provider live tests (Deepgram, BytePlus, ComfyUI, image, music, video, media harness) — plus credential handling for live runs — see Testing — live suites.

    Docker runners (optional "works in Linux" checks)

    These Docker runners split into two buckets:

    • Live-model runners:
      text
      test:docker:live-models
      and
      text
      test:docker:live-gateway
      run only their matching profile-key live file inside the repo Docker image (
      text
      src/agents/models.profiles.live.test.ts
      and
      text
      src/gateway/gateway-models.profiles.live.test.ts
      ), mounting your local config dir and workspace (and sourcing
      text
      ~/.profile
      if mounted). The matching local entrypoints are
      text
      test:live:models-profiles
      and
      text
      test:live:gateway-profiles
      .
    • Docker live runners default to a smaller smoke cap so a full Docker sweep stays practical:
      text
      test:docker:live-models
      defaults to
      text
      OPENCLAW_LIVE_MAX_MODELS=12
      , and
      text
      test:docker:live-gateway
      defaults to
      text
      OPENCLAW_LIVE_GATEWAY_SMOKE=1
      ,
      text
      OPENCLAW_LIVE_GATEWAY_MAX_MODELS=8
      ,
      text
      OPENCLAW_LIVE_GATEWAY_STEP_TIMEOUT_MS=45000
      , and
      text
      OPENCLAW_LIVE_GATEWAY_MODEL_TIMEOUT_MS=90000
      . Override those env vars when you explicitly want the larger exhaustive scan.
    • text
      test:docker:all
      builds the live Docker image once via
      text
      test:docker:live-build
      , packs OpenClaw once as an npm tarball through
      text
      scripts/package-openclaw-for-docker.mjs
      , then builds/reuses two
      text
      scripts/e2e/Dockerfile
      images. The bare image is only the Node/Git runner for install/update/plugin-dependency lanes; those lanes mount the prebuilt tarball. The functional image installs the same tarball into
      text
      /app
      for built-app functionality lanes. Docker lane definitions live in
      text
      scripts/lib/docker-e2e-scenarios.mjs
      ; planner logic lives in
      text
      scripts/lib/docker-e2e-plan.mjs
      ;
      text
      scripts/test-docker-all.mjs
      executes the selected plan. The aggregate uses a weighted local scheduler:
      text
      OPENCLAW_DOCKER_ALL_PARALLELISM
      controls process slots, while resource caps keep heavy live, npm-install, and multi-service lanes from all starting at once. If a single lane is heavier than the active caps, the scheduler can still start it when the pool is empty and then keeps it running alone until capacity is available again. Defaults are 10 slots,
      text
      OPENCLAW_DOCKER_ALL_LIVE_LIMIT=9
      ,
      text
      OPENCLAW_DOCKER_ALL_NPM_LIMIT=10
      , and
      text
      OPENCLAW_DOCKER_ALL_SERVICE_LIMIT=7
      ; tune
      text
      OPENCLAW_DOCKER_ALL_WEIGHT_LIMIT
      or
      text
      OPENCLAW_DOCKER_ALL_DOCKER_LIMIT
      only when the Docker host has more headroom. The runner performs a Docker preflight by default, removes stale OpenClaw E2E containers, prints status every 30 seconds, stores successful lane timings in
      text
      .artifacts/docker-tests/lane-timings.json
      , and uses those timings to start longer lanes first on later runs. Use
      text
      OPENCLAW_DOCKER_ALL_DRY_RUN=1
      to print the weighted lane manifest without building or running Docker, or
      text
      node scripts/test-docker-all.mjs --plan-json
      to print the CI plan for selected lanes, package/image needs, and credentials.
    • text
      Package Acceptance
      is the GitHub-native package gate for "does this installable tarball work as a product?" It resolves one candidate package from
      text
      source=npm
      ,
      text
      source=ref
      ,
      text
      source=url
      , or
      text
      source=artifact
      , uploads it as
      text
      package-under-test
      , then runs the reusable Docker E2E lanes against that exact tarball instead of repacking the selected ref.
      text
      workflow_ref
      selects the trusted workflow/harness scripts, while
      text
      package_ref
      selects the source commit/branch/tag to pack when
      text
      source=ref
      ; this lets current acceptance logic validate older trusted commits. Profiles are ordered by breadth:
      text
      smoke
      is quick install/channel/agent plus gateway/config,
      text
      package
      is the package/update/plugin contract plus the keyless upgrade-survivor fixture, the published-baseline upgrade survivor lane, and the default native replacement for most Parallels package/update coverage,
      text
      product
      adds MCP channels, cron/subagent cleanup, OpenAI web search, and OpenWebUI, and
      text
      full
      runs the release-path Docker chunks with OpenWebUI. For
      text
      published-upgrade-survivor
      , Package Acceptance always uses
      text
      package-under-test
      as the candidate and
      text
      published_upgrade_survivor_baseline
      as the published baseline, defaulting to
      text
      openclaw@latest
      ; shard broader coverage by dispatching multiple runs with exact baseline values. The published lane configures its baseline with a baked
      text
      openclaw config set
      command recipe, then records recipe steps in the lane summary. Release validation runs a custom package delta (
      text
      bundled-channel-deps-compat plugins-offline
      ) plus Telegram package QA because the release-path Docker chunks already cover the overlapping package/update/plugin lanes. Targeted GitHub Docker rerun commands generated from artifacts include prior package artifact, prepared image inputs, and the published upgrade-survivor baseline when available, so failed lanes can avoid rebuilding the package and images.
    • Build and release checks run
      text
      scripts/check-cli-bootstrap-imports.mjs
      after tsdown. The guard walks the static built graph from
      text
      dist/entry.js
      and
      text
      dist/cli/run-main.js
      and fails if pre-dispatch startup imports package dependencies such as Commander, prompt UI, undici, or logging before command dispatch; it also keeps the bundled gateway run chunk under budget and rejects static imports of known cold gateway paths. Packaged CLI smoke also covers root help, onboard help, doctor help, status, config schema, and a model-list command.
    • Package Acceptance legacy compatibility is capped at
      text
      2026.4.25
      (
      text
      2026.4.25-beta.*
      included). Through that cutoff, the harness tolerates only shipped-package metadata gaps: omitted private QA inventory entries, missing
      text
      gateway install --wrapper
      , missing patch files in the tarball-derived git fixture, missing persisted
      text
      update.channel
      , legacy plugin install-record locations, missing marketplace install-record persistence, and config metadata migration during
      text
      plugins update
      . For packages after
      text
      2026.4.25
      , those paths are strict failures.
    • Container smoke runners:
      text
      test:docker:openwebui
      ,
      text
      test:docker:onboard
      ,
      text
      test:docker:npm-onboard-channel-agent
      ,
      text
      test:docker:update-channel-switch
      ,
      text
      test:docker:upgrade-survivor
      ,
      text
      test:docker:published-upgrade-survivor
      ,
      text
      test:docker:session-runtime-context
      ,
      text
      test:docker:agents-delete-shared-workspace
      ,
      text
      test:docker:gateway-network
      ,
      text
      test:docker:browser-cdp-snapshot
      ,
      text
      test:docker:mcp-channels
      ,
      text
      test:docker:pi-bundle-mcp-tools
      ,
      text
      test:docker:cron-mcp-cleanup
      ,
      text
      test:docker:plugins
      ,
      text
      test:docker:plugin-update
      , and
      text
      test:docker:config-reload
      boot one or more real containers and verify higher-level integration paths.

    The live-model Docker runners also bind-mount only the needed CLI auth homes (or all supported ones when the run is not narrowed), then copy them into the container home before the run so external-CLI OAuth can refresh tokens without mutating the host auth store:

    • Direct models:
      text
      pnpm test:docker:live-models
      (script:
      text
      scripts/test-live-models-docker.sh
      )
    • ACP bind smoke:
      text
      pnpm test:docker:live-acp-bind
      (script:
      text
      scripts/test-live-acp-bind-docker.sh
      ; covers Claude, Codex, and Gemini by default, with strict Droid/OpenCode coverage via
      text
      pnpm test:docker:live-acp-bind:droid
      and
      text
      pnpm test:docker:live-acp-bind:opencode
      )
    • CLI backend smoke:
      text
      pnpm test:docker:live-cli-backend
      (script:
      text
      scripts/test-live-cli-backend-docker.sh
      )
    • Codex app-server harness smoke:
      text
      pnpm test:docker:live-codex-harness
      (script:
      text
      scripts/test-live-codex-harness-docker.sh
      )
    • Gateway + dev agent:
      text
      pnpm test:docker:live-gateway
      (script:
      text
      scripts/test-live-gateway-models-docker.sh
      )
    • Observability smoke:
      text
      pnpm qa:otel:smoke
      is a private QA source-checkout lane. It is intentionally not part of package Docker release lanes because the npm tarball omits QA Lab.
    • Open WebUI live smoke:
      text
      pnpm test:docker:openwebui
      (script:
      text
      scripts/e2e/openwebui-docker.sh
      )
    • Onboarding wizard (TTY, full scaffolding):
      text
      pnpm test:docker:onboard
      (script:
      text
      scripts/e2e/onboard-docker.sh
      )
    • Npm tarball onboarding/channel/agent smoke:
      text
      pnpm test:docker:npm-onboard-channel-agent
      installs the packed OpenClaw tarball globally in Docker, configures OpenAI via env-ref onboarding plus Telegram by default, verifies doctor repairs activated plugin runtime deps, and runs one mocked OpenAI agent turn. Reuse a prebuilt tarball with
      text
      OPENCLAW_CURRENT_PACKAGE_TGZ=/path/to/openclaw-*.tgz
      , skip the host rebuild with
      text
      OPENCLAW_NPM_ONBOARD_HOST_BUILD=0
      , or switch channel with
      text
      OPENCLAW_NPM_ONBOARD_CHANNEL=discord
      .
    • Update channel switch smoke:
      text
      pnpm test:docker:update-channel-switch
      installs the packed OpenClaw tarball globally in Docker, switches from package
      text
      stable
      to git
      text
      dev
      , verifies the persisted channel and plugin post-update work, then switches back to package
      text
      stable
      and checks update status.
    • Upgrade survivor smoke:
      text
      pnpm test:docker:upgrade-survivor
      installs the packed OpenClaw tarball over a dirty old-user fixture with agents, channel config, plugin allowlists, stale plugin runtime-deps state, and existing workspace/session files. It runs package update plus non-interactive doctor without live provider or channel keys, then starts a loopback Gateway and checks config/state preservation plus startup/status budgets.
    • Published upgrade survivor smoke:
      text
      pnpm test:docker:published-upgrade-survivor
      installs
      text
      openclaw@latest
      by default, seeds realistic existing-user files, configures that baseline with a baked command recipe, validates the resulting config, updates that published install to the candidate tarball, runs non-interactive doctor, writes
      text
      .artifacts/upgrade-survivor/summary.json
      , then starts a loopback Gateway and checks configured intents, state preservation, startup, and status budgets. Override the baseline with
      text
      OPENCLAW_UPGRADE_SURVIVOR_BASELINE_SPEC
      ; Package Acceptance exposes the same value as
      text
      published_upgrade_survivor_baseline
      .
    • Session runtime context smoke:
      text
      pnpm test:docker:session-runtime-context
      verifies hidden runtime context transcript persistence plus doctor repair of affected duplicated prompt-rewrite branches.
    • Bun global install smoke:
      text
      bash scripts/e2e/bun-global-install-smoke.sh
      packs the current tree, installs it with
      text
      bun install -g
      in an isolated home, and verifies
      text
      openclaw infer image providers --json
      returns bundled image providers instead of hanging. Reuse a prebuilt tarball with
      text
      OPENCLAW_BUN_GLOBAL_SMOKE_PACKAGE_TGZ=/path/to/openclaw-*.tgz
      , skip the host build with
      text
      OPENCLAW_BUN_GLOBAL_SMOKE_HOST_BUILD=0
      , or copy
      text
      dist/
      from a built Docker image with
      text
      OPENCLAW_BUN_GLOBAL_SMOKE_DIST_IMAGE=openclaw-dockerfile-smoke:local
      .
    • Installer Docker smoke:
      text
      bash scripts/test-install-sh-docker.sh
      shares one npm cache across its root, update, and direct-npm containers. Update smoke defaults to npm
      text
      latest
      as the stable baseline before upgrading to the candidate tarball. Override with
      text
      OPENCLAW_INSTALL_SMOKE_UPDATE_BASELINE=2026.4.22
      locally, or with the Install Smoke workflow's
      text
      update_baseline_version
      input on GitHub. Non-root installer checks keep an isolated npm cache so root-owned cache entries do not mask user-local install behavior. Set
      text
      OPENCLAW_INSTALL_SMOKE_NPM_CACHE_DIR=/path/to/cache
      to reuse the root/update/direct-npm cache across local reruns.
    • Install Smoke CI skips the duplicate direct-npm global update with
      text
      OPENCLAW_INSTALL_SMOKE_SKIP_NPM_GLOBAL=1
      ; run the script locally without that env when direct
      text
      npm install -g
      coverage is needed.
    • Agents delete shared workspace CLI smoke:
      text
      pnpm test:docker:agents-delete-shared-workspace
      (script:
      text
      scripts/e2e/agents-delete-shared-workspace-docker.sh
      ) builds the root Dockerfile image by default, seeds two agents with one workspace in an isolated container home, runs
      text
      agents delete --json
      , and verifies valid JSON plus retained workspace behavior. Reuse the install-smoke image with
      text
      OPENCLAW_AGENTS_DELETE_SHARED_WORKSPACE_E2E_IMAGE=openclaw-dockerfile-smoke:local OPENCLAW_AGENTS_DELETE_SHARED_WORKSPACE_E2E_SKIP_BUILD=1
      .
    • Gateway networking (two containers, WS auth + health):
      text
      pnpm test:docker:gateway-network
      (script:
      text
      scripts/e2e/gateway-network-docker.sh
      )
    • Browser CDP snapshot smoke:
      text
      pnpm test:docker:browser-cdp-snapshot
      (script:
      text
      scripts/e2e/browser-cdp-snapshot-docker.sh
      ) builds the source E2E image plus a Chromium layer, starts Chromium with raw CDP, runs
      text
      browser doctor --deep
      , and verifies CDP role snapshots cover link URLs, cursor-promoted clickables, iframe refs, and frame metadata.
    • OpenAI Responses web_search minimal reasoning regression:
      text
      pnpm test:docker:openai-web-search-minimal
      (script:
      text
      scripts/e2e/openai-web-search-minimal-docker.sh
      ) runs a mocked OpenAI server through Gateway, verifies
      text
      web_search
      raises
      text
      reasoning.effort
      from
      text
      minimal
      to
      text
      low
      , then forces the provider schema reject and checks the raw detail appears in Gateway logs.
    • MCP channel bridge (seeded Gateway + stdio bridge + raw Claude notification-frame smoke):
      text
      pnpm test:docker:mcp-channels
      (script:
      text
      scripts/e2e/mcp-channels-docker.sh
      )
    • Pi bundle MCP tools (real stdio MCP server + embedded Pi profile allow/deny smoke):
      text
      pnpm test:docker:pi-bundle-mcp-tools
      (script:
      text
      scripts/e2e/pi-bundle-mcp-tools-docker.sh
      )
    • Cron/subagent MCP cleanup (real Gateway + stdio MCP child teardown after isolated cron and one-shot subagent runs):
      text
      pnpm test:docker:cron-mcp-cleanup
      (script:
      text
      scripts/e2e/cron-mcp-cleanup-docker.sh
      )
    • Plugins (install smoke, ClawHub kitchen-sink install/uninstall, marketplace updates, and Claude-bundle enable/inspect):
      text
      pnpm test:docker:plugins
      (script:
      text
      scripts/e2e/plugins-docker.sh
      ) Set
      text
      OPENCLAW_PLUGINS_E2E_CLAWHUB=0
      to skip the ClawHub block, or override the default kitchen-sink package/runtime pair with
      text
      OPENCLAW_PLUGINS_E2E_CLAWHUB_SPEC
      and
      text
      OPENCLAW_PLUGINS_E2E_CLAWHUB_ID
      . Without
      text
      OPENCLAW_CLAWHUB_URL
      /
      text
      CLAWHUB_URL
      , the test uses a hermetic local ClawHub fixture server.
    • Plugin update unchanged smoke:
      text
      pnpm test:docker:plugin-update
      (script:
      text
      scripts/e2e/plugin-update-unchanged-docker.sh
      )
    • Config reload metadata smoke:
      text
      pnpm test:docker:config-reload
      (script:
      text
      scripts/e2e/config-reload-source-docker.sh
      )
    • Bundled plugin runtime deps:
      text
      pnpm test:docker:bundled-channel-deps
      builds a small Docker runner image by default, builds and packs OpenClaw once on the host, then mounts that tarball into each Linux install scenario. Reuse the image with
      text
      OPENCLAW_SKIP_DOCKER_BUILD=1
      , skip the host rebuild after a fresh local build with
      text
      OPENCLAW_BUNDLED_CHANNEL_HOST_BUILD=0
      , or point at an existing tarball with
      text
      OPENCLAW_CURRENT_PACKAGE_TGZ=/path/to/openclaw-*.tgz
      . The full Docker aggregate and release-path bundled-channel chunks pre-pack this tarball once, then shard bundled channel checks into independent lanes, including separate update lanes for Telegram, Discord, Slack, Feishu, memory-lancedb, and ACPX. Release chunks split channel smokes, update targets, and setup/runtime contracts into
      text
      bundled-channels-core
      ,
      text
      bundled-channels-update-a
      ,
      text
      bundled-channels-update-b
      , and
      text
      bundled-channels-contracts
      ; the aggregate
      text
      bundled-channels
      chunk remains available for manual reruns. The release workflow also splits provider installer chunks and bundled plugin install/uninstall chunks; legacy
      text
      package-update
      ,
      text
      plugins-runtime
      , and
      text
      plugins-integrations
      chunks remain aggregate aliases for manual reruns. Use
      text
      OPENCLAW_BUNDLED_CHANNELS=telegram,slack
      to narrow the channel matrix when running the bundled lane directly, or
      text
      OPENCLAW_BUNDLED_CHANNEL_UPDATE_TARGETS=telegram,acpx
      to narrow the update scenario. Per-scenario Docker runs default to
      text
      OPENCLAW_BUNDLED_CHANNEL_DOCKER_RUN_TIMEOUT=900s
      ; the multi-target update scenario defaults to
      text
      OPENCLAW_BUNDLED_CHANNEL_UPDATE_DOCKER_RUN_TIMEOUT=2400s
      . The lane also verifies that
      text
      channels.<id>.enabled=false
      and
      text
      plugins.entries.<id>.enabled=false
      suppress doctor/runtime-dependency repair.
    • Narrow bundled plugin runtime deps while iterating by disabling unrelated scenarios, for example:
      text
      OPENCLAW_BUNDLED_CHANNEL_SCENARIOS=0 OPENCLAW_BUNDLED_CHANNEL_UPDATE_SCENARIO=0 OPENCLAW_BUNDLED_CHANNEL_ROOT_OWNED_SCENARIO=0 OPENCLAW_BUNDLED_CHANNEL_SETUP_ENTRY_SCENARIO=0 pnpm test:docker:bundled-channel-deps
      .

    To prebuild and reuse the shared functional image manually:

    bash
    OPENCLAW_DOCKER_E2E_IMAGE=openclaw-docker-e2e-functional:local pnpm test:docker:e2e-build OPENCLAW_DOCKER_E2E_IMAGE=openclaw-docker-e2e-functional:local OPENCLAW_SKIP_DOCKER_BUILD=1 pnpm test:docker:mcp-channels

    Suite-specific image overrides such as

    text
    OPENCLAW_GATEWAY_NETWORK_E2E_IMAGE
    still win when set. When
    text
    OPENCLAW_SKIP_DOCKER_BUILD=1
    points at a remote shared image, the scripts pull it if it is not already local. The QR and installer Docker tests keep their own Dockerfiles because they validate package/install behavior rather than the shared built-app runtime.

    The live-model Docker runners also bind-mount the current checkout read-only and stage it into a temporary workdir inside the container. This keeps the runtime image slim while still running Vitest against your exact local source/config. The staging step skips large local-only caches and app build outputs such as

    text
    .pnpm-store
    ,
    text
    .worktrees
    ,
    text
    __openclaw_vitest__
    , and app-local
    text
    .build
    or Gradle output directories so Docker live runs do not spend minutes copying machine-specific artifacts. They also set
    text
    OPENCLAW_SKIP_CHANNELS=1
    so gateway live probes do not start real Telegram/Discord/etc. channel workers inside the container.
    text
    test:docker:live-models
    still runs
    text
    pnpm test:live
    , so pass through
    text
    OPENCLAW_LIVE_GATEWAY_*
    as well when you need to narrow or exclude gateway live coverage from that Docker lane.
    text
    test:docker:openwebui
    is a higher-level compatibility smoke: it starts an OpenClaw gateway container with the OpenAI-compatible HTTP endpoints enabled, starts a pinned Open WebUI container against that gateway, signs in through Open WebUI, verifies
    text
    /api/models
    exposes
    text
    openclaw/default
    , then sends a real chat request through Open WebUI's
    text
    /api/chat/completions
    proxy. The first run can be noticeably slower because Docker may need to pull the Open WebUI image and Open WebUI may need to finish its own cold-start setup. This lane expects a usable live model key, and
    text
    OPENCLAW_PROFILE_FILE
    (
    text
    ~/.profile
    by default) is the primary way to provide it in Dockerized runs. Successful runs print a small JSON payload like
    text
    { "ok": true, "model": "openclaw/default", ... }
    .
    text
    test:docker:mcp-channels
    is intentionally deterministic and does not need a real Telegram, Discord, or iMessage account. It boots a seeded Gateway container, starts a second container that spawns
    text
    openclaw mcp serve
    , then verifies routed conversation discovery, transcript reads, attachment metadata, live event queue behavior, outbound send routing, and Claude-style channel + permission notifications over the real stdio MCP bridge. The notification check inspects the raw stdio MCP frames directly so the smoke validates what the bridge actually emits, not just what a specific client SDK happens to surface.
    text
    test:docker:pi-bundle-mcp-tools
    is deterministic and does not need a live model key. It builds the repo Docker image, starts a real stdio MCP probe server inside the container, materializes that server through the embedded Pi bundle MCP runtime, executes the tool, then verifies
    text
    coding
    and
    text
    messaging
    keep
    text
    bundle-mcp
    tools while
    text
    minimal
    and
    text
    tools.deny: ["bundle-mcp"]
    filter them.
    text
    test:docker:cron-mcp-cleanup
    is deterministic and does not need a live model key. It starts a seeded Gateway with a real stdio MCP probe server, runs an isolated cron turn and a
    text
    /subagents spawn
    one-shot child turn, then verifies the MCP child process exits after each run.

    Manual ACP plain-language thread smoke (not CI):

    • text
      bun scripts/dev/discord-acp-plain-language-smoke.ts --channel <discord-channel-id> ...
    • Keep this script for regression/debug workflows. It may be needed again for ACP thread routing validation, so do not delete it.

    Useful env vars:

    • text
      OPENCLAW_CONFIG_DIR=...
      (default:
      text
      ~/.openclaw
      ) mounted to
      text
      /home/node/.openclaw
    • text
      OPENCLAW_WORKSPACE_DIR=...
      (default:
      text
      ~/.openclaw/workspace
      ) mounted to
      text
      /home/node/.openclaw/workspace
    • text
      OPENCLAW_PROFILE_FILE=...
      (default:
      text
      ~/.profile
      ) mounted to
      text
      /home/node/.profile
      and sourced before running tests
    • text
      OPENCLAW_DOCKER_PROFILE_ENV_ONLY=1
      to verify only env vars sourced from
      text
      OPENCLAW_PROFILE_FILE
      , using temporary config/workspace dirs and no external CLI auth mounts
    • text
      OPENCLAW_DOCKER_CLI_TOOLS_DIR=...
      (default:
      text
      ~/.cache/openclaw/docker-cli-tools
      ) mounted to
      text
      /home/node/.npm-global
      for cached CLI installs inside Docker
    • External CLI auth dirs/files under
      text
      $HOME
      are mounted read-only under
      text
      /host-auth...
      , then copied into
      text
      /home/node/...
      before tests start
      • Default dirs:
        text
        .minimax
      • Default files:
        text
        ~/.codex/auth.json
        ,
        text
        ~/.codex/config.toml
        ,
        text
        .claude.json
        ,
        text
        ~/.claude/.credentials.json
        ,
        text
        ~/.claude/settings.json
        ,
        text
        ~/.claude/settings.local.json
      • Narrowed provider runs mount only the needed dirs/files inferred from
        text
        OPENCLAW_LIVE_PROVIDERS
        /
        text
        OPENCLAW_LIVE_GATEWAY_PROVIDERS
      • Override manually with
        text
        OPENCLAW_DOCKER_AUTH_DIRS=all
        ,
        text
        OPENCLAW_DOCKER_AUTH_DIRS=none
        , or a comma list like
        text
        OPENCLAW_DOCKER_AUTH_DIRS=.claude,.codex
    • text
      OPENCLAW_LIVE_GATEWAY_MODELS=...
      /
      text
      OPENCLAW_LIVE_MODELS=...
      to narrow the run
    • text
      OPENCLAW_LIVE_GATEWAY_PROVIDERS=...
      /
      text
      OPENCLAW_LIVE_PROVIDERS=...
      to filter providers in-container
    • text
      OPENCLAW_SKIP_DOCKER_BUILD=1
      to reuse an existing
      text
      openclaw:local-live
      image for reruns that do not need a rebuild
    • text
      OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1
      to ensure creds come from the profile store (not env)
    • text
      OPENCLAW_OPENWEBUI_MODEL=...
      to choose the model exposed by the gateway for the Open WebUI smoke
    • text
      OPENCLAW_OPENWEBUI_PROMPT=...
      to override the nonce-check prompt used by the Open WebUI smoke
    • text
      OPENWEBUI_IMAGE=...
      to override the pinned Open WebUI image tag

    Docs sanity

    Run docs checks after doc edits:

    text
    pnpm check:docs
    . Run full Mintlify anchor validation when you need in-page heading checks too:
    text
    pnpm docs:check-links:anchors
    .

    Offline regression (CI-safe)

    These are “real pipeline” regressions without real providers:

    • Gateway tool calling (mock OpenAI, real gateway + agent loop):
      text
      src/gateway/gateway.test.ts
      (case: "runs a mock OpenAI tool call end-to-end via gateway agent loop")
    • Gateway wizard (WS
      text
      wizard.start
      /
      text
      wizard.next
      , writes config + auth enforced):
      text
      src/gateway/gateway.test.ts
      (case: "runs wizard over ws and writes auth token config")

    Agent reliability evals (skills)

    We already have a few CI-safe tests that behave like “agent reliability evals”:

    • Mock tool-calling through the real gateway + agent loop (
      text
      src/gateway/gateway.test.ts
      ).
    • End-to-end wizard flows that validate session wiring and config effects (
      text
      src/gateway/gateway.test.ts
      ).

    What’s still missing for skills (see Skills):

    • Decisioning: when skills are listed in the prompt, does the agent pick the right skill (or avoid irrelevant ones)?
    • Compliance: does the agent read
      text
      SKILL.md
      before use and follow required steps/args?
    • Workflow contracts: multi-turn scenarios that assert tool order, session history carryover, and sandbox boundaries.

    Future evals should stay deterministic first:

    • A scenario runner using mock providers to assert tool calls + order, skill file reads, and session wiring.
    • A small suite of skill-focused scenarios (use vs avoid, gating, prompt injection).
    • Optional live evals (opt-in, env-gated) only after the CI-safe suite is in place.

    Contract tests (plugin and channel shape)

    Contract tests verify that every registered plugin and channel conforms to its interface contract. They iterate over all discovered plugins and run a suite of shape and behavior assertions. The default

    text
    pnpm test
    unit lane intentionally skips these shared seam and smoke files; run the contract commands explicitly when you touch shared channel or provider surfaces.

    Commands

    • All contracts:
      text
      pnpm test:contracts
    • Channel contracts only:
      text
      pnpm test:contracts:channels
    • Provider contracts only:
      text
      pnpm test:contracts:plugins

    Channel contracts

    Located in

    text
    src/channels/plugins/contracts/*.contract.test.ts
    :

    • plugin - Basic plugin shape (id, name, capabilities)
    • setup - Setup wizard contract
    • session-binding - Session binding behavior
    • outbound-payload - Message payload structure
    • inbound - Inbound message handling
    • actions - Channel action handlers
    • threading - Thread ID handling
    • directory - Directory/roster API
    • group-policy - Group policy enforcement

    Provider status contracts

    Located in

    text
    src/plugins/contracts/*.contract.test.ts
    .

    • status - Channel status probes
    • registry - Plugin registry shape

    Provider contracts

    Located in

    text
    src/plugins/contracts/*.contract.test.ts
    :

    • auth - Auth flow contract
    • auth-choice - Auth choice/selection
    • catalog - Model catalog API
    • discovery - Plugin discovery
    • loader - Plugin loading
    • runtime - Provider runtime
    • shape - Plugin shape/interface
    • wizard - Setup wizard

    When to run

    • After changing plugin-sdk exports or subpaths
    • After adding or modifying a channel or provider plugin
    • After refactoring plugin registration or discovery

    Contract tests run in CI and do not require real API keys.

    Adding regressions (guidance)

    When you fix a provider/model issue discovered in live:

    • Add a CI-safe regression if possible (mock/stub provider, or capture the exact request-shape transformation)
    • If it’s inherently live-only (rate limits, auth policies), keep the live test narrow and opt-in via env vars
    • Prefer targeting the smallest layer that catches the bug:
      • provider request conversion/replay bug → direct models test
      • gateway session/history/tool pipeline bug → gateway live smoke or CI-safe gateway mock test
    • SecretRef traversal guardrail:
      • text
        src/secrets/exec-secret-ref-id-parity.test.ts
        derives one sampled target per SecretRef class from registry metadata (
        text
        listSecretTargetRegistryEntries()
        ), then asserts traversal-segment exec ids are rejected.
      • If you add a new
        text
        includeInPlan
        SecretRef target family in
        text
        src/secrets/target-registry-data.ts
        , update
        text
        classifyTargetClass
        in that test. The test intentionally fails on unclassified target ids so new classes cannot be skipped silently.

    Related

    • Testing live
    • CI

    © 2024 TaskFlow Mirror

    Powered by TaskFlow Sync Engine