Caricamento in corso...
Caricamento in corso...
Last synced: Today, 22:00
Technical reference for the OpenClaw framework. Real-time synchronization with the official documentation engine.
Use this file to discover all available pages before exploring further.
OpenClaw already worked well with tool-using frontier models, but GPT-5.5 and Codex-style models were still underperforming in a few practical ways:
/elevated fullThis parity program fixes those gaps in four reviewable slices.
This slice adds an opt-in
strict-agenticWhen enabled, OpenClaw stops accepting plan-only turns as “good enough” completion. If the model only says what it intends to do and does not actually use tools or make progress, OpenClaw retries with an act-now steer and then fails closed with an explicit blocked state instead of silently ending the task.
This improves the GPT-5.5 experience most on:
update_planThis slice makes OpenClaw tell the truth about two things:
/elevated fullThat means GPT-5.5 gets better runtime signals for missing scope, auth refresh failures, HTML 403 auth failures, proxy issues, DNS or timeout failures, and blocked full-access modes. The model is less likely to hallucinate the wrong remediation or keep asking for a permission mode the runtime cannot provide.
This slice improves two kinds of correctness:
The tool-compat work reduces schema friction for strict OpenAI/Codex tool registration, especially around parameter-free tools and strict object-root expectations. The replay/liveness work makes long-running tasks more observable, so paused, blocked, and abandoned states are visible instead of disappearing into generic failure text.
This slice adds the first-wave QA-lab parity pack so GPT-5.5 and Opus 4.6 can be exercised through the same scenarios and compared using shared evidence.
The parity pack is the proof layer. It does not change runtime behavior by itself.
After you have two
qa-suite-summary.jsonbashpnpm openclaw qa parity-report \ --repo-root . \ --candidate-summary .artifacts/qa-e2e/gpt55/qa-suite-summary.json \ --baseline-summary .artifacts/qa-e2e/opus46/qa-suite-summary.json \ --output-dir .artifacts/qa-e2e/parity
That command writes:
passfailBefore this work, GPT-5.5 on OpenClaw could feel less agentic than Opus in real coding sessions because the runtime tolerated behaviors that are especially harmful for GPT-5-style models:
The goal is not to make GPT-5.5 imitate Opus. The goal is to give GPT-5.5 a runtime contract that rewards real progress, supplies cleaner tool and permission semantics, and turns failure modes into explicit machine- and human-readable states.
That changes the user experience from:
to:
| Before this program | After PR A-D |
|---|---|
| GPT-5.5 could stop after a reasonable plan without taking the next tool step | PR A turns “plan only” into “act now or surface a blocked state” |
| Strict tool schemas could reject parameter-free or OpenAI/Codex-shaped tools in confusing ways | PR C makes provider-owned tool registration and invocation more predictable |
text /elevated full | PR B gives GPT-5.5 and the user truthful runtime and permission hints |
| Replay or compaction failures could feel like the task silently disappeared | PR C surfaces paused, blocked, abandoned, and replay-invalid outcomes explicitly |
| “GPT-5.5 feels worse than Opus” was mostly anecdotal | PR D turns that into the same scenario pack, the same metrics, and a hard pass/fail gate |
mermaidflowchart TD A["User request"] --> B["Embedded Pi runtime"] B --> C["Strict-agentic execution contract"] B --> D["Provider-owned tool compatibility"] B --> E["Runtime truthfulness"] B --> F["Replay and liveness state"] C --> G["Tool call or explicit blocked state"] D --> G E --> G F --> G G --> H["QA-lab parity pack"] H --> I["Scenario report and parity gate"]
mermaidflowchart LR A["Merged runtime slices (PR A-C)"] --> B["Run GPT-5.5 parity pack"] A --> C["Run Opus 4.6 parity pack"] B --> D["qa-suite-summary.json"] C --> E["qa-suite-summary.json"] D --> F["openclaw qa parity-report"] E --> F F --> G["qa-agentic-parity-report.md"] F --> H["qa-agentic-parity-summary.json"] H --> I{"Gate pass?"} I -- "yes" --> J["Evidence-backed parity claim"] I -- "no" --> K["Keep runtime/review loop open"]
The first-wave parity pack currently covers five scenarios:
approval-turn-tool-followthroughChecks that the model does not stop at “I’ll do that” after a short approval. It should take the first concrete action in the same turn.
model-switch-tool-continuityChecks that tool-using work remains coherent across model/runtime switching boundaries instead of resetting into commentary or losing execution context.
source-docs-discovery-reportChecks that the model can read source and docs, synthesize findings, and continue the task agentically rather than producing a thin summary and stopping early.
image-understanding-attachmentChecks that mixed-mode tasks involving attachments remain actionable and do not collapse into vague narration.
compaction-retry-mutating-toolChecks that a task with a real mutating write keeps replay-unsafety explicit instead of quietly looking replay-safe if the run compacts, retries, or loses reply state under pressure.
| Scenario | What it tests | Good GPT-5.5 behavior | Failure signal |
|---|---|---|---|
text approval-turn-tool-followthrough | Short approval turns after a plan | Starts the first concrete tool action immediately instead of restating intent | plan-only follow-up, no tool activity, or blocked turn without a real blocker |
text model-switch-tool-continuity | Runtime/model switching under tool use | Preserves task context and continues acting coherently | resets into commentary, loses tool context, or stops after switch |
text source-docs-discovery-report | Source reading + synthesis + action | Finds sources, uses tools, and produces a useful report without stalling | thin summary, missing tool work, or incomplete-turn stop |
text image-understanding-attachment | Attachment-driven agentic work | Interprets the attachment, connects it to tools, and continues the task | vague narration, attachment ignored, or no concrete next action |
text compaction-retry-mutating-tool | Mutating work under compaction pressure | Performs a real write and keeps replay-unsafety explicit after the side effect | mutating write happens but replay safety is implied, missing, or contradictory |
GPT-5.5 can only be considered at parity or better when the merged runtime passes the parity pack and the runtime-truthfulness regressions at the same time.
Required outcomes:
/elevated fullFor the first-wave harness, the gate compares:
Parity evidence is intentionally split across two layers:
/elevated full| Completion gate item | Owning PR | Evidence source | Pass signal |
|---|---|---|---|
| GPT-5.5 no longer stalls after planning | PR A | text approval-turn-tool-followthrough | approval turns trigger real work or an explicit blocked state |
| GPT-5.5 no longer fakes progress or fake tool completion | PR A + PR D | parity report scenario outcomes and fake-success count | no suspicious pass results and no commentary-only completion |
| GPT-5.5 no longer gives false text /elevated full | PR B | deterministic truthfulness suites | blocked reasons and full-access hints stay runtime-accurate |
| Replay/liveness failures stay explicit | PR C + PR D | PR C lifecycle/replay suites plus text compaction-retry-mutating-tool | mutating work keeps replay-unsafety explicit instead of silently disappearing |
| GPT-5.5 matches or beats Opus 4.6 on the agreed metrics | PR D | text qa-agentic-parity-report.mdtext qa-agentic-parity-summary.json | same scenario coverage and no regression on completion, stop behavior, or valid tool use |
Use the verdict in
qa-agentic-parity-summary.jsonpassfail/elevated fullstrict-agenticUse
strict-agenticKeep the default contract when:
© 2024 TaskFlow Mirror
Powered by TaskFlow Sync Engine