Caricamento in corso...
Caricamento in corso...
Last synced: Today, 22:00
Technical reference for the OpenClaw framework. Real-time synchronization with the official documentation engine.
Use this file to discover all available pages before exploring further.
This note explains how to review the GPT-5.5 / Codex parity program as four merge units without losing the original six-contract architecture.
Owns:
executionContractupdate_planDoes not own:
Owns:
/elevated fullDoes not own:
Owns:
Does not own:
Owns:
Does not own:
| Original contract | Merge unit |
|---|---|
| Provider transport/auth correctness | PR B |
| Tool contract/schema compatibility | PR C |
| Same-turn execution | PR A |
| Permission truthfulness | PR B |
| Replay/continuation/liveness correctness | PR C |
| Benchmark/release gate | PR D |
PR D is the proof layer. It should not be the reason runtime-correctness PRs are delayed.
update_plan/elevated fullExpected artifacts from PR D:
qa-suite-report.mdqa-suite-summary.jsonqa-agentic-parity-report.mdqa-agentic-parity-summary.jsonDo not claim GPT-5.5 parity or superiority over Opus 4.6 until:
mermaidflowchart LR A["PR A-C merged"] --> B["Run GPT-5.5 parity pack"] A --> C["Run Opus 4.6 parity pack"] B --> D["qa-suite-summary.json"] C --> E["qa-suite-summary.json"] D --> F["qa parity-report"] E --> F F --> G["Markdown report + JSON verdict"] G --> H{"Pass?"} H -- "yes" --> I["Parity claim allowed"] H -- "no" --> J["Keep runtime fixes / review loop open"]
The parity harness is not the only evidence source. Keep this split explicit in review:
Use this when you are ready to land a parity PR and want a repeatable, low-risk sequence.
r:*pnpm check:changedpnpm test:changed/landprmainIf any one of the evidence bar items is missing, request changes instead of merging.
| Completion gate item | Primary owner | Review artifact |
|---|---|---|
| No plan-only stalls | PR A | strict-agentic runtime tests and text approval-turn-tool-followthrough |
| No fake progress or fake tool completion | PR A + PR D | parity fake-success count plus scenario-level report details |
| No false text /elevated full | PR B | deterministic runtime-truthfulness suites |
| Replay/liveness failures remain explicit | PR C + PR D | lifecycle/replay suites plus text compaction-retry-mutating-tool |
| GPT-5.5 matches or beats Opus 4.6 | PR D | text qa-agentic-parity-report.mdtext qa-agentic-parity-summary.json |
| User-visible problem before | Review signal after |
|---|---|
| GPT-5.5 stopped after planning | PR A shows act-or-block behavior instead of commentary-only completion |
| Tool use felt brittle with strict OpenAI/Codex schemas | PR C keeps tool registration and parameter-free invocation predictable |
text /elevated full | PR B ties guidance to actual runtime capability and blocked reasons |
| Long tasks could disappear into replay/compaction ambiguity | PR C emits explicit paused, blocked, abandoned, and replay-invalid state |
| Parity claims were anecdotal | PR D produces a report plus JSON verdict with the same scenario coverage on both models |
© 2024 TaskFlow Mirror
Powered by TaskFlow Sync Engine