Use this file to discover all available pages before exploring further.

Token use and costs

Token use & costs

OpenClaw tracks tokens, not characters. Tokens are model-specific, but most OpenAI-style models average ~4 characters per token for English text.

How the system prompt is built

OpenClaw assembles its own system prompt on every run. It includes:

Tool list + short descriptions
Skills list (only metadata; instructions are loaded on demand with
text
read
). The compact skills block is bounded by
text
skills.limits.maxSkillsPromptChars
, with optional per-agent override at
text
agents.list[].skillsLimits.maxSkillsPromptChars
.
Self-update instructions
Workspace + bootstrap files (
text
AGENTS.md
,
text
SOUL.md
,
text
TOOLS.md
,
text
IDENTITY.md
,
text
USER.md
,
text
HEARTBEAT.md
,
text
BOOTSTRAP.md
when new, plus
text
MEMORY.md
when present). Lowercase root
text
memory.md
is not injected; it is legacy repair input for
text
openclaw doctor --fix
when paired with
text
MEMORY.md
. Large files are truncated by
text
agents.defaults.bootstrapMaxChars
(default: 12000), and total bootstrap injection is capped by
text
agents.defaults.bootstrapTotalMaxChars
(default: 60000).
text
memory/*.md
daily files are not part of the normal bootstrap prompt; they remain on-demand via memory tools on ordinary turns, but reset/startup model runs can prepend a one-shot startup-context block with recent daily memory for that first turn. Bare chat
text
/new
and
text
/reset
commands are acknowledged without invoking the model. The startup prelude is controlled by
text
agents.defaults.startupContext
.
Time (UTC + user timezone)
Reply tags + heartbeat behavior
Runtime metadata (host/OS/model/thinking)

See the full breakdown in System Prompt.

What counts in the context window

Everything the model receives counts toward the context limit:

System prompt (all sections listed above)
Conversation history (user + assistant messages)
Tool calls and tool results
Attachments/transcripts (images, audio, files)
Compaction summaries and pruning artifacts
Provider wrappers or safety headers (not visible, but still counted)

Some runtime-heavy surfaces have their own explicit caps:

text
agents.defaults.contextLimits.memoryGetMaxChars
text
agents.defaults.contextLimits.memoryGetDefaultLines
text
agents.defaults.contextLimits.toolResultMaxChars
text
agents.defaults.contextLimits.postCompactionMaxChars

Per-agent overrides live under

text

agents.list[].contextLimits

. These knobs are for bounded runtime excerpts and injected runtime-owned blocks. They are separate from bootstrap limits, startup-context limits, and skills prompt limits.

For images, OpenClaw downscales transcript/tool image payloads before provider calls. Use

text

agents.defaults.imageMaxDimensionPx

(default:

text

1200

) to tune this:

Lower values usually reduce vision-token usage and payload size.
Higher values preserve more visual detail for OCR/UI-heavy screenshots.

For a practical breakdown (per injected file, tools, skills, and system prompt size), use

text

/context list

text

/context detail

. See Context.

How to see current token usage

Use these in chat:

text
/status
→ emoji‑rich status card with the session model, context usage, last response input/output tokens, and estimated cost (API key only).
text
/usage off|tokens|full
→ appends a per-response usage footer to every reply.
- Persists per session (stored as
  text
  responseUsage
  ).
- OAuth auth hides cost (tokens only).
text
/usage cost
→ shows a local cost summary from OpenClaw session logs.

Other surfaces:

TUI/Web TUI:
text
/status
+
text
/usage
are supported.
CLI:
text
openclaw status --usage
and
text
openclaw channels list
show normalized provider quota windows (
text
X% left
, not per-response costs). Current usage-window providers: Anthropic, GitHub Copilot, Gemini CLI, OpenAI Codex, MiniMax, Xiaomi, and z.ai.

Usage surfaces normalize common provider-native field aliases before display. For OpenAI-family Responses traffic, that includes both

text

input_tokens

text

output_tokens

and

text

prompt_tokens

text

completion_tokens

, so transport-specific field names do not change

text

/status

text

/usage

, or session summaries. Gemini CLI JSON usage is normalized too: reply text comes from

text

response

, and

text

stats.cached

maps to

text

cacheRead

with

text

stats.input_tokens - stats.cached

used when the CLI omits an explicit

text

stats.input

field. For native OpenAI-family Responses traffic, WebSocket/SSE usage aliases are normalized the same way, and totals fall back to normalized input + output when

text

total_tokens

is missing or

text

0

. When the current session snapshot is sparse,

text

/status

and

text

session_status

can also recover token/cache counters and the active runtime model label from the most recent transcript usage log. Existing nonzero live values still take precedence over transcript fallback values, and larger prompt-oriented transcript totals can win when stored totals are missing or smaller. Usage auth for provider quota windows comes from provider-specific hooks when available; otherwise OpenClaw falls back to matching OAuth/API-key credentials from auth profiles, env, or config. Assistant transcript entries persist the same normalized usage shape, including

text

usage.cost

when the active model has pricing configured and the provider returns usage metadata. This gives

text

/usage cost

and transcript-backed session status a stable source even after the live runtime state is gone.

OpenClaw keeps provider usage accounting separate from the current context snapshot. Provider

text

usage.total

can include cached input, output, and multiple tool-loop model calls, so it is useful for cost and telemetry but can overstate the live context window. Context displays and diagnostics use the latest prompt snapshot (

text

promptTokens

, or the last model call when no prompt snapshot is available) for

text

context.used

Cost estimation (when shown)

Costs are estimated from your model pricing config:


text
models.providers.<provider>.models[].cost

These are USD per 1M tokens for

text

input

text

output

text

cacheRead

, and

text

cacheWrite

. If pricing is missing, OpenClaw shows tokens only. OAuth tokens never show dollar cost.

Gateway startup also performs an optional background pricing bootstrap for configured model refs that do not already have local pricing. That bootstrap fetches remote OpenRouter and LiteLLM pricing catalogs. Set

text

models.pricing.enabled: false

to skip those startup catalog fetches on offline or restricted networks; explicit

text

models.providers.*.models[].cost

entries continue to drive local cost estimates.

Cache TTL and pruning impact

Provider prompt caching only applies within the cache TTL window. OpenClaw can optionally run cache-ttl pruning: it prunes the session once the cache TTL has expired, then resets the cache window so subsequent requests can re-use the freshly cached context instead of re-caching the full history. This keeps cache write costs lower when a session goes idle past the TTL.

Configure it in Gateway configuration and see the behavior details in Session pruning.

Heartbeat can keep the cache warm across idle gaps. If your model cache TTL is

text

1h

, setting the heartbeat interval just under that (e.g.,

text

55m

) can avoid re-caching the full prompt, reducing cache write costs.

In multi-agent setups, you can keep one shared model config and tune cache behavior per agent with

text

agents.list[].params.cacheRetention

For a full knob-by-knob guide, see Prompt Caching.

For Anthropic API pricing, cache reads are significantly cheaper than input tokens, while cache writes are billed at a higher multiplier. See Anthropic’s prompt caching pricing for the latest rates and TTL multipliers: https://docs.anthropic.com/docs/build-with-claude/prompt-caching

Example: keep 1h cache warm with heartbeat


yaml
agents:
  defaults:
    model:
      primary: "anthropic/claude-opus-4-6"
    models:
      "anthropic/claude-opus-4-6":
        params:
          cacheRetention: "long"
    heartbeat:
      every: "55m"

Example: mixed traffic with per-agent cache strategy


yaml
agents:
  defaults:
    model:
      primary: "anthropic/claude-opus-4-6"
    models:
      "anthropic/claude-opus-4-6":
        params:
          cacheRetention: "long" # default baseline for most agents
  list:
    - id: "research"
      default: true
      heartbeat:
        every: "55m" # keep long cache warm for deep sessions
    - id: "alerts"
      params:
        cacheRetention: "none" # avoid cache writes for bursty notifications

text

agents.list[].params

merges on top of the selected model's

text

params

, so you can override only

text

cacheRetention

and inherit other model defaults unchanged.

Example: enable Anthropic 1M context beta header

Anthropic's 1M context window is currently beta-gated. OpenClaw can inject the required

text

anthropic-beta

value when you enable

text

context1m

on supported Opus or Sonnet models.


yaml
agents:
  defaults:
    models:
      "anthropic/claude-opus-4-6":
        params:
          context1m: true

This maps to Anthropic's

text

context-1m-2025-08-07

beta header.

This only applies when

text

context1m: true

is set on that model entry.

Requirement: the credential must be eligible for long-context usage. If not, Anthropic responds with a provider-side rate limit error for that request.

If you authenticate Anthropic with OAuth/subscription tokens (

text

sk-ant-oat-*

), OpenClaw skips the

text

context-1m-*

beta header because Anthropic currently rejects that combination with HTTP 401.

Tips for reducing token pressure

Use
text
/compact
to summarize long sessions.
Trim large tool outputs in your workflows.
Lower
text
agents.defaults.imageMaxDimensionPx
for screenshot-heavy sessions.
Keep skill descriptions short (skill list is injected into the prompt).
Prefer smaller models for verbose, exploratory work.

See Skills for the exact skill list overhead formula.

OpenClaw Docs

Token use and costs

Token use & costs

How the system prompt is built

What counts in the context window

How to see current token usage

Cost estimation (when shown)

Cache TTL and pruning impact

Example: keep 1h cache warm with heartbeat

Example: mixed traffic with per-agent cache strategy

Example: enable Anthropic 1M context beta header

Tips for reducing token pressure

Related