docs/reference/token-use.md
OpenClaw tracks tokens, not characters. Tokens are model-specific, but most OpenAI-style models average ~4 characters per token for English text.
OpenClaw assembles its own system prompt on every run. It includes:
read).
The compact skills block is bounded by skills.limits.maxSkillsPromptChars,
with optional per-agent override at
agents.list[].skillsLimits.maxSkillsPromptChars.AGENTS.md, SOUL.md, TOOLS.md, IDENTITY.md, USER.md, HEARTBEAT.md, BOOTSTRAP.md when new, plus MEMORY.md when present). Lowercase root memory.md is not injected; it is legacy repair input for openclaw doctor --fix when paired with MEMORY.md. Large files are truncated by agents.defaults.bootstrapMaxChars (default: 12000), and total bootstrap injection is capped by agents.defaults.bootstrapTotalMaxChars (default: 60000). memory/*.md daily files are not part of the normal bootstrap prompt; they remain on-demand via memory tools on ordinary turns, but reset/startup model runs can prepend a one-shot startup-context block with recent daily memory for that first turn. Bare chat /new and /reset commands are acknowledged without invoking the model. The startup prelude is controlled by agents.defaults.startupContext.See the full breakdown in System Prompt.
Everything the model receives counts toward the context limit:
Some runtime-heavy surfaces have their own explicit caps:
agents.defaults.contextLimits.memoryGetMaxCharsagents.defaults.contextLimits.memoryGetDefaultLinesagents.defaults.contextLimits.toolResultMaxCharsagents.defaults.contextLimits.postCompactionMaxCharsPer-agent overrides live under agents.list[].contextLimits. These knobs are
for bounded runtime excerpts and injected runtime-owned blocks. They are
separate from bootstrap limits, startup-context limits, and skills prompt
limits.
For images, OpenClaw downscales transcript/tool image payloads before provider calls.
Use agents.defaults.imageMaxDimensionPx (default: 1200) to tune this:
For a practical breakdown (per injected file, tools, skills, and system prompt size), use /context list or /context detail. See Context.
Use these in chat:
/status → emoji‑rich status card with the session model, context usage,
last response input/output tokens, and estimated cost (API key only)./usage off|tokens|full → appends a per-response usage footer to every reply.
responseUsage)./usage cost → shows a local cost summary from OpenClaw session logs.Other surfaces:
/status + /usage are supported.openclaw status --usage and openclaw channels list show
normalized provider quota windows (X% left, not per-response costs).
Current usage-window providers: Anthropic, GitHub Copilot, Gemini CLI,
OpenAI Codex, MiniMax, Xiaomi, and z.ai.Usage surfaces normalize common provider-native field aliases before display.
For OpenAI-family Responses traffic, that includes both input_tokens /
output_tokens and prompt_tokens / completion_tokens, so transport-specific
field names do not change /status, /usage, or session summaries.
Gemini CLI JSON usage is normalized too: reply text comes from response, and
stats.cached maps to cacheRead with stats.input_tokens - stats.cached
used when the CLI omits an explicit stats.input field.
For native OpenAI-family Responses traffic, WebSocket/SSE usage aliases are
normalized the same way, and totals fall back to normalized input + output when
total_tokens is missing or 0.
When the current session snapshot is sparse, /status and session_status can
also recover token/cache counters and the active runtime model label from the
most recent transcript usage log. Existing nonzero live values still take
precedence over transcript fallback values, and larger prompt-oriented
transcript totals can win when stored totals are missing or smaller.
Usage auth for provider quota windows comes from provider-specific hooks when
available; otherwise OpenClaw falls back to matching OAuth/API-key credentials
from auth profiles, env, or config.
Assistant transcript entries persist the same normalized usage shape, including
usage.cost when the active model has pricing configured and the provider
returns usage metadata. This gives /usage cost and transcript-backed session
status a stable source even after the live runtime state is gone.
OpenClaw keeps provider usage accounting separate from the current context
snapshot. Provider usage.total can include cached input, output, and multiple
tool-loop model calls, so it is useful for cost and telemetry but can overstate
the live context window. Context displays and diagnostics use the latest prompt
snapshot (promptTokens, or the last model call when no prompt snapshot is
available) for context.used.
Costs are estimated from your model pricing config:
models.providers.<provider>.models[].cost
These are USD per 1M tokens for input, output, cacheRead, and
cacheWrite. If pricing is missing, OpenClaw shows tokens only. OAuth tokens
never show dollar cost.
After sidecars and channels reach the Gateway ready path, OpenClaw starts an
optional background pricing bootstrap for configured model refs that do not
already have local pricing. That bootstrap fetches remote OpenRouter and LiteLLM
pricing catalogs. Set models.pricing.enabled: false to skip those catalog
fetches on offline or restricted networks; explicit
models.providers.*.models[].cost entries continue to drive local cost
estimates.
Provider prompt caching only applies within the cache TTL window. OpenClaw can optionally run cache-ttl pruning: it prunes the session once the cache TTL has expired, then resets the cache window so subsequent requests can re-use the freshly cached context instead of re-caching the full history. This keeps cache write costs lower when a session goes idle past the TTL.
Configure it in Gateway configuration and see the behavior details in Session pruning.
Heartbeat can keep the cache warm across idle gaps. If your model cache TTL
is 1h, setting the heartbeat interval just under that (e.g., 55m) can avoid
re-caching the full prompt, reducing cache write costs.
In multi-agent setups, you can keep one shared model config and tune cache behavior
per agent with agents.list[].params.cacheRetention.
For a full knob-by-knob guide, see Prompt Caching.
For Anthropic API pricing, cache reads are significantly cheaper than input tokens, while cache writes are billed at a higher multiplier. See Anthropic’s prompt caching pricing for the latest rates and TTL multipliers: https://docs.anthropic.com/docs/build-with-claude/prompt-caching
agents:
defaults:
model:
primary: "anthropic/claude-opus-4-6"
models:
"anthropic/claude-opus-4-6":
params:
cacheRetention: "long"
heartbeat:
every: "55m"
agents:
defaults:
model:
primary: "anthropic/claude-opus-4-6"
models:
"anthropic/claude-opus-4-6":
params:
cacheRetention: "long" # default baseline for most agents
list:
- id: "research"
default: true
heartbeat:
every: "55m" # keep long cache warm for deep sessions
- id: "alerts"
params:
cacheRetention: "none" # avoid cache writes for bursty notifications
agents.list[].params merges on top of the selected model's params, so you can
override only cacheRetention and inherit other model defaults unchanged.
Anthropic's 1M context window is currently beta-gated. OpenClaw can inject the
required anthropic-beta value when you enable context1m on supported Opus
or Sonnet models.
agents:
defaults:
models:
"anthropic/claude-opus-4-6":
params:
context1m: true
This maps to Anthropic's context-1m-2025-08-07 beta header.
This only applies when context1m: true is set on that model entry.
Requirement: the credential must be eligible for long-context usage. If not, Anthropic responds with a provider-side rate limit error for that request.
If you authenticate Anthropic with OAuth/subscription tokens (sk-ant-oat-*),
OpenClaw skips the context-1m-* beta header because Anthropic currently
rejects that combination with HTTP 401.
/compact to summarize long sessions.agents.defaults.imageMaxDimensionPx for screenshot-heavy sessions.See Skills for the exact skill list overhead formula.