docs/reference/transcript-hygiene.md
OpenClaw applies provider-specific fixes to transcripts before a run (building model context). Most of these are in-memory adjustments used to satisfy strict provider requirements. A separate session-file repair pass may also rewrite stored JSONL before the session is loaded, but only for malformed lines or persisted turns that are invalid durable records. Delivered assistant replies are preserved on disk; provider-specific assistant-prefill stripping happens only while constructing outbound payloads. When a repair occurs, the original file is backed up alongside the session file.
Scope includes:
If you need transcript storage details, see:
Runtime/system context can be added to the model prompt for a turn, but it is not end-user-authored content. OpenClaw keeps a separate transcript-facing prompt body for Gateway replies, queued followups, ACP, CLI, and embedded Pi runs. Stored visible user turns use that transcript body instead of the runtime-enriched prompt.
For legacy sessions that already persisted runtime wrappers, Gateway history surfaces apply a display projection before returning messages to WebChat, TUI, REST, or SSE clients.
All transcript hygiene is centralized in the embedded runner:
src/agents/transcript-policy.tssanitizeSessionHistory in src/agents/pi-embedded-runner/replay-history.tsThe policy uses provider, modelApi, and modelId to decide what to apply.
Separate from transcript hygiene, session files are repaired (if needed) before load:
repairSessionFileIfNeeded in src/agents/session-file-repair.tsrun/attempt.ts and compact.ts (embedded runner)Image payloads are always sanitized to prevent provider-side rejection due to size limits (downscale/recompress oversized base64 images).
This also helps control image-driven token pressure for vision-capable models. Lower max dimensions generally reduce token usage; higher dimensions preserve detail.
Implementation:
sanitizeSessionMessagesImages in src/agents/pi-embedded-helpers/images.tssanitizeContentBlocksImages in src/agents/tool-images.tsagents.defaults.imageMaxDimensionPx (default: 1200).Assistant tool-call blocks that are missing both input and arguments are dropped
before model context is built. This prevents provider rejections from partially
persisted tool calls (for example, after a rate limit failure).
Implementation:
sanitizeToolCallInputs in src/agents/session-transcript-repair.tssanitizeSessionHistory in src/agents/pi-embedded-runner/replay-history.tsWhen an agent sends a prompt into another session via sessions_send (including
agent-to-agent reply/announce steps), OpenClaw persists the created user turn with:
message.provenance.kind = "inter_session"OpenClaw also prepends a same-turn [Inter-session message ... isUser=false]
marker before the routed prompt text so the active model call can distinguish
foreign session output from external end-user instructions. This marker includes
the source session, channel, and tool when available. The transcript still uses
role: "user" for provider compatibility, but the visible text and provenance
metadata both mark the turn as inter-session data.
During context rebuild, OpenClaw applies the same marker to older persisted inter-session user turns that only have provenance metadata.
OpenAI / OpenAI Codex
rs_* state paired with assistant output items.prompt_cache_key.aborted outputs for missing tool calls.aborted to match Codex replay normalization.OpenAI-compatible Gemma 4
Google (Generative AI / Gemini CLI / Antigravity)
Anthropic / Minimax (Anthropic-compatible)
Amazon Bedrock (Converse API)
content: [], so
persisted assistant turns with stopReason: "error" and empty content are also
repaired on disk before load.Mistral (including model-id based detection)
OpenRouter Gemini
thought_signature values (keep base64).OpenRouter Anthropic
Everything else
Before the 2026.1.22 release, OpenClaw applied multiple layers of transcript hygiene:
_/-).<final> tags from assistant text before persistence.This complexity caused cross-provider regressions (notably openai-responses
call_id|fc_id pairing). The 2026.1.22 cleanup removed the extension, centralized
logic in the runner, and made OpenAI no-touch beyond image sanitization.