docs/plan/codex-context-engine-harness.md
Draft implementation specification.
Make the bundled Codex app-server harness honor the same OpenClaw context-engine lifecycle contract that embedded PI turns already honor.
A session using agents.defaults.embeddedHarness.runtime: "codex" or a
codex/* model should still let the selected context-engine plugin, such as
lossless-claw, control context assembly, post-turn ingest, maintenance, and
OpenClaw-level compaction policy as far as the Codex app-server boundary allows.
The embedded run loop resolves the configured context engine once per run before selecting a concrete low-level harness:
src/agents/pi-embedded-runner/run.ts
resolveContextEngine(params.config)contextEngine and contextTokenBudget into
runEmbeddedAttemptWithBackend(...)runEmbeddedAttemptWithBackend(...) delegates to the selected agent harness:
src/agents/pi-embedded-runner/run/backend.tssrc/agents/harness/selection.tsThe Codex app-server harness is registered by the bundled Codex plugin:
extensions/codex/index.tsextensions/codex/harness.tsThe Codex harness implementation receives the same EmbeddedRunAttemptParams
as PI-backed attempts:
extensions/codex/src/app-server/run-attempt.tsThat means the required hook point is in OpenClaw-controlled code. The external
boundary is the Codex app-server protocol itself: OpenClaw can control what it
sends to thread/start, thread/resume, and turn/start, and can observe
notifications, but it cannot change Codex's internal thread store or native
compactor.
Embedded PI attempts call the context-engine lifecycle directly:
Relevant PI code:
src/agents/pi-embedded-runner/run/attempt.tssrc/agents/pi-embedded-runner/run/attempt.context-engine-helpers.tssrc/agents/pi-embedded-runner/context-engine-maintenance.tsCodex app-server attempts currently run generic agent-harness hooks and mirror
the transcript, but do not call params.contextEngine.bootstrap,
params.contextEngine.assemble, params.contextEngine.afterTurn,
params.contextEngine.ingestBatch, params.contextEngine.ingest, or
params.contextEngine.maintain.
Relevant Codex code:
extensions/codex/src/app-server/run-attempt.tsextensions/codex/src/app-server/thread-lifecycle.tsextensions/codex/src/app-server/event-projector.tsextensions/codex/src/app-server/compact.tsFor Codex harness turns, OpenClaw should preserve this lifecycle:
systemPromptAddition.afterTurn if implemented, otherwise ingestBatch/ingest, using the
mirrored transcript snapshot.Codex owns its native thread and any internal extended history. OpenClaw should not try to mutate the app-server's internal history except through supported protocol calls.
OpenClaw's transcript mirror remains the source for OpenClaw features:
/new and /reset bookkeepingThe context-engine interface returns OpenClaw AgentMessage[], not a Codex
thread patch. Codex app-server turn/start accepts a current user input, while
thread/start and thread/resume accept developer instructions.
Therefore the implementation needs a projection layer. The safe first version should avoid pretending it can replace Codex internal history. It should inject assembled context as deterministic prompt/developer-instruction material around the current turn.
For engines like lossless-claw, the assembled context should be deterministic for unchanged inputs. Do not add timestamps, random ids, or nondeterministic ordering to generated context text.
Harness selection remains as-is:
runtime: "pi" forces PIruntime: "codex" selects the registered Codex harnessruntime: "auto" lets plugin harnesses claim supported providersauto runs use PIThis work changes what happens after the Codex harness is selected.
Today the reusable lifecycle helpers live under the PI runner:
src/agents/pi-embedded-runner/run/attempt.context-engine-helpers.tssrc/agents/pi-embedded-runner/run/attempt.prompt-helpers.tssrc/agents/pi-embedded-runner/context-engine-maintenance.tsCodex should not import from an implementation path whose name implies PI if we can avoid it.
Create a harness-neutral module, for example:
src/agents/harness/context-engine-lifecycle.tsMove or re-export:
runAttemptContextEngineBootstrapassembleAttemptContextEnginefinalizeAttemptContextEngineTurnbuildAfterTurnRuntimeContextbuildAfterTurnRuntimeContextFromUsagerunContextEngineMaintenanceKeep PI imports working either by re-exporting from the old files or updating PI call sites in the same PR.
The neutral helper names should not mention PI.
Suggested names:
bootstrapHarnessContextEngineassembleHarnessContextEnginefinalizeHarnessContextEngineTurnbuildHarnessContextEngineRuntimeContextrunHarnessContextEngineMaintenanceAdd a new module:
extensions/codex/src/app-server/context-engine-projection.tsResponsibilities:
AgentMessage[], original mirrored history, and current
prompt.Proposed API:
export type CodexContextProjection = {
developerInstructionAddition?: string;
promptText: string;
assembledMessages: AgentMessage[];
prePromptMessageCount: number;
};
export function projectContextEngineAssemblyForCodex(params: {
assembledMessages: AgentMessage[];
originalHistoryMessages: AgentMessage[];
prompt: string;
systemPromptAddition?: string;
}): CodexContextProjection;
Recommended first projection:
systemPromptAddition into developer instructions.promptText.Example prompt shape:
OpenClaw assembled context for this turn:
<conversation_context>
[user]
...
[assistant]
...
</conversation_context>
Current user request:
...
This is less elegant than native Codex history surgery, but it is implementable inside OpenClaw and preserves context-engine semantics.
Future improvement: if Codex app-server exposes a protocol for replacing or supplementing thread history, swap this projection layer to use that API.
In extensions/codex/src/app-server/run-attempt.ts:
fs.stat(params.sessionFile) before mirroring writes.SessionManager or use a narrow session manager adapter if the helper
requires it.params.contextEngine exists.Pseudo-flow:
const hadSessionFile = await fileExists(params.sessionFile);
const sessionManager = SessionManager.open(params.sessionFile);
const historyMessages = sessionManager.buildSessionContext().messages;
await bootstrapHarnessContextEngine({
hadSessionFile,
contextEngine: params.contextEngine,
sessionId: params.sessionId,
sessionKey: sandboxSessionKey,
sessionFile: params.sessionFile,
sessionManager,
runtimeContext: buildHarnessContextEngineRuntimeContext(...),
runMaintenance: runHarnessContextEngineMaintenance,
warn,
});
Use the same sessionKey convention as the Codex tool bridge and transcript
mirror. Today Codex computes sandboxSessionKey from params.sessionKey or
params.sessionId; use that consistently unless there is a reason to preserve
raw params.sessionKey.
thread/start / thread/resume and turn/startIn runCodexAppServerAttempt:
assemble(...) when params.contextEngine exists.turn/startThe existing hook call:
resolveAgentHarnessBeforePromptBuildResult({
prompt: params.prompt,
developerInstructions: buildDeveloperInstructions(params),
messages: historyMessages,
ctx: hookContext,
});
should become context-aware:
buildDeveloperInstructions(params)before_prompt_build with the projected prompt/developer instructionsThis order lets generic prompt hooks see the same prompt Codex will receive. If
we need strict PI parity, run context-engine assembly before hook composition,
because PI applies context-engine systemPromptAddition to the final system
prompt after its prompt pipeline. The important invariant is that both context
engine and hooks get a deterministic, documented order.
Recommended order for first implementation:
buildDeveloperInstructions(params)assemble()systemPromptAddition to developer instructionsresolveAgentHarnessBeforePromptBuildResult(...)startOrResumeThread(...)buildTurnStartParams(...)The spec should be encoded in tests so future changes do not reorder it by accident.
The projection helper must produce byte-stable output for identical inputs:
Use fixed delimiters and explicit sections.
Codex's CodexAppServerEventProjector builds a local messagesSnapshot for the
current turn. mirrorTranscriptBestEffort(...) writes that snapshot into the
OpenClaw transcript mirror.
After mirroring succeeds or fails, call the context-engine finalizer with the best available message snapshot:
afterTurn
expects the session snapshot, not only the current turn.historyMessages + result.messagesSnapshot if the session file
cannot be reopened.Pseudo-flow:
const prePromptMessageCount = historyMessages.length;
await mirrorTranscriptBestEffort(...);
const finalMessages = readMirroredSessionHistoryMessages(params.sessionFile)
?? [...historyMessages, ...result.messagesSnapshot];
await finalizeHarnessContextEngineTurn({
contextEngine: params.contextEngine,
promptError: Boolean(finalPromptError),
aborted: finalAborted,
yieldAborted,
sessionIdUsed: params.sessionId,
sessionKey: sandboxSessionKey,
sessionFile: params.sessionFile,
messagesSnapshot: finalMessages,
prePromptMessageCount,
tokenBudget: params.contextTokenBudget,
runtimeContext: buildHarnessContextEngineRuntimeContextFromUsage({
attempt: params,
workspaceDir: effectiveWorkspace,
agentDir,
tokenBudget: params.contextTokenBudget,
lastCallUsage: result.attemptUsage,
promptCache: result.promptCache,
}),
runMaintenance: runHarnessContextEngineMaintenance,
sessionManager,
warn,
});
If mirroring fails, still call afterTurn with the fallback snapshot, but log
that the context engine is ingesting from fallback turn data.
Codex results include normalized usage from app-server token notifications when available. Pass that usage into the context-engine runtime context.
If Codex app-server eventually exposes cache read/write details, map them into
ContextEnginePromptCacheInfo. Until then, omit promptCache rather than
inventing zeros.
There are two compaction systems:
compact()thread/compact/startDo not silently conflate them.
/compact and explicit OpenClaw compactionWhen the selected context engine has info.ownsCompaction === true, explicit
OpenClaw compaction should prefer the context engine's compact() result for
the OpenClaw transcript mirror and plugin state.
When the selected Codex harness has a native thread binding, we may additionally request Codex native compaction to keep the app-server thread healthy, but this must be reported as a separate backend action in details.
Recommended behavior:
contextEngine.info.ownsCompaction === true:
compact() firstdetails.codexNativeCompactionThis likely requires changing extensions/codex/src/app-server/compact.ts or
wrapping it from the generic compaction path, depending on where
maybeCompactAgentHarnessSession(...) is invoked.
Codex may emit contextCompaction item events during a turn. Keep the current
before/after compaction hook emission in event-projector.ts, but do not treat
that as a completed context-engine compaction.
For engines that own compaction, emit an explicit diagnostic when Codex performs native compaction anyway:
compaction stream is acceptable{ backend: "codex-app-server", ownsCompaction: true }This makes the split auditable.
The existing Codex harness reset(...) clears the Codex app-server binding from
the OpenClaw session file. Preserve that behavior.
Also ensure context-engine state cleanup continues to happen through existing OpenClaw session lifecycle paths. Do not add Codex-specific cleanup unless the context-engine lifecycle currently misses reset/delete events for all harnesses.
Follow PI semantics:
Codex-specific additions:
Add tests under extensions/codex/src/app-server:
run-attempt.context-engine.test.ts
bootstrap when a session file exists.assemble with mirrored messages, token budget, tool names,
citations mode, model id, and prompt.systemPromptAddition is included in developer instructions.afterTurn after transcript mirroring.afterTurn, Codex calls ingestBatch or per-message ingest.context-engine-projection.test.ts
compact.context-engine.test.ts
extensions/codex/src/app-server/run-attempt.test.ts if present, otherwise
nearest Codex app-server run tests.extensions/codex/src/app-server/event-projector.test.ts only if compaction
event details change.src/agents/harness/selection.test.ts should not need changes unless config
behavior changes; it should remain stable.Add or extend live Codex harness smoke tests:
plugins.slots.contextEngine to a test engineagents.defaults.model to a codex/* modelagents.defaults.embeddedHarness.runtime = "codex"Avoid requiring lossless-claw in OpenClaw core tests. Use a small in-repo fake context engine plugin.
Add debug logs around Codex context-engine lifecycle calls:
codex context engine bootstrap started/completed/failedcodex context engine assemble appliedcodex context engine finalize completed/failedcodex context engine maintenance skipped with reasoncodex native compaction completed alongside context-engine compactionAvoid logging full prompts or transcript contents.
Add structured fields where useful:
sessionIdsessionKey redacted or omitted according to existing logging practiceengineIdthreadIdturnIdassembledMessageCountestimatedTokenshasSystemPromptAdditionThis should be backward-compatible:
assemble fails, Codex should continue with the original
prompt path.Should assembled context be injected entirely into the user prompt, entirely into developer instructions, or split?
Recommendation: split. Put systemPromptAddition in developer instructions;
put assembled transcript context in the user prompt wrapper. This best matches
the current Codex protocol without mutating native thread history.
Should Codex native compaction be disabled when a context engine owns compaction?
Recommendation: no, not initially. Codex native compaction may still be necessary to keep the app-server thread alive. But it must be reported as native Codex compaction, not as context-engine compaction.
Should before_prompt_build run before or after context-engine assembly?
Recommendation: after context-engine projection for Codex, so generic harness hooks see the actual prompt/developer instructions Codex will receive. If PI parity requires the opposite, encode the chosen order in tests and document it here.
Can Codex app-server accept a future structured context/history override?
Unknown. If it can, replace the text projection layer with that protocol and keep the lifecycle calls unchanged.
codex/* embedded harness turn invokes the selected context engine's
assemble lifecycle.systemPromptAddition affects Codex developer instructions.afterTurn or ingest fallback.