docs/concepts/message-lifecycle-refactor.md
This page is the target design for replacing scattered channel turn, reply dispatch, preview streaming, and outbound delivery helpers with one durable message lifecycle.
The short version:
begin, render, preview or stream, final send,
commit, fail.The current channel stack grew from several valid local needs:
runtime.channel.turn.run.runtime.channel.turn.runPrepared.dispatchInboundReplyWithBase,
recordInboundSessionAndDispatchReply, reply payload helpers, reply chunking,
reply references, and outbound runtime helpers.That shape fixes local bugs, but it leaves OpenClaw with too many public concepts and too many places where delivery semantics can drift.
The reliability issue that exposed this is:
Telegram polling update acked
-> assistant final text exists
-> process restarts before sendMessage succeeds
-> final response is lost
The target invariant is broader than Telegram: once core decides a visible outbound message should exist, the intent must be durable before the platform send is attempted, and the platform receipt must be committed after success. That gives OpenClaw at-least-once recovery. Exactly-once behavior exists only for adapters that can prove native idempotency or reconcile an unknown-after-send attempt against platform state before replay.
That is the end state for this refactor, not a description of every current path. During migration, existing outbound helpers can still fall through to a direct send when best-effort queue writes fail. The refactor is complete only when durable final sends fail closed or explicitly opt out with a documented non-durable policy.
channel.turn callers during migration.runtime.channel.turn.* in the first phase.Vercel Chat has a good public mental model:
ChatThreadChannelMessagepostMessage, editMessage, deleteMessage,
stream, startTyping, and history fetchesOpenClaw should borrow the vocabulary, not copy the surface.
What OpenClaw needs beyond that model:
thread.post() style promises are not enough for OpenClaw. They hide the
transaction boundary that decides whether a send is recoverable.
The new domain should live under an internal core namespace such as
src/channels/message/*.
It has four concepts:
core.messages.receive(...)
core.messages.send(...)
core.messages.live(...)
core.messages.state(...)
receive owns inbound lifecycle.
send owns outbound lifecycle.
live owns preview, edit, progress, and stream state.
state owns durable intent storage, receipts, idempotency, recovery, locks, and
dedupe.
A normalized message is platform-neutral:
type ChannelMessage = {
id: string;
channel: string;
accountId?: string;
direction: "inbound" | "outbound";
target: MessageTarget;
sender?: MessageActor;
body?: MessageBody;
attachments?: MessageAttachment[];
relation?: MessageRelation;
origin?: MessageOrigin;
timestamp?: number;
raw?: unknown;
};
The target describes where the message lives:
type MessageTarget = {
kind: "direct" | "group" | "channel" | "thread";
id: string;
label?: string;
spaceId?: string;
parentId?: string;
threadId?: string;
nativeChannelId?: string;
};
Reply is a relation, not an API root:
type MessageRelation =
| {
kind: "reply";
inboundMessageId?: string;
replyToId?: string;
threadId?: string;
quote?: MessageQuote;
}
| {
kind: "followup";
sessionKey?: string;
previousMessageId?: string;
}
| {
kind: "broadcast";
reason?: string;
}
| {
kind: "system";
reason:
| "approval"
| "task"
| "hook"
| "cron"
| "subagent"
| "message_tool"
| "cli"
| "control_ui"
| "automation"
| "error";
};
This lets the same send path handle normal replies, cron notifications, approval prompts, task completions, message-tool sends, CLI or Control UI sends, subagent results, and automation sends.
Origin describes who produced a message and how OpenClaw should treat echoes of that message. It is separate from relation: a message can be a reply to a user and still be OpenClaw-originated operational output.
type MessageOrigin =
| {
source: "openclaw";
schemaVersion: 1;
kind: "gateway_failure";
code: "agent_failed_before_reply" | "missing_api_key" | "model_login_expired";
echoPolicy: "drop_bot_room_echo";
}
| {
source: "user" | "external_bot" | "platform" | "unknown";
};
Core owns the meaning of OpenClaw-originated output. Channels own how that origin is encoded into their transport.
The first required use is gateway failure output. Humans should still see
messages such as "Agent failed before reply" or "Missing API key", but tagged
OpenClaw operational output must not be accepted as bot-authored input in shared
rooms when allowBots is enabled.
Receipts are first-class:
type MessageReceipt = {
primaryPlatformMessageId?: string;
platformMessageIds: string[];
parts: MessageReceiptPart[];
threadId?: string;
replyToId?: string;
editToken?: string;
deleteToken?: string;
url?: string;
sentAt: number;
raw?: unknown;
};
type MessageReceiptPart = {
platformMessageId: string;
kind: "text" | "media" | "voice" | "card" | "preview" | "unknown";
index: number;
threadId?: string;
replyToId?: string;
editToken?: string;
deleteToken?: string;
url?: string;
raw?: unknown;
};
Receipts are the bridge from durable intent to future edit, delete, preview finalization, duplicate suppression, and recovery.
A receipt can describe one platform message or a multi-part delivery. Chunked text, media plus text, voice plus text, and card fallbacks must preserve all platform ids while still exposing a primary id for threading and later edits.
Receiving should not be a bare helper call. The core needs a context that knows dedupe, routing, session recording, and platform ack policy.
type MessageReceiveContext = {
id: string;
channel: string;
accountId?: string;
input: ChannelMessage;
ack: ReceiveAckController;
route: MessageRouteController;
session: MessageSessionController;
log: MessageLifecycleLogger;
dedupe(): Promise<ReceiveDedupeResult>;
resolve(): Promise<ResolvedInboundMessage>;
record(resolved: ResolvedInboundMessage): Promise<RecordResult>;
dispatch(recorded: RecordResult): Promise<DispatchResult>;
commit(result: DispatchResult): Promise<void>;
fail(error: unknown): Promise<void>;
};
Receive flow:
platform event
-> begin receive context
-> normalize
-> classify
-> dedupe and self-echo gate
-> route and authorize
-> record inbound session metadata
-> dispatch agent run
-> durable outbound sends happen through send context
-> commit receive
-> ack platform when policy allows
Ack is not one thing. The receive contract must keep these signals separate:
ReceiveAckPolicy controls transport or polling acknowledgement only. It must
not be reused for read receipts or status reactions.
Before bot authorization, receive must apply the shared OpenClaw echo policy when the channel can decode message origin metadata:
function shouldDropOpenClawEcho(params: {
origin?: MessageOrigin;
isBotAuthor: boolean;
isRoomish: boolean;
}): boolean {
return (
params.isBotAuthor &&
params.isRoomish &&
params.origin?.source === "openclaw" &&
params.origin.kind === "gateway_failure" &&
params.origin.echoPolicy === "drop_bot_room_echo"
);
}
This drop is tag-based, not text-based. A bot-authored room message with the
same visible gateway-failure text but without OpenClaw origin metadata still
goes through normal allowBots authorization.
Ack policy is explicit:
type ReceiveAckPolicy =
| { kind: "immediate"; reason: "webhook-timeout" | "platform-contract" }
| { kind: "after-record" }
| { kind: "after-durable-send" }
| { kind: "manual" };
Telegram polling now uses the receive-context ack policy for its persisted
restart watermark. The tracker still observes grammY updates as they enter the
middleware chain, but OpenClaw persists only the safe completed update id after
successful dispatch, leaving failed or lower pending updates replayable after a
restart. Telegram's upstream getUpdates fetch offset is still controlled by
the polling library, so the remaining deeper cut is a fully durable polling
source if we need platform-level redelivery beyond OpenClaw's restart
watermark. Webhook platforms may need immediate HTTP ack, but they still need
inbound dedupe and durable outbound send intents because webhooks can redeliver.
Sending is also context based:
type MessageSendContext = {
id: string;
channel: string;
accountId?: string;
message: ChannelMessage;
intent: DurableSendIntent;
attempt: number;
signal: AbortSignal;
previousReceipt?: MessageReceipt;
preview?: LiveMessageState;
log: MessageLifecycleLogger;
render(): Promise<RenderedMessageBatch>;
previewUpdate(rendered: RenderedMessageBatch): Promise<LiveMessageState>;
send(rendered: RenderedMessageBatch): Promise<MessageReceipt>;
edit(receipt: MessageReceipt, rendered: RenderedMessageBatch): Promise<MessageReceipt>;
delete(receipt: MessageReceipt): Promise<void>;
commit(receipt: MessageReceipt): Promise<void>;
fail(error: unknown): Promise<void>;
};
Preferred orchestration:
await core.messages.withSendContext(message, async (ctx) => {
const rendered = await ctx.render();
if (ctx.preview?.canFinalizeInPlace) {
return await ctx.edit(ctx.preview.receipt, rendered);
}
return await ctx.send(rendered);
});
The helper expands to:
begin durable intent
-> render
-> optional preview/edit/stream work
-> mark sending
-> final platform send or final edit
-> mark committing with raw receipt
-> commit receipt
-> ack durable intent
-> fail durable intent on classified failure
The intent must exist before transport I/O. A restart after begin but before commit is recoverable.
The dangerous boundary is after platform success and before receipt commit. If a
process dies there, OpenClaw cannot know whether the platform message exists
unless the adapter provides native idempotency or a receipt reconciliation path.
Those attempts must resume in unknown_after_send, not blindly replay. Channels
without reconciliation may choose at-least-once replay only if duplicate visible
messages are an acceptable, documented tradeoff for that channel and relation.
The current SDK reconciliation bridge requires the adapter to declare
reconcileUnknownSend, then asks durableFinal.reconcileUnknownSend to
classify an unknown entry as sent, not_sent, or unresolved; only not_sent
permits replay, and unresolved entries stay terminal or retry only the
reconciliation check.
Durability policy must be explicit:
type MessageDurabilityPolicy = "required" | "best_effort" | "disabled";
required means core must fail closed when it cannot write the durable intent.
best_effort can fall through when persistence is unavailable. disabled keeps
the old direct send behavior. During migration, legacy wrappers and public
compatibility helpers default to disabled; they must not infer required from
the fact that a channel has a generic outbound adapter.
Send contexts also own channel-local post-send effects. A migration is not safe if durable delivery bypasses local behavior that was previously attached to the channel's direct send path. Examples include self-echo suppression caches, thread participation markers, native edit anchors, model-signature rendering, and platform-specific duplicate guards. Those effects must either move into the send adapter, the render adapter, or a named send-context hook before that channel can enable durable generic final delivery.
Send helpers must return receipts all the way back to their caller. Durable
wrappers cannot swallow message ids or replace a channel delivery result with
undefined; buffered dispatchers use those ids for thread anchors, later edits,
preview finalization, and duplicate suppression.
Fallback sends operate on batches, not single payloads. Silent-reply rewrites, media fallback, card fallback, and chunk projection can all produce more than one deliverable message, so a send context must either deliver the whole projected batch or explicitly document why only one payload is valid.
type RenderedMessageBatch = {
units: RenderedMessageUnit[];
atomicity: "all_or_retry_remaining" | "best_effort_parts";
idempotencyKey: string;
};
type RenderedMessageUnit = {
index: number;
kind: "text" | "media" | "voice" | "card" | "preview" | "unknown";
payload: unknown;
required: boolean;
};
When such a fallback is durable, the whole projected batch must be represented by
one durable send intent or another atomic batch plan. Recording each payload
one-by-one is not enough: a crash between payloads can leave a partial visible
fallback with no durable record for the remaining payloads. Recovery must know
which units already have receipts and either replay only missing units or mark
the batch unknown_after_send until the adapter reconciles it.
Preview, edit, progress, and stream behavior should be one opt-in lifecycle.
type MessageLiveAdapter = {
begin?(ctx: MessageSendContext): Promise<LiveMessageState>;
update?(
ctx: MessageSendContext,
state: LiveMessageState,
update: LiveMessageUpdate,
): Promise<LiveMessageState>;
finalize?(
ctx: MessageSendContext,
state: LiveMessageState,
final: RenderedMessageBatch,
): Promise<MessageReceipt>;
cancel?(
ctx: MessageSendContext,
state: LiveMessageState,
reason: LiveCancelReason,
): Promise<void>;
};
Live state is durable enough to recover or suppress duplicates:
type LiveMessageState = {
mode: "partial" | "block" | "progress" | "native";
receipt?: MessageReceipt;
visibleSince?: number;
canFinalizeInPlace: boolean;
lastRenderedHash?: string;
staleAfterMs?: number;
};
This should cover current behavior:
The public SDK target should be one subpath:
import { defineChannelMessageAdapter } from "openclaw/plugin-sdk/channel-message";
Target shape:
type ChannelMessageAdapter = {
receive?: MessageReceiveAdapter;
send: MessageSendAdapter;
live?: MessageLiveAdapter;
origin?: MessageOriginAdapter;
render?: MessageRenderAdapter;
capabilities: MessageCapabilities;
};
Send adapter:
type MessageSendAdapter = {
send(ctx: MessageSendContext, rendered: RenderedMessageBatch): Promise<MessageReceipt>;
edit?(
ctx: MessageSendContext,
receipt: MessageReceipt,
rendered: RenderedMessageBatch,
): Promise<MessageReceipt>;
delete?(ctx: MessageSendContext, receipt: MessageReceipt): Promise<void>;
classifyError?(ctx: MessageSendContext, error: unknown): DeliveryFailureKind;
reconcileUnknownSend?(ctx: MessageSendContext): Promise<MessageReceipt | null>;
afterSendSuccess?(ctx: MessageSendContext, receipt: MessageReceipt): Promise<void>;
afterCommit?(ctx: MessageSendContext, receipt: MessageReceipt): Promise<void>;
};
Receive adapter:
type MessageReceiveAdapter<TRaw = unknown> = {
normalize(raw: TRaw, ctx: MessageNormalizeContext): Promise<ChannelMessage>;
classify?(message: ChannelMessage): Promise<MessageEventClass>;
preflight?(message: ChannelMessage, event: MessageEventClass): Promise<MessagePreflightResult>;
ackPolicy?(message: ChannelMessage, event: MessageEventClass): ReceiveAckPolicy;
};
Before preflight authorization, core must run the shared OpenClaw echo predicate
whenever origin.decode returns OpenClaw-origin metadata. The receive adapter
supplies platform facts such as bot author and room shape; core owns the drop
decision and ordering so channels do not reimplement text filters.
Origin adapter:
type MessageOriginAdapter<TRaw = unknown, TNative = unknown> = {
encode?(origin: MessageOrigin): TNative | undefined;
decode?(raw: TRaw): MessageOrigin | undefined;
};
Core sets MessageOrigin. Channels only translate it to and from native
transport metadata. Slack maps this to chat.postMessage({ metadata }) and
inbound message.metadata; Matrix can map it to extra event content; channels
without native metadata can use a receipt/outbound registry when that is the
best available approximation.
Capabilities:
type MessageCapabilities = {
text: { maxLength?: number; chunking?: boolean };
attachments?: {
upload: boolean;
remoteUrl: boolean;
voice?: boolean;
};
threads?: {
reply: boolean;
topic?: boolean;
nativeThread?: boolean;
};
live?: {
edit: boolean;
delete: boolean;
nativeStream?: boolean;
progress?: boolean;
};
delivery?: {
idempotencyKey?: boolean;
retryAfter?: boolean;
receiptRequired?: boolean;
};
};
The new public surface should absorb or deprecate these conceptual areas:
reply-runtimereply-dispatch-runtimereply-referencereply-chunkingreply-payloadinbound-reply-dispatchchannel-reply-pipelineoutbound-runtimeCompatibility subpaths can remain as wrappers, but new third-party plugins should not need them.
Bundled plugins may keep internal helper imports through reserved runtime
subpaths while migrating. Public docs should steer plugin authors to
plugin-sdk/channel-message once it exists.
runtime.channel.turn.* should stay during migration.
It should become a compatibility adapter:
channel.turn.run
-> messages.receive context
-> session dispatch
-> messages.send context for visible output
channel.turn.runPrepared should also remain initially:
channel-owned dispatcher
-> messages.receive record/finalize bridge
-> messages.live for preview/progress
-> messages.send for final delivery
After all bundled plugins and known third-party compatibility paths are bridged,
channel.turn can be deprecated. It should not be removed until there is a
published SDK migration path and contract tests proving old plugins still work
or fail with a clear version error.
During migration, generic durable delivery is opt-in for any channel whose existing delivery callback has side effects beyond "send this payload".
Legacy entry points are non-durable by default:
channel.turn.run and dispatchAssembledChannelTurn use the channel's
delivery callback unless that channel explicitly supplies an audited durable
policy/options object.channel.turn.runPrepared stays channel-owned until the prepared dispatcher
explicitly calls the send context.recordInboundSessionAndDispatchReply,
dispatchInboundReplyWithBase, and direct-DM helpers never inject generic
durable delivery before the caller-provided deliver or reply callback.For migration bridge types, durable: undefined means "not durable". The
durable path is enabled only by an explicit policy/options value. durable: false can remain as a compatibility spelling, but implementation should not
require every unmigrated channel to add it.
Current bridge code must keep the durability decision explicit:
handled_visible and
handled_no_send are terminal; unsupported and not_applicable may fall
back to channel-owned delivery; failed propagates the send failure.pendingFinalDelivery* session fields can carry the intent id during the
transition; the end state is a MessageSendIntent store instead of frozen
reply text plus ad hoc context fields.Do not enable the generic durable path for a channel until all of these are true:
Concrete migration hazards to preserve:
OriginatingTo or To and skip
that callback.allowBots authorization.
Channels must not implement this with visible-text prefix filters except as a
short emergency stopgap; the durable contract is structured origin metadata.The durable queue should store message send intents, not reply payloads.
type DurableSendIntent = {
id: string;
idempotencyKey: string;
channel: string;
accountId?: string;
message: ChannelMessage;
batch?: RenderedMessageBatch;
liveState?: LiveMessageState;
status:
| "pending"
| "sending"
| "committing"
| "unknown_after_send"
| "sent"
| "failed"
| "cancelled";
attempt: number;
nextAttemptAt?: number;
receipt?: MessageReceipt;
partialReceipt?: MessageReceipt;
failure?: DeliveryFailure;
createdAt: number;
updatedAt: number;
};
Recovery loop:
load pending or sending intents
-> acquire idempotency lock
-> skip if receipt already committed
-> reconstruct send context
-> render if needed
-> reconcile unknown_after_send if needed
-> call adapter send/edit/finalize
-> commit receipt, mark unknown_after_send, or schedule retry
The queue should keep enough identity to replay through the same account, thread, target, formatting policy, and media rules after restart.
Channel adapters classify transport failures into closed categories:
type DeliveryFailureKind =
| "transient"
| "rate_limit"
| "auth"
| "permission"
| "not_found"
| "invalid_payload"
| "conflict"
| "cancelled"
| "unknown";
Core policy:
transient and rate_limit.invalid_payload unless a render fallback exists.auth or permission until configuration changes.not_found, let live finalization fall back from edit to fresh send when
the channel declares that safe.conflict, use receipt/idempotency rules to decide whether the message
already exists.unknown_after_send unless the adapter can prove the platform
operation did not happen.| Channel | Target migration |
|---|---|
| Telegram | Receive ack policy plus durable final sends. Live adapter owns send plus edit preview, stale preview final send, topics, quote-reply preview skip, media fallback, and retry-after handling. |
| Discord | Send adapter wraps existing durable payload delivery. Live adapter owns draft edit, progress draft, media/error preview cancel, reply target preservation, and message id receipts. Audit bot-authored gateway-failure echoes in shared rooms; use an outbound registry or other native equivalent if Discord cannot carry origin metadata on normal messages. |
| Slack | Send adapter handles normal chat posts. Live adapter chooses native stream when thread shape supports it, otherwise draft preview. Receipts preserve thread timestamps. Origin adapter maps OpenClaw gateway failures to Slack chat.postMessage.metadata and drops tagged bot-room echoes before allowBots authorization. |
| Send adapter owns text/media send with durable final intents. Receive adapter handles group mention and sender identity. Live can stay absent until WhatsApp has an editable transport. | |
| Matrix | Live adapter owns draft event edits, finalization, redaction, encrypted media constraints, and reply-target mismatch fallback. Receive adapter owns encrypted event hydration and dedupe. Origin adapter should encode OpenClaw gateway-failure origin into Matrix event content and drop configured-bot room echoes before allowBots handling. |
| Mattermost | Live adapter owns one draft post, progress/tool folding, finalization in place, and fresh-send fallback. |
| Microsoft Teams | Live adapter owns native progress and block stream behavior. Send adapter owns activities and attachment/card receipts. |
| Feishu | Render adapter owns text/card/raw rendering. Live adapter owns streaming cards and duplicate final suppression. Send adapter owns comments, topic sessions, media, and voice suppression. |
| QQ Bot | Live adapter owns C2C streaming, accumulator timeout, and fallback final send. Render adapter owns media tags and text-as-voice. |
| Signal | Simple receive plus send adapter. No live adapter unless signal-cli adds reliable edit support. |
| iMessage | Simple receive plus send adapter. iMessage send must preserve monitor echo-cache population before durable finals can bypass monitor delivery. |
| Google Chat | Simple receive plus send adapter with thread relation mapped to spaces and thread ids. Audit allowBots=true room behavior for tagged OpenClaw gateway-failure echoes. |
| LINE | Simple receive plus send adapter with reply-token constraints modeled as target/relation capability. |
| Nextcloud Talk | SDK receive bridge plus send adapter. |
| IRC | Simple receive plus send adapter, no durable edit receipts. |
| Nostr | Receive plus send adapter for encrypted DMs; receipts are event ids. |
| QA Channel | Contract-test adapter for receive, send, live, retry, and recovery behavior. |
| Synology Chat | Simple receive plus send adapter. |
| Tlon | Send adapter must preserve model-signature rendering and participated-thread tracking before generic durable final delivery is enabled. |
| Twitch | Simple receive plus send adapter with rate-limit classification. |
| Zalo | Simple receive plus send adapter. |
| Zalo Personal | Simple receive plus send adapter. |
src/channels/message/* types for messages, targets, relations,
origins, receipts, capabilities, durable intents, receive context, send
context, live context, and failure classes.origin?: MessageOrigin to the migration bridge payload type used by
current reply delivery, then move that field to ChannelMessage and rendered
message types as the refactor replaces reply payloads.deliverOutboundPayloads call messages.send.channel.turn.run and dispatchAssembledChannelTurn on top of
messages.receive and messages.send.durable: false as a compatibility escape hatch for paths that finalize
native edits and cannot replay safely yet, but do not rely on false markers
to protect unmigrated channels.deliverDurableInboundReplyPayload with a send-context bridge.recordInboundSessionAndDispatchReply, direct-DM helpers, and similar
public compatibility helpers behavior-preserving. They may expose an explicit
send-context opt-in later, but must not automatically attempt generic durable
delivery before the caller-owned delivery callback.messages.live with two proof adapters:
openclaw/plugin-sdk/channel-message.MessageOrigin, origin encode/decode hooks, and the shared
shouldDropOpenClawEcho predicate in the channel-message SDK surface.Move all non-reply outbound producers onto messages.send:
This is where the model stops being "agent replies" and becomes "OpenClaw sends messages".
channel.turn as a wrapper for at least one compatibility window.Unit tests:
unknown_after_send recovery that reconciles before replay when an adapter
supports reconciliation.shouldDropOpenClawEcho predicate.Integration tests:
channel.turn.run simple adapter still records and sends.channel.turn.runPrepared bridge still records and finalizes.Channel tests:
allowBots, and untagged bot messages with the
same visible text still follow normal bot authorization.allowBots handling.allowBots modes before claiming generic protection there.Validation:
pnpm check:changed in Testbox for the full changed surface.pnpm check in Testbox before landing the complete refactor or after
public SDK/export changes.plugin-sdk/channel-message ships.defineChannelMessageAdapter.messages.send.messages.receive or a
documented compatibility wrapper.messages.live for draft state and
finalization.channel.turn is only a wrapper.