.agents/skills/adk-architecture/references/architecture/checkpoint-resume.md
HITL (Human-in-the-Loop) follows this pattern:
long_running_tool_ids.
Each ancestor propagates the interrupt upward via ctx.interrupt_ids.ctx._interrupt_ids directly (no internal
event needed).FunctionResponse message. The Runner
scans session events to find the matching invocation_id, then
reconstructs node state from persisted events.Resumed nodes reuse the same run_id from the original
execution. From the node's perspective, the execution never paused
— events before and after the resume share the same run_id.
Fresh dispatches (first run, loop re-trigger) get a new run_id.
rerun_on_resumeA node with multiple interrupt IDs may receive partial FRs (only
some resolved). The behavior depends on rerun_on_resume:
rerun_on_resume=True (Workflow, orchestration nodes):
| FRs received | Status | Behavior |
|---|---|---|
| Partial | PENDING | Re-execute immediately with partial resume_inputs. Node handles remaining interrupts internally (e.g., Workflow dispatches resolved children, keeps unresolved as WAITING). |
| All | PENDING | Re-execute with all resume_inputs. |
This is critical for Workflow — when one child's FR arrives, it re-runs immediately to dispatch that resolved child. It doesn't wait for all children's FRs.
rerun_on_resume=False (leaf nodes, simple HITL):
| FRs received | Status | Behavior |
|---|---|---|
| Partial | WAITING | Stay waiting. Need all FRs. |
| All | COMPLETED | Auto-complete. Output = aggregated resolved_responses. No re-execution. |
A node can produce output AND interrupt in the same execution (e.g., a Workflow where child A completes with output and child B interrupts). On resume:
resume_inputs)prior_interrupt_idsprior_outputrunner = NodeRunner(
node=node, parent_ctx=ctx,
run_id=prior_run_id, # reuse
prior_output=cached_output,
prior_interrupt_ids={'fc-2'}, # still unresolved
)
child_ctx = await runner.run(
node_input=input,
resume_inputs={'fc-1': response},
)