.agents/skills/adk-architecture/references/architecture/observability.md
Each Context carries a _span field. Since Context forms a 1:1
parent-child tree with node executions (see Context),
span hierarchy follows naturally — no separate span management
needed.
Root Context._span (invocation) ← Runner sets this
└── ctx[workflow]._span ← NodeRunner creates
├── ctx[child_a]._span ← NodeRunner creates
│ ├── (call_llm span) ← auto-parented
│ └── (execute_tool span) ← auto-parented
├── ctx[child_b]._span ← NodeRunner creates
│ └── ctx[grandchild]._span ← nested
└── ctx[child_c]._span ← ctx.run_node()
Runner creates root_ctx and the invocation span, storing
it as root_ctx._span. This becomes the parent for all node spans.
NodeRunner creates each node's span, explicitly parented to
parent_ctx._span, stores it on child_ctx._span, and closes it
before returning (see NodeRunner for the
execution flow).
Always use ctx._span explicitly — never rely on OTel's
implicit "current span" context. In a concurrent asyncio.Task
runtime, implicit context can be unreliable across concurrent
nodes. All tracing operations (attributes, logs, child spans)
should go through ctx._span.
Span lifecycle:
NodeRunner.run() creates span via tracer.start_span(),
parented to parent_ctx._span, stored on ctx._spanctx._span explicitlyNodeRunner.run() calls ctx._span.end() before returningBatchSpanProcessor buffers ended spans, exports periodicallyOTLPSpanExporter sends batch to the OTLP endpointInterrupted nodes: Span ends immediately when NodeRunner
returns — not left open waiting for resume. Otherwise the span
would be invisible to the backend until resume (which could be
minutes, hours, or never). The resumed execution starts a fresh
span in a new Runner.run_async() call (same invocation_id,
different trace — possibly on a different server).
Context changes — add _span field:
class Context(ReadonlyContext):
_span: Span | None = None
NodeRunner.run():
NodeRunner.run() lifecycle:
parent_ctx._spanctx._spanctx._span during
execution (e.g., SingleAgentReactNode adds
gen_ai.agent.name, gen_ai.request_model)node.interrupted = TrueERROR, record exceptionctx._span.end()) — always, even on interruptKey points:
tracer.start_span() with explicit parent context from
parent_ctx._span — never rely on implicit OTel context in
concurrent async coderun() returns, even on interruptSet at span creation (available for sampling decisions):
| Attribute | Source | Example |
|---|---|---|
node.name | self._node.name | "call_llm" |
node.path | ctx.node_path | "wf/child_a" |
node.run_id | self._run_id | "child_a_abc123" |
node.type | type(self._node).__name__ | "CallLlmNode" |
Set after execution (result attributes):
| Attribute | Source | Example |
|---|---|---|
node.has_output | ctx.output is not None | true |
node.interrupted | bool(ctx.interrupt_ids) | false |
node.resumed | bool(resume_inputs) | false |
GenAI semantic conventions for node spans:
gen_ai.operation.name = "invoke_agent" for agent nodesgen_ai.operation.name = "execute_tool" for tool nodesgen_ai.agent.name, gen_ai.tool.name as appropriateINTERNAL (in-process orchestration)Use the OTel Logs API for point-in-time occurrences within a
node's span. Context provides emit_log() for better DX —
wraps set_span_in_context(self._span) internally so callers
don't manage OTel context:
# On Context:
def emit_log(self, body: str, **attributes):
span_ctx = set_span_in_context(self._span)
otel_logger.emit(
LogRecord(body=body, attributes=attributes),
context=span_ctx,
)
# Usage:
ctx.emit_log('node.event.yielded',
has_output=event.output is not None,
has_message=event.content is not None,
)
Use the google_adk logger namespace:
| Level | What to log |
|---|---|
DEBUG | Node started, node completed, event enqueued |
INFO | Node interrupted, node resumed, dynamic node scheduled |
WARNING | Node timeout, retry triggered |
ERROR | Node failed, unhandled exception |
logger = logging.getLogger("google_adk." + __name__)
logger.debug(
'Node %s started (run_id=%s, path=%s)',
node.name, run_id, ctx.node_path,
)
Use %-style formatting (lazy evaluation) for logging, not
f-strings.
| Metric | Type | Description |
|---|---|---|
node.execution.duration | Histogram | Per node type |
node.execution.count | Counter | Per node type and status |
node.interrupt.count | Counter | HITL interrupts |
node.resume.count | Counter | Resumed executions |
workflow.active_nodes | UpDownCounter | Currently executing |