design-docs/agent-tool-calling-flow.md
This document explains exactly what happens after prompt messages are built and a variant starts running in the agent.
Per variant, Agent(...).run(model, prompt_messages) is called from:
backend/routes/generate_code.py (AgenticGenerationStage._run_variant)Agent is a thin wrapper over AgentEngine:
backend/agent/runner.pybackend/agent/engine.pyThe main loop lives in:
backend/agent/engine.py -> AgentEngine._run_with_session(...)Loop behavior:
started_tool_idsstreamed_lengthsturn = await session.stream_turn(on_event)on_event handles streamed deltas:
assistant_delta -> websocket assistantthinking_delta -> websocket thinkingtool_call_delta -> _handle_streamed_tool_delta(...)turn.tool_calls is empty: finalize and return.session.append_tool_results(turn, executed_tool_calls)Tool runtime:
backend/agent/tools/runtime.py -> AgentToolRuntime.execute(...)Tool definitions:
backend/agent/tools/definitions.py -> canonical_tool_definitions(...)Supported tools:
create_fileedit_filegenerate_imagesremove_backgroundretrieve_optionExecution lifecycle per tool call:
toolStart (unless already emitted from streamed args).create_file, stream preview code chunks while args are still arriving.updated_content, emit setCode.toolResult with { name, output, ok }.create_file previewThe engine parses partial tool arguments from provider deltas using:
backend/agent/tools/parsing.py:
extract_content_from_args(...)extract_path_from_args(...)Then _handle_streamed_tool_delta(...) in engine.py:
toolStart for create_filesetCode updates as content growsThis allows frontend preview before actual tool execution completes.
Provider contract:
backend/agent/providers/base.py
ProviderSessionProviderTurnEach provider returns a ProviderTurn with:
assistant_texttool_callsassistant_turn (provider-native turn object needed for continuation)After tool execution, each provider appends tool results differently.
backend/agent/providers/openai.py -> OpenAIProviderSession.append_tool_results(...)Behavior:
turn.assistant_turn) to request history.function_call_output per tool result:{"type":"function_call_output","call_id":...,"output": json_string}Next responses.create(...) turn uses this updated item list.
backend/agent/providers/anthropic.py -> AnthropicProviderSession.append_tool_results(...)Behavior:
id, name, input)tool_use_id, serialized result content, is_errorNext messages.stream(...) turn continues from these blocks.
backend/agent/providers/gemini.py -> GeminiProviderSession.append_tool_results(...)Behavior:
turn.assistant_turn).role="tool" content with Part.from_function_response(...) per tool.This preserves the model part structure required for reliable continuation (including thought-signature-sensitive flows).
Frontend websocket message types emitted during generation:
assistantthinkingtoolStarttoolResultsetCodeWhere they come from:
StreamEvent deltas during stream_turn(...).send_message(...).Typical per-turn stream sequence:
toolStartsetCode previews (for create_file, optional)toolResultFinalization:
backend/agent/engine.pybackend/agent/runner.pybackend/agent/providers/factory.pybackend/agent/providers/base.pybackend/agent/providers/openai.pybackend/agent/providers/anthropic.pybackend/agent/providers/gemini.pybackend/agent/tools/definitions.pybackend/agent/tools/runtime.pybackend/agent/tools/parsing.pybackend/agent/tools/summaries.py