skills/skills/langbot-testing/references/local-agent-runner-coverage.md
Use this matrix when judging whether the external langbot/local-agent plugin still behaves like the old built-in local-agent runner.
The QA target is end-to-end behavior. UI cases prove the host, SDK, plugin runtime, and WebUI work together. Unit or component tests are still needed for negative branches that are hard to trigger reliably through a live provider.
LangBot/src/langbot/pkg/agent/runner/context_builder.py builds the Protocol v1 context from the event envelope: ctx.input.text, ctx.input.contents, attachments, state, resources, and runtime metadata.LangBot/src/langbot/pkg/agent/runner/pipeline_adapter.py adapts Pipeline-only fields into ctx.adapter.extra.prompt, ctx.adapter.extra.params, and optional ctx.bootstrap.messages.LangBot/src/langbot/pkg/agent/runner/resource_builder.py authorizes models, fallback models, rerank models, tools, and knowledge bases for the current run.LangBot/src/langbot/pkg/plugin/handler.py validates run-scoped model/tool/rerank access and calls the host model provider or tool manager with the current query.langbot-local-agent/components/agent_runner/default.py selects streaming or non-streaming execution, retrieves RAG context, builds messages, invokes models with fallback, and runs tool loops.langbot-local-agent/pkg/messages.py prefers the host effective prompt from ctx.adapter.extra.prompt, uses ctx.bootstrap.messages only as a small bootstrap window, and preserves structured/multimodal input while inserting RAG context.TODO: Treat ctx.adapter.extra.prompt as a temporary Pipeline bridge for old
local-agent behavior parity. It is not the final answer for how user plugins or
host hooks should influence agent behavior after Pipeline is replaced.
These browser cases are the minimum gate for a local-agent migration check:
| Case | Path Covered | Expected Behavior |
|---|---|---|
local-agent-basic-debug-chat | Streaming LLM invocation with effective host context | Bot returns deterministic OK; backend logs streaming completion. |
local-agent-effective-prompt-debug-chat | PromptPreProcessing and host effective prompt handoff through ctx.adapter.extra.prompt | Bot returns PROMPT_PREPROCESS_OK from the fixture prompt probe. |
local-agent-context-compaction-debug-chat | Runner-owned context budgeting and old-history compaction | Automation temporarily shrinks the runner context window, sends multi-turn Debug Chat history, and the bot still recovers the older sentinel. |
local-agent-rag-debug-chat | Knowledge-base authorization, retrieval, and RAG prompt insertion | Bot returns the KB sentinel, not a generic answer. |
mcp-stdio-tool-call | MCP stdio discovery, tool detail, model function calling, and tool execution | Bot returns qa_mcp_echo:<input> and backend logs the MCP tool call. |
local-agent-plugin-tool-call-debug-chat | Plugin tool discovery, tool detail, model function calling, and tool execution | Bot returns qa-plugin-smoke:<input> and backend logs the plugin tool call. |
local-agent-steering-debug-chat | Host steering claim, runner pull at turn boundary, and follow-up injection during an active tool loop | Two user messages produce one assistant response containing the steering sentinel. |
local-agent-multimodal-debug-chat | Image upload, structured input contents, and multimodal runner consumption | UI shows uploaded image and bot returns IMAGE_OK; backend receives an image input. |
local-agent-rag-multimodal-debug-chat | RAG insertion while structured image input is present | UI shows uploaded image, bot returns the KB sentinel, and backend logs the same request with [Image]. |
local-agent-nonstreaming-debug-chat | Host non-streaming adapter path and runner non-streaming invocation | Bot returns NONSTREAM_OK; backend completes without the streaming-completed path. |
| Area | How To Cover | Pass Signal |
|---|---|---|
| Effective prompt | Use the qa-plugin-smoke prompt probe and send qa-effective-prompt. | The answer follows query.prompt.messages and returns PROMPT_PREPROCESS_OK; plugin-local fallback config prompt is not used when host prompt exists. |
| Current text input | Send a deterministic text-only Debug Chat prompt. | ctx.input.text becomes the user text and the bot answers the text request. |
| Structured input contents | Upload an image with text in Debug Chat. | User message shows the image; backend log or request payload contains image content; model can acknowledge it. |
| Multimodal plus RAG | Run local-agent-rag-multimodal-debug-chat. | RAG sentinel is still retrievable and the image is not dropped from the user message; exact image-preservation inside the model message is covered by unit tests. |
| History and context compaction | Run local-agent-context-compaction-debug-chat with a small temporary context-window-tokens budget. | The runner compacts older history into <conversation_summary> and the final answer still recovers the older sentinel from the compacted context. |
| Streaming model invocation | Enable Debug Chat streaming and ask for OK. | UI receives incremental bot output and backend logs streaming completion. |
| Non-streaming model invocation | Disable Debug Chat streaming or use a non-streaming adapter path. | UI receives a final bot message and backend logs a normal response completion. |
| Model fallback before first chunk | Configure a failing primary and working fallback, preferably with a controlled test provider. | First model failure does not fail the run; fallback model produces the final answer. |
| Failure after streaming commit | Use a controlled provider that emits one chunk and then fails. | Runner reports a terminal run failure and does not fallback after partial output. |
| No authorized model | Clear model config or configure a model not in run resources. | Runner returns runner.no_model instead of calling an unauthorized model. |
| MCP tool call | Use qa-local-stdio and qa_mcp_echo. | Bot returns the exact qa_mcp_echo:<input> result; /api/v1/tools contains qa_mcp_echo. |
| Plugin tool call | Install a fixture plugin exposing a deterministic tool and bind it to the pipeline. | Runner lists the plugin tool and can call it through the same tool loop as MCP tools. |
| Run steering | Use local-agent-steering-debug-chat with the fixture qa_plugin_sleep tool. | A follow-up sent while the sleep tool keeps the run active is claimed into the same run: two user messages, one assistant response, sentinel included. |
| Tool errors | Make the model request an unauthorized tool or invalid arguments in a controlled unit/component test. | Tool result contains an error message and the run does not bypass authorization. |
| Tool iteration limit | Use a controlled model/tool fixture that repeatedly requests more tool calls. | Runner stops with runner.tool_loop_limit at the configured limit. |
| Knowledge retrieval | Bind a KB containing a unique sentinel. | Bot returns the sentinel and backend logs LangRAG retrieval. |
Legacy knowledge-base config | Load a pipeline config using the old single-KB field. | Runner still retrieves from the KB. |
| Rerank | Configure rerank-model and rerank-top-k with a working rerank provider. | Retrieval order follows rerank output; unauthorized or failing rerank falls back to original retrieval order. |
| Remove-think | Enable output remove-think on a model that emits think tags. | Final visible output omits think content on both streaming and non-streaming paths. |
| Model extra args | Configure provider/model extra args and run Debug Chat. | Host merges persisted model extra args before provider invocation. |
| Query-aware tools | Call a tool that needs the current Query/session context. | Tool receives the active query and behaves the same as it did under the built-in runner. |
| Params filtering | Add public and secret-like variables before the run. | Public params are visible to the runner; _internal, token, key, password, and credential fields are filtered. |
| Actor/session context | Run through Debug Chat and at least one platform adapter path. | conversation, actor, subject, and state scopes contain stable IDs for the current launcher and sender. |
When reporting a local-agent QA result, separate these categories:
Passed by UI: path was verified through browser-visible behavior and backend/network evidence.Covered by unit/component tests: path is deterministic in tests but not practical as a live UI case.Not covered: path still needs a fixture or provider setup.Environment issue: provider channel, proxy, OAuth, or external marketplace/network problem outside the runner path.Do not mark the whole runner healthy based only on a single text Debug Chat response.