Back to Langbot

Local Agent Runner Coverage

skills/skills/langbot-testing/references/local-agent-runner-coverage.md

4.10.48.7 KB
Original Source

Local Agent Runner Coverage

Use this matrix when judging whether the external langbot/local-agent plugin still behaves like the old built-in local-agent runner.

The QA target is end-to-end behavior. UI cases prove the host, SDK, plugin runtime, and WebUI work together. Unit or component tests are still needed for negative branches that are hard to trigger reliably through a live provider.

Code Path Basis

  • LangBot/src/langbot/pkg/agent/runner/context_builder.py builds the Protocol v1 context from the event envelope: ctx.input.text, ctx.input.contents, attachments, state, resources, and runtime metadata.
  • LangBot/src/langbot/pkg/agent/runner/pipeline_adapter.py adapts Pipeline-only fields into ctx.adapter.extra.prompt, ctx.adapter.extra.params, and optional ctx.bootstrap.messages.
  • LangBot/src/langbot/pkg/agent/runner/resource_builder.py authorizes models, fallback models, rerank models, tools, and knowledge bases for the current run.
  • LangBot/src/langbot/pkg/plugin/handler.py validates run-scoped model/tool/rerank access and calls the host model provider or tool manager with the current query.
  • langbot-local-agent/components/agent_runner/default.py selects streaming or non-streaming execution, retrieves RAG context, builds messages, invokes models with fallback, and runs tool loops.
  • langbot-local-agent/pkg/messages.py prefers the host effective prompt from ctx.adapter.extra.prompt, uses ctx.bootstrap.messages only as a small bootstrap window, and preserves structured/multimodal input while inserting RAG context.

TODO: Treat ctx.adapter.extra.prompt as a temporary Pipeline bridge for old local-agent behavior parity. It is not the final answer for how user plugins or host hooks should influence agent behavior after Pipeline is replaced.

Minimum UI Gate

These browser cases are the minimum gate for a local-agent migration check:

CasePath CoveredExpected Behavior
local-agent-basic-debug-chatStreaming LLM invocation with effective host contextBot returns deterministic OK; backend logs streaming completion.
local-agent-effective-prompt-debug-chatPromptPreProcessing and host effective prompt handoff through ctx.adapter.extra.promptBot returns PROMPT_PREPROCESS_OK from the fixture prompt probe.
local-agent-context-compaction-debug-chatRunner-owned context budgeting and old-history compactionAutomation temporarily shrinks the runner context window, sends multi-turn Debug Chat history, and the bot still recovers the older sentinel.
local-agent-rag-debug-chatKnowledge-base authorization, retrieval, and RAG prompt insertionBot returns the KB sentinel, not a generic answer.
mcp-stdio-tool-callMCP stdio discovery, tool detail, model function calling, and tool executionBot returns qa_mcp_echo:<input> and backend logs the MCP tool call.
local-agent-plugin-tool-call-debug-chatPlugin tool discovery, tool detail, model function calling, and tool executionBot returns qa-plugin-smoke:<input> and backend logs the plugin tool call.
local-agent-steering-debug-chatHost steering claim, runner pull at turn boundary, and follow-up injection during an active tool loopTwo user messages produce one assistant response containing the steering sentinel.
local-agent-multimodal-debug-chatImage upload, structured input contents, and multimodal runner consumptionUI shows uploaded image and bot returns IMAGE_OK; backend receives an image input.
local-agent-rag-multimodal-debug-chatRAG insertion while structured image input is presentUI shows uploaded image, bot returns the KB sentinel, and backend logs the same request with [Image].
local-agent-nonstreaming-debug-chatHost non-streaming adapter path and runner non-streaming invocationBot returns NONSTREAM_OK; backend completes without the streaming-completed path.

Full Coverage Matrix

AreaHow To CoverPass Signal
Effective promptUse the qa-plugin-smoke prompt probe and send qa-effective-prompt.The answer follows query.prompt.messages and returns PROMPT_PREPROCESS_OK; plugin-local fallback config prompt is not used when host prompt exists.
Current text inputSend a deterministic text-only Debug Chat prompt.ctx.input.text becomes the user text and the bot answers the text request.
Structured input contentsUpload an image with text in Debug Chat.User message shows the image; backend log or request payload contains image content; model can acknowledge it.
Multimodal plus RAGRun local-agent-rag-multimodal-debug-chat.RAG sentinel is still retrievable and the image is not dropped from the user message; exact image-preservation inside the model message is covered by unit tests.
History and context compactionRun local-agent-context-compaction-debug-chat with a small temporary context-window-tokens budget.The runner compacts older history into <conversation_summary> and the final answer still recovers the older sentinel from the compacted context.
Streaming model invocationEnable Debug Chat streaming and ask for OK.UI receives incremental bot output and backend logs streaming completion.
Non-streaming model invocationDisable Debug Chat streaming or use a non-streaming adapter path.UI receives a final bot message and backend logs a normal response completion.
Model fallback before first chunkConfigure a failing primary and working fallback, preferably with a controlled test provider.First model failure does not fail the run; fallback model produces the final answer.
Failure after streaming commitUse a controlled provider that emits one chunk and then fails.Runner reports a terminal run failure and does not fallback after partial output.
No authorized modelClear model config or configure a model not in run resources.Runner returns runner.no_model instead of calling an unauthorized model.
MCP tool callUse qa-local-stdio and qa_mcp_echo.Bot returns the exact qa_mcp_echo:<input> result; /api/v1/tools contains qa_mcp_echo.
Plugin tool callInstall a fixture plugin exposing a deterministic tool and bind it to the pipeline.Runner lists the plugin tool and can call it through the same tool loop as MCP tools.
Run steeringUse local-agent-steering-debug-chat with the fixture qa_plugin_sleep tool.A follow-up sent while the sleep tool keeps the run active is claimed into the same run: two user messages, one assistant response, sentinel included.
Tool errorsMake the model request an unauthorized tool or invalid arguments in a controlled unit/component test.Tool result contains an error message and the run does not bypass authorization.
Tool iteration limitUse a controlled model/tool fixture that repeatedly requests more tool calls.Runner stops with runner.tool_loop_limit at the configured limit.
Knowledge retrievalBind a KB containing a unique sentinel.Bot returns the sentinel and backend logs LangRAG retrieval.
Legacy knowledge-base configLoad a pipeline config using the old single-KB field.Runner still retrieves from the KB.
RerankConfigure rerank-model and rerank-top-k with a working rerank provider.Retrieval order follows rerank output; unauthorized or failing rerank falls back to original retrieval order.
Remove-thinkEnable output remove-think on a model that emits think tags.Final visible output omits think content on both streaming and non-streaming paths.
Model extra argsConfigure provider/model extra args and run Debug Chat.Host merges persisted model extra args before provider invocation.
Query-aware toolsCall a tool that needs the current Query/session context.Tool receives the active query and behaves the same as it did under the built-in runner.
Params filteringAdd public and secret-like variables before the run.Public params are visible to the runner; _internal, token, key, password, and credential fields are filtered.
Actor/session contextRun through Debug Chat and at least one platform adapter path.conversation, actor, subject, and state scopes contain stable IDs for the current launcher and sender.

Reporting Rules

When reporting a local-agent QA result, separate these categories:

  • Passed by UI: path was verified through browser-visible behavior and backend/network evidence.
  • Covered by unit/component tests: path is deterministic in tests but not practical as a live UI case.
  • Not covered: path still needs a fixture or provider setup.
  • Environment issue: provider channel, proxy, OAuth, or external marketplace/network problem outside the runner path.

Do not mark the whole runner healthy based only on a single text Debug Chat response.