.agents/skills/llmobs-testing/SKILL.md
Before writing any test, determine the package's LlmObsCategory. Category picks the test strategy (VCR or not), the span kind, and the test structure. The wrong category produces tests that pass against the wrong contract — VCR cassettes for a workflow library produce empty recordings; pure-function tests for an HTTP-call wrapper miss the network surface entirely.
Quick check:
LLM_CLIENT or MULTI_PROVIDER — VCR.ORCHESTRATION — no VCR, pure functions, real LLM as the orchestration node.INFRASTRUCTURE — mock server.See references/category-strategies.md for the FORBIDDEN-vs-REQUIRED matrix per category.
LLMObs tests use special helpers to validate span events.
Key components:
useLlmObs() - Initializes LLMObs test environmentgetEvents() - Retrieves captured span eventsassertLlmObsSpanEvent() - Validates span structure with flexible matchersBasic test flow:
useLlmObs({ plugin: 'name' })getEvents()assertLlmObsSpanEvent()See references/test-structure.md for complete test file templates.
VCR records real API calls and replays them in tests for deterministic testing without external dependencies.
Purpose:
How it works:
http://127.0.0.1:9126/vcr/{provider}Cassette location: test/llmobs/plugins/{integration}/cassettes/
When to use VCR:
LlmObsCategory.LLM_CLIENT (Direct API wrappers)LlmObsCategory.MULTI_PROVIDER (Multi-provider frameworks)LlmObsCategory.ORCHESTRATION (Pure functions, no API calls)LlmObsCategory.INFRASTRUCTURE (Mock servers instead)See references/vcr-cassettes.md for recording process and troubleshooting.
The category-determination block at the top maps category to strategy. Non-obvious bits per category:
http://127.0.0.1:9126/vcr/{provider}. Span kind: 'llm'. Cassettes record once with real API keys; CI replays them.'workflow' or 'agent', never 'llm'. No VCR, no real API calls — the orchestrator itself doesn't make HTTP calls, it coordinates libraries that do. Mock LLM responses as plain return values from the node so the test exercises the workflow execution, not the provider API.See references/category-strategies.md for per-category patterns.
assertLlmObsSpanEvent(actual, expected)
Validates span structure with flexible matchers for non-deterministic values.
Available matchers:
MOCK_STRING - Matches any non-empty string (use for output text)MOCK_NOT_NULLISH - Matches any truthy value (use for token counts)MOCK_NUMBER - Matches any numberMOCK_OBJECT - Matches any object (use for errors)Assertable fields:
spanKind (required) - Span type from LlmObsSpanKind enumname - Operation namemodelName - Model identifier (for LLM spans)modelProvider - Provider name (for LLM spans)inputMessages - Input messages in [{content, role}] formatoutputMessages - Output messages in [{content, role}] formatmetrics - Token usage (input_tokens, output_tokens, total_tokens)metadata - Model parameters (temperature, max_tokens, etc.)error - Error object (if operation failed)Partial validation: Only specified fields are checked, others ignored.
See references/assertion-helpers.md for complete API and patterns.
Location: test/llmobs/plugins/{integration}/index.spec.js
Structure:
'../../util'beforeEach() for fresh statedescribe('chat completions', ...))Standard imports:
useLlmObs, assertLlmObsSpanEvent, MOCK_STRING, MOCK_NOT_NULLISH, MOCK_NUMBER, MOCK_OBJECT
See references/test-structure.md for complete template.
Test all instrumented methods with:
{content, role} structure)Match span kind to operation type using LlmObsSpanKind enum:
'llm''workflow''agent''tool''embedding''retrieval'On errors, validate:
[{content: '', role: ''}]error: MOCK_OBJECTFor detailed information, see: