.agents/skills/llmobs-testing/SKILL.md
BEFORE writing any test, you MUST determine the package category.
The category determines EVERYTHING:
IF YOU USE THE WRONG CATEGORY STRATEGY, THE TEST WILL FAIL.
Categories are defined in the LlmObsCategory enum.
Quick check:
LLM_CLIENT or MULTI_PROVIDER (use VCR)ORCHESTRATION (NO VCR, pure functions)INFRASTRUCTURE (mock servers)See references/category-strategies.md for FORBIDDEN vs REQUIRED patterns per category.
This skill helps you write comprehensive LLMObs tests that validate span events, messages, tokens, and metadata using category-appropriate strategies.
LLMObs tests use special helpers to validate span events.
Key components:
useLlmObs() - Initializes LLMObs test environmentgetEvents() - Retrieves captured span eventsassertLlmObsSpanEvent() - Validates span structure with flexible matchersBasic test flow:
useLlmObs({ plugin: 'name' })getEvents()assertLlmObsSpanEvent()See references/test-structure.md for complete test file templates.
VCR records real API calls and replays them in tests for deterministic testing without external dependencies.
Purpose:
How it works:
http://127.0.0.1:9126/vcr/{provider}Cassette location: test/llmobs/plugins/{integration}/cassettes/
When to use VCR:
LlmObsCategory.LLM_CLIENT (Direct API wrappers)LlmObsCategory.MULTI_PROVIDER (Multi-provider frameworks)LlmObsCategory.ORCHESTRATION (Pure functions, no API calls)LlmObsCategory.INFRASTRUCTURE (Mock servers instead)See references/vcr-cassettes.md for recording process and troubleshooting.
Test strategy is determined by the LlmObsCategory enum.
Strategy: VCR with real API calls via proxy
Characteristics:
Span kind: Usually 'llm' for chat completions
See references/category-strategies.md for detailed patterns.
Strategy: Pure function tests, NO VCR, NO real API calls
Characteristics:
Span kind: Usually 'workflow' or 'agent', NOT 'llm'
Example concept:
See references/category-strategies.md for orchestration test patterns.
Strategy: Mock server tests
Characteristics:
See references/category-strategies.md for infrastructure test patterns.
assertLlmObsSpanEvent(actual, expected)
Validates span structure with flexible matchers for non-deterministic values.
Available matchers:
MOCK_STRING - Matches any non-empty string (use for output text)MOCK_NOT_NULLISH - Matches any truthy value (use for token counts)MOCK_NUMBER - Matches any numberMOCK_OBJECT - Matches any object (use for errors)Assertable fields:
spanKind (required) - Span type from LlmObsSpanKind enumname - Operation namemodelName - Model identifier (for LLM spans)modelProvider - Provider name (for LLM spans)inputMessages - Input messages in [{content, role}] formatoutputMessages - Output messages in [{content, role}] formatmetrics - Token usage (input_tokens, output_tokens, total_tokens)metadata - Model parameters (temperature, max_tokens, etc.)error - Error object (if operation failed)Partial validation: Only specified fields are checked, others ignored.
See references/assertion-helpers.md for complete API and patterns.
Location: test/llmobs/plugins/{integration}/index.spec.js
Structure:
'../../util'beforeEach() for fresh statedescribe('chat completions', ...))Standard imports:
useLlmObs, assertLlmObsSpanEvent, MOCK_STRING, MOCK_NOT_NULLISH, MOCK_NUMBER, MOCK_OBJECT
See references/test-structure.md for complete template.
Test all instrumented methods with:
{content, role} structure)Match span kind to operation type using LlmObsSpanKind enum:
'llm''workflow''agent''tool''embedding''retrieval'On errors, validate:
[{content: '', role: ''}]error: MOCK_OBJECTFor detailed information, see:
LlmObsCategory to pick test approach{content, role} structure