.agents/skills/llmobs-testing/references/category-strategies.md
YOU CANNOT MIX STRATEGIES BETWEEN CATEGORIES.
Each category has FORBIDDEN and REQUIRED patterns. Violating these will cause test failure.
FORBIDDEN:
new Client() classes (orchestration libraries don't have Client classes)REQUIRED:
FORBIDDEN:
REQUIRED:
Same as LLM_CLIENT.
REQUIRED:
Test strategy depends on package category:
| LlmObsCategory | VCR | Real APIs | Mock LLMs | Strategy |
|---|---|---|---|---|
| LLM_CLIENT | ✅ Yes | ✅ Yes | ❌ No | VCR with real API calls |
| MULTI_PROVIDER | ✅ Yes | ✅ Yes | ❌ No | VCR with real API calls |
| ORCHESTRATION | ❌ No | ❌ No | ✅ Yes | Pure functions, mock responses |
| INFRASTRUCTURE | ❌ No | ❌ No | ✅ Yes | Mock servers |
Enum location: anubis_apm/workflows/analyze/models.py
IF YOU USE THE WRONG STRATEGY, THE TEST WILL FAIL. ALWAYS CHECK THE CATEGORY FIRST.
Strategy: VCR with real API calls through proxy
const client = new MyLLMClient({
apiKey: 'test-key',
baseURL: 'http://127.0.0.1:9126/vcr/provider' // VCR proxy
})
it('instruments chat completion', async () => {
// Real API call (first run records, subsequent replays)
const response = await client.chat.completions.create({
messages: [{ role: 'user', content: 'Hello' }],
model: 'gpt-4'
})
const events = getEvents()
assertLlmObsSpanEvent(events[0], {
spanKind: 'llm',
modelName: 'gpt-4',
inputMessages: [{ content: 'Hello', role: 'user' }],
outputMessages: [{ content: MOCK_STRING, role: 'assistant' }],
metrics: { input_tokens: MOCK_NOT_NULLISH }
})
})
If the instrumented methods live in a sub-package that is a dependency of the package you load (e.g. @openai/agents-openai is a dep of @openai/agents-core), you must require the sub-package first.
RITM patches modules on their first require(). If the parent package loads the sub-package transitively before your before() hook requires it, the module is already cached and RITM never fires — instrumentation silently does nothing.
before(() => {
// ✅ CORRECT: require the instrumented sub-package first
const { OpenAIResponsesModel } = require('@openai/agents-openai')
const agentsCore = require('@openai/agents-core')
// ❌ WRONG: parent loads sub-package transitively, caching it before RITM patches it
// const agentsCore = require('@openai/agents-core') // caches @openai/agents-openai
// const { OpenAIResponsesModel } = require('@openai/agents-openai') // already cached, not patched
})
Symptom when wrong: tests time out — getEvents() never resolves, no APM traces arrive, only the SDK's own internal tracing output appears.
Strategy: Pure function tests, NO VCR, NO real API calls
// No VCR proxy - use library directly
const { StateGraph, Annotation } = require('@langchain/langgraph')
it('instruments graph invoke', async () => {
// Create graph with mock LLM responses
const graph = new StateGraph({
channels: {
messages: Annotation.Root({
reducer: (x, y) => x.concat(y)
})
}
})
// Add node with mock LLM response (no real API call)
graph.addNode('agent', async (state) => ({
messages: [{ role: 'assistant', content: 'Mock LLM response' }]
}))
graph.addEdge(START, 'agent')
graph.addEdge('agent', END)
const compiled = graph.compile()
// Invoke with mock data
const result = await compiled.invoke({
messages: [{ role: 'user', content: 'Test' }]
})
const events = getEvents()
assertLlmObsSpanEvent(events[0], {
spanKind: 'workflow', // Not 'llm'!
name: 'langgraph.graph.invoke'
})
})
beforeEach() for fresh stateOrchestration tools don't make HTTP calls themselves - they coordinate other libraries that do. Testing them requires testing the orchestration logic, not API interactions.
Strategy: Mock server tests
const mockServer = new MockMCPServer()
await mockServer.start()
it('instruments MCP protocol', async () => {
const client = new MCPClient({
serverUrl: mockServer.url
})
await client.connect()
const response = await client.call('method', params)
const events = getEvents()
assertLlmObsSpanEvent(events[0], {
spanKind: 'task', // Or appropriate kind
name: 'mcp.call'
})
})
Use this to choose strategy:
YES → Use VCR (Category 1 or 2)
NO → Check next question
YES → Pure functions (Category 3)
NO → Mock servers (Category 4)
// ❌ WRONG - LangGraph with VCR
const client = new StateGraph({
baseURL: 'http://127.0.0.1:9126/vcr/langgraph'
})
Why wrong: LangGraph doesn't make HTTP calls itself.
Fix: Use pure functions with mock responses.
// ❌ WRONG - OpenAI without VCR
const client = new OpenAI({
apiKey: 'real-key',
baseURL: 'https://api.openai.com' // Direct to API
})
Why wrong: Tests will fail without API key, hit rate limits, be non-deterministic.
Fix: Use VCR proxy URL.
// ❌ WRONG - Real OpenAI in LangGraph test
graph.addNode('agent', async (state) => {
const openai = new OpenAI({ apiKey: 'real-key' })
return await openai.chat.completions.create({ ... })
})
Why wrong: Orchestration tests should be pure functions.
Fix: Mock LLM responses directly.
const openai = new OpenAI({
apiKey: 'test',
baseURL: 'http://127.0.0.1:9126/vcr/openai'
})
await openai.chat.completions.create({ ... })
const model = createOpenAI({
apiKey: 'test',
baseURL: 'http://127.0.0.1:9126/vcr/openai'
})
await generateText({ model, prompt: '...' })
graph.addNode('agent', async (state) => ({
messages: [{ role: 'assistant', content: 'Mock' }]
}))
await graph.invoke({ ... })
const mockServer = new MockServer()
const client = new MCPClient({ url: mockServer.url })
await client.call('method', {})
Choose strategy based on what the package does, not what it's called.