Back to Dd Trace Js

Message Extraction Patterns

.agents/skills/llmobs-integration/references/message-extraction.md

5.101.02.5 KB
Original Source

Message Extraction Patterns

Overview

Every LLM provider uses a different message format. Before implementing message extraction, you must read the provider's actual source code and existing plugin implementation to understand its specific format.

All plugins must normalize messages to the standard LLMObs format: [{ content: string, role: string }]

Common roles: 'user', 'assistant', 'system', 'tool'

What Varies Per Provider

Input formats differ in:

  • Field name for the messages array (messages, contents, prompt, etc.)
  • Whether content is a plain string or an array of typed parts
  • Role naming conventions (e.g., 'model' vs 'assistant')

Output formats differ in:

  • Response structure (choices[0].message, content[0].text, candidates[0].content.parts, etc.)
  • Token usage field names (prompt_tokens/completion_tokens vs input_tokens/output_tokens)

Common variations include:

  • Simple array — messages are already [{role, content}] (e.g. OpenAI)
  • Nested content blocks — content is an array of typed objects (e.g. Anthropic [{type: 'text', text: '...'}])
  • Parts format — messages use a parts array inside a contents array (e.g. Google GenAI)
  • Role normalization — provider uses different role names that must be mapped (e.g. Google's 'model''assistant')
  • Streaming — content arrives as deltas that must be accumulated across chunks

How to Research a New Provider

  1. Read the existing tracing plugin for the package (packages/datadog-plugin-<name>/src/index.js) to understand what arguments and results look like
  2. Look at the provider's SDK source or API docs to understand response shapes
  3. Check an existing LLMObs plugin for a similar provider as a reference

Reference Implementations

The best examples of message extraction for the providers we support:

Key Implementation Notes

  • Always handle null/undefined with fallback defaults (|| '' and || [])
  • Normalize 'model' role to 'assistant' for consistency (preserve 'system', 'tool', 'function')
  • For array content parts (Anthropic, Google), join text parts with ''
  • For streaming, accumulate delta content across chunks before tagging
  • Always return [{ content: '', role: '' }] on error (never omit output messages)