Back to Dd Trace Js

APM Integrations

.agents/skills/apm-integrations/SKILL.md

5.107.010.8 KB
Original Source

APM Integrations

dd-trace-js provides automatic tracing for 100+ third-party libraries. Each integration consists of two decoupled layers communicating via Node.js diagnostic channels.

Architecture

┌──────────────────────────┐     diagnostic channels      ┌─────────────────────────┐
│     Instrumentation      │ ──────────────────────────▶  │        Plugin           │
│ datadog-instrumentations │    apm:<name>:<op>:start     │  datadog-plugin-<name>  │
│                          │    apm:<name>:<op>:finish    │                         │
│ Hooks into library       │    apm:<name>:<op>:error     │ Creates spans, sets     │
│ methods, emits events    │                              │ tags, handles errors    │
└──────────────────────────┘                              └─────────────────────────┘

Instrumentation (packages/datadog-instrumentations/src/): Hooks into a library's internals and publishes events with context data to named diagnostic channels. Has zero knowledge of tracing — only emits events.

Plugin (packages/datadog-plugin-<name>/src/): Subscribes to diagnostic channel events and creates APM spans with service name, resource, tags, and error metadata. Extends a base class providing lifecycle management.

Both layers are always needed for a new integration.

Instrumentation: Orchestrion First

Orchestrion is the required default for all new instrumentations. It is an AST rewriter that automatically wraps methods via JSON configuration, with correct CJS and ESM handling built in. Orchestrion handles ESM code far more reliably than traditional shimmer-based wrapping, which struggles with ESM's static module structure.

Config lives in packages/datadog-instrumentations/src/helpers/rewriter/instrumentations/<name>.js. See Orchestrion Reference for the full config format and examples.

When Shimmer Is Necessary Instead

Shimmer (addHook + shimmer.wrap) should only be used when orchestrion cannot handle the pattern. When using shimmer, always include a code comment explaining why orchestrion is not viable. Valid reasons:

  • Dynamic method interception — methods created at runtime or on prototype chains that orchestrion's static analysis cannot reach
  • Factory patterns — wrapping return values of factory functions
  • Argument modification — instrumentations that need to mutate arguments before the original call

If none of these apply, use orchestrion. For shimmer patterns, refer to existing shimmer-based instrumentations in the codebase (e.g., packages/datadog-instrumentations/src/pg.js). Always try to use Orchestrion when beginning a new integration!

Plugin Base Classes

Plugins extend a base class matching the library type. The base class provides automatic channel subscriptions, span lifecycle, and type-specific tags.

Plugin
├── CompositePlugin              — Multiple sub-plugins (produce + consume)
├── LogPlugin                    — Log correlation injection (no spans)
├── WebPlugin                    — Base web plugin
│   └── RouterPlugin             — Web frameworks with middleware
└── TracingPlugin                — Base for all span-creating plugins
    ├── InboundPlugin            — Inbound calls
    │   ├── ServerPlugin         — HTTP servers
    │   └── ConsumerPlugin       — Message consumers (DSM)
    └── OutboundPlugin           — Outbound calls
        ├── ProducerPlugin       — Message producers (DSM)
        └── ClientPlugin         — HTTP/RPC clients
            └── StoragePlugin    — Storage systems
                ├── DatabasePlugin   — Database clients (DBM, db.* tags)
                └── CachePlugin      — Key-value caches

Wrong base class = complex workarounds. Always match the library type to the base class.

Read Upstream Source First

Touching packages/datadog-instrumentations/src/<lib>.js, its plugin counterpart, or any orchestrion config — for any reason — read the upstream library's source first. Memory of an SDK's contract drifts faster than the SDK; comments in the wrap go stale every minor version; cross-version diffs surface contract changes guessing misses (lazy → eager attachment, mode-exclusive APIs, new error paths).

Two ways to fetch the source locally:

  1. Shallow clone the installed version:

    bash
    git clone --depth 1 --branch v<x.y.z> https://github.com/<org>/<repo>.git /tmp/<lib>-versions/v<x.y.z>
    
  2. npm pack when the published runtime artifact is what matters:

    bash
    cd /tmp/<lib>-versions && npm pack <lib>@<x.y.z>
    tar -xzf <lib>-<x.y.z>.tgz -C v<x.y.z> --strip-components=1
    

Read the file the wrap hooks, the base classes the hooked methods inherit from, and files the wrap doesn't currently touch — a public method, an internal channel, or a metadata field the current instrumentation skipped often gives a cleaner hook (e.g., kafka cluster.brokerPool.metadata.clusterId, couchbase tracingChannel).

Key Concepts

The ctx Object

Context flows from instrumentation to plugin:

  • Orchestrion: automatically provides ctx.arguments (method args) and ctx.self (instance)
  • Shimmer: instrumentation sets named properties (ctx.sql, ctx.client, etc.)
  • Plugin sets: ctx.currentStore (span), ctx.parentStore (parent span)
  • On completion: ctx.result or ctx.error

Channel Event Lifecycle

  • runStores() for start events — establishes async context (always)
  • publish() for finish/error events — notification only
  • hasSubscribers guard — skip instrumentation when no plugin listens (performance fast path)
  • When shimmer is necessary, prefer tracingChannel (from dc-polyfill) over manual channels — it provides start/end/asyncStart/asyncEnd/error events automatically

Channel Prefix Patterns

  • Orchestrion: tracing:orchestrion:<npm-package>:<channelName> (set via static prefix)
  • Shimmer + tracingChannel (preferred): tracing:apm:<name>:<operation> (set via static prefix)
  • Shimmer + manual channels (legacy): apm:{id}:{operation} (default, no static prefix needed)

bindStart / bindFinish

Primary plugin methods. Base classes handle most lifecycle; often only bindStart is needed to create the span and set tags.

Subscriber Cardinality (channel.publish position)

When relocating a channel.publish call behind a dedupe gate, depth filter, cache-hit return, or any short-circuit, the question is not "is the publish still there?" but "what cardinality does each downstream subscriber need?". Subscribers split into two camps that look identical from inside the publish site:

  • Once per first occurrence — tracing plugins that dedupe spans, distinct-path metrics. Safe behind a dedupe gate.
  • Once per call — IAST taint-tracking (mutates each call's args object by reference), AppSec WAF subscribers that block/log per invocation, anything walking payload identity. Drops data silently when cardinality falls below one-per-call.

Before adding or moving a gate in front of a publish, grep the repo for the channel name, list its subscribers, decide per-subscriber whether the new position preserves the cardinality each needs. When cardinalities diverge, split the publish into a pre-gate (per-call) and a post-gate (per-first-occurrence) call.

Reference Integrations

Always read 1-2 references of the same type before writing or modifying code.

Library TypePluginInstrumentationBase Class
Databasedatadog-plugin-pgsrc/pg.jsDatabasePlugin
Cachedatadog-plugin-redissrc/redis.jsCachePlugin
HTTP clientdatadog-plugin-fetchsrc/fetch.jsHttpClientPlugin (extends ClientPlugin)
Web frameworkdatadog-plugin-expresssrc/express.jsRouterPlugin
Message queuedatadog-plugin-kafkajssrc/kafkajs.jsProducer/ConsumerPlugin
Orchestriondatadog-plugin-langchainrewriter/instrumentations/langchain.jsTracingPlugin

For the complete list by base class, see Reference Plugins.

Debugging

  • DD_TRACE_DEBUG=true to see channel activity
  • Log Object.keys(ctx) in bindStart to inspect available context
  • Spans missing → verify hasSubscribers guard; check channel names match between layers
  • Context lost → ensure runStores() (not publish()) for start events
  • ESM fails but CJS works → check esmFirst: true in hooks.js (or switch to orchestrion)

Implementation Workflow

Follow these steps when creating or modifying an integration:

  1. Investigate — Read the upstream library's source (see Read Upstream Source First). Read 1-2 reference integrations of the same type (see table above). Understand the instrumentation and plugin patterns before writing code.
  2. Implement instrumentation — Create the instrumentation in packages/datadog-instrumentations/src/. Use orchestrion for instrumentation.
  3. Implement plugin — Create the plugin in packages/datadog-plugin-<name>/src/. Extend the correct base class.
  4. Register — Add entries in packages/dd-trace/src/plugins/index.js, index.d.ts, docs/test.ts, docs/API.md, and .github/workflows/apm-integrations.yml.
  5. Write tests — Add unit tests and ESM integration tests. See Testing for templates.
  6. Run tests — Validate with:
    bash
    # Run plugin tests (preferred CI command — handles yarn services automatically)
    PLUGINS="<name>" npm run test:plugins:ci
    
    # If the plugin needs external services (databases, message brokers, etc.),
    # check docker-compose.yml for available service names, then:
    

docker compose up -d <service> PLUGINS="<name>" npm run test:plugins:ci

7. **Verify** — Confirm all tests pass before marking work as complete.

## Reference Files

- **[New Integration Guide](references/new-integration-guide.md)** — Step-by-step guide and checklist for creating a new integration end-to-end
- **[Orchestrion Reference](references/orchestrion.md)** — JSON config format, channel naming, function kinds, plugin subscription
- **[Plugin Patterns](references/plugin-patterns.md)** — `startSpan()` API, `ctx` object details, `CompositePlugin`, channel subscriptions, code style
- **[Testing](references/testing.md)** — Unit test and ESM integration test templates
- **[Reference Plugins](references/reference-plugins.md)** — All plugins organized by base class