litellm/integrations/otel/README.md
This package produces OpenTelemetry traces for LiteLLM. It is enabled by the
LITELLM_OTEL_V2 environment variable (is_otel_v2_enabled() in
config.py); when unset, nothing in this package runs.
A traced proxy request produces one trace with two kinds of spans:
SERVER span "POST /v1/chat/completions" ← FastAPI instrumentation
├── INTERNAL span "auth /v1/chat/completions" ← auth phase ┐
│ ├── CLIENT span "postgres get_key_object" ← datastore call │
│ └── CLIENT span "postgres get_team_membership" │
├── INTERNAL span "execute_guardrail …" ← guardrail │ this package
├── CLIENT span "chat gpt-4o" ← LLM call │
└── CLIENT span "batch_write_to_db …" ← spend write ┘
The gen-ai spans are siblings under the server span. In particular the guardrail span is a sibling of the LLM call, not a child of it: pre/during/post-call guardrail hooks are part of the request lifecycle (a pre-call guardrail runs before the LLM call even starts), so they belong directly under the server span, alongside the LLM call.
Request-level spans (LLM call, guardrail) parent to the server span via an
explicit anchor — context.set_request_root_span captures the server span
once at request entry, and resolve_request_span_context reads it — rather than
to whatever span is momentarily active. Ambient-only parenting was wrong at two
boundaries: inside the live auth phase span the active span is auth (so the
span would nest under auth), and a pass-through request closes its span from a
detached asyncio.create_task where the server span is no longer active (so the
span orphaned into its own trace). The anchor — a contextvar inherited by those
child tasks — gives a stable parent in both cases. DB/service spans keep ambient
parenting so an auth DB lookup still nests under auth.
Which service calls become spans (spans.span_role_for_service). LiteLLM's
service-logging layer instruments many internal functions, but only some are
traceable units of work:
DB_CALL (CLIENT) — outbound datastore calls (redis, postgres,
batch_write_to_db), carrying db.system.name / db.operation.name semconv.SERVICE (INTERNAL) — genuine internal work worth a span (background
budget/reset jobs, pod-lock manager).self (the track_llm_api_timing wrapper, which
duplicates the LLM-call span), router (duplicates the request), and
proxy_pre_call (a guardrail's real span is execute_guardrail …). These
still feed Prometheus/Datadog through their own hooks; they just never enter
the trace. auth is also excluded here because it gets a live phase span
instead (see below).Spans are named "{service} {call_type}" (e.g. "redis set") so repeated calls
to one service stay distinguishable. Like every other span they parent to the
ambient context, falling back to the threaded litellm_parent_otel_span only
when ambient has no live span; a background job with neither starts its own root
trace. Caller-supplied event_metadata is sanitized before it reaches a span
(primitives only, no live objects, no secrets/headers, bounded) — see
payloads.sanitize_event_metadata.
Live phase spans. auth is wrapped in a real, active span
(logger.phase_span) for the duration of authentication, so the DB lookups it
triggers nest under it instead of flattening onto the server span. Identity
Baggage (team/key/user) is seeded once the key resolves, so every post-auth span
inherits it; auth-internal DB lookups that run before the key is known stay
unlabeled, which is correct.
Status. On success a span's status is left UNSET (the semconv default,
matching the FastAPI server span); only a genuine error sets ERROR.
opentelemetry-instrumentation-fastapi package. It stamps http.* attributes
and extracts inbound traceparent headers. This package does not create
or modify server spans — request routes never touch spans.Both kinds share a single TracerProvider, so they belong to the same trace
and export through the same configured exporters. FastAPI middleware can only be
added before the app starts serving, so the app is instrumented at
import time without a provider — it binds to the OTel global
ProxyTracerProvider. Once config (and the callbacks) is loaded, the proxy
publishes the chosen logger's TracerProvider as the global via
trace.set_tracer_provider(...), and the server spans delegate to it. When a
preset callback (arize, langfuse_otel, …) is configured, its provider
becomes the global, so server spans export to that backend too.
proxy_server import): when the gate is on,
mount.instrument_fastapi_app(app) calls FastAPIInstrumentor.instrument_app
with no provider (the middleware stack is frozen once the app serves, so this
can't wait for startup). It binds to the OTel global ProxyTracerProvider. Noisy
non-LLM routes are excluded by default (mount._DEFAULT_EXCLUDED_ROUTES): health
checks (/health*), the Prometheus scrape (/metrics), and static UI/docs assets
(/litellm-asset-prefix, /_next, /ui, /swagger, /docs, /redoc,
/openapi.json, favicons, /.well-known) — so load-balancer polling, metric
scrapes, and asset fetches don't flood traces. Entries are substring-matched, so
/metrics also drops the /model/metrics admin-analytics spans. Set
OTEL_PYTHON_FASTAPI_EXCLUDED_URLS to override the whole set (e.g. "" to trace
everything, or your own comma-separated path list).proxy_server.proxy_startup_event): after the config (and
callbacks) is loaded, the already-registered preset OpenTelemetryV2 logger
is reused — or a generic one reading OTEL_* envs is built when no preset is
configured — and its TracerProvider is published as the OTel global with
trace.set_tracer_provider(...). The proxy tracer then delegates to it, so
server spans and gen-ai spans share one provider and the same trace.create_litellm_proxy_request_started_span, at the auth boundary)
captures it as the request anchor (set_request_root_span), so every later
request-level span has a stable explicit parent regardless of what is active
when it emits.OpenTelemetryV2.log_pre_api_call
runs synchronously in the request task, just before the upstream call, and
opens the LLM-call span there, parented to the anchored server span
(resolve_request_span_context). The open span is held in a bounded cache keyed
by litellm_call_id (a primitive the callback kwargs carry at both pre_call
and close), so no live Span ever travels through a litellm_params metadata
dict. For the boundary hook to fire at all, the logger is registered into
litellm.input_callback — the list Logging.pre_call iterates. The async
success/failure callback later
closes it: it builds an LLMCallSpanData from the typed
standard_logging_object (token usage and cost are computed only by then),
stamps the attributes, sets status, and ends the span. The sync callback is a
no-op (closing is async-only). When pre_call runs off the request task — a
sync-only provider driven through a thread pool, where contextvars (and so the
anchor) don't follow — no parent is visible there, so creation is deferred
to the async callback, whose worker context was copied from the request task at
enqueue and so still carries the anchor. Pass-through endpoints call
logging_obj.pre_call in the request task too, then close from a detached
asyncio.create_task; the anchor (not the by-then-inactive server span) keeps
their LLM-call span in the request's trace. pre_call is litellm's generic
"log the attempt" hook, so it also fires for synthetic proxy-gate error logs
(auth/rate-limit rejections); those carry LITELLM_LOGGING_NO_UPSTREAM_LLM_CALL
and are skipped, so a request rejected before reaching a provider never produces
a phantom CLIENT span.litellm/_service_logger.py, which
recognizes the V2 OpenTelemetryV2 logger (a plain CustomLogger, not a
subclass of the legacy OpenTelemetry). It hands every service call to the
logger — including calls with no parent span — and the V2 adapter decides the
role (DB_CALL vs SERVICE), the parent (ambient → threaded → root), and
whether the call is a traceable operation or a metrics-only ping. Guardrail
span data is built from the typed, provider-agnostic
StandardLoggingGuardrailInformation — no single provider's field shape is
assumed.model/, no OpenTelemetry import)These define the shape of a span without depending on the OTel SDK, so they can
be imported anywhere. They live in model/ and form a closed set —
nothing here imports outside it:
semconv.py — attribute-key constants (gen_ai.*, http.*,
litellm.*), the GenAI operation/provider enums, and the functions that map
LiteLLM provider/call-type strings onto convention values.spans.py — the span registry: every span role, its OTel span
kind, its place in the hierarchy, and its name builder.payloads.py — frozen dataclasses (LLMCallSpanData,
GuardrailSpanData, ServiceSpanData, …) built from heterogeneous logging
payloads via from_* classmethods.config.py — OpenTelemetryV2Config, a pydantic-settings
model that reads OTEL_* / LITELLM_OTEL_* env vars, plus the feature gate.
capture_span_content gates whether prompt/response bodies may be written as
span attributes; it defaults off (no_content). The Baggage allowlists are
configurable, not hard-coded: set LITELLM_OTEL_BAGGAGE_PROMOTED_KEYS /
LITELLM_OTEL_BAGGAGE_METADATA_KEYS /
LITELLM_OTEL_BAGGAGE_TEAM_METADATA_KEYS (comma-separated) as env vars, or
baggage_promoted_keys / baggage_metadata_keys /
baggage_team_metadata_keys (YAML lists) under callback_settings.otel in
config.yaml — the latter reach the config through the logger's constructor
kwargs. baggage_team_metadata_keys is empty by default, so none of a team's
free-form metadata is promoted until each sub-key is explicitly allowlisted.baggage.py — the single definition of which request-identity
values are promoted into Baggage (so child spans inherit them) and under which
attribute keys.utils.py — value coercion, JSON serialization, and
extractor-table application, shared across the package.emitter.py — SpanEmitter.emit(role, data): dedupe → start
the span → run the mapper chain to stamp attributes → set status → end. It
owns no attribute keys. The dedupe set (which coalesces the sync+async firing
of one request) is a bounded LRU so it can't grow without limit.mappers/ — each mapper turns typed span data into a flat
{attribute key: value} dict. They compose: listing several mapper names in
the config layers multiple attribute vocabularies onto the same span.
genai — the canonical OpenTelemetry GenAI vocabulary, always present.legacy — an additional vocabulary using the older semconv-ai / Traceloop
attribute key names, for backends that read those.openinference, langfuse, weave, langtrace — vendor vocabularies.resolve_mappers(names) turns config names into mapper instances.plumbing/)The OTel-SDK wiring. Everything here imports only model/ and each other; it
lives in plumbing/:
providers.py — builds the TracerProvider, its exporters
(from ExporterSpecs), and the span processor that copies allowlisted Baggage
entries onto every span. register_exporter_factory(kind, factory) lets a
preset contribute a custom exporter kind (e.g. one that fetches an auth
token lazily) without coupling this module to any vendor.context.py — trace-context and Baggage read/write helpers.routing.py — TenantTracerCache: when a request carries
team/key-scoped vendor credentials, route its spans through a credential-keyed
TracerProvider so one logger serves many tenants. The cache is a bounded LRU
that flushes + shuts down evicted providers, since the key derives from
request-supplied credentials and must not grow (or leak threads) without limit.metrics.py — GenAI client metric instruments.logger.py — OpenTelemetryV2, a CustomLogger that
translates LiteLLM's logging callbacks into typed span data and hands them to
the engine. The LLM-call span is opened at the log_pre_api_call boundary
(parented to the live server span via ambient context) and closed at the async
success/failure callback; the open span is held in a bounded cache keyed by
litellm_call_id, never threaded through a metadata dict. The logger registers
itself into litellm.input_callback so Logging.pre_call fires the boundary
hook.mount.py — instrument_fastapi_app(app), the single call site
that attaches opentelemetry-instrumentation-fastapi for SERVER spans. It owns
the health-check exclusion default (OTEL_PYTHON_FASTAPI_EXCLUDED_URLS) and the
passthrough span-naming hook (PASSTHROUGH_PREFIXES) so proxy_server carries
no OTel detail. A safe no-op when the gate is off or the instrumentation package
is absent; must be called at app-creation time (the middleware stack freezes
once the app serves).presets/ — each preset reads one integration's env vars and
returns an OpenTelemetryV2Config (exporter destination + mapper vocabularies
PRESET_BY_CALLBACK maps a callback name ("arize",
"langfuse_otel", …) to its preset. Integrations that support team/key-scoped
credentials also provide a per-request OTLP header builder
(DYNAMIC_HEADERS_BY_CALLBACK). Presets do no network I/O at build time:
AgentOps, for example, mints its JWT lazily inside a custom exporter on the
first export (in the BatchSpanProcessor worker thread), never on the event
loop.mappers/
(a class with a map(data) -> AttributeMap method, typically built from
key -> extractor tables) and register it in mappers/__init__._MAPPER_BY_NAME.presets/ that returns an
OpenTelemetryV2Config, and register it in presets/__init__.PRESET_BY_CALLBACK.
If it supports dynamic credentials, add a header builder to
DYNAMIC_HEADERS_BY_CALLBACK.spans.py (registry entry + name builder),
a payload dataclass in payloads.py, and a branch in the relevant mapper(s).