Back to Skywalking

Envoy AI Gateway Monitoring

docs/en/setup/backend/backend-envoy-ai-gateway-monitoring.md

10.4.05.2 KB
Original Source

Envoy AI Gateway Monitoring

Envoy AI Gateway observability via OTLP

Envoy AI Gateway is a gateway/proxy for AI/LLM API traffic (OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, Google Gemini, etc.) built on top of Envoy Proxy. It natively emits GenAI metrics and access logs via OTLP, following OpenTelemetry GenAI Semantic Conventions.

SkyWalking receives OTLP metrics and logs directly on its gRPC port (11800) — no OpenTelemetry Collector is needed between the AI Gateway and SkyWalking OAP.

Prerequisites

Data flow

  1. Envoy AI Gateway processes LLM API requests and records GenAI metrics (token usage, latency, TTFT, TPOT).
  2. The AI Gateway pushes metrics and access logs via OTLP gRPC to SkyWalking OAP.
  3. SkyWalking OAP parses metrics with MAL rules and access logs with LAL rules.

Set up

The MAL rules (envoy-ai-gateway/*) and LAL rules (envoy-ai-gateway) are enabled by default in SkyWalking OAP. No OAP-side configuration is needed.

Configure the AI Gateway to push OTLP to SkyWalking by setting these environment variables:

Env VarValuePurpose
OTEL_SERVICE_NAMEPer-deployment gateway name (e.g., my-ai-gateway)SkyWalking service name
OTEL_EXPORTER_OTLP_ENDPOINThttp://skywalking-oap:11800SkyWalking OAP gRPC receiver
OTEL_EXPORTER_OTLP_PROTOCOLgrpcOTLP transport
OTEL_METRICS_EXPORTERotlpEnable OTLP metrics push
OTEL_LOGS_EXPORTERotlpEnable OTLP access log push
OTEL_RESOURCE_ATTRIBUTESSee belowRouting + instance + layer

Required resource attributes (in OTEL_RESOURCE_ATTRIBUTES):

  • job_name=envoy-ai-gateway — Fixed routing tag for MAL/LAL rules. Same for all AI Gateway deployments.
  • service.instance.id=<instance-id> — Instance identity. In Kubernetes, use the pod name via Downward API.
  • service.layer=ENVOY_AI_GATEWAY — Routes access logs to the AI Gateway LAL rules.

Example:

bash
OTEL_SERVICE_NAME=my-ai-gateway
OTEL_EXPORTER_OTLP_ENDPOINT=http://skywalking-oap:11800
OTEL_EXPORTER_OTLP_PROTOCOL=grpc
OTEL_METRICS_EXPORTER=otlp
OTEL_LOGS_EXPORTER=otlp
OTEL_RESOURCE_ATTRIBUTES=job_name=envoy-ai-gateway,service.instance.id=pod-abc123,service.layer=ENVOY_AI_GATEWAY

Supported Metrics

SkyWalking observes the AI Gateway as a LAYER: ENVOY_AI_GATEWAY service. Each gateway deployment is a service, each pod is an instance. Metrics include per-provider and per-model breakdowns.

Service Metrics

Monitoring PanelUnitMetric NameDescription
Request CPMcalls/minmeter_envoy_ai_gw_request_cpmRequests per minute
Request Latency Avgmsmeter_envoy_ai_gw_request_latency_avgAverage request duration
Request Latency Percentilemsmeter_envoy_ai_gw_request_latency_percentileP50/P75/P90/P95/P99
Input Token Ratetokens/minmeter_envoy_ai_gw_input_token_rateInput (prompt) tokens per minute
Output Token Ratetokens/minmeter_envoy_ai_gw_output_token_rateOutput (completion) tokens per minute
TTFT Avgmsmeter_envoy_ai_gw_ttft_avgTime to First Token (streaming only)
TTFT Percentilemsmeter_envoy_ai_gw_ttft_percentileP50/P75/P90/P95/P99 TTFT
TPOT Avgmsmeter_envoy_ai_gw_tpot_avgTime Per Output Token (streaming only)
TPOT Percentilemsmeter_envoy_ai_gw_tpot_percentileP50/P75/P90/P95/P99 TPOT

Provider Breakdown Metrics

Monitoring PanelUnitMetric NameDescription
Provider Request CPMcalls/minmeter_envoy_ai_gw_provider_request_cpmRequests by provider
Provider Token Ratetokens/minmeter_envoy_ai_gw_provider_token_rateToken rate by provider
Provider Latency Avgmsmeter_envoy_ai_gw_provider_latency_avgLatency by provider

Model Breakdown Metrics

Monitoring PanelUnitMetric NameDescription
Model Request CPMcalls/minmeter_envoy_ai_gw_model_request_cpmRequests by model
Model Token Ratetokens/minmeter_envoy_ai_gw_model_token_rateToken rate by model
Model Latency Avgmsmeter_envoy_ai_gw_model_latency_avgLatency by model
Model TTFT Avgmsmeter_envoy_ai_gw_model_ttft_avgTTFT by model
Model TPOT Avgmsmeter_envoy_ai_gw_model_tpot_avgTPOT by model

Instance Metrics

All service-level metrics are also available per instance (pod) with meter_envoy_ai_gw_instance_ prefix, including per-provider and per-model breakdowns.

Access Log Sampling

The LAL rules apply a sampling policy to reduce storage:

  • Error responses (HTTP status >= 400) — always persisted.
  • Upstream failures — always persisted.
  • High token cost (>= 10,000 total tokens) — persisted for cost anomaly detection.
  • Normal successful responses with low token counts are dropped.

The token threshold can be adjusted in lal/envoy-ai-gateway.yaml.