doc/source/serve/llm/user-guides/observability.md
(observability-guide)=
Monitor your LLM deployments with built-in metrics, dashboards, and logging.
Ray Serve LLM provides comprehensive observability with the following features:
Ray enables LLM service-level logging by default, making these statistics available through Grafana and Prometheus. For more details on configuring Grafana and Prometheus, see {ref}collect-metrics.
These higher-level metrics track request and token behavior across deployed models:
Ray includes a Serve LLM-specific dashboard, which is automatically available in Grafana:
The dashboard includes visualizations for:
All engine metrics, including vLLM, are available through the Ray metrics export endpoint and are queryable with Prometheus. See vLLM metrics for a complete list. The Serve LLM Grafana dashboard also visualizes these metrics.
Key engine metrics include:
Engine metric logging is on by default as of Ray 2.51. To disable engine-level metric logging, set log_engine_metrics: False when configuring the LLM deployment:
::::{tab-set}
:::{tab-item} Python :sync: builder
from ray import serve
from ray.serve.llm import LLMConfig, build_openai_app
llm_config = LLMConfig(
model_loading_config=dict(
model_id="qwen-0.5b",
model_source="Qwen/Qwen2.5-0.5B-Instruct",
),
deployment_config=dict(
autoscaling_config=dict(
min_replicas=1, max_replicas=2,
)
),
log_engine_metrics=False # Disable engine metrics
)
app = build_openai_app({"llm_configs": [llm_config]})
serve.run(app, blocking=True)
:::
:::{tab-item} YAML :sync: bind
# config.yaml
applications:
- args:
llm_configs:
- model_loading_config:
model_id: qwen-0.5b
model_source: Qwen/Qwen2.5-0.5B-Instruct
accelerator_type: A10G
deployment_config:
autoscaling_config:
min_replicas: 1
max_replicas: 2
log_engine_metrics: false # Disable engine metrics
import_path: ray.serve.llm:build_openai_app
name: llm_app
route_prefix: "/"
:::
::::
The Ray Team collects usage data to improve Ray Serve LLM. The team collects data about the following features and attributes:
To opt out from usage data collection, see {ref}Ray usage stats <ref-usage-stats> for how to disable it.
collect-metrics - Ray metrics collection guideTroubleshooting <../troubleshooting> - Common issues and solutions