doc/source/serve/llm/troubleshooting.md
Common issues and frequently asked questions for Ray Serve LLM.
You can use runtime_env to specify the env variables that are required to access the model. To get the deployment options, you can use the get_deployment_options method on the {class}LLMServer <ray.serve.llm.deployment.LLMServer> class. Each deployment class has its own get_deployment_options method.
from ray import serve
from ray.serve.llm import LLMConfig
from ray.serve.llm.deployment import LLMServer
from ray.serve.llm.ingress import OpenAiIngress
from ray.serve.llm.builders import build_openai_app
import os
llm_config = LLMConfig(
model_loading_config=dict(
model_id="llama-3-8b-instruct",
model_source="meta-llama/Meta-Llama-3-8B-Instruct",
),
deployment_config=dict(
autoscaling_config=dict(
min_replicas=1, max_replicas=2,
)
),
# Pass the desired accelerator type (e.g., A10G, L4, etc.)
accelerator_type="A10G",
runtime_env=dict(
env_vars=dict(
HF_TOKEN=os.environ["HF_TOKEN"]
)
),
)
app = build_openai_app({"llm_configs": [llm_config]})
serve.run(app, blocking=True)
If you're using Hugging Face models, you can enable fast download by setting HF_HUB_ENABLE_HF_TRANSFER and installing pip install hf_transfer.
from ray import serve
from ray.serve.llm import LLMConfig
from ray.serve.llm.deployment import LLMServer
from ray.serve.llm.ingress import OpenAiIngress
from ray.serve.llm.builders import build_openai_app
import os
llm_config = LLMConfig(
model_loading_config=dict(
model_id="llama-3-8b-instruct",
model_source="meta-llama/Meta-Llama-3-8B-Instruct",
),
deployment_config=dict(
autoscaling_config=dict(
min_replicas=1, max_replicas=2,
)
),
# Pass the desired accelerator type (e.g., A10G, L4, etc.)
accelerator_type="A10G",
runtime_env=dict(
env_vars=dict(
HF_TOKEN=os.environ["HF_TOKEN"],
HF_HUB_ENABLE_HF_TRANSFER="1"
)
),
)
# Deploy the application
app = build_openai_app({"llm_configs": [llm_config]})
serve.run(app, blocking=True)
:::{admonition} Known issue
Ray 2.55 installs vLLM 0.18.0. Depending on the conda environment, you may encounter incompatibilities with native runtime libraries (for example, libstdc++, CXXABI, ICU).
In such cases, override just the libstdc++ library from your conda environment with LD_LIBRARY_PATH:
mkdir -p "${CONDA_PREFIX}/lib-overrides"
ln -sf "${CONDA_PREFIX}/lib/libstdc++.so.6" "${CONDA_PREFIX}/lib-overrides/libstdc++.so.6"
export LD_LIBRARY_PATH="${CONDA_PREFIX}/lib-overrides${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"
:::
If you encounter issues not covered in this guide:
Quickstart examples <quick-start>Examples <examples>