docs/configuration/model_resolution.md
vLLM loads HuggingFace-compatible models by inspecting the architectures field in config.json of the model repository
and finding the corresponding implementation that is registered to vLLM.
Nevertheless, our model resolution may fail for the following reasons:
config.json of the model repository lacks the architectures field.To fix this, explicitly specify the model architecture by passing config.json overrides to the hf_overrides option.
For example:
from vllm import LLM
llm = LLM(
model="cerebras/Cerebras-GPT-1.3B",
hf_overrides={"architectures": ["GPT2LMHeadModel"]}, # GPT-2
)
Our list of supported models shows the model architectures that are recognized by vLLM.