docs/design/lora_resolver_plugins.md
This directory contains vLLM's LoRA resolver plugins built on the LoRAResolver framework.
They automatically discover and load LoRA adapters from a specified local storage path, eliminating the need for manual configuration or server restarts.
LoRA Resolver Plugins provide a flexible way to dynamically load LoRA adapters at runtime. When vLLM receives a request for a LoRA adapter that hasn't been loaded yet, the resolver plugins will attempt to locate and load the adapter from their configured storage locations. This enables:
lora_filesystem_resolver requires a local storage path, while the built-in hf_hub_resolver will pull LoRA adapters from Huggingface Hub and proceed in an identical manner. In general, custom resolvers can be implemented to fetch from any source.Before using LoRA Resolver Plugins, ensure the following environment variables are configured:
VLLM_ALLOW_RUNTIME_LORA_UPDATING: Must be set to true or 1 to enable dynamic LoRA loading
export VLLM_ALLOW_RUNTIME_LORA_UPDATING=true
VLLM_PLUGINS: Must include the desired resolver plugins (comma-separated list)
export VLLM_PLUGINS=lora_filesystem_resolver
VLLM_LORA_RESOLVER_CACHE_DIR: Must be set to a valid directory path for filesystem resolver
export VLLM_LORA_RESOLVER_CACHE_DIR=/path/to/lora/adapters
VLLM_PLUGINS: If not set, all available plugins will be loaded. If set to empty string, no plugins will be loaded.The filesystem resolver is installed with vLLM by default and enables loading LoRA adapters from a local directory structure.
Create the LoRA adapter storage directory:
mkdir -p /path/to/lora/adapters
Set environment variables:
export VLLM_ALLOW_RUNTIME_LORA_UPDATING=true
export VLLM_PLUGINS=lora_filesystem_resolver
export VLLM_LORA_RESOLVER_CACHE_DIR=/path/to/lora/adapters
Start vLLM server:
Your base model can be meta-llama/Llama-2-7b-hf. Please make sure you set up the Hugging Face token in your env var export HF_TOKEN=xxx235.
python -m vllm.entrypoints.openai.api_server \
--model your-base-model \
--enable-lora
The filesystem resolver expects LoRA adapters to be organized in the following structure:
/path/to/lora/adapters/
├── adapter1/
│ ├── adapter_config.json
│ ├── adapter_model.bin
│ └── tokenizer files (if applicable)
├── adapter2/
│ ├── adapter_config.json
│ ├── adapter_model.bin
│ └── tokenizer files (if applicable)
└── ...
Each adapter directory must contain:
adapter_config.json: Required configuration file with the following structure:
{
"peft_type": "LORA",
"base_model_name_or_path": "your-base-model-name",
"r": 16,
"lora_alpha": 32,
"target_modules": ["q_proj", "v_proj"],
"bias": "none",
"modules_to_save": null,
"use_rslora": false,
"use_dora": false
}
adapter_model.bin: The LoRA adapter weights file
Prepare your LoRA adapter:
# Assuming you have a LoRA adapter in /tmp/my_lora_adapter
cp -r /tmp/my_lora_adapter /path/to/lora/adapters/my_sql_adapter
Verify the directory structure:
ls -la /path/to/lora/adapters/my_sql_adapter/
# Should show: adapter_config.json, adapter_model.bin, etc.
Make a request using the adapter:
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "my_sql_adapter",
"prompt": "Generate a SQL query for:",
"max_tokens": 50,
"temperature": 0.1
}'
my_sql_adapter/path/to/lora/adapters/my_sql_adapter/ existsadapter_config.json fileYou can configure multiple resolver plugins to load adapters from different sources:
'lora_s3_resolver' is an example of a custom resolver you would need to implement
export VLLM_PLUGINS=lora_filesystem_resolver,lora_s3_resolver
All listed resolvers are enabled; at request time, vLLM tries them in order until one succeeds.
To implement your own resolver plugin:
Create a new resolver class:
from vllm.lora.resolver import LoRAResolver, LoRAResolverRegistry
from vllm.lora.request import LoRARequest
class CustomResolver(LoRAResolver):
async def resolve_lora(self, base_model_name: str, lora_name: str) -> Optional[LoRARequest]:
# Your custom resolution logic here
pass
Register the resolver:
def register_custom_resolver():
resolver = CustomResolver()
LoRAResolverRegistry.register_resolver("Custom Resolver", resolver)
"VLLM_LORA_RESOLVER_CACHE_DIR must be set to a valid directory"
"LoRA adapter not found"
adapter_config.json exists and is valid JSONadapter_model.bin exists in the directory"Invalid adapter configuration"
peft_type is set to "LORA"base_model_name_or_path matches your base modeltarget_modules is properly configured"LoRA rank exceeds maximum"
r value in adapter_config.json doesn't exceed max_lora_rank settingEnable debug logging:
export VLLM_LOGGING_LEVEL=DEBUG
Verify environment variables:
echo $VLLM_ALLOW_RUNTIME_LORA_UPDATING
echo $VLLM_PLUGINS
echo $VLLM_LORA_RESOLVER_CACHE_DIR
Test adapter configuration:
python -c "
import json
with open('/path/to/lora/adapters/my_adapter/adapter_config.json') as f:
config = json.load(f)
print('Config valid:', config)
"