docs/usage/providers/vllm.mdx
<Image alt={'Using vLLM in LobeHub'} cover src={'/blog/assets1049abec5850cebf8ce12cd50199b9c5.webp'} />
vLLM is an open-source local deployment tool for large language models (LLMs). It allows users to efficiently run LLMs on their local machines and provides an OpenAI-compatible API interface.
This guide will walk you through how to use vLLM in LobeHub:
<Steps> ### Step 1: PrerequisitesvLLM has specific hardware and software requirements. Please ensure your environment meets the following:
| Hardware Requirements | |
|---|---|
| GPU | - NVIDIA CUDA |
| Software Requirements |
|---|
| - OS: Linux |
If you're using an NVIDIA GPU, you can install vLLM directly via pip. However, we recommend using uv, a fast Python environment manager, to create and manage your Python environments. Follow the official guide to install uv. Once installed, you can create a new Python environment and install vLLM with the following commands:
uv venv myenv --python 3.12 --seed
source myenv/bin/activate
uv pip install vllm
Alternatively, you can use uv run with the --with [dependency] option to run commands like vllm serve without creating a dedicated environment:
uv run --with vllm vllm --help
You can also use conda to manage your Python environment:
conda create -n myenv python=3.12 -y
conda activate myenv
pip install vllm
<Callout type={'note'}> For non-CUDA platforms, please refer to the official documentation for installation instructions. </Callout>
vLLM can be deployed as a server compatible with the OpenAI API protocol. By default, it starts at http://localhost:8000. You can customize the address using the --host and --port parameters. Note that the server can only run one model at a time.
The following command starts a vLLM server running the Qwen2.5-1.5B-Instruct model:
vllm serve Qwen/Qwen2.5-1.5B-Instruct
To enable API key authentication, you can pass the --api-key parameter or set the VLLM_API_KEY environment variable. If not set, the server will be accessible without an API key.
<Callout type={'note'}> For more detailed server configuration options, refer to the official vLLM documentation. </Callout>
App Settings panel in LobeHubAI Providers, locate the vLLM configuration section<Image alt={'Enter vLLM API Key'} inStep src={'/blog/assets02dce7325584974cdba327fe2f996b9e.webp'} />
<Callout type={'warning'}>
* If your vLLM server is not configured with an API key, leave the API key field blank.
* If your vLLM server is running locally, make sure to enable "Client Request Mode".
</Callout>
Add the model you are running to the model list below
Assign the vLLM model to your assistant to start chatting
<Image alt={'Select vLLM Model'} inStep src={'/blog/assets8477415ecec1f37e38ab38ff1217d0a7.webp'} />
</Steps>You're now ready to use vLLM-powered models in LobeHub for conversations.