Using vLLM in LobeHub

vLLM is an open-source local deployment tool for large language models (LLMs). It allows users to efficiently run LLMs on their local machines and provides an OpenAI-compatible API interface.

This guide will walk you through how to use vLLM in LobeHub:

<Steps> ### Step 1: Prerequisites

vLLM has specific hardware and software requirements. Please ensure your environment meets the following:

Hardware Requirements
GPU	- NVIDIA CUDA

AMD ROCm
Intel XPU | | CPU | - Intel/AMD x86
ARM AArch64
Apple silicon | | Other AI Accelerators | - Google TPU
Intel Gaudi
AWS Neuron
OpenVINO |

Software Requirements
- OS: Linux

Python: 3.9 – 3.12 |

Step 2: Install vLLM

If you're using an NVIDIA GPU, you can install vLLM directly via pip. However, we recommend using uv, a fast Python environment manager, to create and manage your Python environments. Follow the official guide to install uv. Once installed, you can create a new Python environment and install vLLM with the following commands:

shell

uv venv myenv --python 3.12 --seed
source myenv/bin/activate
uv pip install vllm

Alternatively, you can use uv run with the --with [dependency] option to run commands like vllm serve without creating a dedicated environment:

shell

uv run --with vllm vllm --help

You can also use conda to manage your Python environment:

shell

conda create -n myenv python=3.12 -y
conda activate myenv
pip install vllm

<Callout type={'note'}> For non-CUDA platforms, please refer to the official documentation for installation instructions. </Callout>

Step 3: Start the Local Server

vLLM can be deployed as a server compatible with the OpenAI API protocol. By default, it starts at http://localhost:8000. You can customize the address using the --host and --port parameters. Note that the server can only run one model at a time.

The following command starts a vLLM server running the Qwen2.5-1.5B-Instruct model:

shell

vllm serve Qwen/Qwen2.5-1.5B-Instruct

To enable API key authentication, you can pass the --api-key parameter or set the VLLM_API_KEY environment variable. If not set, the server will be accessible without an API key.

<Callout type={'note'}> For more detailed server configuration options, refer to the official vLLM documentation. </Callout>

Step 4: Configure vLLM in LobeHub

Open the App Settings panel in LobeHub
Under AI Providers, locate the vLLM configuration section

Enable the vLLM provider and enter the API service URL and API key

<Callout type={'warning'}> * If your vLLM server is not configured with an API key, leave the API key field blank.
* If your vLLM server is running locally, make sure to enable "Client Request Mode". </Callout>

Add the model you are running to the model list below
Assign the vLLM model to your assistant to start chatting

<Image alt={'Select vLLM Model'} inStep src={'/blog/assets8477415ecec1f37e38ab38ff1217d0a7.webp'} />
</Steps>

You're now ready to use vLLM-powered models in LobeHub for conversations.

Using vLLM API Key in LobeHub

Using vLLM in LobeHub

Step 2: Install vLLM

Step 3: Start the Local Server

Step 4: Configure vLLM in LobeHub