Back to Lobehub

Using vLLM API Key in LobeHub

docs/usage/providers/vllm.mdx

2.1.564.0 KB
Original Source

Using vLLM in LobeHub

<Image alt={'Using vLLM in LobeHub'} cover src={'/blog/assets1049abec5850cebf8ce12cd50199b9c5.webp'} />

vLLM is an open-source local deployment tool for large language models (LLMs). It allows users to efficiently run LLMs on their local machines and provides an OpenAI-compatible API interface.

This guide will walk you through how to use vLLM in LobeHub:

<Steps> ### Step 1: Prerequisites

vLLM has specific hardware and software requirements. Please ensure your environment meets the following:

Hardware Requirements
GPU- NVIDIA CUDA
  • AMD ROCm
  • Intel XPU | | CPU | - Intel/AMD x86
  • ARM AArch64
  • Apple silicon | | Other AI Accelerators | - Google TPU
  • Intel Gaudi
  • AWS Neuron
  • OpenVINO |
Software Requirements
- OS: Linux
  • Python: 3.9 – 3.12 |

Step 2: Install vLLM

If you're using an NVIDIA GPU, you can install vLLM directly via pip. However, we recommend using uv, a fast Python environment manager, to create and manage your Python environments. Follow the official guide to install uv. Once installed, you can create a new Python environment and install vLLM with the following commands:

shell
uv venv myenv --python 3.12 --seed
source myenv/bin/activate
uv pip install vllm

Alternatively, you can use uv run with the --with [dependency] option to run commands like vllm serve without creating a dedicated environment:

shell
uv run --with vllm vllm --help

You can also use conda to manage your Python environment:

shell
conda create -n myenv python=3.12 -y
conda activate myenv
pip install vllm

<Callout type={'note'}> For non-CUDA platforms, please refer to the official documentation for installation instructions. </Callout>

Step 3: Start the Local Server

vLLM can be deployed as a server compatible with the OpenAI API protocol. By default, it starts at http://localhost:8000. You can customize the address using the --host and --port parameters. Note that the server can only run one model at a time.

The following command starts a vLLM server running the Qwen2.5-1.5B-Instruct model:

shell
vllm serve Qwen/Qwen2.5-1.5B-Instruct

To enable API key authentication, you can pass the --api-key parameter or set the VLLM_API_KEY environment variable. If not set, the server will be accessible without an API key.

<Callout type={'note'}> For more detailed server configuration options, refer to the official vLLM documentation. </Callout>

Step 4: Configure vLLM in LobeHub

  • Open the App Settings panel in LobeHub
  • Under AI Providers, locate the vLLM configuration section

<Image alt={'Enter vLLM API Key'} inStep src={'/blog/assets02dce7325584974cdba327fe2f996b9e.webp'} />

  • Enable the vLLM provider and enter the API service URL and API key

<Callout type={'warning'}> * If your vLLM server is not configured with an API key, leave the API key field blank.
* If your vLLM server is running locally, make sure to enable "Client Request Mode". </Callout>

  • Add the model you are running to the model list below

  • Assign the vLLM model to your assistant to start chatting

    <Image alt={'Select vLLM Model'} inStep src={'/blog/assets8477415ecec1f37e38ab38ff1217d0a7.webp'} />

    </Steps>

You're now ready to use vLLM-powered models in LobeHub for conversations.