examples/disaggregated/lmcache/README.md
This folder demonstrates how to use LMCache with vLLM v1 for KV cache offloading, disaggregated prefilling, and KV cache sharing.
LMCache integrates with vLLM v1 in two ways:
LMCacheConnectorV1): LMCache runs inside the vLLM
process and is configured through environment variables or a YAML config
file (LMCACHE_CONFIG_FILE). This is the simplest way to add single-node
CPU/disk offloading.LMCacheMPConnector): LMCache runs as a
standalone server (lmcache server) that owns the KV cache storage; one or
more vLLM instances connect to it. This is the recommended mode for
distributed KV storage and for sharing KV cache across instances. See the
LMCache docs for the full MP setup.python cpu_offload_lmcache.py - CPU offloading with LMCacheConnectorV1
for vLLM v1.bash cpu_offload_lmcache_mp.sh - CPU offloading with LMCacheMPConnector,
using a standalone lmcache server. vLLM provides a built-in shortcut for
this setup via --kv-offloading-backend lmcache and
--kv-offloading-size <GiB>.This example demonstrates how to run LMCache with disaggregated prefill using NIXL on a single node.
pip install lmcache.Run
cd disagg_prefill_lmcache_v1
to get into disagg_prefill_lmcache_v1 folder, and then run
bash disagg_example_nixl.sh
to run disaggregated prefill and benchmark the performance.
disagg_prefill_lmcache_v1/disagg_vllm_launcher.sh - Launches individual vLLM servers for prefill/decode, and also launches the proxy server.disagg_prefill_lmcache_v1/disagg_proxy_server.py - FastAPI proxy server that coordinates between prefiller and decoderdisagg_prefill_lmcache_v1/disagg_example_nixl.sh - Main script to run the exampledisagg_prefill_lmcache_v1/configs/lmcache-prefiller-config.yaml - Configuration for prefiller serverdisagg_prefill_lmcache_v1/configs/lmcache-decoder-config.yaml - Configuration for decoder serverThe main script generates several log files:
prefiller.log - Logs from the prefill serverdecoder.log - Logs from the decode serverproxy.log - Logs from the proxy serverThe kv_cache_sharing_lmcache_v1.py example demonstrates how to share KV
caches between vLLM v1 instances through a centralized LMCache server.