examples/others/lmcache/README.md
This folder demonstrates how to use LMCache for disaggregated prefilling, CPU offloading and KV cache sharing.
This example demonstrates how to run LMCache with disaggregated prefill using NIXL on a single node.
pip install lmcache.Run
cd disagg_prefill_lmcache_v1
to get into disagg_prefill_lmcache_v1 folder, and then run
bash disagg_example_nixl.sh
to run disaggregated prefill and benchmark the performance.
disagg_prefill_lmcache_v1/disagg_vllm_launcher.sh - Launches individual vLLM servers for prefill/decode, and also launches the proxy server.disagg_prefill_lmcache_v1/disagg_proxy_server.py - FastAPI proxy server that coordinates between prefiller and decoderdisagg_prefill_lmcache_v1/disagg_example_nixl.sh - Main script to run the exampledisagg_prefill_lmcache_v1/configs/lmcache-prefiller-config.yaml - Configuration for prefiller serverdisagg_prefill_lmcache_v1/configs/lmcache-decoder-config.yaml - Configuration for decoder serverThe main script generates several log files:
prefiller.log - Logs from the prefill serverdecoder.log - Logs from the decode serverproxy.log - Logs from the proxy serverpython cpu_offload_lmcache.py -v v0 - CPU offloading implementation for vLLM v0python cpu_offload_lmcache.py -v v1 - CPU offloading implementation for vLLM v1The kv_cache_sharing_lmcache_v1.py example demonstrates how to share KV caches between vLLM v1 instances.
The disaggregated_prefill_lmcache_v0.py provides an example of how to run disaggregated prefill in vLLM v0.