benchmarking/vllm/README.md
This directory contains benchmarks for running batch inference with vLLM.
vllmdaftraygenerate_data.ipynb notebook to generate the datasets.config.py with the correct dataset path and other parameters.naive-batch.py: Simple batch inference using a Daft batch function and vLLM's LLM class.naive-batch-sorted.py: Same as naive-batch.py, but with the prompts sorted.continuous-batch.py: Continuous batching using the vllm-prefix-caching provider, with prefix routing disabled.continuous-batch-sorted.py: Same as continuous-batch.py, but with the prompts sorted.prefix-bucketing.py: Both continuous batching and prefix bucketing using the vllm-prefix-caching provider.ray-data.py: Ray Data batching using ray.data.llm.build_llm_processor.