docs/README.md
vLLM is a fast and easy-to-use library for LLM inference and serving.
Originally developed in the Sky Computing Lab at UC Berkeley, vLLM has grown into one of the most active open-source AI projects built and maintained by a diverse community of many dozens of academic institutions and companies from over 2000 contributors.
Where to get started with vLLM depends on the type of user. If you are looking to:
For information about the development of vLLM, see:
vLLM is fast with:
vLLM is flexible and easy to use with:
vLLM seamlessly supports 200+ model architectures on HuggingFace, including:
Find the full list of supported models here.
For more information, check out the following: