docs/contributing/model/tests.md
This page explains how to write unit tests to verify the implementation of your model.
These tests are necessary to get your PR merged into vLLM library. Without them, the CI for your PR will fail.
Include an example HuggingFace repository for your model in tests/models/registry.py. This enables a unit test that loads dummy weights to ensure that the model can be initialized in vLLM.
!!! important The list of models in each section should be maintained in alphabetical order.
!!! tip
If your model requires a development version of HF Transformers, you can set
min_transformers_version to skip the test in CI until the model is released.
These tests are optional to get your PR merged into vLLM library. Passing these tests provides more confidence that your implementation is correct, and helps avoid future regressions.
These tests compare the model outputs of vLLM against HF Transformers. You can add new tests under the subdirectories of tests/models.
For generative models, there are two levels of correctness tests, as defined in tests/models/utils.py:
check_outputs_equal): The text outputted by vLLM should exactly match the text outputted by HF.check_logprobs_close): The logprobs outputted by vLLM should be in the top-k logprobs outputted by HF, and vice versa.For pooling models, we simply check the cosine similarity, as defined in tests/models/utils.py.
Adding your model to tests/models/multimodal/processing/test_common.py verifies that the following input combinations result in the same outputs:
You can add a new file under tests/models/multimodal/processing to run tests that only apply to your model.
For example, if the HF processor for your model accepts user-specified keyword arguments, you can verify that the keyword arguments are being applied correctly, such as in tests/models/multimodal/processing/test_phi3v.py.