Classification Usages

Classification involves predicting which predefined category, class, or label best corresponds to a given input.

Summary

Model Usage: (sequence) classification
Pooling Task: classify
Offline APIs:
- LLM.classify(...)
- LLM.encode(..., pooling_task="classify")
Online APIs:
- Classification API (/classify)
- Pooling API (/pooling)

The key distinction between (sequence) classification and token classification lies in their output granularity: (sequence) classification produces a single result for an entire input sequence, whereas token classification yields a result for each individual token within the sequence.

Many classification models support both (sequence) classification and token classification. For further details on token classification, please refer to this page.

Only when a classification model outputs num_labels equal to 1 can it be used as a scoring model and have its scoring API enabled, please refer to this page.

Typical Use Cases

Classification

The most fundamental application of classification models is to categorize input data into predefined classes.

Supported Models

Text-only Models

Architecture	Models	Example HF Models	LoRA	PP
`ErnieForSequenceClassification`	BERT-like Chinese ERNIE	`Forrest20231206/ernie-3.0-base-zh-cls`
`GPT2ForSequenceClassification`	GPT2	`nie3e/sentiment-polish-gpt2-small`
`Qwen2ForSequenceClassification`<sup>C</sup>	Qwen2-based	`jason9693/Qwen2.5-1.5B-apeach`
`Model`<sup>C</sup>, `ForCausalLM`<sup>C</sup>, etc.	Generative models	N/A	*	*

Multimodal Models

!!! note For more information about multimodal models inputs, see this page.

Architecture	Models	Inputs	Example HF Models	LoRA	PP
`Qwen2_5_VLForSequenceClassification`<sup>C</sup>	Qwen2_5_VL-based	T + I<sup>E+</sup> + V<sup>E+</sup>	`muziyongshixin/Qwen2.5-VL-7B-for-VideoCls`
`ForConditionalGeneration`<sup>C</sup>, `ForCausalLM`<sup>C</sup>, etc.	Generative models	*	N/A	*	*

<sup>C</sup> Automatically converted into a classification model via --convert classify. (details)
* Feature support is the same as that of the original model.

If your model is not in the above list, we will try to automatically convert the model using [as_seq_cls_model][vllm.model_executor.models.adapters.as_seq_cls_model]. By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token.

Cross-encoder Models

Cross-encoder (aka reranker) models are a subset of classification models that accept two prompts as input and output num_labels equal to 1. Most classification models can also be used as cross-encoder models. For more information on cross-encoder models, please refer to this page.

--8<-- "docs/models/pooling_models/scoring.md:supported-cross-encoder-models"

Reward Models

Using (sequence) classification models as reward models. For more information, see Reward Models.

--8<-- "docs/models/pooling_models/reward.md:supported-sequence-reward-models"

Offline Inference

Pooling Parameters

The following [pooling parameters][vllm.PoolingParams] are supported.

python

--8<-- "vllm/pooling_params.py:common-pooling-params"
--8<-- "vllm/pooling_params.py:classify-pooling-params"

`LLM.classify`

The [classify][vllm.LLM.classify] method outputs a probability vector for each prompt.

python

from vllm import LLM

llm = LLM(model="jason9693/Qwen2.5-1.5B-apeach", runner="pooling")
(output,) = llm.classify("Hello, my name is")

probs = output.outputs.probs
print(f"Class Probabilities: {probs!r} (size={len(probs)})")

A code example can be found here: examples/basic/offline_inference/classify.py

`LLM.encode`

The [encode][vllm.LLM.encode] method is available to all pooling models in vLLM.

Set pooling_task="classify" when using LLM.encode for classification Models:

python

from vllm import LLM

llm = LLM(model="jason9693/Qwen2.5-1.5B-apeach", runner="pooling")
(output,) = llm.encode("Hello, my name is", pooling_task="classify")

data = output.outputs.data
print(f"Data: {data!r}")

Online Serving

Classification API

Online /classify API is similar to LLM.classify.

Completion Parameters

The following Classification API parameters are supported:

??? code

```python
--8<-- "vllm/entrypoints/pooling/base/protocol.py:pooling-common-params"
--8<-- "vllm/entrypoints/pooling/base/protocol.py:completion-params"
--8<-- "vllm/entrypoints/pooling/base/protocol.py:classify-params"
```

The following extra parameters are supported:

??? code

```python
--8<-- "vllm/entrypoints/pooling/base/protocol.py:pooling-common-extra-params"
--8<-- "vllm/entrypoints/pooling/base/protocol.py:completion-extra-params"
--8<-- "vllm/entrypoints/pooling/base/protocol.py:classify-extra-params"
```

Chat Parameters

For chat-like input (i.e. if messages is passed), the following parameters are supported:

??? code

```python
--8<-- "vllm/entrypoints/pooling/base/protocol.py:pooling-common-params"
--8<-- "vllm/entrypoints/pooling/base/protocol.py:chat-params"
--8<-- "vllm/entrypoints/pooling/base/protocol.py:classify-params"
```

these extra parameters are supported instead:

??? code

```python
--8<-- "vllm/entrypoints/pooling/base/protocol.py:pooling-common-extra-params"
--8<-- "vllm/entrypoints/pooling/base/protocol.py:chat-extra-params"
--8<-- "vllm/entrypoints/pooling/base/protocol.py:classify-extra-params"
```

Example Requests

Code example: examples/pooling/classify/classification_online.py

You can classify multiple texts by passing an array of strings:

bash

curl -v "http://127.0.0.1:8000/classify" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "jason9693/Qwen2.5-1.5B-apeach",
    "input": [
      "Loved the new café—coffee was great.",
      "This update broke everything. Frustrating."
    ]
  }'

??? console "Response"

```json
{
  "id": "classify-7c87cac407b749a6935d8c7ce2a8fba2",
  "object": "list",
  "created": 1745383065,
  "model": "jason9693/Qwen2.5-1.5B-apeach",
  "data": [
    {
      "index": 0,
      "label": "Default",
      "probs": [
        0.565970778465271,
        0.4340292513370514
      ],
      "num_classes": 2
    },
    {
      "index": 1,
      "label": "Spoiled",
      "probs": [
        0.26448777318000793,
        0.7355121970176697
      ],
      "num_classes": 2
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "total_tokens": 20,
    "completion_tokens": 0,
    "prompt_tokens_details": null
  }
}
```

You can also pass a string directly to the input field:

bash

curl -v "http://127.0.0.1:8000/classify" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "jason9693/Qwen2.5-1.5B-apeach",
    "input": "Loved the new café—coffee was great."
  }'

??? console "Response"

```json
{
  "id": "classify-9bf17f2847b046c7b2d5495f4b4f9682",
  "object": "list",
  "created": 1745383213,
  "model": "jason9693/Qwen2.5-1.5B-apeach",
  "data": [
    {
      "index": 0,
      "label": "Default",
      "probs": [
        0.565970778465271,
        0.4340292513370514
      ],
      "num_classes": 2
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "total_tokens": 10,
    "completion_tokens": 0,
    "prompt_tokens_details": null
  }
}
```

More examples

More examples can be found here: examples/pooling/classify

Supported Features

Enable/disable activation

You can enable or disable activation via use_activation.

Problem type (e.g. `multi_label_classification`)

You can modify the problem_type via problem_type in the Hugging Face config. The supported problem types are: single_label_classification, multi_label_classification, and regression.

Implement alignment with transformers ForSequenceClassificationLoss.

Affine Score Calibration

Affine Score Calibration, also known as Platt Scaling (Platt, 1999), is the most widely used method for calibrating classifier outputs into well-calibrated probabilities.

The calibration follows the transformation:

activation((logit - logit_mean) / logit_sigma)

Parameter	Default	Description
`logit_mean`	`None`	Mean subtracted from logits (centers scores)
`logit_sigma`	`None`	Standard deviation used to scale logits after mean subtraction

The computation order is as follows:

python

logits -= logit_mean   # subtract mean (center scores)
logits /= logit_sigma  # divide by sigma (scale)
logits = activation(logits)  # e.g. sigmoid

Example configuration:

bash

--pooler-config '{"use_activation": true, "logit_mean": 4.5, "logit_sigma": 1.0}'

Removed Features

Remove softmax from PoolingParams

We have already removed softmax and activation from PoolingParams. Instead, use use_activation, since we allow classify and token_classify to use any activation function.

Remove `logit_bias` and `logit_scale`

logit_bias and logit_scale are deprecated aliases for logit_mean and logit_sigma respectively. When using logit_scale, it is automatically converted to logit_sigma = 1/logit_scale. These deprecated parameters will be removed in v0.21.

Classification Usages

Classification Usages

Summary

Typical Use Cases

Classification

Supported Models

Text-only Models

Multimodal Models

Cross-encoder Models

Reward Models

Offline Inference

Pooling Parameters

LLM.classify

LLM.encode

Online Serving

Classification API

Completion Parameters

Chat Parameters

Example Requests

More examples

Supported Features

Enable/disable activation

Problem type (e.g. multi_label_classification)

Affine Score Calibration

Removed Features

Remove softmax from PoolingParams

Remove logit_bias and logit_scale

`LLM.classify`

`LLM.encode`

Problem type (e.g. `multi_label_classification`)

Remove `logit_bias` and `logit_scale`