docs/cross_encoder/pretrained_models.md
We have released various pre-trained Cross Encoder models via our Cross Encoder Hugging Face organization. Additionally, numerous community Cross Encoder models have been publicly released on the Hugging Face Hub.
* **Original models**: `Cross Encoder Hugging Face organization <https://huggingface.co/models?library=sentence-transformers&author=cross-encoder>`_.
* **Community models**: `All Cross Encoder models on Hugging Face <https://huggingface.co/models?library=sentence-transformers&pipeline_tag=text-ranking>`_.
Each of these models can be easily downloaded and used like so:
from sentence_transformers import CrossEncoder
import torch
# Load https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2
model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2", activation_fn=torch.nn.Sigmoid())
scores = model.predict([
("How many people live in Berlin?", "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."),
("How many people live in Berlin?", "Berlin is well known for its museums."),
])
# => array([0.9998173 , 0.01312432], dtype=float32)
Cross-Encoders require pairs as inputs and output a score (0 to 1 if the Sigmoid activation function is used). Most models work with text pairs, but some also support non-text inputs such as images (see Multimodal Rerankers). Cross-Encoders do not work for individual sentences and they don't compute embeddings for individual texts.
MS MARCO Passage Retrieval is a large dataset with real user queries from Bing search engine with annotated relevant text passages. Models trained on this dataset are very effective as rerankers for search systems.
.. note::
You can initialize these models with ``activation_fn=torch.nn.Sigmoid()`` to force the model to return scores between 0 and 1. Otherwise, the raw value can reasonably range between -10 and 10.
| Model Name | NDCG@10 (TREC DL 19) | MRR@10 (MS Marco Dev) | Docs / Sec |
|---|---|---|---|
| cross-encoder/ms-marco-TinyBERT-L2-v2 | 69.84 | 32.56 | 9000 |
| cross-encoder/ms-marco-MiniLM-L2-v2 | 71.01 | 34.85 | 4100 |
| cross-encoder/ms-marco-MiniLM-L4-v2 | 73.04 | 37.70 | 2500 |
| cross-encoder/ms-marco-MiniLM-L6-v2 | 74.30 | 39.01 | 1800 |
| cross-encoder/ms-marco-MiniLM-L12-v2 | 74.31 | 39.02 | 960 |
| cross-encoder/ms-marco-electra-base | 71.99 | 36.41 | 340 |
For details on the usage, see Retrieve & Re-Rank.
QNLI is based on the SQuAD dataset (HF) and was introduced by the GLUE Benchmark (HF). Given a passage from Wikipedia, annotators created questions that are answerable by that passage. These models output higher scores if a passage answers a question.
| Model Name | Accuracy on QNLI dev set |
|---|---|
| cross-encoder/qnli-distilroberta-base | 90.96 |
| cross-encoder/qnli-electra-base | 93.21 |
The following models can be used like this:
from sentence_transformers import CrossEncoder
model = CrossEncoder("cross-encoder/stsb-roberta-base")
scores = model.predict([("It's a wonderful day outside.", "It's so sunny today!"), ("It's a wonderful day outside.", "He drove to work earlier.")])
# => array([0.60443085, 0.00240758], dtype=float32)
They return a score 0...1 indicating the semantic similarity of the given sentence pair.
| Model Name | STSbenchmark Test Performance |
|---|---|
| cross-encoder/stsb-TinyBERT-L4 | 85.50 |
| cross-encoder/stsb-distilroberta-base | 87.92 |
| cross-encoder/stsb-roberta-base | 90.17 |
| cross-encoder/stsb-roberta-large | 91.47 |
These models have been trained on the Quora duplicate questions dataset. They can used like the STSb models and give a score 0...1 indicating the probability that two questions are duplicate questions.
| Model Name | Average Precision dev set |
|---|---|
| cross-encoder/quora-distilroberta-base | 87.48 |
| cross-encoder/quora-roberta-base | 87.80 |
| cross-encoder/quora-roberta-large | 87.91 |
.. note::
The model don't work for question similarity. The question "How to learn Java?" and "How to learn Python?" will get a low score, as these questions are not duplicates. For question similarity, a :class:`~sentence_transformers.sentence_transformer.model.SentenceTransformer` trained on the Quora dataset will yield much more meaningful results.
Given two sentences, are these contradicting each other, entailing one the other or are these neutral? The following models were trained on the SNLI and MultiNLI datasets.
| Model Name | Accuracy on MNLI mismatched set |
|---|---|
| cross-encoder/nli-deberta-v3-base | 90.04 |
| cross-encoder/nli-deberta-base | 88.08 |
| cross-encoder/nli-deberta-v3-xsmall | 87.77 |
| cross-encoder/nli-deberta-v3-small | 87.55 |
| cross-encoder/nli-roberta-base | 87.47 |
| cross-encoder/nli-MiniLM2-L6-H768 | 86.89 |
| cross-encoder/nli-distilroberta-base | 83.98 |
from sentence_transformers import CrossEncoder
model = CrossEncoder("cross-encoder/nli-deberta-v3-base")
scores = model.predict([
("A man is eating pizza", "A man eats something"),
("A black race car starts up in front of a crowd of people.", "A man is driving down a lonely road."),
])
# Convert scores to labels
label_mapping = ["contradiction", "entailment", "neutral"]
labels = [label_mapping[score_max] for score_max in scores.argmax(axis=1)]
# => ['entailment', 'contradiction']
Multimodal rerankers can score pairs involving different modalities such as images, video, audio, and text. These models use the same :class:`~sentence_transformers.base.modules.Transformer` + :class:`~sentence_transformers.cross_encoder.modules.LogitScore` architecture as text-only decoder rerankers, but with a multimodal backbone that can process non-text inputs. You can check whether a model supports a given modality using :attr:`~sentence_transformers.cross_encoder.model.CrossEncoder.modalities` and :meth:`~sentence_transformers.cross_encoder.model.CrossEncoder.supports`.
Here are some community models:
revision="refs/pr/11")revision="refs/pr/9")See Cross Encoder > Usage for usage examples, and the training scripts in examples/cross_encoder/training/multimodal/ to fine-tune your own multimodal reranker.
Some notable models from the Community include: