Back to Sentence Transformers

Pretrained Models

docs/cross_encoder/pretrained_models.md

5.4.19.8 KB
Original Source

Pretrained Models

{eval-rst}
We have released various pre-trained Cross Encoder models via our Cross Encoder Hugging Face organization. Additionally, numerous community Cross Encoder models have been publicly released on the Hugging Face Hub.

* **Original models**: `Cross Encoder Hugging Face organization <https://huggingface.co/models?library=sentence-transformers&author=cross-encoder>`_.
* **Community models**: `All Cross Encoder models on Hugging Face <https://huggingface.co/models?library=sentence-transformers&pipeline_tag=text-ranking>`_.

Each of these models can be easily downloaded and used like so:
python
from sentence_transformers import CrossEncoder
import torch

# Load https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2
model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2", activation_fn=torch.nn.Sigmoid())
scores = model.predict([
    ("How many people live in Berlin?", "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."),
    ("How many people live in Berlin?", "Berlin is well known for its museums."),
])
# => array([0.9998173 , 0.01312432], dtype=float32)

Cross-Encoders require pairs as inputs and output a score (0 to 1 if the Sigmoid activation function is used). Most models work with text pairs, but some also support non-text inputs such as images (see Multimodal Rerankers). Cross-Encoders do not work for individual sentences and they don't compute embeddings for individual texts.

MS MARCO

MS MARCO Passage Retrieval is a large dataset with real user queries from Bing search engine with annotated relevant text passages. Models trained on this dataset are very effective as rerankers for search systems.

{eval-rst}
.. note::
    You can initialize these models with ``activation_fn=torch.nn.Sigmoid()`` to force the model to return scores between 0 and 1. Otherwise, the raw value can reasonably range between -10 and 10.
Model NameNDCG@10 (TREC DL 19)MRR@10 (MS Marco Dev)Docs / Sec
cross-encoder/ms-marco-TinyBERT-L2-v269.8432.569000
cross-encoder/ms-marco-MiniLM-L2-v271.0134.854100
cross-encoder/ms-marco-MiniLM-L4-v273.0437.702500
cross-encoder/ms-marco-MiniLM-L6-v274.3039.011800
cross-encoder/ms-marco-MiniLM-L12-v274.3139.02960
cross-encoder/ms-marco-electra-base71.9936.41340

For details on the usage, see Retrieve & Re-Rank.

SQuAD (QNLI)

QNLI is based on the SQuAD dataset (HF) and was introduced by the GLUE Benchmark (HF). Given a passage from Wikipedia, annotators created questions that are answerable by that passage. These models output higher scores if a passage answers a question.

Model NameAccuracy on QNLI dev set
cross-encoder/qnli-distilroberta-base90.96
cross-encoder/qnli-electra-base93.21

STSbenchmark

The following models can be used like this:

python
from sentence_transformers import CrossEncoder

model = CrossEncoder("cross-encoder/stsb-roberta-base")
scores = model.predict([("It's a wonderful day outside.", "It's so sunny today!"), ("It's a wonderful day outside.", "He drove to work earlier.")])
# => array([0.60443085, 0.00240758], dtype=float32)

They return a score 0...1 indicating the semantic similarity of the given sentence pair.

Model NameSTSbenchmark Test Performance
cross-encoder/stsb-TinyBERT-L485.50
cross-encoder/stsb-distilroberta-base87.92
cross-encoder/stsb-roberta-base90.17
cross-encoder/stsb-roberta-large91.47

Quora Duplicate Questions

These models have been trained on the Quora duplicate questions dataset. They can used like the STSb models and give a score 0...1 indicating the probability that two questions are duplicate questions.

Model NameAverage Precision dev set
cross-encoder/quora-distilroberta-base87.48
cross-encoder/quora-roberta-base87.80
cross-encoder/quora-roberta-large87.91
{eval-rst}
.. note::
    The model don't work for question similarity. The question "How to learn Java?" and "How to learn Python?" will get a low score, as these questions are not duplicates. For question similarity, a :class:`~sentence_transformers.sentence_transformer.model.SentenceTransformer` trained on the Quora dataset will yield much more meaningful results.

NLI

Given two sentences, are these contradicting each other, entailing one the other or are these neutral? The following models were trained on the SNLI and MultiNLI datasets.

Model NameAccuracy on MNLI mismatched set
cross-encoder/nli-deberta-v3-base90.04
cross-encoder/nli-deberta-base88.08
cross-encoder/nli-deberta-v3-xsmall87.77
cross-encoder/nli-deberta-v3-small87.55
cross-encoder/nli-roberta-base87.47
cross-encoder/nli-MiniLM2-L6-H76886.89
cross-encoder/nli-distilroberta-base83.98
python
from sentence_transformers import CrossEncoder

model = CrossEncoder("cross-encoder/nli-deberta-v3-base")
scores = model.predict([
    ("A man is eating pizza", "A man eats something"),
    ("A black race car starts up in front of a crowd of people.", "A man is driving down a lonely road."),
])

# Convert scores to labels
label_mapping = ["contradiction", "entailment", "neutral"]
labels = [label_mapping[score_max] for score_max in scores.argmax(axis=1)]
# => ['entailment', 'contradiction']

Multimodal Rerankers

{eval-rst}
Multimodal rerankers can score pairs involving different modalities such as images, video, audio, and text. These models use the same :class:`~sentence_transformers.base.modules.Transformer` + :class:`~sentence_transformers.cross_encoder.modules.LogitScore` architecture as text-only decoder rerankers, but with a multimodal backbone that can process non-text inputs. You can check whether a model supports a given modality using :attr:`~sentence_transformers.cross_encoder.model.CrossEncoder.modalities` and :meth:`~sentence_transformers.cross_encoder.model.CrossEncoder.supports`.

Here are some community models:

See Cross Encoder > Usage for usage examples, and the training scripts in examples/cross_encoder/training/multimodal/ to fine-tune your own multimodal reranker.

Community Models

Some notable models from the Community include: