docs/sentence_transformer/pretrained_models.md
We provide various pre-trained Sentence Transformers models via our Sentence Transformers Hugging Face organization. Additionally, over 6,000 community Sentence Transformers models have been publicly released on the Hugging Face Hub. All models can be found here:
* **Original models**: `Sentence Transformers Hugging Face organization <https://huggingface.co/models?library=sentence-transformers&author=sentence-transformers>`_.
* **Community models**: `All Sentence Transformer models on Hugging Face <https://huggingface.co/models?library=sentence-transformers>`_.
Each of these models can be easily downloaded and used like so:
.. sidebar:: Original Models
For the original models from the `Sentence Transformers Hugging Face organization <https://huggingface.co/models?library=sentence-transformers&author=sentence-transformers>`_, it is not necessary to include the model author or organization prefix. For example, this snippet loads `sentence-transformers/all-mpnet-base-v2 <https://huggingface.co/sentence-transformers/all-mpnet-base-v2>`_.
from sentence_transformers import SentenceTransformer
# Load https://huggingface.co/sentence-transformers/all-mpnet-base-v2
model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")
embeddings = model.encode([
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium.",
])
similarities = model.similarity(embeddings, embeddings)
.. note::
Consider using the `Massive Textual Embedding Benchmark leaderboard <https://huggingface.co/spaces/mteb/leaderboard>`_ as an inspiration of strong Sentence Transformer models. Be wary:
- **Model sizes**: it is recommended to filter away the large models that might not be feasible without excessive hardware.
- **Experimentation is key**: models that perform well on the leaderboard do not necessarily do well on your tasks, it is **crucial** to experiment with various promising models.
.. tip::
Read `Sentence Transformer > Usage > Speeding up Inference <./usage/efficiency.html>`_ for tips on how to speed up inference of models by up to 2x-3x.
The following table provides an overview of a selection of our models. They have been extensively evaluated for their quality to embedded sentences (Performance Sentence Embeddings) and to embedded search queries & paragraphs (Performance Semantic Search).
The all-* models were trained on all available training data (more than 1 billion training pairs) and are designed as general purpose models. The sentence-transformers/all-mpnet-base-v2 model provides the best quality, while sentence-transformers/all-MiniLM-L6-v2 is 5 times faster and still offers good quality. Toggle All models to see all evaluated original models.
<iframe src="../../../_static/html/models_en_sentence_embeddings.html" height="600" style="width:100%; border:none;" title="Iframe Example"></iframe>The following models have been specifically trained for Semantic Search: Given a question / search query, these models are able to find relevant text passages. For more details, see Usage > Semantic Search.
.. sidebar:: Documentation
#. `sentence-transformers/multi-qa-mpnet-base-cos-v1 <https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-cos-v1>`_
#. :class:`SentenceTransformer <sentence_transformers.sentence_transformer.model.SentenceTransformer>`
#. :meth:`SentenceTransformer.encode <sentence_transformers.sentence_transformer.model.SentenceTransformer.encode>`
#. :meth:`SentenceTransformer.similarity <sentence_transformers.sentence_transformer.model.SentenceTransformer.similarity>`
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("sentence-transformers/multi-qa-mpnet-base-cos-v1")
query_embedding = model.encode("How big is London")
passage_embeddings = model.encode([
"London is known for its financial district",
"London has 9,787,426 inhabitants at the 2011 census",
"The United Kingdom is the fourth largest exporter of goods in the world",
])
similarity = model.similarity(query_embedding, passage_embeddings)
# => tensor([[0.4659, 0.6142, 0.2697]])
The following models have been trained on 215M question-answer pairs from various sources and domains, including StackExchange, Yahoo Answers, Google & Bing search queries and many more. These model perform well across many search tasks and domains.
These models were tuned to be used with the dot-product similarity score:
| Model | Performance Semantic Search (6 Datasets) | Queries (GPU / CPU) per sec. |
|---|---|---|
| sentence-transformers/multi-qa-mpnet-base-dot-v1 | 57.60 | 4,000 / 170 |
| sentence-transformers/multi-qa-distilbert-dot-v1 | 52.51 | 7,000 / 350 |
| sentence-transformers/multi-qa-MiniLM-L6-dot-v1 | 49.19 | 18,000 / 750 |
These models produce normalized vectors of length 1, which can be used with dot-product, cosine-similarity and Euclidean distance as the similarity functions:
| Model | Performance Semantic Search (6 Datasets) | Queries (GPU / CPU) per sec. |
|---|---|---|
| sentence-transformers/multi-qa-mpnet-base-cos-v1 | 57.46 | 4,000 / 170 |
| sentence-transformers/multi-qa-distilbert-cos-v1 | 52.83 | 7,000 / 350 |
| sentence-transformers/multi-qa-MiniLM-L6-cos-v1 | 51.83 | 18,000 / 750 |
The following models have been trained on the MSMARCO Passage Ranking Dataset, which contains 500k real queries from Bing search together with the relevant passages from various web sources. Given the diversity of the MSMARCO dataset, models also perform well on other domains.
These models were tuned to be used with the dot-product similarity score:
| Model | MSMARCO MRR@10 dev set | Performance Semantic Search (6 Datasets) | Queries (GPU / CPU) per sec. |
|---|---|---|---|
| sentence-transformers/msmarco-bert-base-dot-v5 | 38.08 | 52.11 | 4,000 / 170 |
| sentence-transformers/msmarco-distilbert-dot-v5 | 37.25 | 49.47 | 7,000 / 350 |
| sentence-transformers/msmarco-distilbert-base-tas-b | 34.43 | 49.25 | 7,000 / 350 |
These models produce normalized vectors of length 1, which can be used with dot-product, cosine-similarity and Euclidean distance as the similarity functions:
| Model | MSMARCO MRR@10 dev set | Performance Semantic Search (6 Datasets) | Queries (GPU / CPU) per sec. |
|---|---|---|---|
| sentence-transformers/msmarco-distilbert-cos-v5 | 33.79 | 44.98 | 7,000 / 350 |
| sentence-transformers/msmarco-MiniLM-L12-cos-v5 | 32.75 | 43.89 | 11,000 / 400 |
| sentence-transformers/msmarco-MiniLM-L6-cos-v5 | 32.27 | 42.16 | 18,000 / 750 |
The following models similar embeddings for the same texts in different languages. You do not need to specify the input language. Details are in our publication Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation. We used the following 50+ languages: ar, bg, ca, cs, da, de, el, en, es, et, fa, fi, fr, fr-ca, gl, gu, he, hi, hr, hu, hy, id, it, ja, ka, ko, ku, lt, lv, mk, mn, mr, ms, my, nb, nl, pl, pt, pt-br, ro, ru, sk, sl, sq, sr, sv, th, tr, uk, ur, vi, zh-cn, zh-tw.
These models find semantically similar sentences within one language or across languages:
Bitext mining describes the process of finding translated sentence pairs in two languages. If this is your use-case, the following model gives the best performance:
Extending a model to new languages is easy by following Training Examples > Multilingual Models.
Sentence Transformers supports multimodal models that can embed text alongside images, audio, or video into a joint vector space. This enables cross-modal tasks like text-to-image search, image-to-image search, image clustering, and zero-shot classification.
.. tip::
Multimodal models require additional dependencies. Install them with e.g. ``pip install -U "sentence-transformers[image]"`` for image support. See `Installation <../installation.html>`_ for all options.
You can check which modalities a model supports using the :attr:`~sentence_transformers.sentence_transformer.model.SentenceTransformer.modalities` property and :meth:`~sentence_transformers.sentence_transformer.model.SentenceTransformer.supports` method:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Qwen/Qwen3-VL-Embedding-2B", revision="refs/pr/23")
print(model.modalities)
# => ['text', 'image', 'video', 'message']
print(model.supports("image"))
# => True
print(model.supports("audio"))
# => False
Image-text models embed both images and text into the same vector space.
The original Sentence Transformers image-text models are based on CLIP. See Usage > Image Search for details on text-to-image search, image-to-image search, image clustering, and zero-shot image classification.
The following CLIP models are available with their respective Top 1 accuracy on zero-shot ImageNet validation dataset:
| Model | Top 1 Performance |
|---|---|
| sentence-transformers/clip-ViT-L-14 | 75.4 |
| sentence-transformers/clip-ViT-B-16 | 68.1 |
| sentence-transformers/clip-ViT-B-32 | 63.3 |
We further provide this multilingual text-image model:
Newer multimodal models use the unified :class:`~sentence_transformers.base.modules.Transformer` module, which automatically detects supported modalities from the underlying model and processor. These models typically support richer input formats, including interleaved image-text inputs via chat messages. Notable examples include:
VLM-based models support additional modalities and input formats compared to CLIP models. You can verify this with :attr:`~sentence_transformers.sentence_transformer.model.SentenceTransformer.modalities`:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Qwen/Qwen3-VL-Embedding-2B", revision="refs/pr/23")
print(model.modalities)
# => ['text', 'image', 'video', 'message']
The "message" modality means the model accepts chat-style message inputs, allowing you to combine text and images in a single input. This is how VLM-based models handle interleaved multimodal content.
The :class:`~sentence_transformers.base.modules.Transformer` module also supports audio and video modalities when the underlying model and processor provide them. Audio models accept file paths, numpy/torch arrays, dicts with ``"array"`` and ``"sampling_rate"`` keys, or ``torchcodec.AudioDecoder`` instances, while video models accept file paths, numpy/torch arrays, dicts with ``"array"`` and ``"video_metadata"`` keys, or ``torchcodec.VideoDecoder`` instances. As pretrained audio and video embedding models become available on the Hugging Face Hub, they can be loaded with :class:`~sentence_transformers.sentence_transformer.model.SentenceTransformer` just like any other model.
Some INSTRUCTOR models, such as hkunlp/instructor-large, are natively supported in Sentence Transformers. These models are special, as they are trained with instructions in mind. Notably, the primary difference between normal Sentence Transformer models and Instructor models is that the latter do not include the instructions themselves in the pooling step.
The following models work out of the box:
You can use these models like so:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("hkunlp/instructor-large")
embeddings = model.encode(
[
"Dynamical Scalar Degree of Freedom in Horava-Lifshitz Gravity",
"Comparison of Atmospheric Neutrino Flux Calculations at Low Energies",
"Fermion Bags in the Massive Gross-Neveu Model",
"QCD corrections to Associated t-tbar-H production at the Tevatron",
],
prompt="Represent the Medicine sentence for clustering: ",
)
print(embeddings.shape)
# => (4, 768)
For example, for information retrieval:
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim
model = SentenceTransformer("hkunlp/instructor-large")
query = "where is the food stored in a yam plant"
query_instruction = (
"Represent the Wikipedia question for retrieving supporting documents: "
)
corpus = [
'Yams are perennial herbaceous vines native to Africa, Asia, and the Americas and cultivated for the consumption of their starchy tubers in many temperate and tropical regions. The tubers themselves, also called "yams", come in a variety of forms owing to numerous cultivars and related species.',
"The disparate impact theory is especially controversial under the Fair Housing Act because the Act regulates many activities relating to housing, insurance, and mortgage loans—and some scholars have argued that the theory's use under the Fair Housing Act, combined with extensions of the Community Reinvestment Act, contributed to rise of sub-prime lending and the crash of the U.S. housing market and ensuing global economic recession",
"Disparate impact in United States labor law refers to practices in employment, housing, and other areas that adversely affect one group of people of a protected characteristic more than another, even though rules applied by employers or landlords are formally neutral. Although the protected classes vary by statute, most federal civil rights laws protect based on race, color, religion, national origin, and sex as protected traits, and some laws include disability status and other traits as well.",
]
corpus_instruction = "Represent the Wikipedia document for retrieval: "
query_embedding = model.encode(query, prompt=query_instruction)
corpus_embeddings = model.encode(corpus, prompt=corpus_instruction)
similarities = cos_sim(query_embedding, corpus_embeddings)
print(similarities)
# => tensor([[0.8835, 0.7037, 0.6970]])
All other Instructor models either 1) will not load as they refer to InstructorEmbedding in their modules.json or 2) require calling model.set_pooling_include_prompt(include_prompt=False) after loading.
SPECTER is a model trained on scientific citations and can be used to estimate the similarity of two publications. We can use it to find similar papers.