docs/source/en/model_doc/nomic_bert.md
This model was released on 2024-02-10 and added to Hugging Face Transformers on 2026-04-01.
NomicBERT was proposed in Nomic Embed: Training a Reproducible Long Context Text Embedder by Zach Nussbaum, John X. Morris, Brandon Duderstadt, and Andriy Mulyar. It is BERT-inspired with the most notable extension applying Rotary Position Embeddings to an encoder model.
The abstract from the paper is the following:
This technical report describes the training of nomic-embed-text-v1, the first fully reproducible, open-source, open-weights, open-data, 8192 context length English text embedding model that outperforms both OpenAI Ada-002 and OpenAI text-embedding-3-small on the short-context MTEB benchmark and the long context LoCo benchmark. We release the training code and model weights under an Apache 2.0 license. In contrast with other open-source models, we release the full curated training data and code that allows for full replication of nomic-embed-text-v1. [...]
This model was contributed by community member (Sonny Cooper). The original code for nomic-embed-text-v1.5 and nomic-embed-text-v1 can be found here.
The examples below demonstrate how to generate dense vector embeddings for different tasks using [AutoModel]. Each task requires a specific instruction prefix to optimize the embedding space for that use case.
import torch
import torch.nn.functional as F
from transformers import AutoModel, AutoTokenizer
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0]
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
model_id = "nomic-ai/nomic-embed-text-v1.5"
revision = "refs/pr/57"
tokenizer = AutoTokenizer.from_pretrained(model_id, revision=revision)
model = AutoModel.from_pretrained(model_id, revision=revision, device_map="auto")
sentences = ['search_document: TSNE is a dimensionality reduction algorithm created by Laurens van Der Maaten']
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt').to(model.device)
with torch.no_grad():
model_output = model(**encoded_input)
embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
embeddings = F.normalize(embeddings, p=2, dim=1)
print(embeddings)
import torch
import torch.nn.functional as F
from transformers import AutoModel, AutoTokenizer
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0]
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
model_id = "nomic-ai/nomic-embed-text-v1.5"
revision = "refs/pr/57"
tokenizer = AutoTokenizer.from_pretrained(model_id, revision=revision)
model = AutoModel.from_pretrained(model_id, revision=revision, device_map="auto")
sentences = ['search_query: Who is Laurens van Der Maaten?']
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt').to(model.device)
with torch.no_grad():
model_output = model(**encoded_input)
embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
embeddings = F.normalize(embeddings, p=2, dim=1)
print(embeddings)
import torch
import torch.nn.functional as F
from transformers import AutoModel, AutoTokenizer
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0]
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
model_id = "nomic-ai/nomic-embed-text-v1.5"
revision = "refs/pr/57"
tokenizer = AutoTokenizer.from_pretrained(model_id, revision=revision)
model = AutoModel.from_pretrained(model_id, revision=revision, device_map="auto")
sentences = ['clustering: the quick brown fox']
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt').to(model.device)
with torch.no_grad():
model_output = model(**encoded_input)
embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
embeddings = F.normalize(embeddings, p=2, dim=1)
print(embeddings)
import torch
import torch.nn.functional as F
from transformers import AutoModel, AutoTokenizer
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0]
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
model_id = "nomic-ai/nomic-embed-text-v1.5"
revision = "refs/pr/57"
tokenizer = AutoTokenizer.from_pretrained(model_id, revision=revision)
model = AutoModel.from_pretrained(model_id, revision=revision, device_map="auto")
sentences = ['classification: the quick brown fox']
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt').to(model.device)
with torch.no_grad():
model_output = model(**encoded_input)
embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
embeddings = F.normalize(embeddings, p=2, dim=1)
print(embeddings)
You can also increase the context length of the base model by giving dynamic rope parameters:
model_id = "nomic-ai/nomic-embed-text-v1.5"
revision = "refs/pr/57"
tokenizer = AutoTokenizer.from_pretrained(model_id, revision=revision, model_max_length=8192)
# dynamic RoPE for increased context
rope_parameters = {"rope_theta": 1000.0, "rope_type": "dynamic", "factor": 2.0}
model = AutoModel.from_pretrained(model_id, revision=revision, rope_parameters=rope_parameters, device_map="auto")
position_ids accordingly[[autodoc]] NomicBertConfig
[[autodoc]] NomicBertModel - forward
[[autodoc]] NomicBertForMaskedLM
[[autodoc]] NomicBertForSequenceClassification
[[autodoc]] NomicBertForTokenClassification - forward