src/sentry/data/models/README.md
This directory contains machine learning models used by Sentry.
This directory contains the tokenizer model for the Jina AI embeddings v2 base English model.
jinaai/jina-embeddings-v2-base-enjina-embeddings-v2-base-en/tokenizer.jsonsrc/sentry/seer/similarity/utils.py for tokenizing stacktrace textTo update or re-download the tokenizer model, you can run:
from tokenizers import Tokenizer
import os
from sentry.constants import DATA_ROOT
# Download and save the model
tokenizer = Tokenizer.from_pretrained("jinaai/jina-embeddings-v2-base-en")
model_path = os.path.join(DATA_ROOT, "models", "jina-embeddings-v2-base-en", "tokenizer.json")
os.makedirs(os.path.dirname(model_path), exist_ok=True)
tokenizer.save(model_path)