candle-examples/examples/nomic-bert/README.md
nomic-embed-text-v1.5 is a text embedding model based on the NomicBert architecture. It produces 768-dimensional embeddings suitable for semantic search, retrieval, and clustering.
Key architectural differences from standard BERT:
Compute the embedding for a single prompt. Model weights are downloaded from the hub on the first run.
cargo run --example nomic-bert --release -- --prompt "Here is a test sentence"
> Embedding (first 10 dims):
> [0] -0.030893
> [1] 0.038772
> [2] -0.171375
> ...
When run without --prompt, the example computes cosine similarities between a
set of hardcoded sentences and reports the most similar pairs.
cargo run --example nomic-bert --release
> Top cosine similarities:
> 0.9664 'The new movie is awesome' <-> 'The new movie is so great'
> 0.7377 'The cat sits outside' <-> 'The cat plays in the garden'
> 0.5764 'I love pasta' <-> 'Do you like pizza?'
> 0.5031 'A man is playing guitar' <-> 'A woman watches TV'
> 0.4781 'A man is playing guitar' <-> 'The cat plays in the garden'
nomic-embed-text-v1.5 was trained with task prefixes. Adding them is optional but
improves retrieval quality. Use --prefix to prepend a prefix to every input:
# For documents/passages to be searched over
cargo run --example nomic-bert --release -- \
--prefix "search_document: " \
--prompt "Dragonwell is a classic Chinese green tea."
# For search queries
cargo run --example nomic-bert --release -- \
--prefix "search_query: " \
--prompt "sweet floral white tea"
Available prefixes: search_document: , search_query: , clustering: ,
classification: .