Back to Llama Index

Nomic Embedding

docs/examples/embeddings/nomic.ipynb

0.14.213.1 KB
Original Source

<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/embeddings/nomic.ipynb" target="_parent"></a>

Nomic Embedding

Nomic has released v1.5 🪆🪆🪆 is capable of variable sized embeddings with matryoshka learning and an 8192 context, embedding dimensions between 64 and 768.

In this notebook, we will explore using Nomic v1.5 embedding at different dimensions.

Installation

python
%pip install -U llama-index llama-index-embeddings-nomic

Setup API Keys

python
nomic_api_key = "<NOMIC API KEY>"
python
import nest_asyncio

nest_asyncio.apply()

from llama_index.embeddings.nomic import NomicEmbedding

With dimension at 128

python
embed_model = NomicEmbedding(
    api_key=nomic_api_key,
    dimensionality=128,
    model_name="nomic-embed-text-v1.5",
)

embedding = embed_model.get_text_embedding("Nomic Embeddings")
python
print(len(embedding))
python
embedding[:5]

With dimension at 256

python
embed_model = NomicEmbedding(
    api_key=nomic_api_key,
    dimensionality=256,
    model_name="nomic-embed-text-v1.5",
)

embedding = embed_model.get_text_embedding("Nomic Embeddings")
python
print(len(embedding))
python
embedding[:5]

With dimension at 768

python
embed_model = NomicEmbedding(
    api_key=nomic_api_key,
    dimensionality=768,
    model_name="nomic-embed-text-v1.5",
)

embedding = embed_model.get_text_embedding("Nomic Embeddings")
python
print(len(embedding))
python
embedding[:5]

You can still use v1 Nomic Embeddings

It has 768 fixed embedding dimensions

python
embed_model = NomicEmbedding(
    api_key=nomic_api_key, model_name="nomic-embed-text-v1"
)

embedding = embed_model.get_text_embedding("Nomic Embeddings")
python
print(len(embedding))
python
embedding[:5]

Let's Build end to end RAG pipeline with Nomic v1.5 Embedding.

We will use OpenAI for Generation step.

Set Embedding model and llm.

python
from llama_index.core import settings
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI

import os

os.environ["OPENAI_API_KEY"] = "<YOUR OPENAI API KEY>"

embed_model = NomicEmbedding(
    api_key=nomic_api_key,
    dimensionality=128,
    model_name="nomic-embed-text-v1.5",
)

llm = OpenAI(model="gpt-3.5-turbo")

settings.llm = llm
settings.embed_model = embed_model

Download Data

python
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

Load data

python
documents = SimpleDirectoryReader("./data/paul_graham").load_data()

Index creation

python
index = VectorStoreIndex.from_documents(documents)

Query Engine

python
query_engine = index.as_query_engine()
python
response = query_engine.query("what did author do growing up?")
print(response)