llama-index-integrations/embeddings/llama-index-embeddings-ollama/README.md
The llama-index-embeddings-ollama package contains LlamaIndex integrations for generating embeddings using Ollama, a tool for running large language models locally.
Ollama allows you to run embedding models on your local machine, providing privacy, cost savings, and the ability to work offline. This integration enables you to use Ollama's embedding models seamlessly with LlamaIndex's vector store and retrieval systems.
To install the llama-index-embeddings-ollama package, run the following command:
pip install llama-index-embeddings-ollama
You'll also need to have Ollama installed and running on your machine. Visit ollama.ai to download and install Ollama.
Before using this integration, ensure you have:
http://localhost:11434 by default)ollama pull nomic-embed-text
# or
ollama pull embeddinggemma
from llama_index.embeddings.ollama import OllamaEmbedding
# Initialize the embedding model
embed_model = OllamaEmbedding(
model_name="nomic-embed-text", # or "embeddinggemma"
base_url="http://localhost:11434", # default Ollama URL
)
# Generate an embedding for a single text
text_embedding = embed_model.get_text_embedding("Hello, world!")
print(f"Embedding dimension: {len(text_embedding)}")
# Generate an embedding for a query
query_embedding = embed_model.get_query_embedding("What is AI?")
# Generate embeddings for multiple texts at once
texts = [
"The capital of France is Paris.",
"Python is a programming language.",
"Machine learning is a subset of AI.",
]
embeddings = embed_model.get_text_embeddings(texts)
print(f"Generated {len(embeddings)} embeddings")
The most common use case is to integrate Ollama embeddings with LlamaIndex's vector store:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.ollama import OllamaEmbedding
# Set the embedding model globally
Settings.embed_model = OllamaEmbedding(
model_name="nomic-embed-text",
base_url="http://localhost:11434",
)
# Load documents
documents = SimpleDirectoryReader("data").load_data()
# Create index with Ollama embeddings
index = VectorStoreIndex.from_documents(documents)
# Query the index
query_engine = index.as_query_engine()
response = query_engine.query("What is the main topic?")
print(response)
You can combine Ollama embeddings with other LLMs (including Ollama LLMs):
from llama_index.core import VectorStoreIndex, Settings
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.llms.ollama import Ollama
# Set both LLM and embedding model
Settings.llm = Ollama(model="llama3.1", base_url="http://localhost:11434")
Settings.embed_model = OllamaEmbedding(
model_name="nomic-embed-text",
base_url="http://localhost:11434",
)
# Your documents and indexing code here...
The OllamaEmbedding class supports several configuration options:
embed_model = OllamaEmbedding(
model_name="nomic-embed-text", # Required: Ollama model name
base_url="http://localhost:11434", # Optional: Ollama server URL (default: http://localhost:11434)
embed_batch_size=10, # Optional: Batch size for embeddings (default: 10)
keep_alive="5m", # Optional: How long to keep model in memory (default: "5m")
query_instruction=None, # Optional: Instruction to prepend to queries
text_instruction=None, # Optional: Instruction to prepend to text
ollama_additional_kwargs={}, # Optional: Additional kwargs for Ollama API
client_kwargs={}, # Optional: Additional kwargs for Ollama client
)
model_name (required): The name of the Ollama embedding model to use (e.g., "nomic-embed-text", "embeddinggemma")base_url (optional): The base URL of your Ollama server. Defaults to "http://localhost:11434"embed_batch_size (optional): Number of texts to process in each batch. Must be between 1 and 2048. Defaults to 10keep_alive (optional): Controls how long the model stays loaded in memory after a request. Can be a duration string (e.g., "5m", "10s") or a number of seconds. Defaults to "5m"query_instruction (optional): Instruction text to prepend to query strings before embeddingtext_instruction (optional): Instruction text to prepend to document text before embeddingollama_additional_kwargs (optional): Additional keyword arguments to pass to the Ollama APIclient_kwargs (optional): Additional keyword arguments for the Ollama client (e.g., authentication headers)Some embedding models benefit from prepending instructions to queries and documents. This can improve retrieval quality:
embed_model = OllamaEmbedding(
model_name="nomic-embed-text",
query_instruction="Represent the question for retrieving supporting documents:",
text_instruction="Represent the document for retrieval:",
)
# The instructions will be automatically prepended
query_embedding = embed_model.get_query_embedding("What is machine learning?")
# Internally processes: "Represent the question for retrieving supporting documents: What is machine learning?"
text_embedding = embed_model.get_text_embedding(
"Machine learning is a method of data analysis."
)
# Internally processes: "Represent the document for retrieval: Machine learning is a method of data analysis."
The integration supports asynchronous operations for better performance:
import asyncio
from llama_index.embeddings.ollama import OllamaEmbedding
embed_model = OllamaEmbedding(model_name="nomic-embed-text")
async def main():
# Async single embedding
embedding = await embed_model.aget_text_embedding("Hello, world!")
# Async batch embeddings
embeddings = await embed_model.aget_text_embeddings(
[
"Text 1",
"Text 2",
"Text 3",
]
)
# Async query embedding
query_embedding = await embed_model.aget_query_embedding("What is AI?")
asyncio.run(main())
If you're running Ollama on a remote server, specify the base_url:
embed_model = OllamaEmbedding(
model_name="nomic-embed-text",
base_url="http://your-remote-server:11434",
)
Popular embedding models available in Ollama include:
nomic-embed-text: General-purpose embedding modelembeddinggemma: Google's Gemma-based embedding modelmxbai-embed-large: Large embedding model for better qualityPull a model using:
ollama pull nomic-embed-text
For more detailed examples, see the Ollama Embeddings notebook in the LlamaIndex documentation.
If you encounter connection errors, ensure:
ollama serve or check the service statusbase_url matches your Ollama server addressollama pull <model-name>If you get a "model not found" error:
ollama listollama pull <model-name>This package is licensed under the MIT License. See the LICENSE file for details.