docs/examples/usecases/fastapi_rag_ollama.ipynb
This example demonstrates how to build a simple Retrieval-Augmented Generation (RAG) API using LlamaIndex and FastAPI, powered by a local LLM via Ollama.
llama3 for text generation and nomic-embed-text for embeddings via OllamaPull a model using Ollama:
ollama pull llama3
Install dependencies:
pip install -r requirements.txt
uvicorn app:app --reload
Alternatively, you can test the API using FastAPI's built-in Swagger UI:
Open your browser and visit: http://localhost:8000/docs
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"query": "What is this example about?"}'
! pip install fastapi uvicorn llama-index-llms-ollama llama-index-embeddings-ollama ollama
from fastapi import FastAPI
from pydantic import BaseModel
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.settings import Settings
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding
app = FastAPI(title="LlamaIndex FastAPI RAG (Ollama)")
# Configure local LLM and embedding model via Ollama
Settings.llm = Ollama(model="llama3")
Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text")
# Load documents and build index at startup
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
class QueryRequest(BaseModel):
query: str
@app.post("/query")
def query_documents(request: QueryRequest):
# Query indexed documents using a local LLM via Ollama.
response = query_engine.query(request.query)
return {"response": str(response)}