examples/rag-docling/docling-quickstart.ipynb
%%capture
pip install transformers sentence_transformers openai
In this tutorial, we'll use Feast to inject documents into the context of an LLM (Large Language Model) to power a RAG Application (Retrieval Augmented Generation) with Milvus as the online vector database.
Feast solves several common issues in this flow:
We will:
%%capture
! pip install feast[nlp] -U -q
! echo "Please restart your runtime now (Runtime -> Restart runtime). This ensures that the correct dependencies are loaded."
Reminder: Please restart your runtime after installing Feast (Runtime -> Restart runtime). This ensures that the correct dependencies are loaded.
A feature repository is a directory that contains the configuration of the feature store and individual features. This configuration is written as code (Python/YAML) and it's highly recommended that teams track it centrally using git. See Feature Repository for a detailed explanation of feature repositories.
The easiest way to create a new feature repository to use the feast init command in your terminal. For this RAG demo, you do not need to initialize a feast repo. We have already provided a complete feature repository for you in the current directory (check feature_repo) with all the necessary Milvus configurations set up and ready to use.
import feast
import warnings
warnings.filterwarnings('ignore')
Let's take a look at the demo repo itself. It breaks down into
data/ contains raw demo parquet dataexample_repo.py contains demo feature definitionsfeature_store.yaml contains a demo setup configuring where data sources aretest_workflow.py showcases how to run all key Feast commands, including defining, retrieving, and pushing features.
python test_workflow.py.%cd feature_repo/
!ls -R
Let's inspect the setup of the project in feature_store.yaml.
The key line defining the overall architecture of the feature store is the provider.
The provider value sets default offline and online stores.
Valid values for provider in feature_store.yaml are:
Note that there are many other offline / online stores Feast works with, including Azure, Hive, Trino, and PostgreSQL via community plugins. See https://docs.feast.dev/roadmap for all supported connectors.
A custom setup can also be made by following Customizing Feast
!pygmentize feature_store.yaml
The raw feature data we have in this demo is stored in a local parquet file. The dataset Wikipedia summaries of diferent cities.
import pandas as pd
df = pd.read_parquet("./data/docling_samples.parquet")
mdf = pd.read_parquet("./data/metadata_samples.parquet")
df['chunk_embedding'] = df['vector'].apply(lambda x: x.tolist())
embedding_length = len(df['vector'][0])
print(f'embedding length = {embedding_length}')
df['created'] = pd.Timestamp.now()
mdf['created'] = pd.Timestamp.now()
from IPython.display import display
display(df.head())
display(mdf.head())
feast apply scans python files in the current directory for feature/entity definitions and deploys infrastructure according to feature_store.yaml.
Let's inspect what example_repo.py looks like:
Now we run feast apply to register the feature views and entities defined in example_repo.py, and sets up SQLite online store tables. Note that we had previously specified SQLite as the online store in feature_store.yaml by specifying a local provider.
%rm -rf .ipynb_checkpoints/
! feast apply
from datetime import datetime
from feast import FeatureStore
store = FeatureStore(repo_path=".")
write_to_online_storeWe now serialize the latest values of features since the beginning of time to prepare for serving. Note, write_to_online_store serializes all new features since the last write_to_online_store call, or since the time provided minus the ttl timedelta.
df.head()
store.write_to_online_store(feature_view_name='docling_feature_view', df=df)
# Turning off transformation on writes is as simple as changing the default behavior
store.write_to_online_store(
feature_view_name='docling_transform_docs',
df=df[df['document_id']!='doc-1'],
transform_on_write=False,
)
# Now we can transform a raw PDF on the fly
store.write_to_online_store(
feature_view_name='docling_transform_docs',
df=mdf[mdf['document_id']=='doc-1'],
transform_on_write=True, # this is the default
)
Note that now there are online_store.db and registry.db, which store the materialized features and schema information, respectively.
pymilvus_client = store._provider._online_store._connect(store.config)
COLLECTION_NAME = [c for c in pymilvus_client.list_collections() if 'docling_transform_docs' in c][0]
milvus_query_result = pymilvus_client.query(
collection_name=COLLECTION_NAME,
filter="document_id == 'doc-1'",
limit=1000,
)
pd.DataFrame(milvus_query_result).head()
Note from the above command that the online store indexes by entity_key.
Entity keys include a list of all entities needed (e.g. all relevant primary keys) to generate the feature vector. In this case, this is a serialized version of the document_id. We use this later to fetch all features for a given driver at inference time.
During inference (e.g., during when a user submits a chat message) we need to embed the input text. This can be thought of as a feature transformation of the input data. In this example, we'll do this with a small Sentence Transformer from Hugging Face.
from example_repo import embed_text
embed_text("this is an example sentence")[0:10]
At inference time, we need to use vector similarity search through the document embeddings from the online feature store using retrieve_online_documents_v2() while passing the embedded query. These feature vectors can then be fed into the context of the LLM.
sample_query = df['raw_chunk_markdown'].values[0]
print(sample_query)
# Note we can enhance this special case to embed within the feature server, optionally.
query_embedding = embed_text(sample_query)
docling_feature_view FeatureView# Retrieve top k documents
context_data = store.retrieve_online_documents_v2(
features=[
"docling_feature_view:vector",
"docling_feature_view:file_name",
"docling_feature_view:raw_chunk_markdown",
"docling_feature_view:chunk_id",
],
query=query_embedding,
top_k=3,
distance_metric='COSINE',
).to_df()
display(context_data)
docling_transform_docs FeatureView# Retrieve top k documents from the transformed data
context_data = store.retrieve_online_documents_v2(
features=[
"docling_transform_docs:vector",
"docling_transform_docs:document_id",
"docling_transform_docs:chunk_text",
"docling_transform_docs:chunk_id",
],
query=query_embedding,
top_k=3,
distance_metric='COSINE',
).to_df()
display(context_data)
FeatureView vs. OnDemandFeatureView for Vector SearchIf you look in example_repo.py you'll notice that docling_example_feature_view and docling_transform_docs are very similar
with the exception of docling_transform_docs having the schema defined in the @on_demand_feature_view decorator and a function
(i.e., a feature transformation) implemented after the name declaration.
On the backend, Feast orchestrates the execution of this transformation within the Feature Server so that Feast can transform your documents with Docling via API and make your docs available for vector similarity search after transformation and insertion to the online store.
def format_documents(context_df):
output_context = ""
# Remove duplicates based on 'chunk_id' (ensuring unique document chunks)
unique_documents = context_df.drop_duplicates(subset=["chunk_id"])["chunk_text"]
# Format each document
for i, document_text in enumerate(unique_documents):
output_context += f"****START DOCUMENT {i}****\n"
output_context += f"document = {{ {document_text.strip()} }}\n"
output_context += f"****END DOCUMENT {i}****\n\n"
return output_context.strip()
RAG_CONTEXT = format_documents(context_data)
print(RAG_CONTEXT)
FULL_PROMPT = f"""
You are an assistant for answering questions about a series of documents. You will be provided documentation from different documents. Provide a conversational answer.
If you don't know the answer, just say "I do not know." Don't make up an answer.
Here are document(s) you should use when answer the users question:
{RAG_CONTEXT}
"""
question = 'Who are the authors of the paper?'
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("OPENAI_API_KEY"),
)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": FULL_PROMPT},
{"role": "user", "content": question}
],
)
print('\n'.join([c.message.content for c in response.choices]))