docs/examples/docstore/AzureDocstoreDemo.ipynb
This guide shows you how to use our AzureDocumentStore and AzureIndexStore abstractions which are backed by Azure Table Storage. By putting nodes in the docstore, this allows you to define multiple indices over the same underlying docstore, instead of duplicating data across indices.
<a href="https://colab.research.google.com/drive/1qtGtyxoIM6rnqxxrTsfixoez8fZy6T2_?usp=sharing" target="_parent"></a>
If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.
%pip install matplotlib
%pip install llama-index
%pip install llama-index-embeddings-azure-openai
%pip install llama-index-llms-azure-openai
%pip install llama-index-storage-kvstore-azure
%pip install llama-index-storage-docstore-azure
%pip install llama-index-storage-index-store-azure
import nest_asyncio
nest_asyncio.apply()
import logging
import sys
import os
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
logging.getLogger("azure.core.pipeline.policies.http_logging_policy").setLevel(
logging.WARNING
)
from llama_index.core import SimpleDirectoryReader, StorageContext
from llama_index.core import VectorStoreIndex, SimpleKeywordTableIndex
from llama_index.core import SummaryIndex
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.core.response.notebook_utils import display_response
from llama_index.core import Settings
from llama_index.storage.kvstore.azure.base import ServiceMode
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
reader = SimpleDirectoryReader("./data/paul_graham/")
documents = reader.load_data()
from llama_index.core.node_parser import SentenceSplitter
nodes = SentenceSplitter().get_nodes_from_documents(documents)
from llama_index.storage.docstore.azure import AzureDocumentStore
from llama_index.storage.index_store.azure import AzureIndexStore
The AzureDocumentStore and AzureIndexStore classes provide several helper methods from_connection_string, from_account_and_key, from_sas_token, from_aad_token... to simplify connecting to our Azure Table Storage service.
storage_context = StorageContext.from_defaults(
docstore=AzureDocumentStore.from_account_and_key(
"",
"",
service_mode=ServiceMode.STORAGE,
),
index_store=AzureIndexStore.from_account_and_key(
"",
"",
service_mode=ServiceMode.STORAGE,
),
)
storage_context.docstore.add_documents(nodes)
If we navigate to our Azure Table Storage, we should now be able to see our documents in the table.
In staying with the Azure theme, let's define our Azure OpenAI embedding and LLM models.
Settings.embed_model = AzureOpenAIEmbedding(
model="text-embedding-ada-002",
deployment_name="text-embedding-ada-002",
api_key="",
azure_endpoint="",
api_version="2024-03-01-preview",
)
Settings.llm = AzureOpenAI(
model="gpt-4",
deployment_name="gpt-4",
api_key="",
azure_endpoint="",
api_version="2024-03-01-preview",
)
Each index uses the same underlying Nodes.
summary_index = SummaryIndex(nodes, storage_context=storage_context)
We should now be able to see our summary_index in Azure Table Storage.
vector_index = VectorStoreIndex(nodes, storage_context=storage_context)
We should now see an entry for our vector_index in Azure Table Storage.
keyword_table_index = SimpleKeywordTableIndex(
nodes, storage_context=storage_context
)
We should now see an entry our keyword_table_index in Azure Table Storage
# NOTE: the docstore still has the same nodes
len(storage_context.docstore.docs)
# NOTE: docstore and index_store are persisted in Azure Table Storage.
# NOTE: This call is only needed to persist the in-memory `SimpleVectorStore`, created by `VectorStoreIndex`, to disk.
storage_context.persist()
# note down index IDs
list_id = summary_index.index_id
vector_id = vector_index.index_id
keyword_id = keyword_table_index.index_id
from llama_index.core import load_index_from_storage
# re-create storage context
storage_context = StorageContext.from_defaults(
persist_dir="./storage",
docstore=AzureDocumentStore.from_account_and_key(
"",
"",
service_mode=ServiceMode.STORAGE,
),
index_store=AzureIndexStore.from_account_and_key(
"",
"",
service_mode=ServiceMode.STORAGE,
),
)
# load indices
summary_index = load_index_from_storage(
storage_context=storage_context, index_id=list_id
)
vector_index = load_index_from_storage(
storage_context=storage_context, index_id=vector_id
)
keyword_table_index = load_index_from_storage(
storage_context=storage_context, index_id=keyword_id
)
query_engine = summary_index.as_query_engine()
list_response = query_engine.query("What is a summary of this document?")
display_response(list_response)
query_engine = vector_index.as_query_engine()
vector_response = query_engine.query("What did the author do growing up?")
display_response(vector_response)
query_engine = keyword_table_index.as_query_engine()
keyword_response = query_engine.query(
"What did the author do after his time at YC?"
)
display_response(keyword_response)