Back to Llama Index

Deep Lake Vector Store Quickstart

docs/examples/vector_stores/DeepLakeIndexDemo.ipynb

0.14.213.3 KB
Original Source

<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/vector_stores/DeepLakeIndexDemo.ipynb" target="_parent"></a>

Deep Lake Vector Store Quickstart

Deep Lake can be installed using pip.

python
%pip install llama-index-vector-stores-deeplake
python
!pip install llama-index
!pip install deeplake

Next, let's import the required modules and set the needed environmental variables:

python
import os
import textwrap

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Document
from llama_index.vector_stores.deeplake import DeepLakeVectorStore

os.environ["OPENAI_API_KEY"] = "sk-********************************"
os.environ["ACTIVELOOP_TOKEN"] = "********************************"

We are going to embed and store one of Paul Graham's essays in a Deep Lake Vector Store stored locally. First, we download the data to a directory called data/paul_graham

python
import urllib.request

urllib.request.urlretrieve(
    "https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt",
    "data/paul_graham/paul_graham_essay.txt",
)

We can now create documents from the source data file.

python
# load documents
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
print(
    "Document ID:",
    documents[0].doc_id,
    "Document Hash:",
    documents[0].hash,
)

Finally, let's create the Deep Lake Vector Store and populate it with data. We use a default tensor configuration, which creates tensors with text (str), metadata(json), id (str, auto-populated), embedding (float32). Learn more about tensor customizability here.

python
from llama_index.core import StorageContext

dataset_path = "./dataset/paul_graham"

# Create an index over the documents
vector_store = DeepLakeVectorStore(dataset_path=dataset_path, overwrite=True)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

Deep Lake offers highly-flexible vector search and hybrid search options discussed in detail in these tutorials. In this Quickstart, we show a simple example using default options.

python
query_engine = index.as_query_engine()
response = query_engine.query(
    "What did the author learn?",
)
python
print(textwrap.fill(str(response), 100))
python
response = query_engine.query("What was a hard moment for the author?")
python
print(textwrap.fill(str(response), 100))
python
query_engine = index.as_query_engine()
response = query_engine.query("What was a hard moment for the author?")
print(textwrap.fill(str(response), 100))

Deleting items from the database

To find the id of a document to delete, you can query the underlying deeplake dataset directly

python
import deeplake

ds = deeplake.load(dataset_path)

idx = ds.id[0].numpy().tolist()
idx
python
index.delete(idx[0])