Back to Llama Index

The `ObjectIndex` Class

docs/examples/objects/object_index.ipynb

0.14.217.9 KB
Original Source

<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/objects/object_index.ipynb" target="_parent"></a>

The ObjectIndex Class

The ObjectIndex class is one that allows for the indexing of arbitrary Python objects. As such, it is quite flexible and applicable to a wide-range of use cases. As examples:

To construct an ObjectIndex, we require an index as well as another abstraction, namely ObjectNodeMapping. This mapping, as its name suggests, provides the means to go between node and the associated object, and vice versa. Alternatively, there exists a from_objects() class method, that can conveniently construct an ObjectIndex from a set of objects.

In this notebook, we'll quickly cover how you can build an ObjectIndex using a SimpleObjectNodeMapping.

python
from llama_index.core import Settings

Settings.embed_model = "local"
python
from llama_index.core import VectorStoreIndex
from llama_index.core.objects import ObjectIndex, SimpleObjectNodeMapping

# some really arbitrary objects
obj1 = {"input": "Hey, how's it going"}
obj2 = ["a", "b", "c", "d"]
obj3 = "llamaindex is an awesome library!"
arbitrary_objects = [obj1, obj2, obj3]

# (optional) object-node mapping
obj_node_mapping = SimpleObjectNodeMapping.from_objects(arbitrary_objects)
nodes = obj_node_mapping.to_nodes(arbitrary_objects)

# object index
object_index = ObjectIndex(
    index=VectorStoreIndex(nodes=nodes),
    object_node_mapping=obj_node_mapping,
)

# object index from_objects (default index_cls=VectorStoreIndex)
object_index = ObjectIndex.from_objects(
    arbitrary_objects, index_cls=VectorStoreIndex
)

As a retriever

With the object_index in hand, we can use it as a retriever, to retrieve against the index objects.

python
object_retriever = object_index.as_retriever(similarity_top_k=1)
object_retriever.retrieve("llamaindex")

We can also add node-postprocessors to an object index retriever, for easy convience to things like rerankers and more.

python
%pip install llama-index-postprocessor-colbert-rerank
python
from llama_index.postprocessor.colbert_rerank import ColbertRerank

retriever = object_index.as_retriever(
    similarity_top_k=2, node_postprocessors=[ColbertRerank(top_n=1)]
)
retriever.retrieve("a random list object")

Using a Storage Integration (i.e. Chroma)

The object index supports integrations with any existing storage backend in LlamaIndex.

The following section walks through how to set that up using Chroma as an example.

python
%pip install llama-index-vector-stores-chroma
python
from llama_index.core import StorageContext, VectorStoreIndex
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

db = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db.get_or_create_collection("quickstart2")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

object_index = ObjectIndex.from_objects(
    arbitrary_objects,
    index_cls=VectorStoreIndex,
    storage_context=storage_context,
)
python
object_retriever = object_index.as_retriever(similarity_top_k=1)
object_retriever.retrieve("llamaindex")

Now, lets "reload" the index

python
db = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

object_index = ObjectIndex.from_objects_and_index(arbitrary_objects, index)
python
object_retriever = object_index.as_retriever(similarity_top_k=1)
object_retriever.retrieve("llamaindex")

Note that when we reload the index, we still have to pass the objects, since those are not saved in the actual index/vector db.

[Advanced] Customizing the Mapping

For specialized cases where you want full control over how objects are mapped to nodes, you can also provide a to_node_fn() and from_node_fn() hook.

This is useful for when you are converting specialized objects, or want to dynamically create objects at runtime rather than keeping them in memory.

A small example is shown below.

python
from llama_index.core.schema import TextNode

my_objects = {
    str(hash(str(obj))): obj for i, obj in enumerate(arbitrary_objects)
}


def from_node_fn(node):
    return my_objects[node.id]


def to_node_fn(obj):
    return TextNode(id=str(hash(str(obj))), text=str(obj))


object_index = ObjectIndex.from_objects(
    arbitrary_objects,
    index_cls=VectorStoreIndex,
    from_node_fn=from_node_fn,
    to_node_fn=to_node_fn,
)

object_retriever = object_index.as_retriever(similarity_top_k=1)

object_retriever.retrieve("llamaindex")

Persisting ObjectIndex to Disk with Objects

When it comes to persisting the ObjectIndex, we have to handle both the index as well as the object-node mapping. Persisting the index is straightforward and can be handled by usual means (e.g., see this guide). However, it's a bit of a different story when it comes to persisting the ObjectNodeMapping. Since we're indexing aribtrary Python objects with the ObjectIndex, it may be the case (and perhaps more often than we'd like), that the arbitrary objects are not serializable. In those cases, you can persist the index, but the user would have to maintain a way to re-construct the ObjectNodeMapping to be able to re-construct the ObjectIndex. For convenience, there are the persist and from_persist_dir methods on the ObjectIndex that will attempt to persist and load a previously saved ObjectIndex, respectively.

Happy example

python
# persist to disk (no path provided will persist to the default path ./storage)
object_index.persist()
python
# re-loading (no path provided will attempt to load from the default path ./storage)
reloaded_object_index = ObjectIndex.from_persist_dir()
python
reloaded_object_index._object_node_mapping.obj_node_mapping
python
object_index._object_node_mapping.obj_node_mapping

Example of when it doesn't work

python
from llama_index.core.tools import FunctionTool
from llama_index.core import SummaryIndex
from llama_index.core.objects import SimpleToolNodeMapping


def add(a: int, b: int) -> int:
    """Add two integers and returns the result integer"""
    return a + b


def multiply(a: int, b: int) -> int:
    """Multiple two integers and returns the result integer"""
    return a * b


multiply_tool = FunctionTool.from_defaults(fn=multiply)
add_tool = FunctionTool.from_defaults(fn=add)

object_mapping = SimpleToolNodeMapping.from_objects([add_tool, multiply_tool])
object_index = ObjectIndex.from_objects(
    [add_tool, multiply_tool], object_mapping
)
python
# trying to persist the object_mapping directly will raise an error
object_mapping.persist()
python
# try to persist the object index here will throw a Warning to the user
object_index.persist()

In this case, only the index has been persisted. In order to re-construct the ObjectIndex as mentioned above, we will need to manually re-construct ObjectNodeMapping and supply that to the ObjectIndex.from_persist_dir method.

python
reloaded_object_index = ObjectIndex.from_persist_dir(
    object_node_mapping=object_mapping  # without this, an error will be thrown
)