TiDB Property Graph Index

TiDB is a distributed SQL database, it is MySQL compatible and features horizontal scalability, strong consistency, and high availability. Currently it only supports Vector Search in TiDB Cloud Serverless.

In this nodebook, we will cover how to connect to a TiDB Serverless cluster and create a property graph index.

python

%pip install llama-index llama-index-graph-stores-tidb

Prepare TiDB Serverless Cluster

Get the db connection string from the Cluster Details page, for example:

mysql+pymysql://user:password@host:4000/dbname?ssl_verify_cert=true&ssl_verify_identity=true

TiDB Serverless requires TSL connection when using public endpoint.

Env Setup

We need just a few environment setups to get started.

python

import os

os.environ["OPENAI_API_KEY"] = "sk-proj-..."

python

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

python

import nest_asyncio

nest_asyncio.apply()

python

from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

Index Construction

python

from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor
from llama_index.graph_stores.tidb import TiDBPropertyGraphStore

graph_store = TiDBPropertyGraphStore(
    db_connection_string="mysql+pymysql://user:password@host:4000/dbname?ssl_verify_cert=true&ssl_verify_identity=true",
    drop_existing_table=True,
)

# Note: it can take a while to index the documents, especially if you have a large number of documents.
# Especially if you are connecting TiDB Serverless to a public endpoint, it depends on the distance between your server location and the TiDB serverless location.
index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
    kg_extractors=[
        SchemaLLMPathExtractor(
            llm=OpenAI(model="gpt-3.5-turbo", temperature=0.0)
        )
    ],
    property_graph_store=graph_store,
    show_progress=True,
)

Querying and Retrieval

python

retriever = index.as_retriever(
    include_text=False,  # include source text in returned nodes, default True
)

nodes = retriever.retrieve("What happened at Interleaf and Viaweb?")

for node in nodes:
    print(node.text)

python

query_engine = index.as_query_engine(include_text=True)

response = query_engine.query("What happened at Interleaf and Viaweb?")

print(str(response))

Loading from an existing Graph

If you have an existing graph (either created with LlamaIndex or otherwise), we can connect to and use it!

python

from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor
from llama_index.graph_stores.tidb import TiDBPropertyGraphStore

graph_store = TiDBPropertyGraphStore(
    db_connection_string="mysql+pymysql://user:password@host:4000/dbname?ssl_verify_cert=true&ssl_verify_identity=true",
)

index = PropertyGraphIndex.from_existing(
    property_graph_store=graph_store,
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.3),
    embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
)

From here, we can still insert more documents!

python

from llama_index.core import Document

document = Document(text="LlamaIndex is great!")

index.insert(document)

python

nodes = index.as_retriever(include_text=False).retrieve("LlamaIndex")

print(nodes[0].text)

For full details on construction, retrieval, querying of a property graph, see the full docs page.