Back to Llama Index

NebulaGraph Property Graph Index

docs/examples/property_graph/property_graph_nebula.ipynb

0.14.215.4 KB
Original Source

NebulaGraph Property Graph Index

NebulaGraph is an open-source distributed graph database built for super large-scale graphs with milliseconds of latency.

If you already have an existing graph, please skip to the end of this notebook.

python
%pip install llama-index llama-index-graph-stores-nebula jupyter-nebulagraph

Docker Setup

To launch NebulaGraph locally, first ensure you have docker installed. Then, you can launch the database with the following docker command.

bash
mkdir nebula-docker-compose
cd nebula-docker-compose
curl --output docker-compose.yaml https://raw.githubusercontent.com/vesoft-inc/nebula-docker-compose/master/docker-compose-lite.yaml
docker compose up 

After this, you are ready to create your first property graph!

Other options/details for deploying NebulaGraph can be found in the docs:

python
# load NebulaGraph Jupyter extension to enable %ngql magic
%load_ext ngql
# connect to NebulaGraph service
%ngql --address 127.0.0.1 --port 9669 --user root --password nebula
# create a graph space(think of a Database Instance) named: llamaindex_nebula_property_graph
%ngql CREATE SPACE IF NOT EXISTS llamaindex_nebula_property_graph(vid_type=FIXED_STRING(256));
python
# use the graph space, which is similar to "use database" in MySQL
# The space was created in async way, so we need to wait for a while before using it, retry it if failed
%ngql USE llamaindex_nebula_property_graph;

Env Setup

We need just a few environment setups to get started.

python
import os

os.environ["OPENAI_API_KEY"] = "sk-proj-..."
python
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
python
import nest_asyncio

nest_asyncio.apply()
python
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

We choose using gpt-4o and local embedding model intfloat/multilingual-e5-large . You can change to what you like, by editing the following lines:

python
%pip install llama-index-embeddings-huggingface
python
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

Settings.llm = OpenAI(model="gpt-4o", temperature=0.3)
Settings.embed_model = HuggingFaceEmbedding(
    model_name="intfloat/multilingual-e5-large"
)
# Settings.embed_model = OpenAIEmbedding(model_name="text-embedding-3-small")

Index Construction

Prepare property graph store

python
from llama_index.graph_stores.nebula import NebulaPropertyGraphStore

graph_store = NebulaPropertyGraphStore(
    space="llamaindex_nebula_property_graph", overwrite=True
)

And vector store:

python
from llama_index.core.vector_stores.simple import SimpleVectorStore

vec_store = SimpleVectorStore()

Finally, build the index!

python
from llama_index.core.indices.property_graph import PropertyGraphIndex
from llama_index.core.storage.storage_context import StorageContext
from llama_index.llms.openai import OpenAI

index = PropertyGraphIndex.from_documents(
    documents,
    property_graph_store=graph_store,
    vector_store=vec_store,
    show_progress=True,
)

index.storage_context.vector_store.persist("./data/nebula_vec_store.json")

Now that the graph is created, we can explore it with jupyter-nebulagraph

python
%ngql SHOW TAGS
python
%ngql SHOW EDGES
python
%ngql MATCH p=(v:Entity__)-[r]->(t:Entity__) RETURN v.Entity__.name AS src, r.label AS relation, t.Entity__.name AS dest LIMIT 15;
python
%ngql MATCH p=(v:Entity__)-[r]->(t:Entity__) RETURN p LIMIT 2;
python
%ng_draw

Querying and Retrieval

python
retriever = index.as_retriever(
    include_text=False,  # include source text in returned nodes, default True
)

nodes = retriever.retrieve("What happened at Interleaf and Viaweb?")

for node in nodes:
    print(node.text)
python
query_engine = index.as_query_engine(include_text=True)

response = query_engine.query("What happened at Interleaf and Viaweb?")

print(str(response))

Loading from an existing Graph

If you have an existing graph, we can connect to and use it!

python
from llama_index.graph_stores.nebula import NebulaPropertyGraphStore

graph_store = NebulaPropertyGraphStore(
    space="llamaindex_nebula_property_graph"
)

from llama_index.core.vector_stores.simple import SimpleVectorStore

vec_store = SimpleVectorStore.from_persist_path("./data/nebula_vec_store.json")

index = PropertyGraphIndex.from_existing(
    property_graph_store=graph_store,
    vector_store=vec_store,
)

From here, we can still insert more documents!

python
from llama_index.core import Document

document = Document(text="LlamaIndex is great!")

index.insert(document)
python
nodes = index.as_retriever(include_text=False).retrieve("LlamaIndex")

print(nodes[0].text)