examples/distributed/kuzu/README.md
Kùzu is an in-process property graph database management system built for query speed and scalability.
It provides an integration with PyG via the remote backend interface of PyG.
The Python API of Kùzu outputs a torch_geometric.data.FeatureStore and a torch_geometric.data.GraphStore that can be plugged directly into existing familiar PyG interfaces such as NeighborLoader and enables training GNNs directly on graphs stored in Kùzu.
This is particularly useful if you would like to train graphs that don't fit on your CPU's memory.
You can install Kùzu as follows:
pip install kuzu
The API and design documentation of Kùzu can be found at https://kuzudb.com/docs/.
We provide the following examples to showcase the usage of Kùzu remote backend within PyG:
The PubMed example is hosted on Google Colab. In this example, we work on a small dataset for demonstrative purposes. The PubMed dataset consists of 19,717 papers as nodes and 88,648 citation relationships between them.
papers_100MThis example shows how to use the remote backend feature of Kùzu to work with a large graph of papers and citations on a single machine.
The data used in this example is ogbn-papers100M from the Open Graph Benchmark.
The dataset contains approximately 111 million nodes and 1.6 billion edges.