docs/index/methods.md
GraphRAG is a platform for our research into RAG indexing methods that produce optimal context window content for language models. We have a standard indexing pipeline that uses a language model to extract the graph that our memory model is based upon. We may introduce additional indexing methods from time to time. This page documents those options.
This is the method described in the original blog post. Standard uses a language model for all reasoning tasks:
graphrag index --method standard. This is the default method, so the method param can be omitted on the command line.
FastGraphRAG is a method that substitutes some of the language model reasoning for traditional natural language processing (NLP) methods. This is a hybrid technique that we developed as a faster and cheaper indexing alternative:
graphrag index --method fast
FastGraphRAG has a handful of NLP options built in. By default we use NLTK + regular expressions for the noun phrase extraction, which is very fast but primarily suitable for English. We have built in two additional methods using spaCy: semantic parsing and CFG. We use the en_core_web_md model by default for spaCy, but note that you can reference any supported model that you have installed.
Note that we also generally configure the text chunking to produce much smaller chunks (50-100 tokens). This results in a better co-occurrence graph.
⚠️ Note on SpaCy models:
This package requires SpaCy models to function correctly. If the required model is not installed, the package will automatically download and install it the first time it is used.
You can install it manually by running python -m spacy download <model_name>, for example python -m spacy download en_core_web_md.
Standard GraphRAG provides a rich description of real-world entities and relationships, but is more expensive than FastGraphRAG. We estimate graph extraction to constitute roughly 75% of indexing cost. FastGraphRAG is therefore much cheaper, but the tradeoff is that the extracted graph is less directly relevant for use outside of GraphRAG, and the graph tends to be quite a bit noisier. If high fidelity entities and graph exploration are important to your use case, we recommend staying with traditional GraphRAG. If your use case is primarily aimed at summary questions using global search, FastGraphRAG provides high quality summarization with much lower language model cost.