rag_system/DOCUMENTATION.md
This document provides a detailed overview of the RAG (Retrieval-Augmented Generation) system, its architecture, and how to use it.
This RAG system is a sophisticated, multimodal question-answering system designed to work with a variety of documents. It can understand and process both the text and the visual layout of documents, and it uses a knowledge graph to understand the relationships between the entities in the documents.
The system is built around an agentic workflow that allows it to:
The system is composed of two main pipelines: an indexing pipeline and a retrieval pipeline.
The indexing pipeline is responsible for processing the documents and building the knowledge base. It performs the following steps:
PyMuPDF to extract the text from each page of the PDF documents, preserving the original layout.Qwen/Qwen3-Embedding-0.6B) to create numerical vector representations of the text.GraphExtractor that uses a large language model (qwen2.5vl:7b) to extract entities and their relationships. This information is then used to build a knowledge graph, which is stored as a .gml file.The retrieval pipeline is responsible for answering user queries. It uses an agentic workflow that includes the following steps:
QueryDecomposer to break it down into smaller, more manageable sub-questions.MultiVectorRetriever and a GraphRetriever to retrieve relevant information from the knowledge base.Verifier that uses an LLM to check if the context is sufficient to answer the query.The system provides the following command-line endpoints:
index: This endpoint runs the indexing pipeline to process the documents and build the knowledge base.chat: This endpoint runs the retrieval pipeline to answer a user's query.show_graph: This endpoint displays the knowledge graph in a human-readable format and also provides a visual representation of the graph.To run the system, use the following commands:
# Activate the virtual environment
source rag_system/rag_venv/bin/activate
# Index the documents
python rag_system/main.py index
# Ask a question
python rag_system/main.py chat "Your question here"
# Show the knowledge graph
python rag_system/main.py show_graph