rag_system/README.md
This document provides a detailed overview of the multimodal Retrieval-Augmented Generation (RAG) system implemented in this directory. The system is designed to process and understand information from PDF documents, combining both textual and visual data to answer complex queries.
This RAG system is a sophisticated pipeline that leverages state-of-the-art open-source models to provide accurate, context-aware answers from a document corpus. Unlike traditional RAG systems that only process text, this implementation is fully multimodal. It extracts and indexes both text and images from PDFs, allowing a Vision Language Model (VLM) to reason over both modalities when generating a final answer.
The core capabilities include:
The system is composed of several key Python modules that work together to form the RAG pipeline.
main.py: The main entry point for the application. It contains the configuration for all models and pipelines and orchestrates the indexing and retrieval processes.rag_system/pipelines/: Contains the high-level orchestration for indexing and retrieval.
indexing_pipeline.py: Manages the process of converting raw PDFs into indexed, searchable data.retrieval_pipeline.py: Handles the end-to-end process of taking a user query, retrieving relevant information, and generating a final answer.rag_system/indexing/: Contains all modules related to data processing and indexing.
multimodal.py: Responsible for extracting text and images from PDFs and generating embeddings using the configured vision model (colqwen2-v1.0).representations.py: Defines the text embedding model (Qwen2-7B-instruct) and other data representation generators.embedders.py: Manages the connection to the LanceDB vector database and handles the indexing of vector embeddings.rag_system/retrieval/: Contains modules for retrieving and ranking documents.
retrievers.py: Implements the logic for searching the vector database to find relevant text and image chunks.reranker.py: Contains the QwenReranker class, which re-ranks the retrieved documents for improved relevance.rag_system/agent/: Contains the Agent loop that interacts with the user and the RAG pipelines.rag_system/utils/: Contains utility clients, such as the OllamaClient for interacting with the Ollama server.MultimodalProcessor reads a PDF and splits it into pages.QwenEmbedder generates a vector embedding for the text.LocalVisionModel (using colqwen2-v1.0) generates a vector embedding for the image.VectorIndexer stores these embeddings in separate tables within a LanceDB database.Agent.RetrievalPipeline's MultiVectorRetriever searches both the text and image tables in LanceDB for relevant chunks.QwenReranker, which re-orders them based on relevance to the query.qwen-vl).llama3) synthesizes these facts into a coherent, human-readable answer.This system relies on a suite of powerful, open-source models.
| Component | Model | Framework | Purpose |
|---|---|---|---|
| Image Embedding | vidore/colqwen2-v1.0 | colpali | Generates vector embeddings from images. |
| Text Embedding | Qwen/Qwen2-7B-instruct | transformers | Generates vector embeddings from text. |
| Reranker | Qwen/Qwen-reranker | transformers | Re-ranks retrieved documents for relevance. |
| Vision Language Model | qwen2.5vl:7b | Ollama | Extracts facts from text and images. |
| Text Generation | llama3 | Ollama | Synthesizes the final answer. |
All system configurations are centralized in main.py.
OLLAMA_CONFIG: Defines the models that will be run via the Ollama server. This includes the final text generation model and the Vision Language Model.PIPELINE_CONFIGS: Contains the configurations for both the indexing and retrieval pipelines. Here you can specify:
top_k, retrieval_k).To change a model, simply update the corresponding model name in this configuration file.
To run the system, you first need to ensure the required models are available.
pip install -r requirements.txt
ollama pull llama3
ollama pull qwen2.5vl:7b
transformers and colpali libraries will automatically download the required models the first time they are used. Ensure you have a stable internet connection.python rag_system/main.py
rag_system/documents directory and storing their embeddings in LanceDB.> What was the revenue growth in Q3?
quit.