README - Cocoindex — ContextQMD

<p align="center"> </p> <h1 align="center">Self-Updating Wiki for Your Codebases with LLM</h1> <div align="center">

</div> <div align="center">

Star 🌟 CocoIndex if you like it!!

</div>

This example shows how to use instructor with Gemini to analyze multiple Python codebases and generate markdown documentation using CocoIndex v1.

What It Does

Scans subdirectories of a root directory (each expected to be a separate Python project)
Per-file extraction using LLM with a unified CodebaseInfo model:
- Public classes and functions with functionality summaries
- CocoIndex app call relationship graphs (Mermaid format)
- File-level summaries
Project aggregation - combines file-level CodebaseInfo into a project-level summary
Outputs markdown documentation to output/PROJECT_NAME.md

Key Features

Instructor Integration: Uses instructor library for structured LLM outputs with Pydantic
Unified Data Model: Same CodebaseInfo type for both file-level and project-level extraction
LLM-Generated Mermaid Graphs: The LLM generates mermaid syntax directly with:
- Bold text for @coco.fn decorated functions
- Thick arrows (==>) for mount/use_mount calls
Incremental Processing: CocoIndex handles caching - only re-processes changed files
Multi-Project Support: Processes multiple codebases in parallel

Output Format

The generated markdown includes:

Overview - High-level project description
Components - Classes and functions with summaries
CocoIndex Pipeline - Mermaid diagrams (if CocoIndex is used)
File Details - Per-file summaries (for multi-file projects)

Example Mermaid Graph

mermaid

graph TD
    %% App: SampleApp
    app_main[<b>app_main</b>] ==> process_file[<b>process_file</b>]
    process_file --> helper_func[helper_func]

Bold = @coco.fn, thick arrows (==>) = mount/use_mount calls

Run

1. Install dependencies

pip install -e .

2. Set up environment variables

Create a .env file in the example directory:

echo "GEMINI_API_KEY=your_api_key_here" > .env

Replace your_api_key_here with your actual Gemini API key.

Optionally, set a different LLM model:

echo "LLM_MODEL=gemini/gemini-2.5-flash" >> .env

3. Prepare your projects

Create a projects/ directory with subdirectories for each Python project:

projects/
├── my_project_1/
│   ├── main.py
│   └── utils.py
├── my_project_2/
│   └── app.py
└── ...

4. Run the application

cocoindex update main.py

This will:

Scan all subdirectories in projects/
Extract information from all .py files (excluding .venv* directories)
Generate markdown documentation in output/

5. Verify the output

ls -la output/
cat output/my_project_1.md

Customization

Change Input/Output Directories

Edit the app definition in main.py:

python

app = coco.App(
    app_main,
    coco.AppConfig(name="MultiCodebaseSummarization"),
    root_dir=pathlib.Path("./your_projects_dir"),
    output_dir=pathlib.Path("./your_output_dir"),
)

Use a Different LLM

Set the LLM_MODEL environment variable to any LiteLLM-supported model:

# OpenAI
export LLM_MODEL=gpt-4o

# Anthropic
export LLM_MODEL=anthropic/claude-3-5-sonnet

# Local (Ollama)
export LLM_MODEL=ollama/llama3.2

How It Works

mermaid

graph TD
    %% App: MultiCodebaseSummarization
    app_main[<b>app_main</b>] ==> process_project[<b>process_project</b>]
    process_project ==> extract_file_info[<b>extract_file_info</b>]
    process_project ==> aggregate_project_info[<b>aggregate_project_info</b>]
    process_project --> generate_markdown[generate_markdown]

app_main: Lists subdirectories, sets up output target, mounts process_project for each
process_project: Extracts info from each file, aggregates, outputs markdown
extract_file_info: Uses instructor + LLM to extract CodebaseInfo from each file
aggregate_project_info: Combines file CodebaseInfo into project-level CodebaseInfo
generate_markdown: Converts CodebaseInfo to markdown and calls declare_file