Multi-Agent Database Discovery System

Overview

This document describes a multi-agent database discovery system implemented using Claude Code's autonomous agent capabilities. The system uses 4 specialized subagents that collaborate via the MCP (Model Context Protocol) catalog to perform comprehensive database analysis.

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                     Main Agent (Orchestrator)                       │
│  - Launches 4 specialized subagents in parallel                     │
│  - Coordinates via MCP catalog                                      │
│  - Synthesizes final report                                        │
└────────────────┬────────────────────────────────────────────────────┘
                 │
    ┌────────────┼────────────┬────────────┬────────────┐
    │            │            │            │            │
    ▼            ▼            ▼            ▼            ▼
┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐
│Struct. │  │Statist.│  │Semantic│  │Query   │  │  MCP   │
│ Agent  │  │ Agent  │  │ Agent  │  │ Agent  │  │Catalog │
└────────┘  └────────┘  └────────┘  └────────┘  └────────┘
     │            │            │            │            │
     └────────────┴────────────┴────────────┴────────────┘
                          │
                   ▼              ▼
              ┌─────────┐  ┌─────────────┐
              │ Database│  │   Catalog   │
              │ (testdb)│  │ (Shared Mem)│
              └─────────┘  └─────────────┘

The Four Discovery Agents

1. Structural Agent

Mission: Map tables, relationships, indexes, and constraints

Responsibilities:

Complete ERD documentation
Table schema analysis (columns, types, constraints)
Foreign key relationship mapping
Index inventory and assessment
Architectural pattern identification

Catalog Entries: structural_discovery

Key Deliverables:

Entity Relationship Diagram
Complete table definitions
Index inventory with recommendations
Relationship cardinality mapping

2. Statistical Agent

Mission: Profile data distributions, patterns, and anomalies

Responsibilities:

Table row counts and cardinality analysis
Data distribution profiling
Anomaly detection (duplicates, outliers)
Statistical summaries (min/max/avg/stddev)
Business metrics calculation

Catalog Entries: statistical_discovery

Key Deliverables:

Data quality score
Duplicate detection reports
Statistical distributions
True vs inflated metrics

3. Semantic Agent

Mission: Infer business domain and entity types

Responsibilities:

Business domain identification
Entity type classification (master vs transactional)
Business rule discovery
Entity lifecycle analysis
State machine identification

Catalog Entries: semantic_discovery

Key Deliverables:

Complete domain model
Business rules documentation
Entity lifecycle definitions
Missing capabilities identification

4. Query Agent

Mission: Analyze access patterns and optimization opportunities

Responsibilities:

Query pattern identification
Index usage analysis
Performance bottleneck detection
N+1 query risk assessment
Optimization recommendations

Catalog Entries: query_discovery

Key Deliverables:

Access pattern analysis
Index recommendations (prioritized)
Query optimization strategies
EXPLAIN analysis results

Discovery Process

Round Structure

Each agent runs 4 rounds of analysis:

Round 1: Blind Exploration

Initial schema/data analysis
First observations cataloged
Initial hypotheses formed

Round 2: Pattern Recognition

Read other agents' findings from catalog
Identify patterns and anomalies
Form and test hypotheses

Round 3: Hypothesis Testing

Validate business rules against actual data
Cross-reference findings with other agents
Confirm or reject hypotheses

Round 4: Final Synthesis

Compile comprehensive findings
Generate actionable recommendations
Create final mission summary

Catalog-Based Collaboration

python

# Agent writes findings
catalog_upsert(
    kind="structural_discovery",
    key="table_customers",
    document="...",
    tags="structural,table,schema"
)

# Agent reads other agents' findings
findings = catalog_list(kind="statistical_discovery")

Example Discovery Output

Database: testdb (E-commerce Order Management)

True Statistics (After Deduplication)

Metric	Current	Actual
Customers	15	5
Products	15	5
Orders	15	5
Order Items	27	9
Revenue	$10,886.67	$3,628.85

Critical Findings

Data Quality: 5/100 (Catastrophic) - 67% data triplication
Missing Index: orders.order_date (P0 critical)
Missing Constraints: No UNIQUE or FK constraints
Business Domain: E-commerce order management system

Launching the Discovery System

python

# In Claude Code, launch 4 agents in parallel:
Task(
    description="Structural Discovery",
    prompt=STRUCTURAL_AGENT_PROMPT,
    subagent_type="general-purpose"
)

Task(
    description="Statistical Discovery",
    prompt=STATISTICAL_AGENT_PROMPT,
    subagent_type="general-purpose"
)

Task(
    description="Semantic Discovery",
    prompt=SEMANTIC_AGENT_PROMPT,
    subagent_type="general-purpose"
)

Task(
    description="Query Discovery",
    prompt=QUERY_AGENT_PROMPT,
    subagent_type="general-purpose"
)

MCP Tools Used

The agents use these MCP tools for database analysis:

list_schemas - List all databases
list_tables - List tables in a schema
describe_table - Get table schema
sample_rows - Get sample data from table
column_profile - Get column statistics
run_sql_readonly - Execute read-only queries
catalog_upsert - Store findings in catalog
catalog_list / catalog_get - Retrieve findings from catalog

Target Scoping Requirement

Discovery and catalog/LLM tools are target-scoped. Always pass target_id:

discovery.run_static(target_id=..., schema_filter=...)
catalog.*(target_id=..., run_id=...)
agent.run_start(target_id=..., run_id=...)
llm.*(target_id=..., run_id=...)

run_id resolution is no longer global. The same schema name can exist on multiple targets, so target_id is required to resolve the correct discovery run.

Benefits of Multi-Agent Approach

Parallel Execution: All 4 agents run simultaneously
Specialized Expertise: Each agent focuses on its domain
Cross-Validation: Agents validate each other's findings
Comprehensive Coverage: All aspects of database analyzed
Knowledge Synthesis: Final report combines all perspectives

Output Format

The system produces:

40+ Catalog Entries - Detailed findings organized by agent
Comprehensive Report - Executive summary with:
- Structure & Schema (ERD, table definitions)
- Business Domain (entity model, business rules)
- Key Insights (data quality, performance)
- Data Quality Assessment (score, recommendations)

Future Enhancements

Additional specialized agents (Security, Performance, Compliance)
Automated remediation scripts
Continuous monitoring mode
Integration with CI/CD pipelines
Web-based dashboard for findings

simple_discovery.py - Simplified demo of multi-agent pattern
mcp_catalog.db - Catalog database for storing findings

References

Claude Code Task Tool Documentation
MCP (Model Context Protocol) Specification
ProxySQL MCP Server Implementation

Multi-Agent Database Discovery System

Multi-Agent Database Discovery System

Overview

Architecture

The Four Discovery Agents

1. Structural Agent

2. Statistical Agent

3. Semantic Agent

4. Query Agent

Discovery Process

Round Structure

Round 1: Blind Exploration

Round 2: Pattern Recognition

Round 3: Hypothesis Testing

Round 4: Final Synthesis

Catalog-Based Collaboration

Example Discovery Output

Database: testdb (E-commerce Order Management)

True Statistics (After Deduplication)

Critical Findings

Launching the Discovery System

MCP Tools Used

Target Scoping Requirement

Benefits of Multi-Agent Approach

Output Format

Future Enhancements

Related Files

References