scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/README.md
Database discovery systems for comprehensive analysis through MCP (Model Context Protocol).
This directory contains two separate discovery approaches:
| Approach | Description | When to Use |
|---|---|---|
| Two-Phase Discovery | Static harvest + LLM semantic analysis (NEW) | Quick, efficient discovery with semantic insights |
| Multi-Agent Discovery | 6-agent collaborative analysis | Deep, comprehensive analysis (legacy) |
The two-phase discovery provides fast, efficient database schema discovery:
Phase 1: Static Harvest (C++)
Phase 2: LLM Agent Discovery (Optional)
cd scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/
# Phase 1: Static harvest (no Claude Code needed)
# Option A: Using the convenience script (recommended)
./static_harvest.sh --target-id tap_mysql_default --schema test
# Option B: Using curl directly
curl -k -X POST https://localhost:6071/mcp/query \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "discovery.run_static",
"arguments": {
"target_id": "tap_mysql_default",
"schema_filter": "test"
}
}
}'
# Phase 2: LLM agent discovery (requires Claude Code)
cp mcp_config.example.json mcp_config.json
./two_phase_discovery.py \
--mcp-config mcp_config.json \
--target-id tap_mysql_default \
--schema test \
--dry-run # Preview without executing
| File | Purpose |
|---|---|
two_phase_discovery.py | Orchestration script for Phase 2 |
run_real_claude_e2e.sh | Manual real-CLI E2E runner (phase A + phase B) |
mcp_config.example.json | Example MCP configuration for Claude Code |
prompts/two_phase_discovery_prompt.md | System prompt for LLM agent |
prompts/two_phase_user_prompt.md | User prompt template |
./run_real_claude_e2e.sh \
--target-id tap_mysql_default \
--schema testdb \
--mcp-config ./mcp_config.json
See Two_Phase_Discovery_Implementation.md for complete implementation details.
Multi-agent database discovery system for comprehensive analysis through MCP (Model Context Protocol).
This directory contains scripts for running 6-agent collaborative database discovery in headless (non-interactive) mode using Claude Code.
Key Features:
# Basic discovery - discovers the first available database
python ./headless_db_discovery.py
# Discover a specific database
python ./headless_db_discovery.py --database mydb
# Specify output file
python ./headless_db_discovery.py --output my_report.md
# With verbose output
python ./headless_db_discovery.py --verbose
# Basic discovery
./headless_db_discovery.sh
# Discover specific database
./headless_db_discovery.sh -d mydb
# With custom timeout
./headless_db_discovery.sh -t 600
| Agent | Type | Focus | Key MCP Tools |
|---|---|---|---|
| STRUCTURAL | Analysis | Schemas, tables, relationships, indexes, constraints | list_schemas, list_tables, describe_table, get_constraints, suggest_joins |
| STATISTICAL | Analysis | Data distributions, quality, anomalies | table_profile, sample_rows, column_profile, sample_distinct, run_sql_readonly |
| SEMANTIC | Analysis | Business domain, entities, rules, terminology | sample_rows, sample_distinct, run_sql_readonly |
| QUERY | Analysis | Index efficiency, query patterns, optimization | describe_table, explain_sql, suggest_joins, run_sql_readonly |
| SECURITY | Analysis | Sensitive data, access patterns, vulnerabilities | sample_rows, sample_distinct, column_profile, run_sql_readonly |
| META | Meta | Report quality analysis, prompt improvement suggestions | catalog_search, catalog_get (reads findings) |
Round 1: Blind Exploration (Parallel)
Round 2: Pattern Recognition (Collaborative)
catalog_searchRound 3: Hypothesis Testing (Validation)
Round 4: Final Synthesis
Round 5: Meta Analysis (META agent only)
Each agent generates a catalog of questions they can answer about the database, with step-by-step plans for how to answer each question using MCP tools. This creates a reusable knowledge base for future LLM interactions.
Cross-Domain Categories (v1.3):
The generated report includes:
# COMPREHENSIVE DATABASE DISCOVERY REPORT
## Executive Summary
- Database identity (system type, purpose, scale)
- Critical findings (top 5 - one from each agent)
- Health score: current X/10 → potential Y/10
- Top 5 recommendations (prioritized)
## 1. STRUCTURAL ANALYSIS
- Schema inventory
- Relationship diagram
- Design patterns
- Issues & recommendations
## 2. STATISTICAL ANALYSIS
- Table profiles
- Data quality score
- Distribution profiles
- Anomalies detected
## 3. SEMANTIC ANALYSIS
- Business domain identification
- Entity catalog
- Business rules inference
- Domain glossary
## 4. QUERY ANALYSIS
- Index coverage assessment
- Query pattern analysis
- Optimization opportunities
- Expected improvements
## 5. SECURITY ANALYSIS
- Sensitive data identification
- Access pattern analysis
- Vulnerability assessment
- Compliance indicators
- Security recommendations
## 6. CRITICAL FINDINGS
- Each with: description, impact quantification, root cause, remediation
## 7. RECOMMENDATIONS ROADMAP
- URGENT: [actions with impact/effort]
- HIGH: [actions]
- MODERATE: [actions]
- Expected timeline with metrics
## Appendices
- A. Table DDL
- B. Query examples with EXPLAIN
- C. Statistical distributions
- D. Business glossary
- E. Security data classification
Additionally, a separate META ANALYSIS document is generated with:
In addition to the analysis reports, each agent generates a Question Catalog - a knowledge base of questions the agent can answer about the database, with executable plans for how to answer each question.
A Question Catalog contains:
# {AGENT} QUESTION CATALOG
## Metadata
- Agent: {STRUCTURAL|STATISTICAL|SEMANTIC|QUERY|SECURITY}
- Database: {database_name}
- Questions Generated: {count}
## Questions by Category
### Category 1: {Category Name}
#### Q1. {Question Template}
**Question Type:** factual|analytical|comparative|predictive|recommendation
**Example Questions:**
- "What tables exist in the database?"
- "What columns does table X have?"
**Answer Plan:**
1. Step 1: Use `list_tables` to get all tables
2. Step 2: Use `describe_table` to get column details
3. Output: Structured list with table names and column details
**Answer Template:**
Based on the schema analysis:
- Table 1: {columns}
- Table 2: {columns}
15+ minimum questions across 5 categories:
Performance + Security (4 questions):
Structure + Semantics (3 questions):
Statistics + Query (3 questions):
Security + Semantics (3 questions):
All Agents (2 questions):
Question catalogs enable:
Example workflow:
# User asks: "What sensitive data exists in the customers table?"
# System retrieves from SECURITY question catalog:
# - Question template: "What sensitive data exists in table X?"
# - Answer plan: sample_rows + column_profile on customers
# - Answer template: Structured list with sensitivity classification
# System executes plan and returns formatted answer
| Agent | Minimum Questions | High-Complexity Target |
|---|---|---|
| STRUCTURAL | 20 | 5 |
| STATISTICAL | 20 | 5 |
| SEMANTIC | 15 | 3 |
| QUERY | 20 | 5 |
| SECURITY | 15 | 5 |
| TOTAL | 90+ | 23+ |
All question catalogs are stored in the MCP catalog for easy retrieval:
kind="question_catalog", key="structural_questions"kind="question_catalog", key="statistical_questions"kind="question_catalog", key="semantic_questions"kind="question_catalog", key="query_questions"kind="question_catalog", key="security_questions"kind="question_catalog", key="cross_domain_questions"| Option | Short | Description | Default |
|---|---|---|---|
--database | -d | Database name to discover | First available |
--schema | -s | Schema name to analyze | All schemas |
--output | -o | Output file path | discovery_YYYYMMDD_HHMMSS.md |
--timeout | -t | Timeout in seconds | 300 |
--verbose | -v | Enable verbose output | Disabled |
--help | -h | Show help message | - |
The discovery uses the system prompt in prompts/multi_agent_discovery_prompt.md:
prompts/multi_agent_discovery_prompt.md - Concise system prompt for actual useprompts/multi_agent_discovery_reference.md - Comprehensive reference documentation# .github/workflows/database-discovery.yml
name: Database Discovery
on:
schedule:
- cron: '0 0 * * 0' # Weekly
workflow_dispatch:
jobs:
discovery:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install Claude Code
run: npm install -g @anthropics/claude-code
- name: Run Discovery
env:
PROXYSQL_MCP_ENDPOINT: ${{ secrets.PROXYSQL_MCP_ENDPOINT }}
PROXYSQL_MCP_TOKEN: ${{ secrets.PROXYSQL_MCP_TOKEN }}
run: |
cd scripts/mcp/DiscoveryAgent/ClaudeCode_Headless
python ./headless_db_discovery.py \
--database production \
--output discovery_$(date +%Y%m%d).md
- name: Upload Report
uses: actions/upload-artifact@v3
with:
name: discovery-report
path: discovery_*.md
#!/bin/bash
# weekly_discovery.sh - Run weekly and compare results
REPORT_DIR="/var/db-discovery/reports"
mkdir -p "$REPORT_DIR"
# Run discovery
python ./headless_db_discovery.py \
--database mydb \
--output "$REPORT_DIR/discovery_$(date +%Y%m%d).md"
# Compare with previous week
PREV=$(ls -t "$REPORT_DIR"/discovery_*.md | head -2 | tail -1)
if [ -f "$PREV" ]; then
echo "=== Changes since last discovery ==="
diff "$PREV" "$REPORT_DIR/discovery_$(date +%Y%m%d).md" || true
fi
# Modify the prompt in the script for focused discovery
def build_discovery_prompt(database: Optional[str]) -> str:
prompt = f"""Using the 4-agent discovery protocol, focus on:
1. Security aspects of {database}
2. Performance optimization opportunities
3. Data quality issues
Follow the standard 4-round protocol but prioritize these areas.
"""
return prompt
Set the CLAUDE_PATH environment variable:
export CLAUDE_PATH="/path/to/claude"
python ./headless_db_discovery.py
Or install Claude Code:
npm install -g @anthropics/claude-code
Ensure MCP servers are configured in your Claude Code settings or provide MCP configuration via command line.
Increase the timeout:
python ./headless_db_discovery.py --timeout 600
The multi-agent prompt is designed for comprehensive output. If truncated:
ClaudeCode_Headless/
├── README.md # This file
├── prompts/
│ ├── multi_agent_discovery_prompt.md # Concise system prompt
│ └── multi_agent_discovery_reference.md # Comprehensive reference
├── headless_db_discovery.py # Python script
├── headless_db_discovery.sh # Bash script
└── examples/
├── DATABASE_DISCOVERY_REPORT.md # Example output
└── DATABASE_QUESTION_CAPABILITIES.md # Feature documentation
Same license as the proxysql-vec project.