Back to Docling

Multi-Agent Document Analysis with AG2 and Docling

docs/examples/ag2_multiagent_document_analysis.ipynb

2.92.07.4 KB
Original Source

<a href="https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/ag2_multiagent_document_analysis.ipynb" target="_parent"></a>

Multi-Agent Document Analysis with AG2 and Docling

StepTechExecution
Document conversionDocling💻 Local
Multi-agent orchestrationAG2🌐 Remote (LLM)

This example demonstrates how to combine Docling for document conversion with AG2 for multi-agent analysis. Docling converts PDF, DOCX, HTML, and other formats into structured Markdown and tables. AG2 agents then collaborate to analyze the extracted content.

The pipeline:

  1. A Document Processor agent uses Docling tools to convert documents and extract tables.
  2. An Analyst agent synthesizes the extracted content into a structured summary.
  3. A UserProxy orchestrates the conversation via a GroupChat.

Setup

  • 👉 For best conversion speed, use GPU acceleration whenever available; e.g. if running on Colab, use GPU-enabled runtime.
  • Requires an OpenAI API key set as the OPENAI_API_KEY environment variable.
  • First run downloads ML models (~1–2 GB). Subsequent runs use cached models.
python
%pip install -q --progress-bar off --no-warn-conflicts docling "ag2[openai]>=0.11.4,<1.0" pandas
python
import json
import os

from autogen import (
    AssistantAgent,
    GroupChat,
    GroupChatManager,
    LLMConfig,
    UserProxyAgent,
)

from docling.datamodel.base_models import ConversionStatus
from docling.document_converter import DocumentConverter

# Set your OpenAI API key (or configure via .env / Colab secrets)
# os.environ["OPENAI_API_KEY"] = "sk-..."

Document Conversion with Docling

First, let's convert a sample document and inspect the output. We use the Docling Technical Report as the demo document.

python
DOC_SOURCE = "https://arxiv.org/pdf/2408.09869"

converter = DocumentConverter()
result = converter.convert(DOC_SOURCE)

print(f"Status: {result.status}")
print(f"Pages: {len(list(result.document.pages))}")
print()

# Preview the first 2000 characters of extracted Markdown
markdown = result.document.export_to_markdown()
print(f"Markdown length: {len(markdown):,} characters")
print("---")
print(markdown[:2000])

Table Extraction

Docling automatically detects and extracts tables. Let's inspect them.

python
tables = list(result.document.tables)
print(f"Found {len(tables)} table(s)")

for i, table in enumerate(tables):
    table_df = table.export_to_dataframe(doc=result.document)
    print(f"\n### Table {i + 1} (shape: {table_df.shape})")
    print(table_df.to_markdown())

AG2 Multi-Agent Setup

Now we set up AG2 agents that use Docling as their document processing backend.

Architecture:

  • document_processor — calls Docling tools to convert documents and extract tables
  • analyst — analyzes the extracted content and produces a structured summary
  • user_proxy — orchestrates the conversation, executes tool calls

The agents communicate via a GroupChat managed by a GroupChatManager.

python
llm_config = LLMConfig(
    {
        "model": "gpt-4o-mini",
        "api_key": os.environ.get("OPENAI_API_KEY"),
        "api_type": "openai",
    }
)

MAX_CONTENT_CHARS = 15000  # Truncation limit to stay within LLM context


def is_termination_msg(msg):
    content = msg.get("content", "") or ""
    return "TERMINATE" in content


proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
    code_execution_config=False,
    is_termination_msg=is_termination_msg,
)

processor = AssistantAgent(
    name="document_processor",
    system_message=(
        "You are a document processing agent. Use the convert_document tool to "
        "extract text from a document, and extract_tables to get structured table "
        "data. Always call convert_document first, then extract_tables if the user "
        "asks about tables or data."
    ),
    llm_config=llm_config,
)

analyst = AssistantAgent(
    name="analyst",
    system_message=(
        "You are a document analyst. Based on the content extracted by the "
        "document_processor, provide a clear and structured analysis including:\n"
        "- A concise summary of the document\n"
        "- Key findings or contributions\n"
        "- Notable data from any tables\n\n"
        "When your analysis is complete, end your message with TERMINATE."
    ),
    llm_config=llm_config,
)

Tool Registration

We register Docling operations as AG2 tools. The converter instance created earlier is reused — DocumentConverter is stateless and thread-safe.

python
@proxy.register_for_execution()
@processor.register_for_llm(
    description="Convert a document (PDF, DOCX, HTML, or URL) to markdown text"
)
def convert_document(source: str) -> str:
    """Convert a document to markdown using Docling."""
    conv_result = converter.convert(source)
    if conv_result.status == ConversionStatus.FAILURE:
        return f"Error: Document conversion failed for {source}"
    md = conv_result.document.export_to_markdown()
    if len(md) > MAX_CONTENT_CHARS:
        return (
            md[:MAX_CONTENT_CHARS]
            + f"\n\n[Truncated — showing first {MAX_CONTENT_CHARS:,} of {len(md):,} characters]"
        )
    return md


@proxy.register_for_execution()
@processor.register_for_llm(
    description="Extract tables from a document as JSON. Returns a list of tables, each as a list of row records."
)
def extract_tables(source: str) -> str:
    """Extract tables from a document using Docling."""
    conv_result = converter.convert(source)
    if conv_result.status == ConversionStatus.FAILURE:
        return f"Error: Document conversion failed for {source}"
    tables = list(conv_result.document.tables)
    if not tables:
        return "No tables found in the document."
    table_data = []
    for i, table in enumerate(tables):
        table_df = table.export_to_dataframe(doc=conv_result.document)
        table_data.append(
            {
                "table_index": i + 1,
                "rows": table_df.shape[0],
                "columns": table_df.shape[1],
                "data": table_df.to_dict(orient="records"),
            }
        )
    return json.dumps(table_data, indent=2)


print(f"Tools registered on proxy: {list(proxy._function_map.keys())}")

Run the Multi-Agent Analysis

The user_proxy sends a task to the group chat. The document_processor will use Docling tools to extract content, and the analyst will synthesize the findings.

python
group_chat = GroupChat(
    agents=[proxy, processor, analyst],
    messages=[],
    max_round=10,
)

manager = GroupChatManager(
    groupchat=group_chat,
    llm_config=llm_config,
    is_termination_msg=is_termination_msg,
)

result = proxy.run(
    manager,
    message=(
        f"Analyze the document at {DOC_SOURCE} — "
        "summarize its key findings and extract any tables."
    ),
).process()

Further Reading