docs/examples/ag2_multiagent_document_analysis.ipynb
<a href="https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/ag2_multiagent_document_analysis.ipynb" target="_parent"></a>
| Step | Tech | Execution |
|---|---|---|
| Document conversion | Docling | 💻 Local |
| Multi-agent orchestration | AG2 | 🌐 Remote (LLM) |
This example demonstrates how to combine Docling for document conversion with AG2 for multi-agent analysis. Docling converts PDF, DOCX, HTML, and other formats into structured Markdown and tables. AG2 agents then collaborate to analyze the extracted content.
The pipeline:
OPENAI_API_KEY environment variable.%pip install -q --progress-bar off --no-warn-conflicts docling "ag2[openai]>=0.11.4,<1.0" pandas
import json
import os
from autogen import (
AssistantAgent,
GroupChat,
GroupChatManager,
LLMConfig,
UserProxyAgent,
)
from docling.datamodel.base_models import ConversionStatus
from docling.document_converter import DocumentConverter
# Set your OpenAI API key (or configure via .env / Colab secrets)
# os.environ["OPENAI_API_KEY"] = "sk-..."
First, let's convert a sample document and inspect the output. We use the Docling Technical Report as the demo document.
DOC_SOURCE = "https://arxiv.org/pdf/2408.09869"
converter = DocumentConverter()
result = converter.convert(DOC_SOURCE)
print(f"Status: {result.status}")
print(f"Pages: {len(list(result.document.pages))}")
print()
# Preview the first 2000 characters of extracted Markdown
markdown = result.document.export_to_markdown()
print(f"Markdown length: {len(markdown):,} characters")
print("---")
print(markdown[:2000])
Docling automatically detects and extracts tables. Let's inspect them.
tables = list(result.document.tables)
print(f"Found {len(tables)} table(s)")
for i, table in enumerate(tables):
table_df = table.export_to_dataframe(doc=result.document)
print(f"\n### Table {i + 1} (shape: {table_df.shape})")
print(table_df.to_markdown())
Now we set up AG2 agents that use Docling as their document processing backend.
Architecture:
document_processor — calls Docling tools to convert documents and extract tablesanalyst — analyzes the extracted content and produces a structured summaryuser_proxy — orchestrates the conversation, executes tool callsThe agents communicate via a GroupChat managed by a GroupChatManager.
llm_config = LLMConfig(
{
"model": "gpt-4o-mini",
"api_key": os.environ.get("OPENAI_API_KEY"),
"api_type": "openai",
}
)
MAX_CONTENT_CHARS = 15000 # Truncation limit to stay within LLM context
def is_termination_msg(msg):
content = msg.get("content", "") or ""
return "TERMINATE" in content
proxy = UserProxyAgent(
name="user_proxy",
human_input_mode="NEVER",
max_consecutive_auto_reply=10,
code_execution_config=False,
is_termination_msg=is_termination_msg,
)
processor = AssistantAgent(
name="document_processor",
system_message=(
"You are a document processing agent. Use the convert_document tool to "
"extract text from a document, and extract_tables to get structured table "
"data. Always call convert_document first, then extract_tables if the user "
"asks about tables or data."
),
llm_config=llm_config,
)
analyst = AssistantAgent(
name="analyst",
system_message=(
"You are a document analyst. Based on the content extracted by the "
"document_processor, provide a clear and structured analysis including:\n"
"- A concise summary of the document\n"
"- Key findings or contributions\n"
"- Notable data from any tables\n\n"
"When your analysis is complete, end your message with TERMINATE."
),
llm_config=llm_config,
)
We register Docling operations as AG2 tools. The converter instance created earlier
is reused — DocumentConverter is stateless and thread-safe.
@proxy.register_for_execution()
@processor.register_for_llm(
description="Convert a document (PDF, DOCX, HTML, or URL) to markdown text"
)
def convert_document(source: str) -> str:
"""Convert a document to markdown using Docling."""
conv_result = converter.convert(source)
if conv_result.status == ConversionStatus.FAILURE:
return f"Error: Document conversion failed for {source}"
md = conv_result.document.export_to_markdown()
if len(md) > MAX_CONTENT_CHARS:
return (
md[:MAX_CONTENT_CHARS]
+ f"\n\n[Truncated — showing first {MAX_CONTENT_CHARS:,} of {len(md):,} characters]"
)
return md
@proxy.register_for_execution()
@processor.register_for_llm(
description="Extract tables from a document as JSON. Returns a list of tables, each as a list of row records."
)
def extract_tables(source: str) -> str:
"""Extract tables from a document using Docling."""
conv_result = converter.convert(source)
if conv_result.status == ConversionStatus.FAILURE:
return f"Error: Document conversion failed for {source}"
tables = list(conv_result.document.tables)
if not tables:
return "No tables found in the document."
table_data = []
for i, table in enumerate(tables):
table_df = table.export_to_dataframe(doc=conv_result.document)
table_data.append(
{
"table_index": i + 1,
"rows": table_df.shape[0],
"columns": table_df.shape[1],
"data": table_df.to_dict(orient="records"),
}
)
return json.dumps(table_data, indent=2)
print(f"Tools registered on proxy: {list(proxy._function_map.keys())}")
The user_proxy sends a task to the group chat. The document_processor will use
Docling tools to extract content, and the analyst will synthesize the findings.
group_chat = GroupChat(
agents=[proxy, processor, analyst],
messages=[],
max_round=10,
)
manager = GroupChatManager(
groupchat=group_chat,
llm_config=llm_config,
is_termination_msg=is_termination_msg,
)
result = proxy.run(
manager,
message=(
f"Analyze the document at {DOC_SOURCE} — "
"summarize its key findings and extract any tables."
),
).process()