doc/source/ray-overview/examples/multi_agent_a2a/content/README.ipynb
This tutorial guides you through building and deploying a multi-agent system where agents communicate using the A2A (Agent-to-Agent) protocol. The system is built on Ray Serve for scalable deployment, LangChain for agent orchestration, and MCP for tool integration.
If you're new to Ray Serve and LangChain integration, see this single-agent template first:
langchain_agent_ray_serveThe multi-agent system consists of three agents:
Each agent runs as an independent, autoscaling service with two interfaces: SSE for human-to-agent chat and A2A for agent-to-agent communication.
<figure> </figure>POST /weather-agent/chat.GET /.well-known/agent-card.json and POST /v1/message:send.| Component | Description | Interface | Resources |
|---|---|---|---|
| LLM Service | Qwen3-4B-Instruct-2507-FP8 with tool calling support | OpenAI-compatible API | 1x L4 GPU |
| Weather MCP | National Weather Service tools | MCP protocol | 0.2 CPU |
| Web Search MCP | Brave search and URL fetching | MCP protocol | 0.2 CPU |
| Weather Agent | Answers weather questions. | SSE + A2A | 1 CPU |
| Research Agent | Performs web research. | SSE + A2A | 1 CPU |
| Travel Agent | Orchestrates other agents. | SSE + A2A | 1 CPU |
| Use case | Endpoint |
|---|---|
| LLM (OpenAI-compatible) | POST /llm/v1/chat/completions |
| Weather agent (human chat, SSE) | POST /weather-agent/chat |
| Research agent (human chat, SSE) | POST /research-agent/chat |
| Travel agent (human chat, SSE) | POST /travel-agent/chat |
| A2A discovery (any A2A agent) | GET /a2a-*/.well-known/agent-card.json |
| A2A execute (blocking, any A2A agent) | POST /a2a-*/v1/message:send |
| A2A execute (streaming, any A2A agent) | POST /a2a-*/v1/message:stream |
| A2A task/status (poll and history) | GET /a2a-*/v1/tasks/{id} |
┌──────┐
│ User │
└──┬───┘
│ POST /travel-agent/chat
│ "Plan a trip to Seattle"
▼
┌──────────────┐
│ Travel Agent │
└──────┬───────┘
│ LLM Reasoning:
│ 1. Analyze user request
│ 2. Call a2a_research("Seattle attractions")
│ 3. Call a2a_weather("Seattle weather")
│ 4. Synthesize final itinerary
│
├────────────────────────────────────────────────┐
│ │
▼ ▼
┌───────────────────┐ ┌───────────────────┐
│ Research Agent │ │ Weather Agent │
│ (via A2A) │ │ (via A2A) │
└────────┬──────────┘ └────────┬──────────┘
│ │
▼ ▼
┌───────────────────┐ ┌───────────────────┐
│ Web Search MCP │ │ Weather MCP │
│ (Brave + Fetch) │ │ (NWS API) │
└───────────────────┘ └───────────────────┘
│ │
└──────────────────┬─────────────────────────┘
│
▼
┌───────────────┐
│ Final Result │
│ (Itinerary + │
│ Weather + │
│ Sources) │
└───────────────┘
Benefits of A2A over direct integration:
| Aspect | Direct integration | A2A protocol |
|---|---|---|
| Coupling | Tight | Loose |
| Discovery | Hard-coded | Dynamic (/.well-known/agent-card.json) |
| Versioning | Code changes | Protocol-level |
| Debugging | Complex | Traceable per-task |
| Composition | Nested code | HTTP boundaries |
Ray Serve capabilities:
Anyscale service additional benefits:
For more information, see the Anyscale Services documentation.
multi-agent-a2a/
├── agents/ # Agent implementations
│ ├── weather_agent_with_mcp.py
│ ├── research_agent_with_web_search_mcp.py
│ └── travel_agent_with_a2a.py
├── agent_runtime/ # Shared runtime utilities
│ ├── config.py # Configuration management
│ ├── agent_builder.py # Agent factory functions
│ ├── serve_deployment.py # SSE deployment factory
│ └── a2a_deployment.py # A2A deployment factory
├── protocols/ # A2A SDK helpers (cards + client)
│ ├── a2a_card.py
│ └── a2a_client.py
├── mcps/ # MCP server implementations
│ ├── weather_mcp_server.py
│ └── web_search_mcp_server.py
├── llm/ # LLM deployment
│ └── llm_deploy_qwen.py
├── tests/ # Test suite
├── ray_serve_all_deployments.py # Unified deployment entrypoint
├── serve_multi_config.yaml # Local Ray Serve deployment config
├── anyscale_service_multi_config.yaml # Anyscale production deployment config
└── requirements.txt # Python dependencies
Get the system running first, then explore how it works.
Docker image:
anyscale/ray-llm:2.50.1-py311-cu128 for optimal compatibility.Compute requirements:
Install dependencies:
Install all Python dependencies from requirements.txt:
!pip install -r requirements.txt
To get a Brave Search API key, sign up at brave.com/search/api. The free tier includes 2,000 requests per month.
Set up environment variables following the Anyscale Workspace environment variables guide.
Run this in your terminal (outside the notebook) before starting services:
export BRAVE_API_KEY=<your-brave-api-key>
Start Ray Serve and deploy all services with a single command in the terminal:
serve run serve_multi_config.yaml
This command deploys all the following services:
/llm./mcp-weather./mcp-web-search./weather-agent (SSE) and /a2a-weather (A2A)./research-agent (SSE) and /a2a-research (A2A)./travel-agent (SSE) and /a2a-travel (A2A).After all services have started, verify each layer as follows:
Test services individually (with curl): Run each of the following curl commands separately and check their responses:
!curl -X POST http://127.0.0.1:8000/llm/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "Qwen/Qwen3-4B-Instruct-2507-FP8", "messages": [{"role": "user", "content": "Hello!"}]}'
!curl -X POST http://127.0.0.1:8000/weather-agent/chat -H "Content-Type: application/json" -d '{"user_request": "What is the weather in San Francisco?"}'
!curl -X POST http://127.0.0.1:8000/research-agent/chat -H "Content-Type: application/json" -d '{"user_request": "What are the top attractions in Seattle? Reply with sources."}'
!curl -X POST http://127.0.0.1:8000/travel-agent/chat -H "Content-Type: application/json" -d '{"user_request": "Plan a 2-day trip to Seattle next week. Include weather details and considerations."}'
!curl http://127.0.0.1:8000/a2a-weather/.well-known/agent-card.json
!curl http://127.0.0.1:8000/a2a-research/.well-known/agent-card.json
!curl http://127.0.0.1:8000/a2a-travel/.well-known/agent-card.json
# Run all tests
!python tests/run_all.py
To deploy to production on Anyscale:
!anyscale service deploy -f anyscale_service_multi_config.yaml
Note: An Anyscale service config is a superset of a Ray Serve config. For more details, see the Anyscale service config docs.
After deploying to Anyscale, your service is available at a URL like:
https://<service-name>-<id>.cld-<cluster-id>.s.anyscaleuserdata.com
You also receive an authentication token for secure access.
To follow the same structure as the local serve run ... deployment, verify production in two steps: (1) test each service directly with curl, then (2) run the full test suite.
Set up environment variables (once):
In a notebook:
%env BASE_URL=<ANYSCALE_SERVICE_URL>
%env ANYSCALE_API_TOKEN=<AUTH_TOKEN>
Note: Don't include a trailing
/at the end ofBASE_URL(after.anyscaleuserdata.com).
Test services individually (with curl): Run each of the following curl commands separately and check their responses:
!curl -X POST "${BASE_URL}/llm/v1/chat/completions" -H "Content-Type: application/json" -H "Authorization: Bearer ${ANYSCALE_API_TOKEN}" -d '{"model": "Qwen/Qwen3-4B-Instruct-2507-FP8", "messages": [{"role": "user", "content": "Hello!"}]}'
!curl -X POST "${BASE_URL}/weather-agent/chat" -H "Content-Type: application/json" -H "Authorization: Bearer ${ANYSCALE_API_TOKEN}" -d '{"user_request": "What is the weather in San Francisco?"}'
!curl -X POST "${BASE_URL}/research-agent/chat" -H "Content-Type: application/json" -H "Authorization: Bearer ${ANYSCALE_API_TOKEN}" -d '{"user_request": "What are the top attractions in Seattle? Reply with sources."}'
!curl -X POST "${BASE_URL}/travel-agent/chat" -H "Content-Type: application/json" -H "Authorization: Bearer ${ANYSCALE_API_TOKEN}" -d '{"user_request": "Plan a 2-day trip to Seattle next week. Include weather details and considerations."}'
!curl "${BASE_URL}/a2a-weather/.well-known/agent-card.json" -H "Authorization: Bearer ${ANYSCALE_API_TOKEN}"
!curl "${BASE_URL}/a2a-research/.well-known/agent-card.json" -H "Authorization: Bearer ${ANYSCALE_API_TOKEN}"
!curl "${BASE_URL}/a2a-travel/.well-known/agent-card.json" -H "Authorization: Bearer ${ANYSCALE_API_TOKEN}"
You can run the full test suite against your production deployment. Set a preventive timeout:
In a notebook:
%env TEST_TIMEOUT_SECONDS=2000
!python tests/run_all.py
Explore how each service is implemented.
See the code in llm/llm_deploy_qwen.py. This file deploys Qwen as an OpenAI-compatible API with tool calling support.
Key configurations:
max_model_len=65536: Provides a 64K token context window for complex multi-turn conversations with multiple tool calls.
enable_auto_tool_choice=True: Enables the model to automatically decide when to use tools, which is essential for agent workflows.
tool_call_parser="hermes": Parses tool calls in Hermes format, which Qwen models support natively.
For detailed information on deploying and configuring LLM services, see the Anyscale LLM serving documentation and the Deploy LLM template.
MCP (Model Context Protocol) servers expose external tools that agents can discover and use dynamically.
Ray Serve only supports stateless HTTP mode in MCP. Set stateless_http=True to prevent "session not found" errors when running multiple replicas.
For more information, see the Anyscale MCP documentation and MCP Ray Serve template.
See mcps/weather_mcp_server.py:
| Tool | Description | Parameters |
|---|---|---|
get_alerts | Fetches active weather alerts. | state: str (for example, "CA") |
get_forecast | Gets a five-period forecast. | latitude: float, longitude: float |
See mcps/web_search_mcp_server.py:
| Tool | Description | Parameters |
|---|---|---|
brave_search | Searches the web through the Brave Search API. | query: str, num_results: int (default: 10) |
fetch_url | Fetches and parses web pages. | url: str, max_length: int (default: 5000), start_index: int (default: 0), raw: bool (default: false), ignore_robots_txt: bool (default: false) |
The agent runtime provides a builder pattern for creating agents and deploying them with both SSE (human-to-agent) and A2A (agent-to-agent) interfaces. This shared infrastructure eliminates code duplication across agents by centralizing configuration, agent building, and deployment logic.
<figure> </figure>The agent runtime consists of four core modules:
The configuration module agent_runtime/config.py centralizes configuration loading for LLM and MCP settings from environment variables.
LLMConfig (LLM backend settings) and MCPEndpoint (MCP server configuration).load_llm_config().The agent builder module agent_runtime/agent_builder.py provides factory functions for building LangChain agents, centralizing LLM setup, MCP tool discovery, and agent creation to eliminate boilerplate.
build_llm(), load_mcp_tools(), build_tool_agent(), and build_mcp_agent().MemorySaver checkpointing.The SSE deployment module agent_runtime/serve_deployment.py builds the FastAPI application and Ray Serve deployment for the human-to-agent chat interface.
POST /chat with SSE streaming support.create_chat_app() and create_serve_deployment().The A2A deployment module agent_runtime/a2a_deployment.py enables standardized agent-to-agent communication by creating Ray Serve deployments with A2A protocol compliance.
Each specialized agent is a LangChain agent that combines an LLM with specific tools. The agents use the builder pattern from the agent runtime to minimize boilerplate code.
File: agents/weather_agent_with_mcp.py
This agent provides weather information using tools from the Weather MCP server. It demonstrates the MCP integration pattern where tools are dynamically discovered from the MCP server.
Implementation approach:
build_mcp_agent() from agent runtime to create the agent.get_alerts, get_forecast).System prompt strategy:
Request flow example:
get_forecast(37.4419, -122.1430).Configuration:
WEATHER_MCP_BASE_URL - Base URL for Weather MCP server.WEATHER_MCP_TOKEN - Optional authentication token.File: agents/research_agent_with_web_search_mcp.py
This agent performs online research and gathers sources using the Web Search MCP server. It demonstrates how to combine multiple MCP tools (search + fetch) for comprehensive research workflows.
Implementation approach:
build_mcp_agent() from agent runtime.brave_search, fetch_url).System prompt strategy:
brave_search first to find relevant sources.fetch_url to read primary sources and confirm details.Request flow example:
brave_search("Seattle top attractions", num_results=10).fetch_url(url) for promising results.Configuration:
WEB_SEARCH_MCP_BASE_URL - Base URL for Web Search MCP server.WEB_SEARCH_MCP_TOKEN - Optional authentication token.File: agents/travel_agent_with_a2a.py
This agent demonstrates agent-to-agent communication using the A2A protocol. Instead of connecting to MCP servers directly, it orchestrates two downstream agents (Weather and Research) to create comprehensive travel plans.
Implementation approach:
build_tool_agent() from agent runtime with explicit A2A tools.a2a_research(query) - Calls Research Agent through A2A.a2a_weather(query) - Calls Weather Agent through A2A.a2a_execute_text() helper for agent-to-agent communication.System prompt strategy:
Request flow example:
a2a_research("Seattle attractions, restaurants, activities").a2a_weather("Seattle weather forecast next week").Configuration:
RESEARCH_A2A_BASE_URL - Base URL for Research Agent A2A endpoint (default: http://127.0.0.1:8000/a2a-research).WEATHER_A2A_BASE_URL - Base URL for Weather Agent A2A endpoint (default: http://127.0.0.1:8000/a2a-weather).A2A_TIMEOUT_S - Timeout for downstream agent calls (default: 360 seconds).A2A tool implementation:
from langchain_core.tools import tool
from protocols.a2a_client import a2a_execute_text
@tool
async def a2a_research(query: str) -> str:
"""Call the Research agent over A2A to gather up-to-date info and sources."""
return await a2a_execute_text(RESEARCH_A2A_BASE_URL, query, timeout_s=A2A_TIMEOUT_S)
@tool
async def a2a_weather(query: str) -> str:
"""Call the Weather agent over A2A to get weather/forecast guidance."""
return await a2a_execute_text(WEATHER_A2A_BASE_URL, query, timeout_s=A2A_TIMEOUT_S)
The A2A (Agent-to-Agent) protocol enables standardized agent-to-agent communication. This system uses the official a2a-sdk with custom helper utilities.
A2A components:
agent_runtime/a2a_deployment.py (deployment factory).protocols/a2a_card.py (discovery).protocols/a2a_client.py (execution).File: protocols/a2a_card.py
This module provides utilities for creating A2A AgentCards using the official a2a-sdk types. AgentCards enable agent discovery by advertising capabilities, skills, and endpoints.
Key function:
build_agent_card() - Creates an a2a.types.AgentCard for HTTP+JSON (REST) agents.Usage:
from protocols.a2a_card import build_agent_card
card = build_agent_card(
name="weather-agent",
description="Weather agent that uses a Weather MCP server",
version="0.1.0",
skills=["weather", "forecast", "current_conditions"],
url="http://127.0.0.1:8000/a2a-weather"
)
AgentCard format:
AgentCards are exposed at GET /.well-known/agent-card.json for A2A discovery:
{
"name": "weather-agent",
"description": "Weather agent that uses a Weather MCP server...",
"version": "0.1.0",
"url": "http://127.0.0.1:8000/a2a-weather",
"preferred_transport": "http+json",
"capabilities": {
"streaming": true,
"push_notifications": false,
"state_transition_history": false
},
"default_input_modes": ["text/plain"],
"default_output_modes": ["text/plain"],
"skills": [
{
"id": "weather-agent-primary",
"name": "weather-agent",
"description": "Weather agent that uses a Weather MCP server...",
"tags": ["weather", "forecast", "current_conditions"]
}
]
}
File: protocols/a2a_client.py
Agents that orchestrate other agents use this client for agent-to-agent communication through the official a2a-sdk REST transport.
Key function:
a2a_execute_text() - Sends a single text message to an A2A agent and returns the text response.Features:
a2a-sdk REST transport.Role.user, TextPart).Usage:
from protocols.a2a_client import a2a_execute_text
result = await a2a_execute_text(
base_url="http://127.0.0.1:8000/a2a-weather",
input_text="What's the forecast for Seattle?",
timeout_s=60.0,
headers={"Authorization": "Bearer <token>"} # Optional
)
print(result)
Standard A2A endpoints (exposed by all A2A agents):
| Endpoint | Description |
|---|---|
GET /.well-known/agent-card.json | Returns the AgentCard for discovery. |
POST /v1/message:send | Executes a message (blocking). |
POST /v1/message:stream | Executes a message (SSE streaming). |
GET /v1/tasks/{id} | Fetches and polls task state and history. |
Test A2A endpoints:
# Test A2A discovery
!curl http://127.0.0.1:8000/a2a-weather/.well-known/agent-card.json
# For execution, prefer the Python helper:
# python -c 'import asyncio; from protocols.a2a_client import a2a_execute_text; print(asyncio.run(a2a_execute_text("http://127.0.0.1:8000/a2a-weather","Weather in NYC?")))'
Use the a2a_execute_text helper for text-based calls:
from protocols.a2a_client import a2a_execute_text
# Call the Weather Agent via A2A
result = await a2a_execute_text(
base_url="http://127.0.0.1:8000/a2a-weather",
input_text="What's the forecast for Seattle?",
timeout_s=60.0
)
print(result)
You can also test discovery directly:
!curl http://127.0.0.1:8000/a2a-weather/.well-known/agent-card.json
See serve_multi_config.yaml for the complete deployment configuration.
Autoscaling configuration:
The system uses Ray Serve's built-in autoscaling to handle variable load. See the configuration details in serve_multi_config.yaml.
Build your own agents (or a multi-agent system) using the agent runtime
Integrate Langfuse for observability, prompt management, and evals
langfuse/langfuse