doc/source/ray-overview/examples/mcp-ray-serve/README.md
This repository provides end-to-end examples for deploying and scaling Model Context Protocol (MCP) servers using Ray Serve and Anyscale Service, covering both streamable HTTP and stdio transport types:
01-Deploy_custom_mcp_in_streamable_http_with_ray_serve.ipynb: Deploys a custom Weather MCP server in streamable HTTP mode behind FastAPI + Ray Serve, illustrating autoscaling, load‑balancing, and end‑to‑end testing on Anyscale.02-Build_mcp_gateway_with_existing_ray_serve_apps.ipynb: Shows how to stand up a single MCP gateway that multiplexes requests to multiple pre‑existing Ray Serve apps under one unified /mcp endpoint, requiring no code changes in the underlying services.03-Deploy_single_mcp_stdio_docker_image_with_ray_serve.ipynb: Wraps a stdio‑only MCP Docker image, for example Brave Search, with Ray Serve so it exposes /tools and /call HTTP endpoints and scales horizontally without rebuilding the image.04-Deploy_multiple_mcp_stdio_docker_images_with_ray_serve.ipynb: Extends the previous pattern to run several stdio‑based MCP images side‑by‑side, using fractional‑CPU deployments and a router to direct traffic to the right service.05-(Optional)_Build_docker_image_for_mcp_server.ipynb: Builds and pushes a lightweight Podman‑based Docker image for a Weather MCP server with uv in an Anyscale workspace.BRAVE_API_KEY) for notebooks 3 and 4You can run this example on your own Ray cluster or on Anyscale workspaces, which enables development without worrying about infrastructure—like working on a laptop. Workspaces come with:
Learn more about Anyscale Workspaces in the official documentation.
Note: Run the entire tutorial for free on Anyscale—all dependencies come pre-installed, and compute autoscales automatically. To run it elsewhere, install the dependencies from the Dockerfiles provided and provision the appropriate resources..
Seamlessly integrate with your existing CI/CD pipelines by leveraging the Anyscale CLI or SDK to deploy highly available services and run reliable batch jobs. Developing in an environment nearly identical to production—a multi-node cluster—drastically accelerates the dev-to-prod transition. This tutorial also introduces proprietary RayTurbo features that optimize workloads for performance, fault tolerance, scale, and observability.
Abstract away infrastructure from your ML/AI developers so they can focus on their core ML development. You can additionally better manage compute resources and costs with enterprise governance and observability and admin capabilities so you can set resource quotas, set priorities for different workloads and gain observability of your utilization across your entire compute fleet. If you're running on a Kubernetes cloud (EKS, GKE, etc.), you can still access the proprietary RayTurbo optimizations demonstrated in this tutorial by deploying the Anyscale Kubernetes operator.
:hidden:
01 Deploy_custom_mcp_in_streamable_http_with_ray_serve.ipynb
02 Build_mcp_gateway_with_existing_ray_serve_apps.ipynb
03 Deploy_single_mcp_stdio_docker_image_with_ray_serve.ipynb
04 Deploy_multiple_mcp_stdio_docker_images_with_ray_serve.ipynb
05 (Optional) Build_docker_image_for_mcp_server.ipynb