docs/v2/clients/sampling.mdx
import { VersionBadge } from "/snippets/version-badge.mdx";
<VersionBadge version="2.0.0" />MCP servers can request LLM completions from clients. The client handles these requests through a sampling handler callback.
Provide a sampling_handler function when creating the client:
from fastmcp import Client
from fastmcp.client.sampling import (
SamplingMessage,
SamplingParams,
RequestContext,
)
async def sampling_handler(
messages: list[SamplingMessage],
params: SamplingParams,
context: RequestContext
) -> str:
# Your LLM integration logic here
# Extract text from messages and generate a response
return "Generated response based on the messages"
client = Client(
"my_mcp_server.py",
sampling_handler=sampling_handler,
)
The sampling handler receives three parameters:
<Card icon="code" title="Sampling Handler Parameters"> <ResponseField name="SamplingMessage" type="Sampling Message Object"> <Expandable title="attributes"> <ResponseField name="role" type='Literal["user", "assistant"]'> The role of the message. </ResponseField><ResponseField name="content" type="TextContent | ImageContent | AudioContent">
The content of the message.
TextContent is most common, and has a `.text` attribute.
</ResponseField>
<ResponseField name="modelPreferences" type="ModelPreferences | None">
The server's preferences for which model to select. The client MAY ignore
these preferences.
<Expandable title="attributes">
<ResponseField name="hints" type="list[ModelHint] | None">
The hints to use for model selection.
</ResponseField>
<ResponseField name="costPriority" type="float | None">
The cost priority for model selection.
</ResponseField>
<ResponseField name="speedPriority" type="float | None">
The speed priority for model selection.
</ResponseField>
<ResponseField name="intelligencePriority" type="float | None">
The intelligence priority for model selection.
</ResponseField>
</Expandable>
</ResponseField>
<ResponseField name="systemPrompt" type="str | None">
An optional system prompt the server wants to use for sampling.
</ResponseField>
<ResponseField name="includeContext" type="IncludeContext | None">
A request to include context from one or more MCP servers (including the caller), to
be attached to the prompt.
</ResponseField>
<ResponseField name="temperature" type="float | None">
The sampling temperature.
</ResponseField>
<ResponseField name="maxTokens" type="int">
The maximum number of tokens to sample.
</ResponseField>
<ResponseField name="stopSequences" type="list[str] | None">
The stop sequences to use for sampling.
</ResponseField>
<ResponseField name="metadata" type="dict[str, Any] | None">
Optional metadata to pass through to the LLM provider.
</ResponseField>
<ResponseField name="tools" type="list[Tool] | None">
Optional list of tools the LLM can use during sampling. See [Using the OpenAI Handler](#using-the-openai-handler).
</ResponseField>
<ResponseField name="toolChoice" type="ToolChoice | None">
Optional control over tool usage behavior (`auto`, `required`, or `none`).
</ResponseField>
</Expandable>
from fastmcp import Client
from fastmcp.client.sampling import SamplingMessage, SamplingParams, RequestContext
async def basic_sampling_handler(
messages: list[SamplingMessage],
params: SamplingParams,
context: RequestContext
) -> str:
# Extract message content
conversation = []
for message in messages:
content = message.content.text if hasattr(message.content, 'text') else str(message.content)
conversation.append(f"{message.role}: {content}")
# Use the system prompt if provided
system_prompt = params.systemPrompt or "You are a helpful assistant."
# Here you would integrate with your preferred LLM service
# This is just a placeholder response
return f"Response based on conversation: {' | '.join(conversation)}"
client = Client(
"my_mcp_server.py",
sampling_handler=basic_sampling_handler
)
When you provide a sampling_handler, FastMCP automatically advertises full sampling capabilities to the server, including tool support. To disable tool support (for simpler handlers that don't support tools), pass sampling_capabilities explicitly:
from mcp.types import SamplingCapability
client = Client(
"my_mcp_server.py",
sampling_handler=basic_handler,
sampling_capabilities=SamplingCapability(), # No tool support
)
FastMCP provides built-in sampling handlers for OpenAI and Anthropic APIs. These handlers support the full sampling API including tool use, handling message conversion and response formatting automatically.
The OpenAI handler works with OpenAI's API and any OpenAI-compatible provider:
from fastmcp import Client
from fastmcp.client.sampling.handlers.openai import OpenAISamplingHandler
client = Client(
"my_mcp_server.py",
sampling_handler=OpenAISamplingHandler(default_model="gpt-4o"),
)
For OpenAI-compatible APIs (like local models), pass a custom client:
from openai import AsyncOpenAI
client = Client(
"my_mcp_server.py",
sampling_handler=OpenAISamplingHandler(
default_model="llama-3.1-70b",
client=AsyncOpenAI(base_url="http://localhost:8000/v1"),
),
)
The Anthropic handler uses Claude models via the Anthropic API:
from fastmcp import Client
from fastmcp.client.sampling.handlers.anthropic import AnthropicSamplingHandler
client = Client(
"my_mcp_server.py",
sampling_handler=AnthropicSamplingHandler(default_model="claude-sonnet-4-5"),
)
You can pass a custom client for advanced configuration:
from anthropic import AsyncAnthropic
client = Client(
"my_mcp_server.py",
sampling_handler=AnthropicSamplingHandler(
default_model="claude-sonnet-4-5",
client=AsyncAnthropic(), # Uses ANTHROPIC_API_KEY env var
),
)
Tool execution happens on the server side. The client's role is to pass tools to the LLM and return the LLM's response (which may include tool use requests). The server then executes the tools and may send follow-up sampling requests with tool results.
<Tip> To implement a custom sampling handler, see the [handler source code](https://github.com/PrefectHQ/fastmcp/tree/main/src/fastmcp/client/sampling/handlers) as a reference. </Tip>