models/spring-ai-anthropic/README.md
This module integrates the official Anthropic Java SDK with Spring AI, providing access to Claude models through Anthropic's API.
Anthropic Java SDK GitHub repository
Configure your Anthropic API key either programmatically or via environment variable:
AnthropicChatOptions options = AnthropicChatOptions.builder()
.apiKey("<your-api-key>")
.build();
Or using the environment variable (automatically detected):
export ANTHROPIC_API_KEY=<your-api-key>
This module supports:
// Create chat model with default options
AnthropicChatModel chatModel = new AnthropicChatModel(
AnthropicChatOptions.builder()
.model("claude-sonnet-4-20250514")
.maxTokens(1024)
.build()
);
// Synchronous call
ChatResponse response = chatModel.call(new Prompt("Hello, Claude!"));
// Streaming call
Flux<ChatResponse> stream = chatModel.stream(new Prompt("Tell me a story"));
var options = AnthropicChatOptions.builder()
.model("claude-sonnet-4-20250514")
.toolCallbacks(FunctionToolCallback.builder("getWeather", new WeatherService())
.description("Get the current weather for a location")
.inputType(WeatherRequest.class)
.build())
.build();
ChatResponse response = chatModel.call(new Prompt("What's the weather in Paris?", options));
Enable Claude's reasoning feature to see step-by-step thinking before the final answer:
var options = AnthropicChatOptions.builder()
.model("claude-sonnet-4-20250514")
.temperature(1.0) // required when thinking is enabled
.maxTokens(16000)
.thinkingEnabled(10000L) // budget must be >= 1024 and < maxTokens
.build();
ChatResponse response = chatModel.call(new Prompt("Solve this step by step...", options));
Three thinking modes are available via convenience builders:
thinkingEnabled(budgetTokens) - Enable with a specific token budgetthinkingAdaptive() - Let Claude decide whether to thinkthinkingDisabled() - Explicitly disable thinkingThinking is fully supported in both synchronous and streaming modes, including signature capture for thinking block verification.
Anthropic's Citations API allows Claude to reference specific parts of provided documents when generating responses. Three document types are supported: plain text, PDF, and custom content blocks.
// Create a citation document
AnthropicCitationDocument document = AnthropicCitationDocument.builder()
.plainText("The Eiffel Tower was completed in 1889 in Paris, France. " +
"It stands 330 meters tall and was designed by Gustave Eiffel.")
.title("Eiffel Tower Facts")
.citationsEnabled(true)
.build();
// Call the model with the document
ChatResponse response = chatModel.call(
new Prompt(
"When was the Eiffel Tower built?",
AnthropicChatOptions.builder()
.model("claude-sonnet-4-20250514")
.maxTokens(1024)
.citationDocuments(document)
.build()
)
);
// Access citations from response metadata
List<Citation> citations = (List<Citation>) response.getMetadata().get("citations");
for (Citation citation : citations) {
System.out.println("Document: " + citation.getDocumentTitle());
System.out.println("Cited text: " + citation.getCitedText());
}
PDF and custom content block documents are also supported via pdfFile(), pdf(), and customContent() builders.
Prompt caching reduces costs and latency by caching repeated context (system prompts, tool definitions, conversation history) across API calls. Five caching strategies are available:
| Strategy | Description |
|---|---|
NONE | No caching (default) |
SYSTEM_ONLY | Cache system message content |
TOOLS_ONLY | Cache tool definitions |
SYSTEM_AND_TOOLS | Cache both system messages and tool definitions |
CONVERSATION_HISTORY | Cache system messages, tools, and conversation messages |
// Cache system messages to reduce costs for repeated prompts
var options = AnthropicChatOptions.builder()
.model("claude-sonnet-4-20250514")
.maxTokens(1024)
.cacheOptions(AnthropicCacheOptions.builder()
.strategy(AnthropicCacheStrategy.SYSTEM_AND_TOOLS)
.build())
.build();
ChatResponse response = chatModel.call(
new Prompt(List.of(
new SystemMessage("You are an expert assistant with deep domain knowledge..."),
new UserMessage("What is the capital of France?")),
options));
// Access cache token usage via native SDK usage
com.anthropic.models.messages.Usage sdkUsage =
(com.anthropic.models.messages.Usage) response.getMetadata().getUsage().getNativeUsage();
long cacheCreation = sdkUsage.cacheCreationInputTokens().orElse(0L);
long cacheRead = sdkUsage.cacheReadInputTokens().orElse(0L);
You can also configure TTL (5 minutes or 1 hour), minimum content length thresholds, and multi-block system caching for static vs. dynamic system message segments:
var options = AnthropicCacheOptions.builder()
.strategy(AnthropicCacheStrategy.SYSTEM_ONLY)
.messageTypeTtl(MessageType.SYSTEM, AnthropicCacheTtl.ONE_HOUR)
.messageTypeMinContentLength(MessageType.SYSTEM, 100)
.multiBlockSystemCaching(true)
.build();
Structured output constrains Claude to produce responses conforming to a JSON schema. The SDK module also supports Anthropic's effort control for tuning response quality vs speed.
Model Requirement: Structured output and effort control require
claude-sonnet-4-6or newer. Older models likeclaude-sonnet-4-20250514do not support these features.
var options = AnthropicChatOptions.builder()
.model("claude-sonnet-4-6")
.outputSchema("""
{
"type": "object",
"properties": {
"name": {"type": "string"},
"capital": {"type": "string"},
"population": {"type": "integer"}
},
"required": ["name", "capital"],
"additionalProperties": false
}
""")
.build();
ChatResponse response = chatModel.call(new Prompt("Tell me about France.", options));
// Response text will be valid JSON conforming to the schema
Control how much compute Claude spends on its response. Lower effort means faster, cheaper responses; higher effort means more thorough reasoning.
var options = AnthropicChatOptions.builder()
.model("claude-sonnet-4-6")
.effort(OutputConfig.Effort.LOW) // LOW, MEDIUM, HIGH, or MAX
.build();
var options = AnthropicChatOptions.builder()
.model("claude-sonnet-4-6")
.outputSchema("{\"type\":\"object\",\"properties\":{\"answer\":{\"type\":\"integer\"}},\"required\":[\"answer\"],\"additionalProperties\":false}")
.effort(OutputConfig.Effort.HIGH)
.build();
For full control, use the SDK's OutputConfig directly:
import com.anthropic.models.messages.OutputConfig;
import com.anthropic.models.messages.JsonOutputFormat;
import com.anthropic.core.JsonValue;
var outputConfig = OutputConfig.builder()
.effort(OutputConfig.Effort.HIGH)
.format(JsonOutputFormat.builder()
.schema(JsonOutputFormat.Schema.builder()
.putAdditionalProperty("type", JsonValue.from("object"))
.putAdditionalProperty("properties", JsonValue.from(Map.of(
"name", Map.of("type", "string"))))
.putAdditionalProperty("additionalProperties", JsonValue.from(false))
.build())
.build())
.build();
var options = AnthropicChatOptions.builder()
.model("claude-sonnet-4-6")
.outputConfig(outputConfig)
.build();
Add custom HTTP headers to individual API calls. Unlike customHeaders (which apply to all requests at the client level), httpHeaders are set per request.
var options = AnthropicChatOptions.builder()
.httpHeaders(Map.of(
"X-Request-Id", "req-12345",
"X-Custom-Tracking", "my-value"))
.build();
ChatResponse response = chatModel.call(new Prompt("Hello", options));
Enable SDK logging by setting the environment variable:
export ANTHROPIC_LOG=debug
For comprehensive documentation, see: