docs/docs/integrations/language-models/amazon-bedrock.md
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-bedrock</artifactId>
<version>1.13.1</version>
</dependency>
In order to use Amazon Bedrock models, you need to configure AWS credentials.
One of the options is to set the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables. More information can be found here. Alternatively, set the AWS_BEARER_TOKEN_BEDROCK environment variable locally for API Key authentication. For additional API key details, refer to docs.
:::note Guardrails is not supported by the current implementation. :::
Supported models and their features can be found here.
Models ids can be found here.
ChatModel model = BedrockChatModel.builder()
.client(BedrockRuntimeClient)
.region(...)
.modelId("us.amazon.nova-lite-v1:0")
.returnThinking(...)
.sendThinking(...)
.timeout(...)
.maxRetries(...)
.logRequests(...)
.logResponses(...)
.listeners(...)
.defaultRequestParameters(BedrockChatRequestParameters.builder()
.modelName(...)
.temperature(...)
.topP(...)
.maxOutputTokens(...)
.stopSequences(...)
.toolSpecifications(...)
.toolChoice(...)
.additionalModelRequestFields(...)
.additionalModelRequestField(...)
.enableReasoning(...)
.promptCaching(...)
.build())
.build();
:::note Guardrails is not supported by the current implementation. :::
Supported models and their features can be found here.
Models ids can be found here.
StreamingChatModel model = BedrockStreamingChatModel.builder()
.client(BedrockRuntimeAsyncClient)
.region(...)
.modelId("us.amazon.nova-lite-v1:0")
.returnThinking(...)
.sendThinking(...)
.timeout(...)
.logRequests(...)
.logResponses(...)
.listeners(...)
.defaultRequestParameters(BedrockChatRequestParameters.builder()
.modelName(...)
.temperature(...)
.topP(...)
.maxOutputTokens(...)
.stopSequences(...)
.toolSpecifications(...)
.toolChoice(...)
.additionalModelRequestFields(...)
.additionalModelRequestField(...)
.enableReasoning(...)
.promptCaching(...)
.build())
.build();
The field additionalModelRequestFields in the BedrockChatRequestParameters is a Map<String, Object>.
As explained here
it allows to add inference parameters for a specific model that is not covered by common InferenceConfiguration.
To enable Claude thinking process, call enableReasoning on BedrockChatRequestParameters and set it via
defaultRequestParameters when building the model:
BedrockChatRequestParameters parameters = BedrockChatRequestParameters.builder()
.enableReasoning(1024) // token budget
.build();
ChatModel model = BedrockChatModel.builder()
.modelId("us.anthropic.claude-sonnet-4-20250514-v1:0")
.defaultRequestParameters(parameters)
.returnThinking(true)
.sendThinking(true)
.build();
The following parameters also control thinking behaviour:
returnThinking: controls whether to return thinking (if available) inside AiMessage.thinking()
and whether to invoke StreamingChatResponseHandler.onPartialThinking() and TokenStream.onPartialThinking()
callbacks when using BedrockStreamingChatModel.
Disabled by default. If enabled, tinking signatures will also be stored and returned inside the AiMessage.attributes().sendThinking: controls whether to send thinking and signatures stored in AiMessage to the LLM in follow-up requests.
Enabled by default.AWS Bedrock supports prompt caching to improve performance and reduce costs when making repeated API calls with similar prompts. This feature can reduce latency by up to 85% and costs by up to 90% for cached content.
Prompt caching allows you to mark specific points in your conversation to be cached. When you make subsequent API calls with the same cached content, Bedrock can reuse the cached portion, significantly reducing processing time and costs. The cache has a 5-minute TTL (Time To Live) which resets on each cache hit.
Prompt caching is supported on the following models:
To enable prompt caching, use the promptCaching() method in BedrockChatRequestParameters:
import dev.langchain4j.model.bedrock.BedrockChatRequestParameters;
import dev.langchain4j.model.bedrock.BedrockCachePointPlacement;
BedrockChatRequestParameters params = BedrockChatRequestParameters.builder()
.promptCaching(BedrockCachePointPlacement.AFTER_SYSTEM)
.temperature(0.7)
.maxOutputTokens(500)
.build();
ChatModel model = BedrockChatModel.builder()
.modelId("us.amazon.nova-micro-v1:0")
.region(Region.US_EAST_1)
.defaultRequestParameters(params)
.build();
The BedrockCachePointPlacement enum provides three options for where to place the cache point in your conversation:
AFTER_SYSTEM: Places the cache point after the system message. This is ideal when you have a consistent system prompt that you want to reuse across multiple conversations.AFTER_USER_MESSAGE: Places the cache point after the user message. Useful when you have a standard user prompt or context that remains the same.AFTER_TOOLS: Places the cache point after tool definitions. This is beneficial when you have a consistent set of tools that you want to cache.// Configure prompt caching to cache after system message
BedrockChatRequestParameters params = BedrockChatRequestParameters.builder()
.promptCaching(BedrockCachePointPlacement.AFTER_SYSTEM)
.build();
ChatModel model = BedrockChatModel.builder()
.modelId("us.anthropic.claude-sonnet-4-6")
.defaultRequestParameters(params)
.build();
// First request - establishes the cache
ChatRequest request1 = ChatRequest.builder()
.messages(Arrays.asList(
SystemMessage.from("You are a helpful coding assistant with expertise in Java."),
UserMessage.from("What is dependency injection?")
))
.build();
ChatResponse response1 = model.chat(request1);
// Second request - benefits from cached system message
ChatRequest request2 = ChatRequest.builder()
.messages(Arrays.asList(
SystemMessage.from("You are a helpful coding assistant with expertise in Java."),
UserMessage.from("What is the singleton pattern?")
))
.build();
ChatResponse response2 = model.chat(request2); // Faster response due to caching
Prompt caching can be combined with other Bedrock features like reasoning:
BedrockChatRequestParameters params = BedrockChatRequestParameters.builder()
.promptCaching(BedrockCachePointPlacement.AFTER_SYSTEM)
.enableReasoning(1000) // Enable reasoning with 1000 token budget
.temperature(0.3)
.maxOutputTokens(2000)
.build();
ChatModel model = BedrockChatModel.builder()
.modelId("us.anthropic.claude-sonnet-4-6")
.defaultRequestParameters(params)
.build();
AFTER_SYSTEM when your system prompt is consistent across conversationsAFTER_TOOLS when you have a stable set of tool definitionsAFTER_USER_MESSAGE for scenarios with repeated user contexts