site/docs/providers/aws-bedrock.md
The bedrock provider lets you use Amazon Bedrock in your evals. This is a common way to access Anthropic's Claude, Meta's Llama 3.3, Amazon's Nova, OpenAI's GPT-OSS models, AI21's Jamba, Alibaba's Qwen, and other models. The complete list of available models can be found in the AWS Bedrock model IDs documentation.
Model Access: Amazon Bedrock provides automatic access to serverless foundation models with no manual approval required.
aws-marketplace:SubscribeInstall the @aws-sdk/client-bedrock-runtime package:
npm install @aws-sdk/client-bedrock-runtime
The AWS SDK will automatically pull credentials from the following locations:
~/.aws/credentialsAWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variablesSee setting node.js credentials (AWS) for more details.
Edit your configuration file to point to the AWS Bedrock provider. Here's an example:
providers:
- id: bedrock:us.anthropic.claude-opus-4-7-v1:0
Note that the provider is bedrock: followed by the ARN/model id of the model.
Additional config parameters are passed like so:
providers:
- id: bedrock:anthropic.claude-3-5-sonnet-20241022-v2:0
config:
accessKeyId: YOUR_ACCESS_KEY_ID
secretAccessKey: YOUR_SECRET_ACCESS_KEY
region: 'us-west-2'
max_tokens: 256
temperature: 0.7
AWS Bedrock supports Application Inference Profiles, which allow you to use a single ARN to access multiple foundation models across different regions. This helps optimize costs and availability while maintaining consistent performance.
When using an inference profile ARN, you must specify the inferenceModelType in your configuration to indicate which model family the profile is configured for:
providers:
- id: bedrock:arn:aws:bedrock:us-east-1:123456789012:application-inference-profile/my-profile
config:
inferenceModelType: 'claude' # Required for inference profiles
region: 'us-east-1'
max_tokens: 256
temperature: 0.7
The inferenceModelType config option supports the following values:
claude - For Anthropic Claude modelsnova - For Amazon Nova models (v1)nova2 - For Amazon Nova 2 models (with reasoning support)llama - Defaults to Llama 4 (latest version)llama2 - For Meta Llama 2 modelsllama3 - For Meta Llama 3 modelsllama3.1 or llama3_1 - For Meta Llama 3.1 modelsllama3.2 or llama3_2 - For Meta Llama 3.2 modelsllama3.3 or llama3_3 - For Meta Llama 3.3 modelsllama4 - For Meta Llama 4 modelsmistral - For Mistral modelscohere - For Cohere modelsai21 - For AI21 modelstitan - For Amazon Titan modelsdeepseek - For DeepSeek modelsopenai - For OpenAI modelsqwen - For Alibaba Qwen models# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
# Claude Opus 4.7 via global inference profile
- id: bedrock:arn:aws:bedrock:us-east-2::inference-profile/global.anthropic.claude-opus-4-7
config:
inferenceModelType: 'claude'
region: 'us-east-2'
max_tokens: 1024
temperature: 0.7
# Using an inference profile that routes to Claude models
- id: bedrock:arn:aws:bedrock:us-east-1:123456789012:application-inference-profile/claude-profile
config:
inferenceModelType: 'claude'
max_tokens: 1024
temperature: 0.7
anthropic_version: 'bedrock-2023-05-31'
# Using an inference profile for Llama models
- id: bedrock:arn:aws:bedrock:us-west-2:123456789012:application-inference-profile/llama-profile
config:
inferenceModelType: 'llama3.3'
max_gen_len: 1024
temperature: 0.7
# Using an inference profile for Nova models
- id: bedrock:arn:aws:bedrock:eu-west-1:123456789012:application-inference-profile/nova-profile
config:
inferenceModelType: 'nova'
interfaceConfig:
max_new_tokens: 1024
temperature: 0.7
:::tip
Application Inference Profiles provide several benefits:
When using inference profiles, ensure the inferenceModelType matches the model family your profile is configured for, as the configuration parameters differ between model types.
:::
The Converse API provides a unified interface across all Bedrock models with native support for extended thinking (reasoning), tool calling, and guardrails. Use the bedrock:converse: prefix to access this API.
providers:
- id: bedrock:converse:anthropic.claude-3-5-sonnet-20241022-v2:0
config:
region: us-east-1
maxTokens: 4096
temperature: 0.7
Enable Claude's extended thinking capabilities for complex reasoning tasks:
providers:
- id: bedrock:converse:us.anthropic.claude-sonnet-4-5-20250929-v1:0
config:
region: us-west-2
maxTokens: 20000
thinking:
type: enabled
budget_tokens: 16000
showThinking: true # Include thinking content in output
The thinking configuration controls Claude's reasoning behavior:
type: enabled - Activates extended thinkingbudget_tokens - Maximum tokens allocated for thinking (minimum 1024)Use showThinking: true to include the model's reasoning process in the output, or false to only show the final response.
:::warning
Do not set temperature, topP, or topK when using extended thinking. These sampling parameters are incompatible with reasoning mode.
:::
| Option | Description |
|---|---|
maxTokens | Maximum output tokens |
temperature | Sampling temperature (0-1) |
topP | Nucleus sampling parameter |
stopSequences | Array of stop sequences |
thinking | Extended thinking configuration (Claude models) |
reasoningConfig | Reasoning configuration (Amazon Nova 2 models) |
showThinking | Include thinking in output (default: false) |
performanceConfig | Performance settings (latency: optimized) |
serviceTier | Service tier (priority, default, or flex) |
guardrailIdentifier | Guardrail ID for content filtering |
guardrailVersion | Guardrail version (default: DRAFT) |
Optimize for latency or cost:
providers:
- id: bedrock:converse:anthropic.claude-3-5-sonnet-20241022-v2:0
config:
performanceConfig:
latency: optimized # or 'standard'
serviceTier:
type: priority # or 'default', 'flex'
The Converse API works with all Bedrock models that support the Converse operation:
Amazon Bedrock supports multiple authentication methods, including the new API key authentication for simplified access. Credentials are resolved in this priority order:
Credentials are resolved in the following priority order:
accessKeyId, secretAccessKey)apiKey)profile)~/.aws/credentials)The first available credential method is used automatically.
Specify AWS access keys directly in your configuration. For security, use environment variables instead of hardcoding credentials:
providers:
- id: bedrock:us.anthropic.claude-3-5-sonnet-20241022-v2:0
config:
accessKeyId: '{{env.AWS_ACCESS_KEY_ID}}'
secretAccessKey: '{{env.AWS_SECRET_ACCESS_KEY}}'
sessionToken: '{{env.AWS_SESSION_TOKEN}}' # Optional, for temporary credentials
region: 'us-east-1' # Optional, defaults to us-east-1
Environment variables:
export AWS_ACCESS_KEY_ID="your_access_key_id"
export AWS_SECRET_ACCESS_KEY="your_secret_access_key"
export AWS_SESSION_TOKEN="your_session_token" # Optional
:::warning Security Best Practice
Do not commit credentials to version control. Use environment variables or a dedicated secrets management system to handle sensitive keys.
:::
This method overrides all other credential sources, including EC2 instance roles and SSO profiles.
Amazon Bedrock API keys provide simplified authentication without managing AWS IAM credentials.
Using environment variables:
Set the AWS_BEARER_TOKEN_BEDROCK environment variable:
export AWS_BEARER_TOKEN_BEDROCK="your-api-key-here"
providers:
- id: bedrock:us.anthropic.claude-3-5-sonnet-20241022-v2:0
config:
region: 'us-east-1' # Optional, defaults to us-east-1
Using config file:
Specify the API key directly in your configuration:
providers:
- id: bedrock:us.anthropic.claude-3-5-sonnet-20241022-v2:0
config:
apiKey: 'your-api-key-here'
region: 'us-east-1' # Optional, defaults to us-east-1
:::note
API keys are limited to Amazon Bedrock and Amazon Bedrock Runtime actions. They cannot be used with:
For these advanced features, use traditional AWS IAM credentials instead.
:::
Use a named profile from your AWS configuration for AWS SSO setups or managing multiple AWS accounts:
providers:
- id: bedrock:us.anthropic.claude-3-5-sonnet-20241022-v2:0
config:
profile: 'YOUR_SSO_PROFILE'
region: 'us-east-1' # Optional, defaults to us-east-1
Prerequisites for SSO profiles:
Install AWS CLI v2: Ensure AWS CLI v2 is installed and on your PATH.
Configure AWS SSO: Set up AWS SSO using the AWS CLI:
aws configure sso
Profile configuration: Your ~/.aws/config should contain the profile:
[profile YOUR_SSO_PROFILE]
sso_start_url = https://your-sso-portal.awsapps.com/start
sso_region = us-east-1
sso_account_id = 123456789012
sso_role_name = YourRoleName
region = us-east-1
Active SSO session: Ensure you have an active SSO session:
aws sso login --profile YOUR_SSO_PROFILE
Use SSO profiles when:
Use the AWS SDK's standard credential chain:
providers:
- id: bedrock:us.anthropic.claude-3-5-sonnet-20241022-v2:0
config:
region: 'us-east-1' # Only region specified
The AWS SDK checks these sources in order:
AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN~/.aws/credentials (from aws configure)Use default credentials when:
aws configure)Quick setup for local development:
# Option 1: Using AWS CLI
aws configure
# Option 2: Using environment variables
export AWS_ACCESS_KEY_ID="your_access_key"
export AWS_SECRET_ACCESS_KEY="your_secret_key"
export AWS_DEFAULT_REGION="us-east-1"
See Github for full examples of Claude, Nova, AI21, Llama 3.3, and Titan model usage.
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
prompts:
- 'Write a tweet about {{topic}}'
providers:
# Using inference profiles (requires inferenceModelType)
- id: bedrock:arn:aws:bedrock:us-east-1:123456789012:application-inference-profile/my-claude-profile
config:
inferenceModelType: 'claude'
region: 'us-east-1'
temperature: 0.7
max_tokens: 256
# Using regular model IDs
- id: bedrock:meta.llama3-1-405b-instruct-v1:0
config:
region: 'us-east-1'
temperature: 0.7
max_tokens: 256
- id: bedrock:us.meta.llama3-3-70b-instruct-v1:0
config:
max_gen_len: 256
- id: bedrock:amazon.nova-lite-v1:0
config:
region: 'us-east-1'
interfaceConfig:
temperature: 0.7
max_new_tokens: 256
- id: bedrock:us.amazon.nova-premier-v1:0
config:
region: 'us-east-1'
interfaceConfig:
temperature: 0.7
max_new_tokens: 256
- id: bedrock:us.anthropic.claude-sonnet-4-5-20250929-v1:0
config:
region: 'us-east-1'
temperature: 0.7
max_tokens: 256
- id: bedrock:us.anthropic.claude-opus-4-1-20250805-v1:0
config:
region: 'us-east-1'
temperature: 0.7
max_tokens: 256
- id: bedrock:us.anthropic.claude-3-5-sonnet-20241022-v2:0
config:
region: 'us-east-1'
temperature: 0.7
max_tokens: 256
- id: bedrock:us.anthropic.claude-3-5-haiku-20241022-v1:0
config:
region: 'us-east-1'
temperature: 0.7
max_tokens: 256
- id: bedrock:us.anthropic.claude-3-opus-20240229-v1:0
config:
region: 'us-east-1'
temperature: 0.7
max_tokens: 256
- id: bedrock:openai.gpt-oss-120b-1:0
config:
region: 'us-west-2'
temperature: 0.7
max_completion_tokens: 256
reasoning_effort: 'medium'
- id: bedrock:openai.gpt-oss-20b-1:0
config:
region: 'us-west-2'
temperature: 0.7
max_completion_tokens: 256
reasoning_effort: 'low'
- id: bedrock:qwen.qwen3-coder-480b-a35b-v1:0
config:
region: 'us-west-2'
temperature: 0.7
max_tokens: 256
showThinking: true
- id: bedrock:qwen.qwen3-32b-v1:0
config:
region: 'us-east-1'
temperature: 0.7
max_tokens: 256
tests:
- vars:
topic: Our eco-friendly packaging
- vars:
topic: A sneak peek at our secret menu item
- vars:
topic: Behind-the-scenes at our latest photoshoot
Different models may support different configuration options. Here are some model-specific parameters:
inferenceModelType: (Required for inference profiles) Specifies the model family when using application inference profiles. Options include: claude, nova, nova2, llama, llama2, llama3, llama3.1, llama3.2, llama3.3, llama4, mistral, cohere, ai21, titan, deepseek, openai, qwenAmazon Nova models (e.g., amazon.nova-lite-v1:0, amazon.nova-pro-v1:0, amazon.nova-micro-v1:0, amazon.nova-premier-v1:0) support advanced features like tool use and structured outputs. You can configure them with the following options:
providers:
- id: bedrock:amazon.nova-lite-v1:0
config:
interfaceConfig:
max_new_tokens: 256 # Maximum number of tokens to generate
temperature: 0.7 # Controls randomness (0.0 to 1.0)
top_p: 0.9 # Nucleus sampling parameter
top_k: 50 # Top-k sampling parameter
stopSequences: ['END'] # Optional stop sequences
toolConfig: # Optional tool configuration
tools:
- toolSpec:
name: 'calculator'
description: 'A basic calculator for arithmetic operations'
inputSchema:
json:
type: 'object'
properties:
expression:
description: 'The arithmetic expression to evaluate'
type: 'string'
required: ['expression']
toolChoice: # Optional tool selection
tool:
name: 'calculator'
:::note
Nova models use a slightly different configuration structure compared to other Bedrock models, with separate interfaceConfig and toolConfig sections.
:::
Amazon Nova 2 models introduce extended thinking capabilities with configurable reasoning levels. Nova 2 Lite (amazon.nova-2-lite-v1:0) supports step-by-step reasoning and task decomposition with a 1 million token context window.
providers:
# Use cross-region model ID (us.) for on-demand access
- id: bedrock:us.amazon.nova-2-lite-v1:0
config:
interfaceConfig:
max_new_tokens: 4096
reasoningConfig:
type: enabled # Enable extended thinking
maxReasoningEffort: medium # low, medium, or high
Reasoning Configuration:
type: Set to enabled to activate extended thinking, or disabled for fast responses (default)maxReasoningEffort: Controls thinking depth - low, medium, or highWhen extended thinking is enabled, the model's reasoning process is captured in the response output with <thinking> tags, similar to other reasoning models.
:::warning
When using reasoningConfig with type: enabled:
temperature, top_p, or top_k - these are incompatible with reasoning modemaxReasoningEffort: high: Also do not set max_new_tokens - the model manages output length automatically:::
Regional Model IDs:
Nova 2 models require cross-region inference profiles for on-demand access:
us.amazon.nova-2-lite-v1:0 - US region (recommended)eu.amazon.nova-2-lite-v1:0 - EU regionapac.amazon.nova-2-lite-v1:0 - Asia Pacific regionglobal.amazon.nova-2-lite-v1:0 - Global cross-region inferenceUsing Nova 2 with Converse API:
Nova 2 reasoning is also supported via the Converse API, which provides a unified interface across Bedrock models:
providers:
- id: bedrock:converse:us.amazon.nova-2-lite-v1:0
config:
maxTokens: 4096
reasoningConfig:
type: enabled
maxReasoningEffort: medium
The same parameter constraints apply when using the Converse API.
The Amazon Nova Sonic model (amazon.nova-sonic-v1:0) is a multimodal model that supports audio input and text/audio output with tool-using capabilities. It has a different configuration structure compared to other Nova models:
providers:
- id: bedrock:amazon.nova-sonic-v1:0
config:
inferenceConfiguration:
maxTokens: 1024 # Maximum number of tokens to generate
temperature: 0.7 # Controls randomness (0.0 to 1.0)
topP: 0.95 # Nucleus sampling parameter
textOutputConfiguration:
mediaType: text/plain
toolConfiguration: # Optional tool configuration
tools:
- toolSpec:
name: 'getDateTool'
description: 'Get information about the current date'
inputSchema:
json: '{"$schema":"http://json-schema.org/draft-07/schema#","type":"object","properties":{},"required":[]}'
toolUseOutputConfiguration:
mediaType: application/json
# Optional audio output configuration
audioOutputConfiguration:
mediaType: audio/lpcm
sampleRateHertz: 24000
sampleSizeBits: 16
channelCount: 1
voiceId: matthew
encoding: base64
audioType: SPEECH
Note: Nova Sonic has advanced multimodal capabilities including audio input/output, but audio input requires base64 encoded data which may be better handled through the API directly rather than in the configuration file.
Amazon Nova Reel (amazon.nova-reel-v1:1) generates studio-quality videos from text prompts. Videos are generated in 6-second increments up to 2 minutes.
:::note Prerequisites
Nova Reel requires an Amazon S3 bucket for video output. Your AWS credentials must have:
bedrock:InvokeModel and bedrock:StartAsyncInvoke permissionss3:PutObject permission on the output buckets3:GetObject permission for downloading generated videos:::
Nova Reel is available in us-east-1 region.
providers:
- id: bedrock:video:amazon.nova-reel-v1:1
config:
region: us-east-1
s3OutputUri: s3://my-bucket/videos # Required
durationSeconds: 6 # Default: 6
seed: 42 # Optional: for reproducibility
Nova Reel supports three task types:
TEXT_VIDEO (default) - Generate a 6-second video from a text prompt:
providers:
- id: bedrock:video:amazon.nova-reel-v1:1
config:
s3OutputUri: s3://my-bucket/videos
taskType: TEXT_VIDEO
durationSeconds: 6
MULTI_SHOT_AUTOMATED - Generate longer videos (12-120 seconds) from a single prompt:
providers:
- id: bedrock:video:amazon.nova-reel-v1:1
config:
s3OutputUri: s3://my-bucket/videos
taskType: MULTI_SHOT_AUTOMATED
durationSeconds: 18 # Must be multiple of 6
MULTI_SHOT_MANUAL - Define individual shots with separate prompts:
providers:
- id: bedrock:video:amazon.nova-reel-v1:1
config:
s3OutputUri: s3://my-bucket/videos
taskType: MULTI_SHOT_MANUAL
durationSeconds: 12
shots:
- text: 'Drone footage of a forest from high altitude'
- text: 'Camera arcs around vehicles in a forest'
Use an image as the starting frame (must be 1280x720):
providers:
- id: bedrock:video:amazon.nova-reel-v1:1
config:
s3OutputUri: s3://my-bucket/videos
image: file://path/to/image.png
| Option | Description | Default |
|---|---|---|
s3OutputUri | S3 bucket URI for output (required) | - |
taskType | TEXT_VIDEO, MULTI_SHOT_AUTOMATED, or MULTI_SHOT_MANUAL | TEXT_VIDEO |
durationSeconds | Video duration (6, or 12-120 in multiples of 6) | 6 |
seed | Random seed (0-2,147,483,646) | - |
image | Starting frame image (file:// path or base64) | - |
shots | Shot definitions for MULTI_SHOT_MANUAL | - |
pollIntervalMs | Polling interval in ms | 10000 |
maxPollTimeMs | Maximum polling time in ms | 900000 |
downloadFromS3 | Download video to local blob storage | true |
Generated videos are 1280x720 resolution at 24 FPS in MP4 format.
:::warning Generation Time Video generation is asynchronous and takes approximately:
The provider polls for completion automatically. :::
For AI21 models (e.g., ai21.jamba-1-5-mini-v1:0, ai21.jamba-1-5-large-v1:0), you can use the following configuration options:
config:
max_tokens: 256
temperature: 0.7
top_p: 0.9
frequency_penalty: 0.5
presence_penalty: 0.3
For Claude models (e.g., anthropic.claude-sonnet-4-6, anthropic.claude-sonnet-4-5-20250929-v1:0, anthropic.claude-haiku-4-5-20251001-v1:0, anthropic.claude-sonnet-4-20250514-v1:0, anthropic.us.claude-3-5-sonnet-20241022-v2:0), you can use the following configuration options:
Note: Claude Opus 4.7 (anthropic.claude-opus-4-7) is available via cross-region inference profiles (us., eu., jp., global.) and — at launch — through the base foundation model ID in select regions (US East/N. Virginia, Europe/Ireland, Europe/Stockholm, Asia Pacific/Tokyo). Claude Opus 4.6 (anthropic.claude-opus-4-6-v1) and Claude Opus 4.5 (anthropic.claude-opus-4-5-20251101-v1:0) still require an inference profile ARN and cannot be used as a direct model ID. See the Application Inference Profiles section for setup.
config:
max_tokens: 256
temperature: 0.7
anthropic_version: 'bedrock-2023-05-31'
tools: [...] # Optional: Specify available tools
tool_choice: { ... } # Optional: Specify tool choice
thinking: { ... } # Optional: Enable Claude's extended thinking capability
showThinking: true # Optional: Control whether thinking content is included in output
When using Claude's extended thinking capability, you can configure it like this:
config:
max_tokens: 20000
thinking:
type: 'enabled'
budget_tokens: 16000 # Must be ≥1024 and less than max_tokens
showThinking: true # Whether to include thinking content in the output (default: true)
:::tip
The showThinking parameter controls whether thinking content is included in the response output:
true (default), thinking content will be included in the outputfalse, thinking content will be excluded from the outputThis is useful when you want to use thinking for better reasoning but don't want to expose the thinking process to end users.
:::
For Titan models (e.g., amazon.titan-text-express-v1), you can use the following configuration options:
config:
maxTokenCount: 256
temperature: 0.7
topP: 0.9
stopSequences: ['END']
For Llama models (e.g., meta.llama3-1-70b-instruct-v1:0, meta.llama3-2-90b-instruct-v1:0, meta.llama3-3-70b-instruct-v1:0, meta.llama4-scout-17b-instruct-v1:0, meta.llama4-maverick-17b-instruct-v1:0), you can use the following configuration options:
config:
max_gen_len: 256
temperature: 0.7
top_p: 0.9
Llama 3.2 Vision models (us.meta.llama3-2-11b-instruct-v1:0, us.meta.llama3-2-90b-instruct-v1:0) support image inputs. You can use them with either the legacy InvokeModel API or the Converse API:
Using InvokeModel API (legacy):
providers:
- id: bedrock:us.meta.llama3-2-11b-instruct-v1:0
config:
region: us-east-1
max_gen_len: 256
prompts:
- file://llama_vision_prompt.json
tests:
- vars:
image: file://path/to/image.jpg
[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": "{{image}}"
}
},
{
"type": "text",
"text": "What is in this image?"
}
]
}
]
Using Converse API:
providers:
- id: bedrock:converse:us.meta.llama3-2-11b-instruct-v1:0
config:
region: us-east-1
maxTokens: 256
The Converse API uses the same prompt format shown above for Nova Vision.
For Cohere models (e.g., cohere.command-text-v14), you can use the following configuration options:
config:
max_tokens: 256
temperature: 0.7
p: 0.9
k: 0
stop_sequences: ['END']
For Mistral models (e.g., mistral.mistral-7b-instruct-v0:2), you can use the following configuration options:
config:
max_tokens: 256
temperature: 0.7
top_p: 0.9
top_k: 50
For DeepSeek models, you can use the following configuration options:
config:
# Deepseek params
max_tokens: 256
temperature: 0.7
top_p: 0.9
# Promptfoo control params
showThinking: true # Optional: Control whether thinking content is included in output
DeepSeek models support an extended thinking capability. The showThinking parameter controls whether thinking content is included in the response output:
true (default), thinking content will be included in the outputfalse, thinking content will be excluded from the outputThis allows you to access the model's reasoning process during generation while having the option to present only the final response to end users.
OpenAI's open-weight models are available through AWS Bedrock with full support for their reasoning capabilities and parameters. The available models include:
openai.gpt-oss-120b-1:0: 120 billion parameter model with strong reasoning capabilitiesopenai.gpt-oss-20b-1:0: 20 billion parameter model, more cost-effectiveconfig:
max_completion_tokens: 1024 # Maximum tokens for response (OpenAI-style parameter)
temperature: 0.7 # Controls randomness (0.0 to 1.0)
top_p: 0.9 # Nucleus sampling parameter
frequency_penalty: 0.1 # Reduces repetition of frequent tokens
presence_penalty: 0.1 # Reduces repetition of any tokens
stop: ['END', 'STOP'] # Stop sequences
reasoning_effort: 'medium' # Controls reasoning depth: 'low', 'medium', 'high'
OpenAI models support adjustable reasoning effort through the reasoning_effort parameter:
low: Faster responses with basic reasoningmedium: Balanced performance and reasoning depthhigh: Thorough reasoning, slower but more accurate responsesThe reasoning effort is implemented via system prompt instructions, allowing the model to adjust its cognitive processing depth.
providers:
- id: bedrock:openai.gpt-oss-120b-1:0
config:
region: 'us-west-2'
max_completion_tokens: 2048
temperature: 0.3
top_p: 0.95
reasoning_effort: 'high'
- id: bedrock:openai.gpt-oss-20b-1:0
config:
region: 'us-west-2'
max_completion_tokens: 1024
temperature: 0.5
reasoning_effort: 'medium'
stop: ['END', 'FINAL']
:::note
OpenAI models use max_completion_tokens instead of max_tokens like other Bedrock models. This aligns with OpenAI's API specification and allows for more precise control over response length.
:::
Alibaba's Qwen models (e.g., qwen.qwen3-coder-480b-a35b-v1:0, qwen.qwen3-coder-30b-a3b-v1:0, qwen.qwen3-235b-a22b-2507-v1:0, qwen.qwen3-32b-v1:0) support advanced features including hybrid thinking modes, tool calling, and extended context understanding.
Regional Availability: Check the AWS Bedrock console or use aws bedrock list-foundation-models to verify which Qwen models are available in your target region, as availability varies by model and region.
You can configure them with the following options:
config:
max_tokens: 2048 # Maximum number of tokens to generate
temperature: 0.7 # Controls randomness (0.0 to 1.0)
top_p: 0.9 # Nucleus sampling parameter
frequency_penalty: 0.1 # Reduces repetition of frequent tokens
presence_penalty: 0.1 # Reduces repetition of any tokens
stop: ['END', 'STOP'] # Stop sequences
showThinking: true # Control whether thinking content is included in output
tools: [...] # Tool calling configuration (optional)
tool_choice: 'auto' # Tool selection strategy (optional)
Qwen models support hybrid thinking modes where the model can apply step-by-step reasoning before delivering the final answer. The showThinking parameter controls whether thinking content is included in the response output:
true (default), thinking content will be included in the outputfalse, thinking content will be excluded from the outputThis allows you to access the model's reasoning process during generation while having the option to present only the final response to end users.
Qwen models support tool calling with OpenAI-compatible function definitions.
config:
tools:
- type: function
function:
name: calculate
description: Perform arithmetic calculations
parameters:
type: object
properties:
expression:
type: string
description: The mathematical expression to evaluate
required: ['expression']
tool_choice: auto # 'auto', 'none', or specific function name
providers:
- id: bedrock:qwen.qwen3-coder-480b-a35b-v1:0
config:
region: us-west-2
max_tokens: 2048
temperature: 0.7
top_p: 0.9
showThinking: true
tools:
- type: function
function:
name: code_analyzer
description: Analyze code for potential issues
parameters:
type: object
properties:
code:
type: string
description: The code to analyze
required: ['code']
tool_choice: auto
You can use Bedrock models to grade outputs. By default, model-graded tests use gpt-5 and require the OPENAI_API_KEY environment variable to be set. However, when using AWS Bedrock, you have the option of overriding the grader for model-graded assertions to point to AWS Bedrock or other providers.
You can use either regular model IDs or application inference profiles for grading:
:::warning
Because of how model-graded evals are implemented, the LLM grading models must support chat-formatted prompts (except for embedding or classification models).
:::
To set this for all your test cases, add the defaultTest property to your config:
defaultTest:
options:
provider:
# Using a regular model ID
id: bedrock:us.anthropic.claude-3-5-sonnet-20241022-v2:0
config:
temperature: 0
# Other provider config options
# Or using an inference profile
# id: bedrock:arn:aws:bedrock:us-east-1:123456789012:application-inference-profile/grading-profile
# config:
# inferenceModelType: 'claude'
# temperature: 0
You can also do this for individual assertions:
# ...
assert:
- type: llm-rubric
value: Do not mention that you are an AI or chat assistant
provider:
text:
id: provider:chat:modelname
config:
region: us-east-1
temperature: 0
# Other provider config options...
Or for individual tests:
# ...
tests:
- vars:
# ...
options:
provider:
id: provider:chat:modelname
config:
temperature: 0
# Other provider config options
assert:
- type: llm-rubric
value: Do not mention that you are an AI or chat assistant
Several Bedrock models support multimodal inputs including images and text:
To use these capabilities, structure your prompts to include both image data and text content.
Amazon Nova supports comprehensive vision understanding for both images and videos:
Here's an example configuration for running multimodal evaluations:
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
description: 'Bedrock Nova Eval with Images'
prompts:
- file://nova_multimodal_prompt.json
providers:
- id: bedrock:amazon.nova-pro-v1:0
config:
region: 'us-east-1'
inferenceConfig:
temperature: 0.7
max_new_tokens: 256
tests:
- vars:
image: file://path/to/image.jpg
The prompt file (nova_multimodal_prompt.json) should be structured to include both image and text content. This format will depend on the specific model you're using:
[
{
"role": "user",
"content": [
{
"image": {
"format": "jpg",
"source": { "bytes": "{{image}}" }
}
},
{
"text": "What is this a picture of?"
}
]
}
]
See Github for a runnable example.
When loading image files as variables, Promptfoo automatically converts them to the appropriate format for the model. The supported image formats include:
To override the embeddings provider for all assertions that require embeddings (such as similarity), use defaultTest:
defaultTest:
options:
provider:
embedding:
id: bedrock:embeddings:amazon.titan-embed-text-v2:0
config:
region: us-east-1
To use guardrails, set the guardrailIdentifier and guardrailVersion in the provider config.
For example:
providers:
- id: bedrock:us.anthropic.claude-3-5-sonnet-20241022-v2:0
config:
guardrailIdentifier: 'test-guardrail'
guardrailVersion: 1 # The version number for the guardrail. The value can also be DRAFT.
The following environment variables can be used to configure the Bedrock provider:
Authentication:
AWS_BEARER_TOKEN_BEDROCK: Bedrock API key for simplified authenticationConfiguration:
AWS_BEDROCK_REGION: Default region for Bedrock API callsAWS_BEDROCK_MAX_TOKENS: Default maximum number of tokens to generateAWS_BEDROCK_TEMPERATURE: Default temperature for generationAWS_BEDROCK_TOP_P: Default top_p value for generationAWS_BEDROCK_FREQUENCY_PENALTY: Default frequency penalty (for supported models)AWS_BEDROCK_PRESENCE_PENALTY: Default presence penalty (for supported models)AWS_BEDROCK_STOP: Default stop sequences (as a JSON string)AWS_BEDROCK_MAX_RETRIES: Number of retry attempts for failed API calls (default: 10)Model-specific environment variables:
MISTRAL_MAX_TOKENS, MISTRAL_TEMPERATURE, MISTRAL_TOP_P, MISTRAL_TOP_K: For Mistral modelsCOHERE_TEMPERATURE, COHERE_P, COHERE_K, COHERE_MAX_TOKENS: For Cohere modelsThese environment variables can be overridden by the configuration specified in the YAML file.
Error: Unable to locate credentials. You can configure credentials by running "aws configure".
Solutions:
aws configure list to see active credentialsaws sso login --profile YOUR_PROFILEAWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are setSolutions:
bedrock:InvokeModel permission"SSO session has expired":
aws sso login --profile YOUR_PROFILE
"Profile not found":
~/.aws/config contains the profileEnable debug logging to see which credentials are being used:
export AWS_SDK_JS_LOG=1
npx promptfoo eval
This will show detailed AWS SDK logs including credential resolution.
If you see this error when using an inference profile ARN:
Error: Inference profile requires inferenceModelType to be specified in config. Options: claude, nova, llama (defaults to v4), llama2, llama3, llama3.1, llama3.2, llama3.3, llama4, mistral, cohere, ai21, titan, deepseek, openai, qwen
This means you're using an application inference profile ARN but haven't specified which model family it's configured for. Add the inferenceModelType to your configuration:
providers:
# Incorrect - missing inferenceModelType
- id: bedrock:arn:aws:bedrock:us-east-1:123456789012:application-inference-profile/my-profile
# Correct - includes inferenceModelType
- id: bedrock:arn:aws:bedrock:us-east-1:123456789012:application-inference-profile/my-profile
config:
inferenceModelType: 'claude' # Specify the model family
If you see this error:
ValidationException: Invocation of model ID anthropic.claude-3-5-sonnet-20241022-v2:0 with on-demand throughput isn't supported. Retry your request with the ID or ARN of an inference profile that contains this model.
This usually means you need to use the region-specific model ID. Update your provider configuration to include the regional prefix:
providers:
# Instead of this:
- id: bedrock:anthropic.claude-sonnet-4-5-20250929-v1:0
# Use this:
- id: bedrock:us.anthropic.claude-sonnet-4-5-20250929-v1:0 # US region
# or
- id: bedrock:eu.anthropic.claude-sonnet-4-5-20250929-v1:0 # EU region
# or
- id: bedrock:apac.anthropic.claude-sonnet-4-5-20250929-v1:0 # APAC region
Make sure to:
us., eu., or apac.) based on your AWS regionIf you see this error, the cause depends on which model provider you're using:
For most serverless models (Amazon, DeepSeek, Mistral, Meta, Qwen, OpenAI):
bedrock:InvokeModelFor Anthropic models (Claude):
For AWS Marketplace models:
aws-marketplace:SubscribeAWS Bedrock Knowledge Bases provide Retrieval Augmented Generation (RAG) functionality, allowing you to query a knowledge base with natural language and get responses based on your data.
To use the Knowledge Base provider, you need:
An existing Knowledge Base created in AWS Bedrock
Install the @aws-sdk/client-bedrock-agent-runtime package:
npm install @aws-sdk/client-bedrock-agent-runtime
Configure the Knowledge Base provider by specifying kb in your provider ID. Note that the model ID needs to include the regional prefix (us., eu., or apac.):
providers:
- id: bedrock:kb:us.anthropic.claude-3-7-sonnet-20250219-v1:0
config:
region: 'us-east-2'
knowledgeBaseId: 'YOUR_KNOWLEDGE_BASE_ID'
temperature: 0.0
max_tokens: 1000
numberOfResults: 5 # Optional: number of chunks to retrieve (AWS default when not specified)
The provider ID follows this pattern: bedrock:kb:[REGIONAL_MODEL_ID]
For example:
bedrock:kb:us.anthropic.claude-3-5-sonnet-20241022-v2:0 (US region)bedrock:kb:eu.anthropic.claude-3-5-sonnet-20241022-v2:0 (EU region)Configuration options include:
knowledgeBaseId (required): The ID of your AWS Bedrock Knowledge Baseregion: AWS region where your Knowledge Base is deployed (e.g., 'us-east-1', 'us-east-2', 'eu-west-1')temperature: Controls randomness in response generation (default: 0.0)max_tokens: Maximum number of tokens in the generated responsenumberOfResults: Number of chunks to retrieve from the knowledge base (optional, uses AWS default when not specified)accessKeyId, secretAccessKey, sessionToken: AWS credentials (if not using environment variables or IAM roles)profile: AWS profile name for SSO authenticationHere's a complete example to test your Knowledge Base with a few questions:
prompts:
- 'What is the capital of France?'
- 'Tell me about quantum computing.'
providers:
- id: bedrock:kb:us.anthropic.claude-3-7-sonnet-20250219-v1:0
config:
region: 'us-east-2'
knowledgeBaseId: 'YOUR_KNOWLEDGE_BASE_ID'
temperature: 0.0
max_tokens: 1000
numberOfResults: 10
# Regular Claude model for comparison
- id: bedrock:us.anthropic.claude-3-5-sonnet-20241022-v2:0
config:
region: 'us-east-2'
temperature: 0.0
max_tokens: 1000
tests:
- description: 'Basic factual questions from the knowledge base'
The Knowledge Base provider returns both the generated response and citations from the source documents. These citations are included in the evaluation results and can be used to verify the accuracy of the responses.
:::info
When viewing evaluation results in the UI, citations appear in a separate section within the details view of each response. You can click on the source links to visit the original documents or copy citation content for reference.
:::
When using the Knowledge Base provider, the response will include:
retrievedReferences: References to source documents that informed the responsegeneratedResponsePart: Parts of the response that correspond to specific citationsThe Knowledge Base provider supports extracting context from citations for evaluation using the contextTransform feature:
tests:
- vars:
query: 'What is promptfoo?'
assert:
# Extract context from all citations
- type: context-faithfulness
contextTransform: |
if (!metadata?.citations) return '';
return metadata.citations
.flatMap(citation => citation.retrievedReferences || [])
.map(ref => ref.content?.text || '')
.filter(text => text.length > 0)
.join('\n\n');
threshold: 0.7
# Extract context from first citation only
- type: context-relevance
contextTransform: 'metadata?.citations?.[0]?.retrievedReferences?.[0]?.content?.text || ""'
threshold: 0.6
This approach allows you to:
See the Knowledge Base contextTransform example for complete configuration examples.
Amazon Bedrock Agents uses the reasoning of foundation models (FMs), APIs, and data to break down user requests, gathers relevant information, and efficiently completes tasks—freeing teams to focus on high-value work. For detailed information on testing and evaluating deployed agents, see the AWS Bedrock Agents Provider documentation.
Quick example:
providers:
- bedrock-agent:YOUR_AGENT_ID
config:
agentAliasId: PROD_ALIAS
region: us-east-1
enableTrace: true
AWS Bedrock supports video generation through asynchronous invoke APIs. Videos are generated in the cloud and output to an S3 bucket that you specify.
Generate videos using Luma Ray 2, which produces high-quality videos from text prompts or images.
Provider ID: bedrock:video:luma.ray-v2:0
:::note
Luma Ray 2 is currently available in us-west-2 region only.
:::
providers:
- id: bedrock:video:luma.ray-v2:0
config:
region: us-west-2
s3OutputUri: s3://my-bucket/luma-outputs/
| Option | Type | Default | Description |
|---|---|---|---|
s3OutputUri | string | - | Required. S3 bucket for video output |
duration | string | "5s" | Video duration: "5s" or "9s" |
resolution | string | "720p" | Output resolution: "540p" or "720p" |
aspectRatio | string | "16:9" | Aspect ratio (see supported ratios below) |
loop | boolean | false | Whether video should seamlessly loop |
startImage | string | - | Start frame image (file:// path or base64) |
endImage | string | - | End frame image (file:// path or base64) |
pollIntervalMs | number | 10000 | Polling interval in milliseconds |
maxPollTimeMs | number | 600000 | Maximum wait time (10 min default) |
downloadFromS3 | boolean | true | Download video from S3 after generation |
1:1 - Square16:9 - Widescreen (default)9:16 - Vertical/Portrait4:3 - Standard3:4 - Portrait standard21:9 - Ultrawide9:21 - Ultra-tallproviders:
- id: bedrock:video:luma.ray-v2:0
config:
region: us-west-2
s3OutputUri: s3://my-bucket/videos/
duration: '5s'
resolution: '720p'
aspectRatio: '16:9'
prompts:
- 'A majestic eagle soaring through clouds at golden hour'
tests:
- vars: {}
Animate images by providing start and/or end frames:
providers:
- id: bedrock:video:luma.ray-v2:0
config:
region: us-west-2
s3OutputUri: s3://my-bucket/videos/
startImage: file://./start-frame.jpg
endImage: file://./end-frame.jpg
duration: '5s'
prompts:
- 'Smooth transition with camera movement'
tests:
- vars: {}
Your AWS credentials need these IAM permissions:
{
"Statement": [
{
"Effect": "Allow",
"Action": ["bedrock:InvokeModel", "bedrock:GetAsyncInvoke", "bedrock:StartAsyncInvoke"],
"Resource": "arn:aws:bedrock:*:*:model/luma.ray-v2:0"
},
{
"Effect": "Allow",
"Action": ["s3:PutObject", "s3:GetObject"],
"Resource": "arn:aws:s3:::my-bucket/*"
}
]
}