site/docs/providers/azure.md
The azure provider enables you to use Azure OpenAI Service models with Promptfoo. It shares configuration settings with the OpenAI provider.
There are three ways to authenticate with Azure OpenAI:
Set the AZURE_API_KEY environment variable and configure your deployment:
providers:
- id: azure:chat:deploymentNameHere
config:
apiHost: 'xxxxxxxx.openai.azure.com'
Use an Azure Entra ID (formerly Azure AD) Service Principal instead of an API key. This is the recommended approach for production environments, CI/CD pipelines, and any scenario where you want to avoid managing API keys directly.
You'll need three values from your Service Principal's app registration in the Azure Portal:
Set them as environment variables:
export AZURE_CLIENT_ID="your-application-client-id"
export AZURE_CLIENT_SECRET="your-client-secret-value"
export AZURE_TENANT_ID="your-directory-tenant-id"
Or set them in the provider config (see full example below):
azureClientIdazureClientSecretazureTenantIdOptionally, you can also set:
AZURE_AUTHORITY_HOST / azureAuthorityHost (defaults to https://login.microsoftonline.com)AZURE_TOKEN_SCOPE / azureTokenScope (defaults to https://cognitiveservices.azure.com/.default)Then configure your deployment:
providers:
- id: azure:chat:deploymentNameHere
config:
apiHost: 'xxxxxxxx.openai.azure.com'
:::tip
The Service Principal must have the Cognitive Services OpenAI User role (or equivalent) assigned on your Azure OpenAI resource. You can assign this in the Azure Portal under your resource's Access control (IAM) blade.
:::
Authenticate with Azure CLI using az login before running promptfoo. This is the fallback option if the parameters for the previous options are not provided.
Optionally, you can also set:
AZURE_TOKEN_SCOPE / azureTokenScope (defaults to 'https://cognitiveservices.azure.com/.default')Then configure your deployment:
providers:
- id: azure:chat:deploymentNameHere
config:
apiHost: 'xxxxxxxx.openai.azure.com'
azure:chat:<deployment name> - For chat endpoints (e.g., gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, gpt-5, gpt-4o)azure:completion:<deployment name> - For completion endpoints (e.g., gpt-35-turbo-instruct)azure:embedding:<deployment name> - For embedding models (e.g., text-embedding-3-small, text-embedding-3-large)azure:responses:<deployment name> - For the Responses API (e.g., gpt-4.1, gpt-5.1)azure:assistant:<assistant id> - For Azure OpenAI Assistants (using Azure OpenAI API)azure:foundry-agent:<agent name or id> - For Azure AI Foundry Agents (using Azure AI Projects SDK)azure:video:<deployment name> - For video generation (Sora)Vision-capable GPT-5, GPT-4o, and GPT-4.1 deployments use the standard azure:chat: provider type.
Azure deployment availability changes frequently and varies by region. Check the Azure OpenAI model availability page for the current list of supported models and regions before creating new deployments.
Azure provides access to OpenAI models as well as third-party models through Azure AI Foundry (Microsoft Foundry).
| Category | Models |
|---|---|
| GPT-5 Series | gpt-5.4, gpt-5.4-pro, gpt-5.4-mini, gpt-5.4-nano, gpt-5, gpt-5-pro, gpt-5-mini, gpt-5-nano, gpt-5.1, gpt-5.1-chat, gpt-5.1-codex |
| GPT-4.1 Series | gpt-4.1, gpt-4.1-mini, gpt-4.1-nano |
| GPT-4o Series | gpt-4o, gpt-4o-mini, gpt-4o-realtime |
| Reasoning Models | o1, o1-mini, o1-pro, o3, o3-mini, o3-pro, o4-mini |
| Specialized | computer-use-preview, gpt-image-1, codex-mini-latest |
| Deep Research | o3-deep-research, o4-mini-deep-research |
| Embeddings | text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002 |
Azure AI Foundry provides access to models from multiple providers:
| Provider | Models |
|---|---|
| Anthropic Claude | claude-opus-4-7, claude-opus-4-6-20260205, claude-sonnet-4-6, claude-opus-4-5-20251101, claude-sonnet-4-5-20250929, claude-haiku-4-5-20251001, claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022 — see Using Claude Models for deployment and config details |
| Meta Llama | Llama-4-Scout-17B-16E-Instruct, Llama-4-Maverick-17B-128E-Instruct-FP8, Llama-3.3-70B-Instruct, Meta-Llama-3.1-405B-Instruct, Meta-Llama-3.1-70B-Instruct, Meta-Llama-3.1-8B-Instruct |
| DeepSeek | DeepSeek-R1 (reasoning), DeepSeek-V3, DeepSeek-R1-Distill-Llama-70B, DeepSeek-R1-Distill-Qwen-32B |
| Mistral | Mistral-Large-2411, Pixtral-Large-2411, Ministral-3B-2410, Mistral-Nemo-2407 |
| Cohere | Cohere-command-a-03-2025, command-r-plus-08-2024, command-r-08-2024 |
| Microsoft Phi | Phi-4, Phi-4-mini-instruct, Phi-4-reasoning, Phi-4-mini-reasoning |
| xAI Grok | grok-3, grok-3-mini, grok-3-reasoning, grok-3-mini-reasoning, grok-2-vision-1212 |
| AI21 | AI21-Jamba-1.5-Large, AI21-Jamba-1.5-Mini |
| Core42 | JAIS-70b-chat, Falcon3-7B-Instruct |
For the complete list of 200+ models with pricing, see the Azure model catalog.
The Azure OpenAI Responses API is a stateful API that brings together the best capabilities from chat completions and assistants API in one unified experience. It provides advanced features like MCP servers, code interpreter, and background tasks.
To use the Azure Responses API with promptfoo, use the azure:responses provider type:
providers:
# Using the azure:responses alias (recommended)
# Note: deployment name must match your Azure deployment, not the model name
- id: azure:responses:my-gpt-4-1-deployment
config:
temperature: 0.7
instructions: 'You are a helpful assistant.'
response_format: file://./response-schema.json
# For newer v1 API, use 'preview' (default)
# For legacy API, use specific version like '2025-04-01-preview'
apiVersion: 'preview'
# Or using openai:responses with Azure configuration (legacy method)
- id: openai:responses:gpt-4.1
config:
apiHost: 'your-resource.openai.azure.com'
apiKey: '{{ env.AZURE_API_KEY }}' # or set OPENAI_API_KEY env var
temperature: 0.7
instructions: 'You are a helpful assistant.'
The Responses API supports Azure deployments backed by current Azure OpenAI responses-capable models. Common examples include:
gpt-5.4, gpt-5.4-pro, gpt-5.4-mini, gpt-5.4-nano, gpt-5, gpt-5-mini, gpt-5-nano, gpt-5.1gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nanoo1, o1-mini, o1-pro, o3, o3-mini, o3-pro, o4-minicomputer-use-preview, gpt-image-1, codex-mini-latesto3-deep-research, o4-mini-deep-researchUse your Azure deployment name in promptfoo, even if it differs from the underlying model ID.
Load complex JSON schemas from external files for better organization:
providers:
- id: openai:responses:gpt-4.1
config:
apiHost: 'your-resource.openai.azure.com'
response_format: file://./schemas/response-schema.json
Example response-schema.json:
{
"type": "json_schema",
"name": "structured_output",
"schema": {
"type": "object",
"properties": {
"result": { "type": "string" },
"confidence": { "type": "number" }
},
"required": ["result", "confidence"],
"additionalProperties": false
}
}
You can also use nested file references for the schema itself:
{
"type": "json_schema",
"name": "structured_output",
"schema": "file://./schemas/output-schema.json"
}
Variable rendering is supported in file paths:
config:
response_format: file://./schemas/{{ schema_name }}.json
Instructions: Provide system-level instructions to guide model behavior:
config:
instructions: 'You are a helpful assistant specializing in technical documentation.'
Background Tasks: Enable asynchronous processing for long-running tasks:
config:
background: true
store: true
Chaining Responses: Chain multiple responses together for multi-turn conversations:
config:
previous_response_id: '{{previous_id}}'
MCP Servers: Connect to remote MCP servers for extended tool capabilities:
config:
tools:
- type: mcp
server_label: github
server_url: https://example.com/mcp-server
require_approval: never
headers:
Authorization: 'Bearer {{ env.MCP_API_KEY }}'
Code Interpreter: Enable code execution capabilities:
config:
tools:
- type: code_interpreter
container:
type: auto
Web Search: Enable web search capabilities:
config:
tools:
- type: web_search_preview
Image Generation: Use image generation with supported models:
config:
tools:
- type: image_generation
partial_images: 2 # For streaming partial images
Here's a comprehensive example using multiple Azure Responses API features:
# promptfooconfig.yaml
description: Azure Responses API evaluation
providers:
# Using the new azure:responses alias (recommended)
- id: azure:responses:gpt-4.1-deployment
label: azure-gpt-4.1
config:
temperature: 0.7
max_output_tokens: 2000
instructions: 'You are a helpful AI assistant.'
response_format: file://./response-format.json
tools:
- type: code_interpreter
container:
type: auto
- type: web_search_preview
metadata:
session: 'eval-001'
user: 'test-user'
store: true
# Reasoning model example
- id: azure:responses:o3-mini-deployment
label: azure-reasoning
config:
reasoning_effort: medium
max_completion_tokens: 4000
prompts:
- 'Analyze this data and provide insights: {{data}}'
- 'Write a Python function to solve: {{problem}}'
tests:
- vars:
data: 'Sales increased by 25% in Q3 compared to Q2'
assert:
- type: contains
value: 'growth'
- type: contains
value: '25%'
- vars:
problem: 'Calculate fibonacci sequence up to n terms'
assert:
- type: javascript
value: 'output.includes("def fibonacci") || output.includes("function fibonacci")'
- type: contains
value: 'recursive'
Streaming: Enable streaming for real-time output:
config:
stream: true
Parallel Tool Calls: Allow multiple tool calls in parallel:
config:
parallel_tool_calls: true
max_tool_calls: 5
Truncation: Configure how input is truncated when it exceeds limits:
config:
truncation: auto # or 'disabled'
Webhook URL: Set a webhook for async notifications:
config:
webhook_url: 'https://your-webhook.com/callback'
purpose: user_data requires workaround (use purpose: assistants)store: trueThe Azure OpenAI provider supports the following environment variables:
| Environment Variable | Config Key | Description | Required |
|---|---|---|---|
AZURE_API_KEY | apiKey | Your Azure OpenAI API key | No* |
AZURE_API_HOST | apiHost | API host | No |
AZURE_API_BASE_URL | apiBaseUrl | API base URL | No |
AZURE_BASE_URL | apiBaseUrl | Alternative API base URL | No |
AZURE_DEPLOYMENT_NAME | - | Default deployment name | Yes |
AZURE_CLIENT_ID | azureClientId | Azure AD application client ID | No* |
AZURE_CLIENT_SECRET | azureClientSecret | Azure AD application client secret | No* |
AZURE_TENANT_ID | azureTenantId | Azure AD tenant ID | No* |
AZURE_AUTHORITY_HOST | azureAuthorityHost | Azure AD authority host | No |
AZURE_TOKEN_SCOPE | azureTokenScope | Azure AD token scope | No |
* Either AZURE_API_KEY OR the combination of AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, and AZURE_TENANT_ID must be provided.
Note: For API URLs, you only need to set one of AZURE_API_HOST, AZURE_API_BASE_URL, or AZURE_BASE_URL. If multiple are set, the provider will use them in that order of preference.
If AZURE_DEPLOYMENT_NAME is set, it will be automatically used as the default deployment when no other provider is configured. This makes Azure OpenAI the default provider when:
OPENAI_API_KEY is not set)AZURE_DEPLOYMENT_NAME is setFor example, if you have these environment variables set:
AZURE_DEPLOYMENT_NAME=gpt-4o
AZURE_API_KEY=your-api-key
AZURE_API_HOST=your-host.openai.azure.com
Or these client credential environment variables:
AZURE_DEPLOYMENT_NAME=gpt-4o
AZURE_CLIENT_ID=your-client-id
AZURE_CLIENT_SECRET=your-client-secret
AZURE_TENANT_ID=your-tenant-id
AZURE_API_HOST=your-host.openai.azure.com
Then Azure OpenAI will be used as the default provider for all operations including:
Because embedding models are distinct from text generation models, to set a default embedding provider you must specify AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME.
Set this environment variable to the deployment name of your embedding model:
AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME=text-embedding-3-small
This deployment will automatically be used whenever embeddings are required, such as for similarity comparisons or dataset generation. You can also override the embedding provider in your configuration:
defaultTest:
options:
provider:
embedding:
id: azure:embedding:text-embedding-3-small-deployment
config:
apiHost: 'your-resource.openai.azure.com'
Note that any moderation tasks will still use the OpenAI API.
The YAML configuration can override environment variables and set additional parameters:
providers:
- id: azure:chat:deploymentNameHere
config:
apiHost: 'xxxxxxxx.openai.azure.com'
# Authentication (Option 1: API Key)
apiKey: 'your-api-key'
# Authentication (Option 2: Client Credentials)
azureClientId: 'your-azure-client-id'
azureClientSecret: 'your-azure-client-secret'
azureTenantId: 'your-azure-tenant-id'
azureAuthorityHost: 'https://login.microsoftonline.com' # Optional
azureTokenScope: 'https://cognitiveservices.azure.com/.default' # Optional
# OpenAI parameters
temperature: 0.5
max_tokens: 1024
:::tip
All other OpenAI provider environment variables and configuration properties are supported.
:::
If you want to authenticate with a Service Principal (SPN) instead of an API key, follow these steps.
Cognitive Services OpenAI User (or Cognitive Services Contributor) to the Service Principal on your Azure OpenAI resource. Go to your resource's Access control (IAM) > Add role assignment.@azure/identity package — promptfoo uses it to obtain tokens from Azure Entra ID:npm install @azure/identity
You can provide the Service Principal credentials via environment variables or directly in the YAML config.
Using environment variables (recommended for CI/CD and production):
export AZURE_CLIENT_ID="00000000-0000-0000-0000-000000000000" # Application (client) ID
export AZURE_CLIENT_SECRET="your-client-secret-value" # Client secret
export AZURE_TENANT_ID="00000000-0000-0000-0000-000000000000" # Directory (tenant) ID
providers:
- id: azure:chat:my-gpt-4o-deployment
config:
apiHost: 'your-resource.openai.azure.com'
Using inline config (useful for local testing):
providers:
- id: azure:chat:my-gpt-4o-deployment
config:
apiHost: 'your-resource.openai.azure.com'
azureClientId: '00000000-0000-0000-0000-000000000000'
azureClientSecret: 'your-client-secret-value'
azureTenantId: '00000000-0000-0000-0000-000000000000'
azureAuthorityHost: 'https://login.microsoftonline.com' # Optional
azureTokenScope: 'https://cognitiveservices.azure.com/.default' # Optional
When client credentials are provided, promptfoo uses the @azure/identity library to create a ClientSecretCredential and requests an access token scoped to Azure Cognitive Services (https://cognitiveservices.azure.com/.default). The token is then sent as a Bearer token in the Authorization header instead of an API key.
If neither an API key nor client credentials are provided, promptfoo falls back to AzureCliCredential (i.e., your az login session) — see Option 3.
The azureAuthorityHost defaults to https://login.microsoftonline.com if not specified. The azureTokenScope defaults to https://cognitiveservices.azure.com/.default, the scope required to authenticate with Azure Cognitive Services. You typically don't need to change these unless you're working with a sovereign cloud (e.g., Azure Government or Azure China).
Model-graded assertions such as factuality or llm-rubric use gpt-5 by default. When AZURE_DEPLOYMENT_NAME is set (and OPENAI_API_KEY is not), promptfoo automatically uses the specified Azure deployment for grading. You can also explicitly override the grader as shown below.
The easiest way to do this for all your test cases is to add the defaultTest property to your config:
defaultTest:
options:
provider:
id: azure:chat:gpt-4o-deployment
config:
apiHost: 'xxxxxxx.openai.azure.com'
However, you can also do this for individual assertions:
# ...
assert:
- type: llm-rubric
value: Do not mention that you are an AI or chat assistant
provider:
id: azure:chat:xxxx
config:
apiHost: 'xxxxxxx.openai.azure.com'
Or individual tests:
# ...
tests:
- vars:
# ...
options:
provider:
id: azure:chat:xxxx
config:
apiHost: 'xxxxxxx.openai.azure.com'
assert:
- type: llm-rubric
value: Do not mention that you are an AI or chat assistant
When you have tests that use both text-based assertions (like llm-rubric, answer-relevance) and embedding-based assertions (like similar), you can configure different Azure deployments for each type using the provider type map pattern:
defaultTest:
options:
provider:
# Text provider for llm-rubric, answer-relevance, factuality, etc.
text:
id: azure:chat:o4-mini-deployment
config:
apiHost: 'text-models.openai.azure.com'
# Embedding provider for similarity assertions
embedding:
id: azure:embedding:text-embedding-3-large
config:
apiHost: 'embedding-models.openai.azure.com'
The similar assertion type requires an embedding model such as text-embedding-3-large or text-embedding-3-small. Be sure to specify a deployment with an embedding model, not a chat model, when overriding the grader.
For example, override the embedding deployment in your config:
defaultTest:
options:
provider:
embedding:
id: azure:embedding:text-embedding-3-small-deployment
config:
apiHost: 'your-resource.openai.azure.com'
You may also specify data_sources to integrate with the Azure AI Search API.
providers:
- id: azure:chat:deploymentNameHere
config:
apiHost: 'xxxxxxxx.openai.azure.com'
deployment_id: 'abc123'
data_sources:
- type: azure_search
parameters:
endpoint: https://xxxxxxxx.search.windows.net
index_name: index123
authentication:
type: api_key
key: ''
:::note
For legacy Azure OpenAI API versions before 2024-02-15-preview, you can also specify deployment_id and dataSources, used to integrate with the Azure AI Search API.
providers:
- id: azure:chat:deploymentNameHere
config:
apiHost: 'xxxxxxxx.openai.azure.com'
deployment_id: 'abc123'
dataSources:
- type: AzureCognitiveSearch
parameters:
endpoint: '...'
key: '...'
indexName: '...'
:::
These properties can be set under the provider config key:
| Name | Description |
|---|---|
| apiHost | API host (e.g., yourresource.openai.azure.com) |
| apiBaseUrl | Base URL of the API (used instead of host) |
| apiKey | API key for authentication |
| apiVersion | API version. Use 2024-10-21 or newer for vision support |
| Name | Description |
|---|---|
| azureClientId | Azure identity client ID |
| azureClientSecret | Azure identity client secret |
| azureTenantId | Azure identity tenant ID |
| azureAuthorityHost | Azure identity authority host |
| azureTokenScope | Azure identity token scope |
| deployment_id | Azure cognitive services deployment ID |
| dataSources | Azure cognitive services parameter for specifying data sources |
| Name | Description |
|---|---|
| o1 | Set to true if your Azure deployment uses an o1 model. (Deprecated, use isReasoningModel instead) |
| isReasoningModel | Set to true if your Azure deployment uses a reasoning model (o1, o3, o3-mini, o4-mini). Required for reasoning models |
| max_completion_tokens | Maximum tokens to generate for reasoning models. Only used when isReasoningModel is true |
| reasoning_effort | Controls reasoning depth: 'low', 'medium', or 'high'. Only used when isReasoningModel is true |
| temperature | Controls randomness (0-2). Not supported for reasoning models |
| max_tokens | Maximum tokens to generate. Not supported for reasoning models |
| top_p | Controls nucleus sampling (0-1) |
| frequency_penalty | Penalizes repeated tokens (-2 to 2) |
| presence_penalty | Penalizes new tokens based on presence (-2 to 2) |
| omitDefaults | Omits hardcoded defaults unless values are explicitly set via config or environment variables. Supported by azure:chat and azure:responses. |
| best_of | Generates multiple outputs and returns the best |
| functions | Array of functions available for the model to call |
| function_call | Controls how the model calls functions |
| response_format | Specifies output format (e.g., { type: "json_object" }) |
| stop | Array of sequences where the model will stop generating |
| passthrough | Additional parameters to send with the request |
Azure OpenAI now supports reasoning models like o1, o3, o3-mini, and o4-mini. These models operate differently from standard models with specific requirements:
max_completion_tokens instead of max_tokenstemperature (it's ignored)reasoning_effort parameter ('low', 'medium', 'high')Since Azure allows custom deployment names that don't necessarily reflect the underlying model type, you must explicitly set the isReasoningModel flag to true in your configuration when using reasoning models. This works with both chat and completion endpoints:
# For chat endpoints
providers:
- id: azure:chat:my-o4-mini-deployment
config:
apiHost: 'xxxxxxxx.openai.azure.com'
# Set this flag to true for reasoning models (o1, o3, o3-mini, o4-mini)
isReasoningModel: true
# Use max_completion_tokens instead of max_tokens
max_completion_tokens: 25000
# Optional: Set reasoning effort (default is 'medium' unless omitDefaults is true)
reasoning_effort: 'medium'
# For completion endpoints
providers:
- id: azure:completion:my-o3-deployment
config:
apiHost: 'xxxxxxxx.openai.azure.com'
isReasoningModel: true
max_completion_tokens: 25000
reasoning_effort: 'high'
Note: The
o1flag is still supported for backward compatibility, butisReasoningModelis preferred as it more clearly indicates its purpose.
You can use variables in your configuration to dynamically adjust the reasoning effort based on your test cases:
# Configure different reasoning efforts based on test variables
prompts:
- 'Solve this complex math problem: {{problem}}'
providers:
- id: azure:chat:my-o4-mini-deployment
config:
apiHost: 'xxxxxxxx.openai.azure.com'
isReasoningModel: true
max_completion_tokens: 25000
# This will be populated from the test case variables
reasoning_effort: '{{effort_level}}'
tests:
- vars:
problem: 'What is the integral of x²?'
effort_level: 'low'
- vars:
problem: 'Prove the Riemann hypothesis'
effort_level: 'high'
If you encounter this error when using reasoning models:
API response error: unsupported_parameter Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.
This means you're using a reasoning model without setting the isReasoningModel flag. Update your config as shown above.
Azure OpenAI supports vision-capable models like GPT-5.1, GPT-4o, and GPT-4.1 for image analysis.
providers:
- id: azure:chat:gpt-4o
config:
apiHost: 'your-resource-name.openai.azure.com'
apiVersion: '2024-10-21' # or newer for vision support
Vision models require a specific message format. Images can be provided as:
file:// paths (automatically converted to base64)data:image/jpeg;base64,YOUR_DATAprompts:
- |
[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What do you see in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "{{image_url}}"
}
}
]
}
]
tests:
- vars:
image_url: https://example.com/image.jpg # URL
- vars:
image_url: file://assets/image.jpg # Local file (auto base64)
- vars:
image_url: data:image/jpeg;base64,/9j/4A... # Base64
See the Azure OpenAI example for a complete working example with image analysis. Use promptfooconfig.vision.yaml for vision-specific features.
Azure AI Foundry exposes Claude through two endpoint families. Pick the one that matches how you want to manage the model.
Per Anthropic's own Foundry integration, every Claude deployment publishes a native Messages endpoint at https://<resource>.services.ai.azure.com/anthropic/v1/messages. Point promptfoo's anthropic:messages provider at that base URL and you get the full Anthropic provider feature set — adaptive thinking, xhigh effort, automatic Opus 4.7 temperature suppression, and consistent pricing across Anthropic/Bedrock/Vertex:
providers:
- id: anthropic:messages:claude-opus-4-7
config:
apiBaseUrl: 'https://<resource>.services.ai.azure.com/anthropic'
apiKey: '{{env.AZURE_FOUNDRY_API_KEY}}'
max_tokens: 1024
Promptfoo appends /v1/messages to the base URL automatically, so set apiBaseUrl to the https://…/anthropic prefix shown above.
The same deployment also accepts OpenAI-style chat completion requests. Use this if you want a single provider type across Azure Claude and Azure OpenAI deployments:
providers:
- id: azure:chat:claude-opus-4-7
config:
apiHost: 'your-deployment.services.ai.azure.com'
apiVersion: '2025-04-01-preview'
max_tokens: 4096
Opus 4.7 deployments automatically omit temperature from the request body on this path too.
Available Claude deployments on Azure AI Foundry:
| Model | Description |
|---|---|
claude-opus-4-7 | Claude Opus 4.7 |
claude-opus-4-6-20260205 | Claude Opus 4.6 |
claude-sonnet-4-6 | Claude Sonnet 4.6 |
claude-opus-4-5-20251101 | Claude Opus 4.5 |
claude-sonnet-4-5-20250929 | Claude Sonnet 4.5 |
claude-haiku-4-5-20251001 | Claude Haiku 4.5 |
claude-3-5-sonnet-20241022 | Claude 3.5 Sonnet |
claude-3-5-haiku-20241022 | Claude 3.5 Haiku |
description: Azure Claude evaluation
providers:
- id: anthropic:messages:claude-opus-4-7
label: claude-opus-4-7
config:
apiBaseUrl: 'https://<resource>.services.ai.azure.com/anthropic'
apiKey: '{{env.AZURE_FOUNDRY_API_KEY}}'
max_tokens: 1024
effort: xhigh
prompts:
- 'Explain {{concept}} in simple terms.'
tests:
- vars:
concept: quantum computing
assert:
- type: contains-any
value: ['qubit', 'superposition']
Azure AI Foundry provides access to Meta's Llama models, including Llama 4:
providers:
- id: azure:chat:Llama-4-Maverick-17B-128E-Instruct-FP8
config:
apiHost: 'your-deployment.services.ai.azure.com'
apiVersion: '2025-04-01-preview'
max_tokens: 4096
Available Llama models include:
Llama-4-Maverick-17B-128E-Instruct-FP8 - Llama 4 Maverick (128 experts)Llama-4-Scout-17B-16E-Instruct - Llama 4 Scout (16 experts)Llama-3.3-70B-Instruct - Llama 3.3 70BMeta-Llama-3.1-405B-Instruct - Llama 3.1 405BMeta-Llama-3.1-70B-Instruct - Llama 3.1 70BMeta-Llama-3.1-8B-Instruct - Llama 3.1 8BAzure AI supports DeepSeek models such as DeepSeek-R1. Like other reasoning models, these require specific configuration:
isReasoningModel: truemax_completion_tokens instead of max_tokensproviders:
- id: azure:chat:DeepSeek-R1
config:
apiHost: 'your-deployment-name.services.ai.azure.com'
apiVersion: '2025-04-01-preview'
isReasoningModel: true
max_completion_tokens: 2048
reasoning_effort: 'medium' # Options: low, medium, high
For model-graded assertions, you can configure your defaultTest to use the same provider:
defaultTest:
options:
provider:
id: azure:chat:DeepSeek-R1
config:
apiHost: 'your-deployment-name.services.ai.azure.com'
apiVersion: '2025-04-01-preview'
isReasoningModel: true
max_completion_tokens: 2048
Adjust reasoning_effort to control response quality vs. speed: low for faster responses, medium for balanced performance (default), or high for more thorough reasoning on complex tasks.
To evaluate an OpenAI assistant on Azure:
@azure/openai-assistants package:npm i @azure/openai-assistants
providers:
- id: azure:assistant:asst_E4GyOBYKlnAzMi19SZF2Sn8I
config:
apiHost: yourdeploymentname.openai.azure.com
Replace the assistant ID and deployment name with your actual values.
Azure OpenAI Assistants support tool calling. Define tool schemas via tools and provide callback implementations via functionToolCallbacks to handle invocations.
providers:
- id: azure:assistant:your_assistant_id
config:
apiHost: your-resource-name.openai.azure.com
# Load function tool definition
tools: file://tools/weather-function.json
# Define function callback inline
functionToolCallbacks:
# Use an external file
get_weather: file://callbacks/weather.js:getWeather
# Or use an inline function
get_weather: |
async function(args) {
try {
const parsedArgs = JSON.parse(args);
const location = parsedArgs.location;
const unit = parsedArgs.unit || 'celsius';
// Function implementation...
return JSON.stringify({
location,
temperature: 22,
unit,
condition: 'sunny'
});
} catch (error) {
return JSON.stringify({ error: String(error) });
}
}
Azure OpenAI Assistants support vector stores for enhanced file search capabilities. To use a vector store:
providers:
- id: azure:assistant:your_assistant_id
config:
apiHost: your-resource-name.openai.azure.com
# Add tools for file search
tools:
- type: file_search
# Configure vector store IDs
tool_resources:
file_search:
vector_store_ids:
- 'your_vector_store_id'
# Optional parameters
temperature: 1
top_p: 1
apiVersion: '2025-04-01-preview'
Key requirements:
type: file_searchtool_resources.file_search.vector_store_ids array with your vector store IDsapiVersion (recommended: 2025-04-01-preview or later)Here's an example of a simple full assistant eval:
prompts:
- 'Write a tweet about {{topic}}'
providers:
- id: azure:assistant:your_assistant_id
config:
apiHost: your-resource-name.openai.azure.com
tests:
- vars:
topic: bananas
For complete working examples of Azure OpenAI Assistants with various tool configurations, check out the Azure Assistant example directory.
See the guide on How to evaluate OpenAI assistants for more information on how to compare different models, instructions, and more.
Azure AI Foundry Agents let promptfoo run an existing Foundry agent through the Azure AI Projects SDK (@azure/ai-projects) and the v2 agent runtime. Promptfoo resolves the agent from your Azure AI Foundry project, then calls the Responses API with an agent_reference.
| Feature | Azure Assistant | Azure Foundry Agent |
|---|---|---|
| API Type | Direct HTTP calls to Azure OpenAI API | Azure AI Projects SDK (@azure/ai-projects) + Responses API agent runtime |
| Authentication | API key or Azure credentials | DefaultAzureCredential (Azure CLI, environment variables, managed identity) |
| Endpoint | Azure OpenAI endpoint (*.openai.azure.com) | Azure AI Project URL (*.services.ai.azure.com/api/projects/*) |
| Provider Format | azure:assistant:<assistant_id> | azure:foundry-agent:<agent-name-or-id> |
| Execution Model | Threads/messages/runs | responses.create(..., { body: { agent: { name, type: "agent_reference" } } }) |
npm install @azure/ai-projects @azure/identity
Authenticate using one of these methods:
az loginSet your Azure AI Project URL:
export AZURE_AI_PROJECT_URL="https://your-project.services.ai.azure.com/api/projects/your-project-id"
Alternatively, you can provide the projectUrl in your configuration file.
The provider uses the azure:foundry-agent:<agent-name-or-id> format. Agent names are preferred. Legacy IDs still work as a fallback lookup if the agent exists in the project.
providers:
- id: azure:foundry-agent:my-foundry-agent
config:
projectUrl: 'https://your-project.services.ai.azure.com/api/projects/your-project-id'
temperature: 0.7
max_tokens: 150
instructions: 'You are a helpful assistant that provides clear and concise answers.'
This provider references an existing Foundry agent. Some settings can still be sent per request through the Responses API, while agent-definition settings need to be configured on the agent itself.
Supported per-request settings:
| Parameter | Description |
|---|---|
projectUrl | Azure AI Project URL (required, can also use AZURE_AI_PROJECT_URL env var) |
instructions | Additional per-request instructions |
temperature | Controls randomness |
top_p | Nucleus sampling parameter |
max_tokens | Mapped to max_output_tokens for the Responses API |
max_completion_tokens | Also mapped to max_output_tokens |
response_format | Output format (json_object or json_schema) |
tools | Tool definitions loaded into the request |
tool_choice | Tool selection strategy |
functionToolCallbacks | Callback implementations for function_call outputs |
modelName | Optional per-request model override |
reasoning_effort | Sent as reasoning.effort |
verbosity | Passed through to the Responses text config |
metadata | Request metadata |
passthrough | Additional raw Responses API fields |
maxPollTimeMs | Maximum time to keep resolving callback loops before timing out (default: 300000) |
Ignored per-request settings:
tool_resourcesfrequency_penaltypresence_penaltyseedstoptimeoutMsretryOptionsConfigure those on the Foundry agent definition itself instead of on the eval request.
Promptfoo can handle Responses API function_call outputs for Foundry agents. If every requested function has a configured callback, promptfoo executes the callbacks locally and sends function_call_output items back with previous_response_id.
You can define callbacks at the provider level or override them per prompt:
providers:
- id: azure:foundry-agent:my-foundry-agent
config:
projectUrl: 'https://your-project.services.ai.azure.com/api/projects/your-project-id'
tools: file://tools/weather-function.json
functionToolCallbacks:
get_current_weather: file://callbacks/weather.js:getCurrentWeather
get_forecast: |
async function(args) {
try {
const parsedArgs = JSON.parse(args);
const location = parsedArgs.location;
const days = parsedArgs.days || 7;
// Your implementation here
return JSON.stringify({
location,
forecast: [
{ day: 'Monday', temperature: 72, condition: 'sunny' },
{ day: 'Tuesday', temperature: 68, condition: 'cloudy' }
]
});
} catch (error) {
return JSON.stringify({ error: String(error) });
}
}
The function callbacks receive two parameters:
args: JSON-encoded function argumentscontext: { threadId, runId, assistantId, provider }If a callback is missing, promptfoo returns the unresolved function call in the model output instead of trying to fake a tool result.
Foundry agent tools such as file search and vector stores should be configured on the agent in Azure AI Foundry. The v2 runtime does not apply tool_resources from the eval request.
providers:
- id: azure:foundry-agent:my-foundry-agent
config:
projectUrl: 'https://your-project.services.ai.azure.com/api/projects/your-project-id'
tools:
- type: file_search
temperature: 1
top_p: 1
In that example, the request tells the runtime that file search is available, but the actual vector store bindings still need to live on the Foundry agent definition.
| Variable | Description |
|---|---|
AZURE_AI_PROJECT_URL | Your Azure AI Project URL (can be overridden in config) |
AZURE_CLIENT_ID | Azure service principal client ID (for service principal auth) |
AZURE_CLIENT_SECRET | Azure service principal secret (for service principal auth) |
AZURE_TENANT_ID | Azure tenant ID (for service principal auth) |
Here's a complete example configuration:
description: 'Azure Foundry Agent evaluation'
providers:
- id: azure:foundry-agent:my-foundry-agent
config:
projectUrl: 'https://my-project.services.ai.azure.com/api/projects/my-project-id'
temperature: 0.7
max_tokens: 150
instructions: 'You are a helpful assistant that provides clear and concise answers.'
prompts:
- '{{question}}'
tests:
- vars:
question: 'What is the capital of France?'
assert:
- type: contains
value: 'Paris'
- vars:
question: 'Explain what photosynthesis is in simple terms.'
assert:
- type: contains
value: 'plants'
- type: contains
value: 'sunlight'
The Azure Foundry Agent provider includes comprehensive error handling:
maxPollTimeMsThe provider supports caching to improve performance and reduce API calls. Results are cached based on:
Caching is enabled by default. To explicitly configure it in your configuration:
evaluateOptions:
cache: true
providers:
- id: azure:foundry-agent:my-foundry-agent
config:
projectUrl: 'https://your-project.services.ai.azure.com/api/projects/your-project-id'
Use Azure Foundry Agents when:
DefaultAzureCredential)Use standard Azure Assistants when:
For complete working examples, check out the Azure Foundry Agent example directory.
Azure AI Foundry provides access to OpenAI's Sora video generation model for text-to-video and image-to-video generation.
eastus2 or swedencentral)providers:
- id: azure:video:sora
config:
apiBaseUrl: https://your-resource.cognitiveservices.azure.com
# Authentication (choose one):
apiKey: ${AZURE_API_KEY} # Or use AZURE_API_KEY env var
# Or use Entra ID (DefaultAzureCredential)
# Video parameters
width: 1280 # 480, 720, 854, 1080, 1280, 1920
height: 720 # 480, 720, 1080
n_seconds: 5 # 5, 10, 15, 20
# Polling
poll_interval_ms: 10000
max_poll_time_ms: 600000
| Size | Aspect Ratio |
|---|---|
| 480x480 | 1:1 (Square) |
| 720x720 | 1:1 (Square) |
| 1080x1080 | 1:1 (Square) |
| 854x480 | 16:9 (Landscape) |
| 1280x720 | 16:9 (Landscape) |
| 1920x1080 | 16:9 (Landscape) |
providers:
- azure:video:sora
prompts:
- 'A serene Japanese garden with koi fish swimming in a pond'
tests:
- vars: {}
assert:
- type: is-video
| Variable | Description |
|---|---|
AZURE_API_KEY | Azure API key |
AZURE_API_BASE_URL | Resource endpoint URL |
AZURE_CLIENT_ID | Entra ID client ID (for service principal auth) |
AZURE_CLIENT_SECRET | Entra ID client secret (for service principal auth) |
AZURE_TENANT_ID | Entra ID tenant ID (for service principal auth) |