examples/azure-mai/README.md
You can run this example with:
npx promptfoo@latest init --example azure-mai
cd azure-mai
Evaluate Microsoft's first-party MAI models with promptfoo. This example focuses on image generation with MAI-Image-2.5 (a Preview Foundry Model sold by Azure) via the dedicated azure:image provider, and shows how to wire reasoning/chat MAI models via azure:chat — though those have limited availability today (see Notes).
This example requires:
AZURE_API_HOST — your Foundry resource endpoint, e.g. your-resource.services.ai.azure.comAZURE_API_KEY — a resource key (or authenticate with az login for Microsoft Entra ID)# 1. Deploy an MAI image model to a Microsoft Foundry (AIServices) resource
az cognitiveservices account deployment create \
--name <RESOURCE> --resource-group <RG> \
--deployment-name mai-image-2-5 \
--model-name MAI-Image-2.5 --model-format Microsoft \
--model-version 2026-06-02 --sku-name GlobalStandard --sku-capacity 1
# 2. Point promptfoo at the resource
export AZURE_API_HOST=<RESOURCE>.services.ai.azure.com
export AZURE_API_KEY=<key>
# 3. Run the eval (images cost ~$0.03 each, so disable caching for fresh runs)
promptfoo eval --no-cache
# 4. View the generated images
promptfoo view
promptfooconfig.yaml — generates images with MAI-Image-2.5 (Preview) through the Microsoft azure:image provider, reporting per-image token usage and cost from the API's token countspromptfooconfig.vision-judge.yaml — uses a vision LLM as a judge on the generated images (see below)azure:chat (these are deprecated/private-preview today — see Notes)azure:image)providers:
- id: azure:image:mai-image-2-5
config:
model: MAI-Image-2.5 # cost-reporting id (deployment names can't contain dots)
width: 1024 # min 768; width * height <= 1,048,576
height: 1024
The image is returned as a base64 PNG. promptfoo stores large base64 media as a blob reference (promptfoo://blob/...), so assertions should accept either the blob ref or an inline data:image/... URL.
azure:chat)MAI-Thinking-1 and MAI-DS-R1 are reasoning models (auto-detected by name — promptfoo sends max_completion_tokens and drops temperature). They aren't deployable on most subscriptions today (MAI-DS-R1 is deprecated; MAI-Thinking-1 / MAI-Code-1-Flash are private preview and not in the public CLI catalog), so the chat provider in promptfooconfig.yaml is commented out. Uncomment it once your subscription can deploy one:
providers:
- id: azure:chat:mai-thinking-1
config:
max_completion_tokens: 2048
# omitDefaults: true # if the deployment rejects top_p / penalties
promptfooconfig.vision-judge.yaml grades each generated image with a vision-capable LLM. It uses a custom rubricPrompt that passes the image to the grader as an image_url block, so the judge evaluates the actual picture rather than a text description.
Run it with inline media so {{output}} is a base64 data URL the grader can read:
PROMPTFOO_INLINE_MEDIA=true promptfoo eval -c promptfooconfig.vision-judge.yaml --no-cache
Why inline media? With promptfoo's default media handling, an image output is stored as a
promptfoo://blob/...reference, which a hosted grader's API can't fetch.PROMPTFOO_INLINE_MEDIA=truekeeps the output as an inline data URL the vision model can read directly.
The MAI image models are currently Preview. The MAI text models have limited availability: MAI-DS-R1 is deprecated in the Azure catalog, and MAI-Thinking-1 / MAI-Code-1-Flash are in private preview and aren't yet in the public CLI catalog. Run az cognitiveservices model list --location <region> to see what your subscription can actually deploy. See the Azure provider docs for details.