site/docs/integrations/azure-pipelines.md
This guide demonstrates how to set up promptfoo with Azure Pipelines to run evaluations as part of your CI pipeline.
Create a new file named azure-pipelines.yml in the root of your repository with the following configuration:
trigger:
- main
- master # Include if you use master as your main branch
pool:
vmImage: 'ubuntu-latest'
variables:
npm_config_cache: $(Pipeline.Workspace)/.npm
steps:
- task: NodeTool@0
inputs:
versionSpec: '20.x'
displayName: 'Install Node.js'
- task: Cache@2
inputs:
key: 'npm | "$(Agent.OS)" | package-lock.json'
restoreKeys: |
npm | "$(Agent.OS)"
path: $(npm_config_cache)
displayName: 'Cache npm packages'
- script: |
npm ci
npm install -g promptfoo
displayName: 'Install dependencies'
- script: |
npx promptfoo eval
displayName: 'Run promptfoo evaluations'
env:
OPENAI_API_KEY: $(OPENAI_API_KEY)
ANTHROPIC_API_KEY: $(ANTHROPIC_API_KEY)
# Add other API keys as needed
- task: PublishTestResults@2
inputs:
testResultsFormat: 'JUnit'
testResultsFiles: 'promptfoo-results.xml'
mergeTestResults: true
testRunTitle: 'Promptfoo Evaluation Results'
condition: succeededOrFailed()
displayName: 'Publish test results'
- task: PublishBuildArtifacts@1
inputs:
pathtoPublish: 'promptfoo-results.json'
artifactName: 'promptfoo-results'
condition: succeededOrFailed()
displayName: 'Publish evaluation results'
Store your LLM provider API keys as secret pipeline variables in Azure DevOps:
OPENAI_API_KEY, ANTHROPIC_API_KEY)You can configure the pipeline to fail when promptfoo assertions don't pass by modifying the script step:
- script: |
npx promptfoo eval --fail-on-error
displayName: 'Run promptfoo evaluations'
env:
OPENAI_API_KEY: $(OPENAI_API_KEY)
If you want to customize where results are stored:
- script: |
npx promptfoo eval --output-path $(Build.ArtifactStagingDirectory)/promptfoo-results.json
displayName: 'Run promptfoo evaluations'
To run evaluations on pull requests, add a PR trigger:
trigger:
- main
- master
pr:
- main
- master
# Rest of pipeline configuration
Run promptfoo only when certain files change:
steps:
- task: NodeTool@0
inputs:
versionSpec: '20.x'
displayName: 'Install Node.js'
- script: |
npm ci
npm install -g promptfoo
displayName: 'Install dependencies'
- script: |
npx promptfoo eval
displayName: 'Run promptfoo evaluations'
condition: |
and(
succeeded(),
or(
eq(variables['Build.SourceBranch'], 'refs/heads/main'),
eq(variables['Build.Reason'], 'PullRequest')
),
or(
eq(variables['Build.Reason'], 'PullRequest'),
contains(variables['Build.SourceVersionMessage'], '[run-eval]')
)
)
env:
OPENAI_API_KEY: $(OPENAI_API_KEY)
Test across multiple configurations or models in parallel:
strategy:
matrix:
gpt:
MODEL: 'gpt-5.1'
claude:
MODEL: 'claude-sonnet-4-5-20250929'
steps:
- script: |
npx promptfoo eval --providers.0.config.model=$(MODEL)
displayName: 'Test with $(MODEL)'
env:
OPENAI_API_KEY: $(OPENAI_API_KEY)
ANTHROPIC_API_KEY: $(ANTHROPIC_API_KEY)
If you encounter issues with your Azure Pipelines integration:
If you're getting timeouts during evaluations, you may need to adjust the pipeline timeout settings or consider using a self-hosted agent for better stability with long-running evaluations.