site/docs/providers/xai.md
The xai provider supports xAI's Grok models through an API interface compatible with OpenAI's format, including text, vision, image generation, video generation, and voice workflows.
To use xAI's API, set the XAI_API_KEY environment variable or specify via apiKey in the configuration file.
export XAI_API_KEY=your_api_key_here
When xAI is the selected fallback provider family, Promptfoo can use xAI defaults for grading, suggestions, synthesis, and web search. xAI does not currently expose a public embeddings or moderation API, so those defaults fall back to OpenAI when xAI is selected. Explicit provider IDs in your config still take precedence.
The xAI provider includes support for the following model formats. xAI's public model catalog currently recommends grok-4.3 for general chat and coding workloads; consult the catalog when choosing a new default for a long-lived integration.
xai:grok-4.3 - General-purpose reasoning modelxai:grok-4.3-latest - Alias for the Grok 4.3 familyxai:grok-4.20-0309-reasoning - Reasoning modelxai:grok-4.20 - Alias for the Grok 4.20 reasoning familyxai:grok-4.20-reasoning - Alias for the Grok 4.20 reasoning familyxai:grok-4.20-reasoning-latest - Alias for the Grok 4.20 reasoning familyxai:grok-4.20-0309-non-reasoning - Non-reasoning variantxai:grok-4.20-non-reasoning - Alias for the Grok 4.20 non-reasoning familyxai:grok-4.20-non-reasoning-latest - Alias for the Grok 4.20 non-reasoning familyxai:grok-4.20-multi-agent-0309 - Multi-agent variantxai:grok-4.20-multi-agent - Alias for the Grok 4.20 multi-agent familyxai:grok-4.20-multi-agent-latest - Alias for the Grok 4.20 multi-agent familyxai:grok-4-1-fast-reasoning - Frontier model optimized for agentic tool calling with reasoning (2M context)xai:grok-4-1-fast-non-reasoning - Fast variant for instant responses without reasoning (2M context)xai:grok-4-1-fast - Alias for grok-4-1-fast-reasoningxai:grok-4-1-fast-reasoning-latest - Alias for grok-4-1-fast-reasoningxai:grok-4-1-fast-non-reasoning-latest - Alias for grok-4-1-fast-non-reasoningxai:grok-code-fast-1 - Speedy and economical reasoning model optimized for agentic coding (256K context)xai:grok-code-fast - Alias for grok-code-fast-1xai:grok-code-fast-1-0825 - Specific version of the code-fast model (256K context)xai:grok-4-fast-reasoning - Fast reasoning model with 2M context windowxai:grok-4-fast-non-reasoning - Fast non-reasoning model for instant responses (2M context)xai:grok-4-fast - Alias for grok-4-fast-reasoningxai:grok-4-fast-reasoning-latest - Alias for grok-4-fast-reasoningxai:grok-4-fast-non-reasoning-latest - Alias for grok-4-fast-non-reasoningxai:grok-4-0709 - Flagship reasoning model (256K context)xai:grok-4 - Alias for latest Grok-4 modelxai:grok-4-latest - Alias for latest Grok-4 modelxai:grok-3-beta - Latest flagship model for enterprise tasks (131K context)xai:grok-3-fast-beta - Fastest flagship model (131K context)xai:grok-3-mini-beta - Smaller model for basic tasks, supports reasoning effort (32K context)xai:grok-3-mini-fast-beta - Faster mini model, supports reasoning effort (32K context)xai:grok-3 - Alias for grok-3-betaxai:grok-3-latest - Alias for grok-3-betaxai:grok-3-fast - Alias for grok-3-fast-betaxai:grok-3-fast-latest - Alias for grok-3-fast-betaxai:grok-3-mini - Alias for grok-3-mini-betaxai:grok-3-mini-latest - Alias for grok-3-mini-betaxai:grok-3-mini-fast - Alias for grok-3-mini-fast-betaxai:grok-3-mini-fast-latest - Alias for grok-3-mini-fast-betaxai:grok-2-latest - Latest Grok-2 model (131K context)xai:grok-2-vision-latest - Latest Grok-2 vision model (32K context)xai:grok-2-vision-1212xai:grok-2-1212xai:grok-beta - Beta version (131K context)xai:grok-vision-beta - Vision beta version (8K context)You can also use specific versioned models:
xai:grok-2-1212xai:grok-2-vision-1212The provider supports all OpenAI provider configuration options plus Grok-specific options. Example usage:
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
- id: xai:grok-3-mini-beta
config:
temperature: 0.7
reasoning_effort: 'high' # Only for grok-3-mini models
apiKey: your_api_key_here # Alternative to XAI_API_KEY
Multiple Grok models support reasoning capabilities:
Grok 4.3: General-purpose reasoning model recommended by xAI's public model catalog. It reasons automatically and does not support reasoning_effort.
Grok Code Fast Models: The grok-code-fast-1 family are reasoning models optimized for agentic coding workflows. They support:
search_parametersGrok 4.3 is the best starting point for general text workflows:
xai:responses:grok-4.3 for server-side tools, multi-turn state, and newer xAI capabilitiesreasoning_effort parameter is required or supportedpresence_penalty, frequency_penalty, stop, reasoning_effort)# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
- id: xai:grok-4.3
config:
temperature: 0.7
max_completion_tokens: 4096
Grok-3 Models: The grok-3-mini-beta and grok-3-mini-fast-beta models support reasoning through the reasoning_effort parameter:
reasoning_effort: "low" - Minimal thinking time, using fewer tokens for quick responsesreasoning_effort: "high" - Maximum thinking time, leveraging more tokens for complex problems:::info
For Grok-3, reasoning is only available for the mini variants. The standard grok-3-beta and grok-3-fast-beta models do not support reasoning.
:::
Grok 4.1 Fast is xAI's frontier model specifically optimized for agentic tool calling:
grok-4-1-fast-reasoning for maximum intelligence, grok-4-1-fast-non-reasoning for instant responses# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
- id: xai:grok-4-1-fast-reasoning
config:
temperature: 0.7
max_completion_tokens: 4096
Grok-4 Fast models offer the same capabilities as Grok-4 but with faster inference and lower cost:
grok-4-fast-reasoning for reasoning tasks, grok-4-fast-non-reasoning for instant responses# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
- id: xai:grok-4-fast-reasoning
config:
temperature: 0.7
max_completion_tokens: 4096
Grok-4 introduces significant changes compared to previous Grok models:
reasoning_effort parameter: Unlike Grok-3 mini models, Grok-4 does not support the reasoning_effort parameterpresencePenalty / presence_penaltyfrequencyPenalty / frequency_penaltystopmax_completion_tokens: As a reasoning model, Grok-4 uses max_completion_tokens instead of max_tokens# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
- id: xai:grok-4
config:
temperature: 0.7
max_completion_tokens: 4096
The Grok Code Fast models are optimized for agentic coding workflows and offer several key features:
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
- id: xai:grok-code-fast-1
# or use the alias:
# - id: xai:grok-code-fast
config:
temperature: 0.1 # Lower temperature often preferred for coding tasks
max_completion_tokens: 4096
search_parameters:
mode: auto # Enable web search for coding assistance
You can specify a region to use a region-specific API endpoint:
providers:
- id: xai:grok-4.3
config:
region: eu-west-1 # Will use https://eu-west-1.api.x.ai/v1
This is equivalent to setting base_url="https://eu-west-1.api.x.ai/v1" in the Python client. The same region option is also accepted by the xAI image, video, Responses, and realtime voice providers.
xAI's public regional docs say the global endpoint automatically routes requests and gives access to every model available to your team. The current public model catalog shows xAI's language, Grok Imagine image, and Grok Imagine video models in both us-east-1 and eu-west-1. Regional endpoints are useful for data-residency requirements, but xAI warns that not every model is guaranteed in every region over time; for the latest region-by-region availability, use the xAI Console or the model pages on xAI's site.
:::warning
xAI's current documentation recommends the Responses API for server-side tools. Promptfoo still passes legacy search_parameters through for older configs, but new search configs should use the Agent Tools API.
:::
Legacy configs can still pass a search_parameters object. The mode field controls how search is used:
off – Disable searchauto – Model decides when to search (default)on – Always perform live searchAdditional fields like sources, from_date, to_date, and return_citations may also be provided.
providers:
- id: xai:grok-3-beta
config:
search_parameters:
mode: auto
return_citations: true
sources:
- type: web
For a full list of options see the xAI documentation.
Use the xai:responses:<model> provider to access xAI's Agent Tools API, which enables autonomous server-side tool execution for web search, X search, code execution, collections search, and remote MCP tools.
providers:
- id: xai:responses:grok-4.3
config:
temperature: 0.7
max_output_tokens: 4096
tools:
- type: web_search
- type: x_search
| Tool | Description |
|---|---|
web_search | Search the web and browse pages |
x_search | Search X posts, users, and threads |
code_execution / code_interpreter | Execute Python code in a sandbox |
collections_search / file_search | Search uploaded knowledge bases |
mcp | Connect to remote MCP servers |
tools:
- type: web_search
filters:
allowed_domains:
- example.com
- news.com
# OR excluded_domains (cannot use both)
enable_image_understanding: true
tools:
- type: x_search
from_date: '2025-01-01' # ISO8601 format
to_date: '2025-11-27'
allowed_x_handles:
- elonmusk
enable_image_understanding: true
enable_video_understanding: true
tools:
- type: code_interpreter
container:
pip_packages:
- numpy
- pandas
providers:
- id: xai:responses:grok-4.3
config:
temperature: 0.7
tools:
- type: web_search
enable_image_understanding: true
- type: x_search
from_date: '2025-01-01'
- type: code_interpreter
container:
pip_packages:
- numpy
tool_choice: auto # auto, required, or none
parallel_tool_calls: true
tests:
- vars:
question: What's the latest AI news? Search the web and X.
assert:
- type: contains
value: AI
| Parameter | Type | Description |
|---|---|---|
temperature | number | Sampling temperature (0-2) |
max_output_tokens | number | Maximum tokens to generate |
max_tool_calls | number | Maximum tool calls for one request |
top_p | number | Nucleus sampling parameter |
tools | array | Agent tools to enable |
tool_choice | string | Tool selection mode: auto, required, none |
parallel_tool_calls | boolean | Allow parallel tool execution |
stream | boolean | Request streamed response deltas |
instructions | string | System-level instructions |
previous_response_id | string | For multi-turn conversations |
store | boolean | Store response for later retrieval |
include | array | Additional response data to return |
reasoning | object | Multi-agent configuration where supported |
response_format | object | JSON schema for structured output |
The Responses API works with current Grok models, including:
grok-4.3grok-4.20-reasoninggrok-4.20-non-reasoninggrok-4.20-multi-agentgrok-4-1-fast-reasoning (recommended for agentic workflows)grok-4-1-fast-non-reasoninggrok-4-fast-reasoninggrok-4-fast-non-reasoninggrok-4If you're using Live Search via search_parameters, migrate to the Responses API:
Before (Live Search - deprecated):
providers:
- id: xai:grok-4-1-fast-reasoning
config:
search_parameters:
mode: auto
sources:
- type: web
- type: x
After (Responses API):
providers:
- id: xai:responses:grok-4.3
config:
tools:
- type: web_search
- type: x_search
:::info Not Yet Supported
xAI offers Deferred Chat Completions for long-running requests that can be retrieved asynchronously via a request_id. This feature is not yet supported in promptfoo. For async workflows, use the xAI Python SDK directly.
:::
xAI supports standard OpenAI-compatible function calling for client-side tool execution:
providers:
- id: xai:grok-4-1-fast-reasoning
config:
tools:
- type: function
function:
name: get_weather
description: Get the current weather for a location
parameters:
type: object
properties:
location:
type: string
description: City and state
required:
- location
xAI supports structured outputs via JSON schema:
providers:
- id: xai:grok-4
config:
response_format:
type: json_schema
json_schema:
name: analysis_result
strict: true
schema:
type: object
properties:
summary:
type: string
confidence:
type: number
required:
- summary
- confidence
additionalProperties: false
You can also load schemas from external files:
config:
response_format: file://./schemas/analysis-schema.json
Nested file references and variable rendering are supported (see OpenAI documentation for details).
For models with vision capabilities, you can include images in your prompts using the same format as OpenAI. Create a prompt.yaml file:
- role: user
content:
- type: image_url
image_url:
url: '{{image_url}}'
detail: 'high'
- type: text
text: '{{question}}'
Then reference it in your promptfoo config:
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
prompts:
- file://prompt.yaml
providers:
- id: xai:grok-2-vision-latest
tests:
- vars:
image_url: 'https://example.com/image.jpg'
question: "What's in this image?"
xAI does not currently expose a public embeddings API. Use the OpenAI provider (or another embedding provider) for similarity assertions.
xAI also supports image generation through Grok Imagine:
providers:
- xai:image:grok-imagine-image
Current Grok Imagine image model IDs include:
xai:image:grok-imagine-imagexai:image:grok-imagine-image-proExample configuration for image generation:
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
prompts:
- 'A {{style}} painting of {{subject}}'
providers:
- id: xai:image:grok-imagine-image
config:
n: 1 # Number of images to generate (1-10)
response_format: 'url' # 'url' or 'b64_json'
aspect_ratio: '16:9'
resolution: '2k'
tests:
- vars:
style: 'impressionist'
subject: 'sunset over mountains'
Use the same provider with image, images, or mask inputs to call xAI's image-editing endpoint:
providers:
- id: xai:image:grok-imagine-image
config:
image:
url: 'https://example.com/source.png'
mask:
url: 'https://example.com/mask.png'
quality: 'high'
prompts:
- 'Render this as a pencil sketch with detailed shading'
xAI supports video generation through the Grok Imagine API using the xai:video:grok-imagine-video provider:
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
prompts:
- 'Generate a video of: {{scene}}'
providers:
- id: xai:video:grok-imagine-video
config:
duration: 5 # 1-15 seconds
aspect_ratio: '16:9'
resolution: '720p'
tests:
- vars:
scene: a cat playing with yarn
assert:
- type: cost
threshold: 1.0
| Option | Type | Default | Description |
|---|---|---|---|
duration | number | 8 | Video length in seconds (1-15) |
aspect_ratio | string | 16:9 | Aspect ratio: 16:9, 4:3, 1:1, 9:16, 3:4, 3:2, 2:3 |
resolution | string | 720p | Output resolution: 720p, 480p |
reference_images | array | - | Reference images for reference-to-video mode |
poll_interval_ms | number | 10000 | Polling interval in milliseconds |
max_poll_time_ms | number | 600000 | Maximum wait time (10 minutes) |
Animate a static image by providing an image URL:
providers:
- id: xai:video:grok-imagine-video
config:
image:
url: 'https://example.com/image.jpg'
duration: 5
Edit an existing video with text instructions:
providers:
- id: xai:video:grok-imagine-video
config:
video:
url: 'https://example.com/source-video.mp4'
prompts:
- 'Make the colors more vibrant and add slow motion'
:::note Video editing skips duration, aspect ratio, and resolution validation since these are determined by the source video. :::
Guide generation with up to seven reference images:
providers:
- id: xai:video:grok-imagine-video
config:
reference_images:
- url: 'https://example.com/person.jpg'
- url: 'https://example.com/shirt.jpg'
duration: 10
Reference-to-video requires a non-empty prompt, cannot be combined with image or video, and is limited to 10 seconds.
Promptfoo uses the exact usage.cost_in_usd_ticks value returned by xAI when available, and falls back to the legacy estimate only when the API omits usage.
The xAI Voice Agent API enables real-time voice conversations with Grok models via WebSocket. Use the xai:voice:<model> provider format.
providers:
- xai:voice:grok-voice-think-fast-1.0
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
- id: xai:voice:grok-voice-think-fast-1.0
config:
voice: 'Ara' # Ara, Rex, Sal, Eve, or Leo
instructions: 'You are a helpful voice assistant.'
modalities: ['text', 'audio']
turn_detection:
type: server_vad
threshold: 0.85
silence_duration_ms: 500
prefix_padding_ms: 333
websocketTimeout: 60000 # Connection timeout in ms
tools:
- type: web_search
- type: x_search
| Voice | Description |
|---|---|
| Ara | Female voice |
| Rex | Male voice |
| Sal | Male voice |
| Eve | Female voice |
| Leo | Male voice |
Use turn_detection to tune server-side voice activity detection:
| Option | Type | Description |
|---|---|---|
type | string | server_vad for automatic detection |
threshold | number | Activation threshold from 0.1 to 0.9 |
silence_duration_ms | number | Silence required before ending the turn |
prefix_padding_ms | number | Audio kept before detected speech to avoid clipping |
The Voice API includes server-side tools that execute automatically:
| Tool | Description |
|---|---|
web_search | Search the web for information |
x_search | Search posts on X (Twitter) |
file_search | Search uploaded files in vector stores |
tools:
- type: web_search
- type: x_search
allowed_x_handles:
- elonmusk
- xai
- type: file_search
vector_store_ids:
- vs-123
max_num_results: 10
You can define custom function tools inline or load them from external files:
providers:
- id: xai:voice:grok-voice-think-fast-1.0
config:
# Inline tool definition
tools:
- type: function
name: set_volume
description: Set the device volume level
parameters:
type: object
properties:
level:
type: number
description: Volume level from 0 to 100
required:
- level
# Or load from external file (YAML or JSON)
# tools: file://tools.yaml
tests:
- vars:
question: 'Set the volume to 50 percent'
assert:
# Check that the correct function was called with correct arguments
- type: javascript
value: |
const calls = output.functionCalls || [];
return calls.some(c => c.name === 'set_volume' && c.arguments?.level === 50);
# Or use tool-call-f1 for function name matching
- type: tool-call-f1
value: ['set_volume']
threshold: 1.0
External tools file example:
- type: function
name: get_weather
description: Get the current weather for a location
parameters:
type: object
properties:
location:
type: string
required:
- location
- type: function
name: set_reminder
description: Set a reminder for the user
parameters:
type: object
properties:
message:
type: string
time:
type: string
required:
- message
- time
When function tools are used, the provider output includes a functionCalls array with:
name: The function name that was calledarguments: The parsed arguments objectresult: The result returned by your function handler (if provided)You can configure a custom WebSocket endpoint for the Voice API, useful for proxies or regional endpoints:
providers:
- id: xai:voice:grok-voice-think-fast-1.0
config:
# Option 1: Full base URL (transforms https:// to wss://)
apiBaseUrl: 'https://my-proxy.example.com/v1'
# Option 2: Host only (builds https://{host}/v1)
# apiHost: 'my-proxy.example.com'
You can also use the XAI_API_BASE_URL environment variable:
export XAI_API_BASE_URL=https://my-proxy.example.com/v1
URL transformation: The provider automatically converts HTTP URLs to WebSocket URLs (https:// → wss://, http:// → ws://) and appends /realtime to reach the Voice API endpoint.
For advanced use cases like local testing, custom proxies, or endpoints requiring query parameters, you can provide a complete WebSocket URL that will be used exactly as specified without any transformation:
providers:
- id: xai:voice:grok-voice-think-fast-1.0
config:
# Use this URL exactly as-is (no transformation applied)
websocketUrl: 'wss://custom-endpoint.example.com/path?token=xyz&session=abc'
This is useful for:
Configure input/output audio formats:
config:
audio:
input:
format:
type: audio/pcm
rate: 24000
output:
format:
type: audio/pcm
rate: 24000
Supported formats: audio/pcm, audio/pcmu, audio/pcma
Supported sample rates: 8000, 16000, 22050, 24000, 32000, 44100, 48000 Hz
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
prompts:
- file://input.json
providers:
- id: xai:voice:grok-voice-think-fast-1.0
config:
voice: 'Ara'
instructions: 'You are a helpful voice assistant.'
modalities: ['text', 'audio']
tools:
- type: web_search
tests:
- vars:
question: 'What are the latest AI developments?'
assert:
- type: llm-rubric
value: Provides information about recent AI news
The Voice Agent API is billed at $0.05 per minute of connection time.
For more information on the available models and API usage, refer to the xAI documentation.
For examples demonstrating text generation, image creation, and web search, see the xai example.
npx promptfoo@latest init --example xai/chat
For real-time voice conversations with Grok, see the xai-voice example.
npx promptfoo@latest init --example xai/voice
If you encounter 502 Bad Gateway errors when using the xAI provider, this typically indicates:
The xAI provider will provide helpful error messages to guide you in resolving these issues.
Solution: Verify your XAI_API_KEY environment variable is set correctly. You can obtain an API key from https://x.ai/.
If you're experiencing timeouts or want to control retry behavior:
PROMPTFOO_RETRY_5XX=falsePROMPTFOO_REQUEST_BACKOFF_MS=1000 (in milliseconds)