docs/providers/xai.md
OpenClaw ships a bundled xai provider plugin for Grok models.
```bash
openclaw onboard --auth-choice xai-api-key
```
OpenClaw includes these xAI model families out of the box:
| Family | Model ids |
|---|---|
| Grok 3 | grok-3, grok-3-fast, grok-3-mini, grok-3-mini-fast |
| Grok 4.3 | grok-4.3 |
| Grok 4 | grok-4, grok-4-0709 |
| Grok 4 Fast | grok-4-fast, grok-4-fast-non-reasoning |
| Grok 4.1 Fast | grok-4-1-fast, grok-4-1-fast-non-reasoning |
| Grok 4.20 Beta | grok-4.20-beta-latest-reasoning, grok-4.20-beta-latest-non-reasoning |
| Grok Code | grok-code-fast-1 |
The plugin also forward-resolves newer grok-4* and grok-code-fast* ids when
they follow the same API shape.
The bundled plugin maps xAI's current public API surface onto OpenClaw's shared provider and tool contracts. Capabilities that don't fit the shared contract (for example streaming TTS and realtime voice) are not exposed — see the table below.
| xAI capability | OpenClaw surface | Status |
|---|---|---|
| Chat / Responses | xai/<model> model provider | Yes |
| Server-side web search | web_search provider grok | Yes |
| Server-side X search | x_search tool | Yes |
| Server-side code execution | code_execution tool | Yes |
| Images | image_generate | Yes |
| Videos | video_generate | Yes |
| Batch text-to-speech | messages.tts.provider: "xai" / tts | Yes |
| Streaming TTS | — | Not exposed; OpenClaw's TTS contract returns complete audio buffers |
| Batch speech-to-text | tools.media.audio / media understanding | Yes |
| Streaming speech-to-text | Voice Call streaming.provider: "xai" | Yes |
| Realtime voice | — | Not exposed yet; different session/WebSocket contract |
| Files / batches | Generic model API compatibility only | Not a first-class OpenClaw tool |
/fast on or agents.defaults.models["xai/<model>"].params.fastMode: true
rewrites native xAI requests as follows:
| Source model | Fast-mode target |
|---|---|
grok-3 | grok-3-fast |
grok-3-mini | grok-3-mini-fast |
grok-4 | grok-4-fast |
grok-4-0709 | grok-4-fast |
Legacy aliases still normalize to the canonical bundled ids:
| Legacy alias | Canonical id |
|---|---|
grok-4-fast-reasoning | grok-4-fast |
grok-4-1-fast-reasoning | grok-4-1-fast |
grok-4.20-reasoning | grok-4.20-beta-latest-reasoning |
grok-4.20-non-reasoning | grok-4.20-beta-latest-non-reasoning |
```bash
openclaw config set tools.web.search.provider grok
```
- Default video model: `xai/grok-imagine-video`
- Modes: text-to-video, image-to-video, reference-image generation, remote
video edit, and remote video extension
- Aspect ratios: `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `3:2`, `2:3`
- Resolutions: `480P`, `720P`
- Duration: 1-15 seconds for generation/image-to-video, 1-10 seconds when
using `reference_image` roles, 2-10 seconds for extension
- Reference-image generation: set `imageRoles` to `reference_image` for
every supplied image; xAI accepts up to 7 such images
<Warning>
Local video buffers are not accepted. Use remote `http(s)` URLs for
video edit/extend inputs. Image-to-video accepts local image buffers because
OpenClaw can encode those as data URLs for xAI.
</Warning>
To use xAI as the default video provider:
```json5
{
agents: {
defaults: {
videoGenerationModel: {
primary: "xai/grok-imagine-video",
},
},
},
}
```
<Note>
See [Video Generation](/tools/video-generation) for shared tool parameters,
provider selection, and failover behavior.
</Note>
- Default image model: `xai/grok-imagine-image`
- Additional model: `xai/grok-imagine-image-pro`
- Modes: text-to-image and reference-image edit
- Reference inputs: one `image` or up to five `images`
- Aspect ratios: `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `2:3`, `3:2`
- Resolutions: `1K`, `2K`
- Count: up to 4 images
OpenClaw asks xAI for `b64_json` image responses so generated media can be
stored and delivered through the normal channel attachment path. Local
reference images are converted to data URLs; remote `http(s)` references are
passed through.
To use xAI as the default image provider:
```json5
{
agents: {
defaults: {
imageGenerationModel: {
primary: "xai/grok-imagine-image",
},
},
},
}
```
<Note>
xAI also documents `quality`, `mask`, `user`, and additional native ratios
such as `1:2`, `2:1`, `9:20`, and `20:9`. OpenClaw forwards only the
shared cross-provider image controls today; unsupported native-only knobs
are intentionally not exposed through `image_generate`.
</Note>
- Voices: `eve`, `ara`, `rex`, `sal`, `leo`, `una`
- Default voice: `eve`
- Formats: `mp3`, `wav`, `pcm`, `mulaw`, `alaw`
- Language: BCP-47 code or `auto`
- Speed: provider-native speed override
- Native Opus voice-note format is not supported
To use xAI as the default TTS provider:
```json5
{
messages: {
tts: {
provider: "xai",
providers: {
xai: {
voiceId: "eve",
},
},
},
},
}
```
<Note>
OpenClaw uses xAI's batch `/v1/tts` endpoint. xAI also offers streaming TTS
over WebSocket, but the OpenClaw speech provider contract currently expects
a complete audio buffer before reply delivery.
</Note>
- Default model: `grok-stt`
- Endpoint: xAI REST `/v1/stt`
- Input path: multipart audio file upload
- Supported by OpenClaw wherever inbound audio transcription uses
`tools.media.audio`, including Discord voice-channel segments and
channel audio attachments
To force xAI for inbound audio transcription:
```json5
{
tools: {
media: {
audio: {
models: [
{
type: "provider",
provider: "xai",
model: "grok-stt",
},
],
},
},
},
}
```
Language can be supplied through the shared audio media config or per-call
transcription request. Prompt hints are accepted by the shared OpenClaw
surface, but the xAI REST STT integration only forwards file, model, and
language because those map cleanly to the current public xAI endpoint.
- Endpoint: xAI WebSocket `wss://api.x.ai/v1/stt`
- Default encoding: `mulaw`
- Default sample rate: `8000`
- Default endpointing: `800ms`
- Interim transcripts: enabled by default
Voice Call's Twilio media stream sends G.711 µ-law audio frames, so the
xAI provider can forward those frames directly without transcoding:
```json5
{
plugins: {
entries: {
"voice-call": {
config: {
streaming: {
enabled: true,
provider: "xai",
providers: {
xai: {
apiKey: "${XAI_API_KEY}",
endpointingMs: 800,
language: "en",
},
},
},
},
},
},
},
}
```
Provider-owned config lives under
`plugins.entries.voice-call.config.streaming.providers.xai`. Supported
keys are `apiKey`, `baseUrl`, `sampleRate`, `encoding` (`pcm`, `mulaw`, or
`alaw`), `interimResults`, `endpointingMs`, and `language`.
<Note>
This streaming provider is for Voice Call's realtime transcription path.
Discord voice currently records short segments and uses the batch
`tools.media.audio` transcription path instead.
</Note>
Config path: `plugins.entries.xai.config.xSearch`
| Key | Type | Default | Description |
| ------------------ | ------- | ------------------ | ------------------------------------ |
| `enabled` | boolean | — | Enable or disable x_search |
| `model` | string | `grok-4-1-fast` | Model used for x_search requests |
| `baseUrl` | string | — | xAI Responses base URL override |
| `inlineCitations` | boolean | — | Include inline citations in results |
| `maxTurns` | number | — | Maximum conversation turns |
| `timeoutSeconds` | number | — | Request timeout in seconds |
| `cacheTtlMinutes` | number | — | Cache time-to-live in minutes |
```json5
{
plugins: {
entries: {
xai: {
config: {
xSearch: {
enabled: true,
model: "grok-4-1-fast",
baseUrl: "https://api.x.ai/v1",
inlineCitations: true,
},
},
},
},
},
}
```
Config path: `plugins.entries.xai.config.codeExecution`
| Key | Type | Default | Description |
| ----------------- | ------- | ------------------ | ---------------------------------------- |
| `enabled` | boolean | `true` (if key available) | Enable or disable code execution |
| `model` | string | `grok-4-1-fast` | Model used for code execution requests |
| `maxTurns` | number | — | Maximum conversation turns |
| `timeoutSeconds` | number | — | Request timeout in seconds |
<Note>
This is remote xAI sandbox execution, not local [`exec`](/tools/exec).
</Note>
```json5
{
plugins: {
entries: {
xai: {
config: {
codeExecution: {
enabled: true,
model: "grok-4-1-fast",
},
},
},
},
},
}
```
The xAI media paths are covered by unit tests and opt-in live suites. The live
commands load secrets from your login shell, including ~/.profile, before
probing XAI_API_KEY.
pnpm test extensions/xai
OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_TEST_QUIET=1 pnpm test:live -- extensions/xai/xai.live.test.ts
OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_TEST_QUIET=1 OPENCLAW_LIVE_IMAGE_GENERATION_PROVIDERS=xai pnpm test:live -- test/image-generation.runtime.live.test.ts
The provider-specific live file synthesizes normal TTS, telephony-friendly PCM TTS, transcribes audio through xAI batch STT, streams the same PCM through xAI realtime STT, generates text-to-image output, and edits a reference image. The shared image live file verifies the same xAI provider through OpenClaw's runtime selection, fallback, normalization, and media attachment path.