content/providers/01-ai-sdk-providers/03-openai.mdx
The OpenAI provider contains language model support for the OpenAI responses, chat, and completion APIs, as well as embedding model support for the OpenAI embeddings API.
The OpenAI provider is available in the @ai-sdk/openai module. You can install it with
<Tabs items={['pnpm', 'npm', 'yarn', 'bun']}> <Tab> <Snippet text="pnpm add @ai-sdk/openai" dark /> </Tab> <Tab> <Snippet text="npm install @ai-sdk/openai" dark /> </Tab> <Tab> <Snippet text="yarn add @ai-sdk/openai" dark /> </Tab>
<Tab> <Snippet text="bun add @ai-sdk/openai" dark /> </Tab> </Tabs>You can import the default provider instance openai from @ai-sdk/openai:
import { openai } from '@ai-sdk/openai';
If you need a customized setup, you can import createOpenAI from @ai-sdk/openai and create a provider instance with your settings:
import { createOpenAI } from '@ai-sdk/openai';
const openai = createOpenAI({
// custom settings, e.g.
headers: {
'header-name': 'header-value',
},
});
You can use the following optional settings to customize the OpenAI provider instance:
baseURL string
Use a different URL prefix for API calls, e.g. to use proxy servers.
The default prefix is https://api.openai.com/v1.
apiKey string
API key that is being sent using the Authorization header.
It defaults to the OPENAI_API_KEY environment variable.
name string
The provider name. You can set this when using OpenAI compatible providers
to change the model provider property. Defaults to openai.
organization string
OpenAI Organization.
project string
OpenAI project.
headers Record<string,string>
Custom headers to include in the requests.
fetch (input: RequestInfo, init?: RequestInit) => Promise<Response>
Custom fetch implementation.
Defaults to the global fetch function.
You can use it as a middleware to intercept requests,
or to provide a custom fetch implementation for e.g. testing.
The OpenAI provider instance is a function that you can invoke to create a language model:
const model = openai('gpt-5');
It automatically selects the correct API based on the model id. You can also pass additional settings in the second argument:
const model = openai('gpt-5', {
// additional settings
});
The available options depend on the API that's automatically chosen for the model (see below).
If you want to explicitly select a specific model API, you can use .responses, .chat, or .completion.
You can use OpenAI language models to generate text with the generateText function:
import { openai } from '@ai-sdk/openai';
import { generateText } from 'ai';
const { text } = await generateText({
model: openai('gpt-5'),
prompt: 'Write a vegetarian lasagna recipe for 4 people.',
});
OpenAI language models can also be used in the streamText function
and support structured data generation with Output
(see AI SDK Core).
You can use the OpenAI responses API with the openai(modelId) or openai.responses(modelId) factory methods. It is the default API that is used by the OpenAI provider (since AI SDK 5).
const model = openai('gpt-5');
Further configuration can be done using OpenAI provider options.
You can validate the provider options using the OpenAILanguageModelResponsesOptions type.
import { openai, OpenAILanguageModelResponsesOptions } from '@ai-sdk/openai';
import { generateText } from 'ai';
const result = await generateText({
model: openai('gpt-5'), // or openai.responses('gpt-5')
providerOptions: {
openai: {
parallelToolCalls: false,
store: false,
user: 'user_123',
// ...
} satisfies OpenAILanguageModelResponsesOptions,
},
// ...
});
The following provider options are available:
parallelToolCalls boolean
Whether to use parallel tool calls. Defaults to true.
store boolean
Whether to store the generation. Defaults to true.
maxToolCalls integer The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.
metadata Record<string, string> Additional metadata to store with the generation.
conversation string
The ID of the OpenAI Conversation to continue.
You must create a conversation first via the OpenAI API.
Cannot be used in conjunction with previousResponseId.
Defaults to undefined.
previousResponseId string
The ID of the previous response. You can use it to continue a conversation. Defaults to undefined.
instructions string
Instructions for the model.
They can be used to change the system or developer message when continuing a conversation using the previousResponseId option.
Defaults to undefined.
logprobs boolean | number
Return the log probabilities of the tokens. Including logprobs will increase the response size and can slow down response times. However, it can be useful to better understand how the model is behaving. Setting to true returns the log probabilities of the tokens that were generated. Setting to a number (1-20) returns the log probabilities of the top n tokens that were generated.
user string
A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse. Defaults to undefined.
reasoningEffort 'none' | 'minimal' | 'low' | 'medium' | 'high' | 'xhigh'
Reasoning effort for reasoning models. Defaults to medium. If you use providerOptions to set the reasoningEffort option, this model setting will be ignored.
reasoningSummary 'auto' | 'detailed'
Controls whether the model returns its reasoning process. Set to 'auto' for a condensed summary, 'detailed' for more comprehensive reasoning. Defaults to undefined (no reasoning summaries). When enabled, reasoning summaries appear in the stream as events with type 'reasoning' and in non-streaming responses within the reasoning field.
strictJsonSchema boolean
Whether to use strict JSON schema validation. Defaults to true.
serviceTier 'auto' | 'flex' | 'priority' | 'default' Service tier for the request. Set to 'flex' for 50% cheaper processing at the cost of increased latency (available for o3, o4-mini, and gpt-5 models). Set to 'priority' for faster processing with Enterprise access (available for gpt-4, gpt-5, gpt-5-mini, o3, o4-mini; gpt-5-nano is not supported).
Defaults to 'auto'.
textVerbosity 'low' | 'medium' | 'high'
Controls the verbosity of the model's response. Lower values result in more concise responses,
while higher values result in more verbose responses. Defaults to 'medium'.
include Array<string>
Specifies additional content to include in the response. Supported values:
['file_search_call.results'] for including file search results in responses.
['message.output_text.logprobs'] for logprobs.
Defaults to undefined.
truncation string The truncation strategy to use for the model response.
promptCacheKey string A cache key for manual prompt caching control. Used by OpenAI to cache responses for similar requests to optimize your cache hit rates.
promptCacheRetention 'in_memory' | '24h'
The retention policy for the prompt cache. Set to '24h' to enable extended prompt caching, which keeps cached prefixes active for up to 24 hours. Defaults to 'in_memory' for standard prompt caching. Note: '24h' is currently only available for the 5.1 series of models.
safetyIdentifier string A stable identifier used to help detect users of your application that may be violating OpenAI's usage policies. The IDs should be a string that uniquely identifies each user.
systemMessageMode 'system' | 'developer' | 'remove'
Controls the role of the system message when making requests. By default (when omitted), for models that support reasoning the system message is automatically converted to a developer message. Setting systemMessageMode to system passes the system message as a system-level instruction; developer passes it as a developer message; remove omits the system message from the request.
forceReasoning boolean
Force treating this model as a reasoning model. This is useful for "stealth" reasoning models (e.g. via a custom baseURL) where the model ID is not recognized by the SDK's allowlist. When enabled, the SDK applies reasoning-model parameter compatibility rules and defaults systemMessageMode to developer unless overridden.
contextManagement Array<object> Enable server-side context management (compaction). When configured, the server automatically compresses conversation context when token usage crosses a specified threshold. Each object in the array should have:
type: 'compaction'compactThreshold: number — the token count at which compaction is triggeredThe OpenAI responses provider also returns provider-specific metadata:
For Responses models, you can type this metadata using OpenaiResponsesProviderMetadata:
import { openai, type OpenaiResponsesProviderMetadata } from '@ai-sdk/openai';
import { generateText } from 'ai';
const result = await generateText({
model: openai('gpt-5'),
});
const providerMetadata = result.providerMetadata as
| OpenaiResponsesProviderMetadata
| undefined;
const { responseId, logprobs, serviceTier } = providerMetadata?.openai ?? {};
// responseId can be used to continue a conversation (previousResponseId).
console.log(responseId);
The following OpenAI-specific metadata may be returned:
For reasoning models like gpt-5, you can enable reasoning summaries to see the model's thought process. Different models support different summarizers—for example, o4-mini supports detailed summaries. Set reasoningSummary: "auto" to automatically receive the richest level available.
import {
openai,
type OpenAILanguageModelResponsesOptions,
} from '@ai-sdk/openai';
import { streamText } from 'ai';
const result = streamText({
model: openai('gpt-5'),
prompt: 'Tell me about the Mission burrito debate in San Francisco.',
providerOptions: {
openai: {
reasoningSummary: 'detailed', // 'auto' for condensed or 'detailed' for comprehensive
} satisfies OpenAILanguageModelResponsesOptions,
},
});
for await (const part of result.fullStream) {
if (part.type === 'reasoning') {
console.log(`Reasoning: ${part.textDelta}`);
} else if (part.type === 'text-delta') {
process.stdout.write(part.textDelta);
}
}
For non-streaming calls with generateText, the reasoning summaries are available in the reasoning field of the response:
import {
openai,
type OpenAILanguageModelResponsesOptions,
} from '@ai-sdk/openai';
import { generateText } from 'ai';
const result = await generateText({
model: openai('gpt-5'),
prompt: 'Tell me about the Mission burrito debate in San Francisco.',
providerOptions: {
openai: {
reasoningSummary: 'auto',
} satisfies OpenAILanguageModelResponsesOptions,
},
});
console.log('Reasoning:', result.reasoning);
Learn more about reasoning summaries in the OpenAI documentation.
OpenAI's WebSocket API keeps a persistent connection open, which can significantly reduce Time-to-First-Byte (TTFB) in agentic workflows with many tool calls. After the initial connection, subsequent requests skip TCP/TLS/HTTP negotiation entirely.
The ai-sdk-openai-websocket-fetch
package provides a drop-in fetch replacement that routes streaming requests
through a persistent WebSocket connection.
<Tabs items={['pnpm', 'npm', 'yarn', 'bun']}> <Tab> <Snippet text="pnpm add ai-sdk-openai-websocket-fetch" dark /> </Tab> <Tab> <Snippet text="npm install ai-sdk-openai-websocket-fetch" dark /> </Tab> <Tab> <Snippet text="yarn add ai-sdk-openai-websocket-fetch" dark /> </Tab> <Tab> <Snippet text="bun add ai-sdk-openai-websocket-fetch" dark /> </Tab> </Tabs>
Pass the WebSocket fetch to createOpenAI via the fetch option:
import { createOpenAI } from '@ai-sdk/openai';
import { createWebSocketFetch } from 'ai-sdk-openai-websocket-fetch';
import { streamText } from 'ai';
// Create a WebSocket-backed fetch instance
const wsFetch = createWebSocketFetch();
const openai = createOpenAI({ fetch: wsFetch });
const result = streamText({
model: openai('gpt-4.1-mini'),
prompt: 'Hello!',
tools: {
// ...
},
onFinish: () => wsFetch.close(), // close the WebSocket when done
});
The first request will be slower because it must establish the WebSocket connection (DNS + TCP + TLS + WebSocket upgrade). After that, subsequent steps in a multi-step tool-calling loop reuse the open connection, resulting in lower TTFB per step.
<Note> The WebSocket transport only routes streaming requests to the OpenAI Responses API (`POST /responses` with `stream: true`) through the WebSocket. All other requests (non-streaming, embeddings, etc.) fall through to the standard `fetch` implementation. </Note>You can see a live side-by-side comparison of HTTP vs WebSocket streaming performance in the demo app.
You can control the length and detail of model responses using the textVerbosity parameter:
import {
openai,
type OpenAILanguageModelResponsesOptions,
} from '@ai-sdk/openai';
import { generateText } from 'ai';
const result = await generateText({
model: openai('gpt-5-mini'),
prompt: 'Write a poem about a boy and his first pet dog.',
providerOptions: {
openai: {
textVerbosity: 'low', // 'low' for concise, 'medium' (default), or 'high' for verbose
} satisfies OpenAILanguageModelResponsesOptions,
},
});
The textVerbosity parameter scales output length without changing the underlying prompt:
'low': Produces terse, minimal responses'medium': Balanced detail (default)'high': Verbose responses with comprehensive detailThe OpenAI responses API supports web search through the openai.tools.webSearch tool.
const result = await generateText({
model: openai('gpt-5'),
prompt: 'What happened in San Francisco last week?',
tools: {
web_search: openai.tools.webSearch({
// optional configuration:
externalWebAccess: true,
searchContextSize: 'high',
userLocation: {
type: 'approximate',
city: 'San Francisco',
region: 'California',
},
filters: {
allowedDomains: ['sfchronicle.com', 'sfgate.com'],
},
}),
},
// Force web search tool (optional):
toolChoice: { type: 'tool', toolName: 'web_search' },
});
// URL sources directly from `results`
const sources = result.sources;
// Or access sources from tool results
for (const toolResult of result.toolResults) {
if (toolResult.toolName === 'web_search') {
console.log('Query:', toolResult.output.action.query);
console.log('Sources:', toolResult.output.sources);
// `sources` is an array of object: { type: 'url', url: string }
}
}
The web search tool supports the following configuration options:
true.type (always 'approximate'), country, city, region, and timezone.For detailed information on configuration options see the OpenAI Web Search Tool documentation.
The OpenAI responses API supports file search through the openai.tools.fileSearch tool.
You can force the use of the file search tool by setting the toolChoice parameter to { type: 'tool', toolName: 'file_search' }.
const result = await generateText({
model: openai('gpt-5'),
prompt: 'What does the document say about user authentication?',
tools: {
file_search: openai.tools.fileSearch({
vectorStoreIds: ['vs_123'],
// configuration below is optional:
maxNumResults: 5,
filters: {
key: 'author',
type: 'eq',
value: 'Jane Smith',
},
ranking: {
ranker: 'auto',
scoreThreshold: 0.5,
},
}),
},
providerOptions: {
openai: {
// optional: include results
include: ['file_search_call.results'],
} satisfies OpenAILanguageModelResponsesOptions,
},
});
The file search tool supports filtering with both comparison and compound filters:
Comparison filters - Filter by a single attribute:
eq - Equal tone - Not equal togt - Greater thangte - Greater than or equal tolt - Less thanlte - Less than or equal toin - Value is in arraynin - Value is not in array// Single comparison filter
filters: { key: 'year', type: 'gte', value: 2023 }
// Filter with array values
filters: { key: 'status', type: 'in', value: ['published', 'reviewed'] }
Compound filters - Combine multiple filters with and or or:
// Compound filter with AND
filters: {
type: 'and',
filters: [
{ key: 'author', type: 'eq', value: 'Jane Smith' },
{ key: 'year', type: 'gte', value: 2023 },
],
}
// Compound filter with OR
filters: {
type: 'or',
filters: [
{ key: 'department', type: 'eq', value: 'Engineering' },
{ key: 'department', type: 'eq', value: 'Research' },
],
}
OpenAI's Responses API supports multi-modal image generation as a provider-defined tool.
Availability is restricted to specific models (for example, gpt-5 variants).
You can use the image tool with either generateText or streamText:
import { openai } from '@ai-sdk/openai';
import { generateText } from 'ai';
const result = await generateText({
model: openai('gpt-5'),
prompt:
'Generate an image of an echidna swimming across the Mozambique channel.',
tools: {
image_generation: openai.tools.imageGeneration({ outputFormat: 'webp' }),
},
});
for (const toolResult of result.staticToolResults) {
if (toolResult.toolName === 'image_generation') {
const base64Image = toolResult.output.result;
}
}
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';
const result = streamText({
model: openai('gpt-5'),
prompt:
'Generate an image of an echidna swimming across the Mozambique channel.',
tools: {
image_generation: openai.tools.imageGeneration({
outputFormat: 'webp',
quality: 'low',
}),
},
});
for await (const part of result.fullStream) {
if (part.type == 'tool-result' && !part.dynamic) {
const base64Image = part.output.result;
}
}
For complete details on model availability, image quality controls, supported sizes, and tool-specific parameters, refer to the OpenAI documentation:
The OpenAI responses API supports the code interpreter tool through the openai.tools.codeInterpreter tool.
This allows models to write and execute Python code.
import { openai } from '@ai-sdk/openai';
import { generateText } from 'ai';
const result = await generateText({
model: openai('gpt-5'),
prompt: 'Write and run Python code to calculate the factorial of 10',
tools: {
code_interpreter: openai.tools.codeInterpreter({
// optional configuration:
container: {
fileIds: ['file-123', 'file-456'], // optional file IDs to make available
},
}),
},
});
The code interpreter tool can be configured with:
fileIds to specify uploaded files that should be available to the code interpreterThe OpenAI responses API supports connecting to Model Context Protocol (MCP) servers through the openai.tools.mcp tool. This allows models to call tools exposed by remote MCP servers or service connectors.
import { openai } from '@ai-sdk/openai';
import { generateText } from 'ai';
const result = await generateText({
model: openai('gpt-5'),
prompt: 'Search the web for the latest news about AI developments',
tools: {
mcp: openai.tools.mcp({
serverLabel: 'web-search',
serverUrl: 'https://mcp.exa.ai/mcp',
serverDescription: 'A web-search API for AI agents',
}),
},
});
The MCP tool can be configured with:
serverLabel string (required)
A label to identify the MCP server. This label is used in tool calls to distinguish between multiple MCP servers.
serverUrl string (required if connectorId is not provided)
The URL for the MCP server. Either serverUrl or connectorId must be provided.
connectorId string (required if serverUrl is not provided)
Identifier for a service connector. Either serverUrl or connectorId must be provided.
serverDescription string (optional)
Optional description of the MCP server that helps the model understand its purpose.
allowedTools string[] | object (optional)
Controls which tools from the MCP server are available. Can be:
['tool1', 'tool2']{
readOnly: true, // Only allow read-only tools
toolNames: ['tool1', 'tool2'] // Specific tool names
}
authorization string (optional)
OAuth access token for authenticating with the MCP server or connector.
headers Record<string, string> (optional)
Optional HTTP headers to include in requests to the MCP server.
requireApproval 'always' | 'never' | object (optional)
Controls which MCP tool calls require user approval before execution. Can be:
'always': All MCP tool calls require approval'never': No MCP tool calls require approval (default){
never: {
toolNames: ['safe_tool', 'another_safe_tool']; // Skip approval for these tools
}
}
When approval is required, the model will return a tool-approval-request content part that you can use to prompt the user for approval. See Human in the Loop for more details on implementing approval workflows.
The OpenAI responses API support the local shell tool for Codex models through the openai.tools.localShell tool.
Local shell is a tool that allows agents to run shell commands locally on a machine you or the user provides.
import { openai } from '@ai-sdk/openai';
import { generateText } from 'ai';
const result = await generateText({
model: openai.responses('gpt-5-codex'),
tools: {
local_shell: openai.tools.localShell({
execute: async ({ action }) => {
// ... your implementation, e.g. sandbox access ...
return { output: stdout };
},
}),
},
prompt: 'List the files in my home directory.',
stopWhen: stepCountIs(2),
});
The OpenAI Responses API supports the shell tool through the openai.tools.shell tool.
The shell tool allows running bash commands and interacting with a command line.
The model proposes shell commands; your integration executes them and returns the outputs.
The shell tool supports three environment modes that control where commands are executed:
When no environment is specified (or type: 'local' is used), commands are executed locally via your execute callback:
import { openai } from '@ai-sdk/openai';
import { generateText } from 'ai';
const result = await generateText({
model: openai('gpt-5.2'),
tools: {
shell: openai.tools.shell({
execute: async ({ action }) => {
// ... your implementation, e.g. sandbox access ...
return { output: results };
},
}),
},
prompt: 'List the files in the current directory and show disk usage.',
});
Set environment.type to 'containerAuto' to run commands in an OpenAI-hosted container. No execute callback is needed — OpenAI handles execution server-side:
const result = await generateText({
model: openai('gpt-5.2'),
tools: {
shell: openai.tools.shell({
environment: {
type: 'containerAuto',
// optional configuration:
memoryLimit: '4g',
fileIds: ['file-abc123'],
networkPolicy: {
type: 'allowlist',
allowedDomains: ['example.com'],
},
},
}),
},
prompt: 'Install numpy and compute the eigenvalues of a 3x3 matrix.',
});
The containerAuto environment supports:
{ type: 'disabled' } — no network access{ type: 'allowlist', allowedDomains: string[], domainSecrets?: Array<{ domain, name, value }> } — allow specific domains with optional secretsSet environment.type to 'containerReference' to use an existing container by ID:
const result = await generateText({
model: openai('gpt-5.2'),
tools: {
shell: openai.tools.shell({
environment: {
type: 'containerReference',
containerId: 'cntr_abc123',
},
}),
},
prompt: 'Check the status of running processes.',
});
For local execution (default or type: 'local'), your execute function must return an output array with results for each command:
{ type: 'timeout' } or { type: 'exit', exitCode: number }Skills are versioned bundles of files with a SKILL.md manifest that extend the shell tool's capabilities. They can be attached to both containerAuto and local environments.
Container skills support two formats — by reference (for skills uploaded to OpenAI) or inline (as a base64-encoded zip):
const result = await generateText({
model: openai('gpt-5.2'),
tools: {
shell: openai.tools.shell({
environment: {
type: 'containerAuto',
skills: [
// By reference:
{ type: 'skillReference', skillId: 'skill_abc123' },
// Or inline:
{
type: 'inline',
name: 'my-skill',
description: 'What this skill does',
source: {
type: 'base64',
mediaType: 'application/zip',
data: readFileSync('./my-skill.zip').toString('base64'),
},
},
],
},
}),
},
prompt: 'Use the skill to solve this problem.',
});
Local skills point to a directory on disk containing a SKILL.md file:
const result = await generateText({
model: openai('gpt-5.2'),
tools: {
shell: openai.tools.shell({
execute: async ({ action }) => {
// ... your local execution implementation ...
return { output: results };
},
environment: {
type: 'local',
skills: [
{
name: 'my-skill',
description: 'What this skill does',
path: resolve('path/to/skill-directory'),
},
],
},
}),
},
prompt: 'Use the skill to solve this problem.',
stopWhen: stepCountIs(5),
});
For more details on creating skills, see the OpenAI Skills documentation.
The OpenAI Responses API supports the apply patch tool for GPT-5.1 models through the openai.tools.applyPatch tool.
The apply patch tool lets the model create, update, and delete files in your codebase using structured diffs.
Instead of just suggesting edits, the model emits patch operations that your application applies and reports back on,
enabling iterative, multi-step code editing workflows.
import { openai } from '@ai-sdk/openai';
import { generateText, stepCountIs } from 'ai';
const result = await generateText({
model: openai('gpt-5.1'),
tools: {
apply_patch: openai.tools.applyPatch({
execute: async ({ callId, operation }) => {
// ... your implementation for applying the diffs.
},
}),
},
prompt: 'Create a python file that calculates the factorial of a number',
stopWhen: stepCountIs(5),
});
Your execute function must return:
Tool search allows the model to dynamically search for and load tools into context as needed,
rather than loading all tool definitions up front. This can reduce token usage, cost, and latency
when you have many tools. Mark the tools you want to make searchable with deferLoading: true
in their providerOptions.
There are two execution modes:
tool_search_call, your application performs the lookup, and you return the matching tools via the execute callback.Use hosted tool search when the candidate tools are already known at request time.
Add openai.tools.toolSearch() with no arguments and mark your tools with deferLoading: true:
import { openai } from '@ai-sdk/openai';
import { generateText, tool, stepCountIs } from 'ai';
import { z } from 'zod';
const result = await generateText({
model: openai.responses('gpt-5.4'),
prompt: 'What is the weather in San Francisco?',
stopWhen: stepCountIs(10),
tools: {
toolSearch: openai.tools.toolSearch(),
get_weather: tool({
description: 'Get the current weather at a specific location',
inputSchema: z.object({
location: z.string(),
unit: z.enum(['celsius', 'fahrenheit']),
}),
execute: async ({ location, unit }) => ({
location,
temperature: unit === 'celsius' ? 18 : 64,
}),
providerOptions: {
openai: { deferLoading: true },
},
}),
search_files: tool({
description: 'Search through files in the workspace',
inputSchema: z.object({ query: z.string() }),
execute: async ({ query }) => ({
results: [`Found 3 files matching "${query}"`],
}),
providerOptions: {
openai: { deferLoading: true },
},
}),
},
});
In hosted mode, the model internally searches the deferred tools, loads the relevant ones, and
proceeds to call them — all within a single response. The tool_search_call and
tool_search_output items appear in the response with execution: 'server' and call_id: null.
Use client-executed tool search when tool discovery depends on runtime state — for example,
tools that vary per tenant, project, or external system. Pass execution: 'client' along with
a description, parameters schema, and an execute callback:
import { openai } from '@ai-sdk/openai';
import { generateText, tool, stepCountIs } from 'ai';
import { z } from 'zod';
const result = await generateText({
model: openai.responses('gpt-5.4'),
prompt: 'What is the weather in San Francisco?',
stopWhen: stepCountIs(10),
tools: {
toolSearch: openai.tools.toolSearch({
execution: 'client',
description: 'Search for available tools based on what the user needs.',
parameters: {
type: 'object',
properties: {
goal: {
type: 'string',
description: 'What the user is trying to accomplish',
},
},
required: ['goal'],
additionalProperties: false,
},
execute: async ({ arguments: args }) => {
// Your custom tool discovery logic here.
// Return the tools that match the search goal.
return {
tools: [
{
type: 'function',
name: 'get_weather',
description: 'Get the current weather at a specific location',
deferLoading: true,
parameters: {
type: 'object',
properties: {
location: { type: 'string' },
},
required: ['location'],
additionalProperties: false,
},
},
],
};
},
}),
get_weather: tool({
description: 'Get the current weather at a specific location',
inputSchema: z.object({ location: z.string() }),
execute: async ({ location }) => ({
location,
temperature: 64,
condition: 'Partly cloudy',
}),
providerOptions: {
openai: { deferLoading: true },
},
}),
},
});
In client mode, the flow spans two steps:
tool_search_call with execution: 'client' and a non-null call_id. The SDK calls your execute callback with the search arguments. Your callback returns the discovered tools.tool_search_output (with the matching call_id) back to the model. The model can now call the loaded tools as normal function calls.For more details, see the OpenAI Tool Search documentation.
The OpenAI Responses API supports
custom tools
through the openai.tools.customTool tool.
Custom tools return a raw string instead of JSON, optionally constrained to a grammar
(regex or Lark syntax). This makes them useful for generating structured text like
SQL queries, code snippets, or any output that must match a specific pattern.
import { openai } from '@ai-sdk/openai';
import { generateText, stepCountIs } from 'ai';
const result = await generateText({
model: openai.responses('gpt-5.2-codex'),
tools: {
write_sql: openai.tools.customTool({
description: 'Write a SQL SELECT query to answer the user question.',
format: {
type: 'grammar',
syntax: 'regex',
definition: 'SELECT .+',
},
execute: async input => {
// input is a raw string matching the grammar, e.g. "SELECT * FROM users WHERE age > 25"
const rows = await db.query(input);
return JSON.stringify(rows);
},
}),
},
toolChoice: 'required',
prompt: 'Write a SQL query to get all users older than 25.',
stopWhen: stepCountIs(3),
});
Custom tools also work with streamText:
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';
const result = streamText({
model: openai.responses('gpt-5.2-codex'),
tools: {
write_sql: openai.tools.customTool({
description: 'Write a SQL SELECT query to answer the user question.',
format: {
type: 'grammar',
syntax: 'regex',
definition: 'SELECT .+',
},
}),
},
toolChoice: 'required',
prompt: 'Write a SQL query to get all users older than 25.',
});
for await (const chunk of result.fullStream) {
if (chunk.type === 'tool-call') {
console.log(`Tool: ${chunk.toolName}`);
console.log(`Input: ${chunk.input}`);
}
}
The custom tool can be configured with:
'grammar' for constrained output or 'text' for explicit unconstrained text.'regex' for regular expression patterns or 'lark' for Lark parser grammar.The OpenAI Responses API supports Image inputs for appropriate models. You can pass Image files as part of the message content using the 'image' type:
const result = await generateText({
model: openai('gpt-5'),
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: 'Please describe the image.',
},
{
type: 'image',
image: readFileSync('./data/image.png'),
},
],
},
],
});
The model will have access to the image and will respond to questions about it.
The image should be passed using the image field.
You can also pass a file-id from the OpenAI Files API.
{
type: 'image',
image: 'file-8EFBcWHsQxZV7YGezBC1fq'
}
You can also pass the URL of an image.
{
type: 'image',
image: 'https://sample.edu/image.png',
}
The OpenAI Responses API supports reading PDF files.
You can pass PDF files as part of the message content using the file type:
const result = await generateText({
model: openai('gpt-5'),
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: 'What is an embedding model?',
},
{
type: 'file',
data: readFileSync('./data/ai.pdf'),
mediaType: 'application/pdf',
filename: 'ai.pdf', // optional
},
],
},
],
});
You can also pass a file-id from the OpenAI Files API.
{
type: 'file',
data: 'file-8EFBcWHsQxZV7YGezBC1fq',
mediaType: 'application/pdf',
}
You can also pass the URL of a pdf.
{
type: 'file',
data: 'https://sample.edu/example.pdf',
mediaType: 'application/pdf',
filename: 'ai.pdf', // optional
}
The model will have access to the contents of the PDF file and
respond to questions about it.
The PDF file should be passed using the data field,
and the mediaType should be set to 'application/pdf'.
The OpenAI Responses API supports structured outputs. You can use generateText or streamText with Output to enforce structured outputs.
const result = await generateText({
model: openai('gpt-4.1'),
output: Output.object({
schema: z.object({
recipe: z.object({
name: z.string(),
ingredients: z.array(
z.object({
name: z.string(),
amount: z.string(),
}),
),
steps: z.array(z.string()),
}),
}),
}),
prompt: 'Generate a lasagna recipe.',
});
When using the OpenAI Responses API, the SDK attaches OpenAI-specific metadata to output parts via providerMetadata.
This metadata can be used on the client side for tasks such as rendering citations or downloading files generated by the Code Interpreter. To enable type-safe handling of this metadata, the AI SDK exports dedicated TypeScript types.
For text parts, when part.type === 'text', the providerMetadata is provided in the form of OpenaiResponsesTextProviderMetadata.
This metadata includes the following fields:
itemId
The ID of the output item in the Responses API.
annotations (optional)
An array of annotation objects generated by the model.
If no annotations are present, this property itself may be omitted (undefined).
Each element in annotations is a discriminated union with a required type field. Supported types include, for example:
url_citationfile_citationcontainer_file_citationfile_pathThese annotations directly correspond to the annotation objects defined by the Responses API and can be used for inline reference rendering or output analysis. For details, see the official OpenAI documentation: Responses API – output text annotations.
import {
openai,
type OpenaiResponsesTextProviderMetadata,
} from '@ai-sdk/openai';
import { generateText } from 'ai';
const result = await generateText({
model: openai('gpt-4.1-mini'),
prompt:
'Create a program that generates five random numbers between 1 and 100 with two decimal places, and show me the execution results. Also save the result to a file.',
tools: {
code_interpreter: openai.tools.codeInterpreter(),
web_search: openai.tools.webSearch(),
file_search: openai.tools.fileSearch({ vectorStoreIds: ['vs_1234'] }), // requires a configured vector store
},
});
for (const part of result.content) {
if (part.type === 'text') {
const providerMetadata = part.providerMetadata as
| OpenaiResponsesTextProviderMetadata
| undefined;
if (!providerMetadata) continue;
const { itemId: _itemId, annotations } = providerMetadata.openai;
if (!annotations) continue;
for (const annotation of annotations) {
switch (annotation.type) {
case 'url_citation':
// url_citation is returned from web_search and provides:
// properties: type, url, title, start_index and end_index
break;
case 'file_citation':
// file_citation is returned from file_search and provides:
// properties: type, file_id, filename and index
break;
case 'container_file_citation':
// container_file_citation is returned from code_interpreter and provides:
// properties: type, container_id, file_id, filename, start_index and end_index
break;
case 'file_path':
// file_path provides:
// properties: type, file_id and index
break;
default: {
const _exhaustiveCheck: never = annotation;
throw new Error(
`Unhandled annotation: ${JSON.stringify(_exhaustiveCheck)}`,
);
}
}
}
}
}
When using the OpenAI Responses API, reasoning output parts can include provider metadata.
To handle this metadata in a type-safe way, use OpenaiResponsesReasoningProviderMetadata.
For reasoning parts, when part.type === 'reasoning', the providerMetadata is provided in the form of OpenaiResponsesReasoningProviderMetadata.
This metadata includes the following fields:
itemIdreasoningEncryptedContent (optional)include: ['reasoning.encrypted_content']).import {
openai,
type OpenaiResponsesReasoningProviderMetadata,
type OpenAILanguageModelResponsesOptions,
} from '@ai-sdk/openai';
import { generateText } from 'ai';
const result = await generateText({
model: openai('gpt-5'),
prompt: 'How many "r"s are in the word "strawberry"?',
providerOptions: {
openai: {
store: false,
include: ['reasoning.encrypted_content'],
} satisfies OpenAILanguageModelResponsesOptions,
},
});
for (const part of result.content) {
if (part.type === 'reasoning') {
const providerMetadata = part.providerMetadata as
| OpenaiResponsesReasoningProviderMetadata
| undefined;
const { itemId, reasoningEncryptedContent } =
providerMetadata?.openai ?? {};
console.log(itemId, reasoningEncryptedContent);
}
}
For source document parts, when part.type === 'source' and sourceType === 'document', the providerMetadata is provided as OpenaiResponsesSourceDocumentProviderMetadata.
This metadata is also a discriminated union with a required type field. Supported types include:
file_citationcontainer_file_citationfile_pathEach type includes the identifiers required to work with the referenced resource, such as fileId and containerId.
import {
openai,
type OpenaiResponsesSourceDocumentProviderMetadata,
} from '@ai-sdk/openai';
import { generateText } from 'ai';
const result = await generateText({
model: openai('gpt-4.1-mini'),
prompt:
'Create a program that generates five random numbers between 1 and 100 with two decimal places, and show me the execution results. Also save the result to a file.',
tools: {
code_interpreter: openai.tools.codeInterpreter(),
web_search: openai.tools.webSearch(),
file_search: openai.tools.fileSearch({ vectorStoreIds: ['vs_1234'] }), // requires a configured vector store
},
});
for (const part of result.content) {
if (part.type === 'source') {
if (part.sourceType === 'document') {
const providerMetadata = part.providerMetadata as
| OpenaiResponsesSourceDocumentProviderMetadata
| undefined;
if (!providerMetadata) continue;
const annotation = providerMetadata.openai;
switch (annotation.type) {
case 'file_citation':
// file_citation is returned from file_search and provides:
// properties: type, fileId and index
// The filename can be accessed via part.filename.
break;
case 'container_file_citation':
// container_file_citation is returned from code_interpreter and provides:
// properties: type, containerId and fileId
// The filename can be accessed via part.filename.
break;
case 'file_path':
// file_path provides:
// properties: type, fileId and index
break;
default: {
const _exhaustiveCheck: never = annotation;
throw new Error(
`Unhandled annotation: ${JSON.stringify(_exhaustiveCheck)}`,
);
}
}
}
}
}
The OpenAI Responses API supports server-side context compaction. When enabled, the server automatically compresses conversation context when token usage crosses a configured threshold. This is useful for long-running conversations or agent loops where you want to stay within token limits without manually managing context.
The compaction item returned by the server is opaque and encrypted — it carries forward key prior state and reasoning into the next turn using fewer tokens. The AI SDK handles this automatically: compaction items are returned as text parts with special providerMetadata, and when passed back in subsequent requests they are sent as compaction input items.
import {
openai,
type OpenAILanguageModelResponsesOptions,
} from '@ai-sdk/openai';
import { generateText } from 'ai';
const result = await generateText({
model: openai.responses('gpt-5.2'),
messages: conversationHistory,
providerOptions: {
openai: {
store: false,
contextManagement: [{ type: 'compaction', compactThreshold: 50000 }],
} satisfies OpenAILanguageModelResponsesOptions,
},
});
Configuration:
'compaction'When using streamText, you can detect compaction by checking the providerMetadata on text-start and text-end events:
import {
openai,
type OpenAILanguageModelResponsesOptions,
} from '@ai-sdk/openai';
import { streamText } from 'ai';
const result = streamText({
model: openai.responses('gpt-5.2'),
messages: conversationHistory,
providerOptions: {
openai: {
store: false,
contextManagement: [{ type: 'compaction', compactThreshold: 50000 }],
} satisfies OpenAILanguageModelResponsesOptions,
},
});
for await (const part of result.fullStream) {
switch (part.type) {
case 'text-start': {
const isCompaction = part.providerMetadata?.openai?.type === 'compaction';
if (isCompaction) {
// ... your logic
}
break;
}
case 'text-end': {
const isCompaction = part.providerMetadata?.openai?.type === 'compaction';
if (isCompaction) {
// ... your logic
}
break;
}
case 'text-delta': {
process.stdout.write(part.text);
break;
}
}
}
When using useChat or other UI hooks, compaction items appear as text parts with providerMetadata. You can detect and style them differently in your UI:
{
message.parts.map((part, index) => {
if (part.type === 'text') {
const isCompaction =
(part.providerMetadata?.openai as { type?: string } | undefined)
?.type === 'compaction';
if (isCompaction) {
return (
<div
key={index}
className="bg-yellow-100 border-l-4 border-yellow-500 p-2"
>
<span className="font-bold">[Context Compacted]</span>
<p className="text-sm text-yellow-700">
The server compressed the conversation context to reduce token
usage.
</p>
</div>
);
}
return <div key={index}>{part.text}</div>;
}
});
}
The metadata includes the following fields:
'compaction'You can create models that call the OpenAI chat API using the .chat() factory method.
The first argument is the model id, e.g. gpt-4.
The OpenAI chat models support tool calls and some have multi-modal capabilities.
const model = openai.chat('gpt-5');
OpenAI chat models support also some model specific provider options that are not part of the standard call settings.
You can pass them in the providerOptions argument:
import { openai, type OpenAILanguageModelChatOptions } from '@ai-sdk/openai';
const model = openai.chat('gpt-5');
await generateText({
model,
providerOptions: {
openai: {
logitBias: {
// optional likelihood for specific tokens
'50256': -100,
},
user: 'test-user', // optional unique user identifier
} satisfies OpenAILanguageModelChatOptions,
},
});
The following optional provider options are available for OpenAI chat models:
logitBias Record<number, number>
Modifies the likelihood of specified tokens appearing in the completion.
Accepts a JSON object that maps tokens (specified by their token ID in the GPT tokenizer) to an associated bias value from -100 to 100. You can use this tokenizer tool to convert text to token IDs. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.
As an example, you can pass {"50256": -100} to prevent the token from being generated.
logprobs boolean | number
Return the log probabilities of the tokens. Including logprobs will increase the response size and can slow down response times. However, it can be useful to better understand how the model is behaving.
Setting to true will return the log probabilities of the tokens that were generated.
Setting to a number will return the log probabilities of the top n tokens that were generated.
parallelToolCalls boolean
Whether to enable parallel function calling during tool use. Defaults to true.
user string
A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse. Learn more.
reasoningEffort 'minimal' | 'low' | 'medium' | 'high' | 'xhigh'
Reasoning effort for reasoning models. Defaults to medium. If you use
providerOptions to set the reasoningEffort option, this
model setting will be ignored.
maxCompletionTokens number
Maximum number of completion tokens to generate. Useful for reasoning models.
store boolean
Whether to enable persistence in Responses API.
metadata Record<string, string>
Metadata to associate with the request.
prediction Record<string, any>
Parameters for prediction mode.
serviceTier 'auto' | 'flex' | 'priority' | 'default'
Service tier for the request. Set to 'flex' for 50% cheaper processing at the cost of increased latency (available for o3, o4-mini, and gpt-5 models). Set to 'priority' for faster processing with Enterprise access (available for gpt-4, gpt-5, gpt-5-mini, o3, o4-mini; gpt-5-nano is not supported).
Defaults to 'auto'.
strictJsonSchema boolean
Whether to use strict JSON schema validation.
Defaults to true.
textVerbosity 'low' | 'medium' | 'high'
Controls the verbosity of the model's responses. Lower values will result in more concise responses, while higher values will result in more verbose responses.
promptCacheKey string
A cache key for manual prompt caching control. Used by OpenAI to cache responses for similar requests to optimize your cache hit rates.
promptCacheRetention 'in_memory' | '24h'
The retention policy for the prompt cache. Set to '24h' to enable extended prompt caching, which keeps cached prefixes active for up to 24 hours. Defaults to 'in_memory' for standard prompt caching. Note: '24h' is currently only available for the 5.1 series of models.
safetyIdentifier string
A stable identifier used to help detect users of your application that may be violating OpenAI's usage policies. The IDs should be a string that uniquely identifies each user.
systemMessageMode 'system' | 'developer' | 'remove'
Override the system message mode for this model. If not specified, the mode is automatically determined based on the model. system uses the 'system' role for system messages (default for most models); developer uses the 'developer' role (used by reasoning models); remove removes system messages entirely.
forceReasoning boolean
Force treating this model as a reasoning model. This is useful for "stealth" reasoning models (e.g. via a custom baseURL) where the model ID is not recognized by the SDK's allowlist. When enabled, the SDK applies reasoning-model parameter compatibility rules and defaults systemMessageMode to developer unless overridden.
OpenAI has introduced the o1,o3, and o4 series of reasoning models.
Currently, o4-mini, o3, o3-mini, and o1 are available via both the chat and responses APIs. The
model gpt-5.1-codex-mini is available only via the responses API.
Reasoning models currently only generate text, have several limitations, and are only supported using generateText and streamText.
They support additional settings and response metadata:
You can use providerOptions to set
reasoningEffort option (or alternatively the reasoningEffort model setting), which determines the amount of reasoning the model performs.You can use response providerMetadata to access the number of reasoning tokens that the model generated.
import { openai, type OpenAILanguageModelChatOptions } from '@ai-sdk/openai';
import { generateText } from 'ai';
const { text, usage, providerMetadata } = await generateText({
model: openai.chat('gpt-5'),
prompt: 'Invent a new holiday and describe its traditions.',
providerOptions: {
openai: {
reasoningEffort: 'low',
} satisfies OpenAILanguageModelChatOptions,
},
});
console.log(text);
console.log('Usage:', {
...usage,
reasoningTokens: providerMetadata?.openai?.reasoningTokens,
});
You can control how system messages are handled by providerOptions systemMessageMode:
developer: treat the prompt as a developer message (default for reasoning models).system: keep the system message as a system-level instruction.remove: remove the system message from the messages.import { openai, type OpenAILanguageModelChatOptions } from '@ai-sdk/openai';
import { generateText } from 'ai';
const result = await generateText({
model: openai.chat('gpt-5'),
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Tell me a joke.' },
],
providerOptions: {
openai: {
systemMessageMode: 'system',
} satisfies OpenAILanguageModelChatOptions,
},
});
Strict structured outputs are enabled by default.
You can disable them by setting the strictJsonSchema option to false.
import { openai, OpenAILanguageModelChatOptions } from '@ai-sdk/openai';
import { generateText, Output } from 'ai';
import { z } from 'zod';
const result = await generateText({
model: openai.chat('gpt-4o-2024-08-06'),
providerOptions: {
openai: {
strictJsonSchema: false,
} satisfies OpenAILanguageModelChatOptions,
},
output: Output.object({
schema: z.object({
name: z.string(),
ingredients: z.array(
z.object({
name: z.string(),
amount: z.string(),
}),
),
steps: z.array(z.string()),
}),
schemaName: 'recipe',
schemaDescription: 'A recipe for lasagna.',
}),
prompt: 'Generate a lasagna recipe.',
});
console.log(JSON.stringify(result.output, null, 2));
For example, optional schema properties are not supported.
You need to change Zod .nullish() and .optional() to .nullable().
OpenAI provides logprobs information for completion/chat models.
You can access it in the providerMetadata object.
import { openai, type OpenAILanguageModelChatOptions } from '@ai-sdk/openai';
import { generateText } from 'ai';
const result = await generateText({
model: openai.chat('gpt-5'),
prompt: 'Write a vegetarian lasagna recipe for 4 people.',
providerOptions: {
openai: {
// this can also be a number,
// refer to logprobs provider options section for more
logprobs: true,
} satisfies OpenAILanguageModelChatOptions,
},
});
const openaiMetadata = (await result.providerMetadata)?.openai;
const logprobs = openaiMetadata?.logprobs;
The OpenAI Chat API supports Image inputs for appropriate models. You can pass Image files as part of the message content using the 'image' type:
const result = await generateText({
model: openai.chat('gpt-5'),
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: 'Please describe the image.',
},
{
type: 'image',
image: readFileSync('./data/image.png'),
},
],
},
],
});
The model will have access to the image and will respond to questions about it.
The image should be passed using the image field.
You can also pass the URL of an image.
{
type: 'image',
image: 'https://sample.edu/image.png',
}
The OpenAI Chat API supports reading PDF files.
You can pass PDF files as part of the message content using the file type:
const result = await generateText({
model: openai.chat('gpt-5'),
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: 'What is an embedding model?',
},
{
type: 'file',
data: readFileSync('./data/ai.pdf'),
mediaType: 'application/pdf',
filename: 'ai.pdf', // optional
},
],
},
],
});
The model will have access to the contents of the PDF file and
respond to questions about it.
The PDF file should be passed using the data field,
and the mediaType should be set to 'application/pdf'.
You can also pass a file-id from the OpenAI Files API.
{
type: 'file',
data: 'file-8EFBcWHsQxZV7YGezBC1fq',
mediaType: 'application/pdf',
}
You can also pass the URL of a PDF.
{
type: 'file',
data: 'https://sample.edu/example.pdf',
mediaType: 'application/pdf',
filename: 'ai.pdf', // optional
}
OpenAI supports predicted outputs for gpt-4o and gpt-4o-mini.
Predicted outputs help you reduce latency by allowing you to specify a base text that the model should modify.
You can enable predicted outputs by adding the prediction option to the providerOptions.openai object:
const result = streamText({
model: openai.chat('gpt-5'),
messages: [
{
role: 'user',
content: 'Replace the Username property with an Email property.',
},
{
role: 'user',
content: existingCode,
},
],
providerOptions: {
openai: {
prediction: {
type: 'content',
content: existingCode,
},
} satisfies OpenAILanguageModelChatOptions,
},
});
OpenAI provides usage information for predicted outputs (acceptedPredictionTokens and rejectedPredictionTokens).
You can access it in the providerMetadata object.
const openaiMetadata = (await result.providerMetadata)?.openai;
const acceptedPredictionTokens = openaiMetadata?.acceptedPredictionTokens;
const rejectedPredictionTokens = openaiMetadata?.rejectedPredictionTokens;
You can use the openai provider option to set the image input detail to high, low, or auto:
const result = await generateText({
model: openai.chat('gpt-5'),
messages: [
{
role: 'user',
content: [
{ type: 'text', text: 'Describe the image in detail.' },
{
type: 'image',
image:
'https://github.com/vercel/ai/blob/main/examples/ai-functions/data/comic-cat.png?raw=true',
// OpenAI specific options - image detail:
providerOptions: {
openai: { imageDetail: 'low' },
},
},
],
},
],
});
OpenAI supports model distillation for some models.
If you want to store a generation for use in the distillation process, you can add the store option to the providerOptions.openai object.
This will save the generation to the OpenAI platform for later use in distillation.
import { openai, type OpenAILanguageModelChatOptions } from '@ai-sdk/openai';
import { generateText } from 'ai';
import 'dotenv/config';
async function main() {
const { text, usage } = await generateText({
model: openai.chat('gpt-4o-mini'),
prompt: 'Who worked on the original macintosh?',
providerOptions: {
openai: {
store: true,
metadata: {
custom: 'value',
},
} satisfies OpenAILanguageModelChatOptions,
},
});
console.log(text);
console.log();
console.log('Usage:', usage);
}
main().catch(console.error);
OpenAI has introduced Prompt Caching for supported models
including gpt-4o and gpt-4o-mini.
providerMetadata to access the number of prompt tokens that were a cache hit.import { openai } from '@ai-sdk/openai';
import { generateText } from 'ai';
const { text, usage, providerMetadata } = await generateText({
model: openai.chat('gpt-4o-mini'),
prompt: `A 1024-token or longer prompt...`,
});
console.log(`usage:`, {
...usage,
cachedPromptTokens: providerMetadata?.openai?.cachedPromptTokens,
});
To improve cache hit rates, you can manually control caching using the promptCacheKey option:
import { openai, type OpenAILanguageModelChatOptions } from '@ai-sdk/openai';
import { generateText } from 'ai';
const { text, usage, providerMetadata } = await generateText({
model: openai.chat('gpt-5'),
prompt: `A 1024-token or longer prompt...`,
providerOptions: {
openai: {
promptCacheKey: 'my-custom-cache-key-123',
} satisfies OpenAILanguageModelChatOptions,
},
});
console.log(`usage:`, {
...usage,
cachedPromptTokens: providerMetadata?.openai?.cachedPromptTokens,
});
For GPT-5.1 models, you can enable extended prompt caching that keeps cached prefixes active for up to 24 hours:
import { openai, type OpenAILanguageModelChatOptions } from '@ai-sdk/openai';
import { generateText } from 'ai';
const { text, usage, providerMetadata } = await generateText({
model: openai.chat('gpt-5.1'),
prompt: `A 1024-token or longer prompt...`,
providerOptions: {
openai: {
promptCacheKey: 'my-custom-cache-key-123',
promptCacheRetention: '24h', // Extended caching for GPT-5.1
} satisfies OpenAILanguageModelChatOptions,
},
});
console.log(`usage:`, {
...usage,
cachedPromptTokens: providerMetadata?.openai?.cachedPromptTokens,
});
With the gpt-4o-audio-preview model, you can pass audio files to the model.
import { openai } from '@ai-sdk/openai';
import { generateText } from 'ai';
const result = await generateText({
model: openai.chat('gpt-4o-audio-preview'),
messages: [
{
role: 'user',
content: [
{ type: 'text', text: 'What is the audio saying?' },
{
type: 'file',
mediaType: 'audio/mpeg',
data: readFileSync('./data/galileo.mp3'),
},
],
},
],
});
You can create models that call the OpenAI completions API using the .completion() factory method.
The first argument is the model id.
Currently only gpt-3.5-turbo-instruct is supported.
const model = openai.completion('gpt-3.5-turbo-instruct');
OpenAI completion models support also some model specific settings that are not part of the standard call settings. You can pass them as an options argument:
const model = openai.completion('gpt-3.5-turbo-instruct');
await model.doGenerate({
providerOptions: {
openai: {
echo: true, // optional, echo the prompt in addition to the completion
logitBias: {
// optional likelihood for specific tokens
'50256': -100,
},
suffix: 'some text', // optional suffix that comes after a completion of inserted text
user: 'test-user', // optional unique user identifier
} satisfies OpenAILanguageModelCompletionOptions,
},
});
The following optional provider options are available for OpenAI completion models:
echo: boolean
Echo back the prompt in addition to the completion.
logitBias Record<number, number>
Modifies the likelihood of specified tokens appearing in the completion.
Accepts a JSON object that maps tokens (specified by their token ID in the GPT tokenizer) to an associated bias value from -100 to 100. You can use this tokenizer tool to convert text to token IDs. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.
As an example, you can pass {"50256": -100} to prevent the <|endoftext|>
token from being generated.
logprobs boolean | number
Return the log probabilities of the tokens. Including logprobs will increase the response size and can slow down response times. However, it can be useful to better understand how the model is behaving.
Setting to true will return the log probabilities of the tokens that were generated.
Setting to a number will return the log probabilities of the top n tokens that were generated.
suffix string
The suffix that comes after a completion of inserted text.
user string
A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse. Learn more.
| Model | Image Input | Audio Input | Object Generation | Tool Usage |
|---|---|---|---|---|
gpt-5.4-pro | <Check size={18} /> | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> |
gpt-5.4 | <Check size={18} /> | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> |
gpt-5.3-chat-latest | <Check size={18} /> | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> |
gpt-5.2-pro | <Check size={18} /> | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> |
gpt-5.2-chat-latest | <Check size={18} /> | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> |
gpt-5.2 | <Check size={18} /> | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> |
gpt-5.1-codex-mini | <Check size={18} /> | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> |
gpt-5.1-codex | <Check size={18} /> | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> |
gpt-5.1-chat-latest | <Check size={18} /> | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> |
gpt-5.1 | <Check size={18} /> | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> |
gpt-5-pro | <Check size={18} /> | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> |
gpt-5 | <Check size={18} /> | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> |
gpt-5-mini | <Check size={18} /> | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> |
gpt-5-nano | <Check size={18} /> | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> |
gpt-5-codex | <Check size={18} /> | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> |
gpt-5-chat-latest | <Check size={18} /> | <Cross size={18} /> | <Cross size={18} /> | <Cross size={18} /> |
gpt-4.1 | <Check size={18} /> | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> |
gpt-4.1-mini | <Check size={18} /> | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> |
gpt-4.1-nano | <Check size={18} /> | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> |
gpt-4o | <Check size={18} /> | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> |
gpt-4o-mini | <Check size={18} /> | <Cross size={18} /> | <Check size={18} /> | <Check size={18} /> |
You can create models that call the OpenAI embeddings API
using the .embedding() factory method.
const model = openai.embedding('text-embedding-3-large');
OpenAI embedding models support several additional provider options. You can pass them as an options argument:
import { openai, type OpenAIEmbeddingModelOptions } from '@ai-sdk/openai';
import { embed } from 'ai';
const { embedding } = await embed({
model: openai.embedding('text-embedding-3-large'),
value: 'sunny day at the beach',
providerOptions: {
openai: {
dimensions: 512, // optional, number of dimensions for the embedding
user: 'test-user', // optional unique user identifier
} satisfies OpenAIEmbeddingModelOptions,
},
});
The following optional provider options are available for OpenAI embedding models:
dimensions: number
The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.
user string
A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse. Learn more.
| Model | Default Dimensions | Custom Dimensions |
|---|---|---|
text-embedding-3-large | 3072 | <Check size={18} /> |
text-embedding-3-small | 1536 | <Check size={18} /> |
text-embedding-ada-002 | 1536 | <Cross size={18} /> |
You can create models that call the OpenAI image generation API
using the .image() factory method.
const model = openai.image('dall-e-3');
OpenAI's gpt-image-1 model supports powerful image editing capabilities. Pass input images via prompt.images to transform, combine, or edit existing images.
Transform an existing image using text prompts:
const imageBuffer = readFileSync('./input-image.png');
const { images } = await generateImage({
model: openai.image('gpt-image-1'),
prompt: {
text: 'Turn the cat into a dog but retain the style of the original image',
images: [imageBuffer],
},
});
Edit specific parts of an image using a mask. Transparent areas in the mask indicate where the image should be edited:
const image = readFileSync('./input-image.png');
const mask = readFileSync('./mask.png'); // Transparent areas = edit regions
const { images } = await generateImage({
model: openai.image('gpt-image-1'),
prompt: {
text: 'A sunlit indoor lounge area with a pool containing a flamingo',
images: [image],
mask: mask,
},
});
Remove the background from an image by setting background to transparent:
const imageBuffer = readFileSync('./input-image.png');
const { images } = await generateImage({
model: openai.image('gpt-image-1'),
prompt: {
text: 'do not change anything',
images: [imageBuffer],
},
providerOptions: {
openai: {
background: 'transparent',
output_format: 'png',
},
},
});
Combine multiple reference images into a single output. gpt-image-1 supports up to 16 input images:
const cat = readFileSync('./cat.png');
const dog = readFileSync('./dog.png');
const owl = readFileSync('./owl.png');
const bear = readFileSync('./bear.png');
const { images } = await generateImage({
model: openai.image('gpt-image-1'),
prompt: {
text: 'Combine these animals into a group photo, retaining the original style',
images: [cat, dog, owl, bear],
},
});
| Model | Sizes |
|---|---|
gpt-image-1.5 | 1024x1024, 1536x1024, 1024x1536 |
gpt-image-1-mini | 1024x1024, 1536x1024, 1024x1536 |
gpt-image-1 | 1024x1024, 1536x1024, 1024x1536 |
dall-e-3 | 1024x1024, 1792x1024, 1024x1792 |
dall-e-2 | 256x256, 512x512, 1024x1024 |
You can pass optional providerOptions to the image model. These are prone to change by OpenAI and are model dependent. For example, the gpt-image-1 model supports the quality option:
const { image, providerMetadata } = await generateImage({
model: openai.image('gpt-image-1.5'),
prompt: 'A salamander at sunrise in a forest pond in the Seychelles.',
providerOptions: {
openai: { quality: 'high' },
},
});
For more on generateImage() see Image Generation.
OpenAI's image models return additional metadata in the response that can be
accessed via providerMetadata.openai. The following OpenAI-specific metadata
is available:
images Array<object>
Array of image-specific metadata. Each image object may contain:
revisedPrompt string - The revised prompt that was actually used to generate the image (OpenAI may modify your prompt for safety or clarity)created number - The Unix timestamp (in seconds) of when the image was createdsize string - The size of the generated image. One of 1024x1024, 1024x1536, or 1536x1024quality string - The quality of the generated image. One of low, medium, or highbackground string - The background parameter used for the image generation. Either transparent or opaqueoutputFormat string - The output format of the generated image. One of png, webp, or jpegFor more information on the available OpenAI image model options, see the OpenAI API reference.
You can create models that call the OpenAI transcription API
using the .transcription() factory method.
The first argument is the model id e.g. whisper-1.
const model = openai.transcription('whisper-1');
You can also pass additional provider-specific options using the providerOptions argument. For example, supplying the input language in ISO-639-1 (e.g. en) format will improve accuracy and latency.
import { experimental_transcribe as transcribe } from 'ai';
import { openai, type OpenAITranscriptionModelOptions } from '@ai-sdk/openai';
const result = await transcribe({
model: openai.transcription('whisper-1'),
audio: new Uint8Array([1, 2, 3, 4]),
providerOptions: {
openai: { language: 'en' } satisfies OpenAITranscriptionModelOptions,
},
});
To get word-level timestamps, specify the granularity:
import { experimental_transcribe as transcribe } from 'ai';
import { openai, type OpenAITranscriptionModelOptions } from '@ai-sdk/openai';
const result = await transcribe({
model: openai.transcription('whisper-1'),
audio: new Uint8Array([1, 2, 3, 4]),
providerOptions: {
openai: {
//timestampGranularities: ['word'],
timestampGranularities: ['segment'],
} satisfies OpenAITranscriptionModelOptions,
},
});
// Access word-level timestamps
console.log(result.segments); // Array of segments with startSecond/endSecond
The following provider options are available:
timestampGranularities string[]
The granularity of the timestamps in the transcription.
Defaults to ['segment'].
Possible values are ['word'], ['segment'], and ['word', 'segment'].
Note: There is no additional latency for segment timestamps, but generating word timestamps incurs additional latency.
language string The language of the input audio. Supplying the input language in ISO-639-1 format (e.g. 'en') will improve accuracy and latency. Optional.
prompt string An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language. Optional.
temperature number The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit. Defaults to 0. Optional.
include string[] Additional information to include in the transcription response.
| Model | Transcription | Duration | Segments | Language |
|---|---|---|---|---|
whisper-1 | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
gpt-4o-mini-transcribe | <Check size={18} /> | <Cross size={18} /> | <Cross size={18} /> | <Cross size={18} /> |
gpt-4o-transcribe | <Check size={18} /> | <Cross size={18} /> | <Cross size={18} /> | <Cross size={18} /> |
You can create models that call the OpenAI speech API
using the .speech() factory method.
The first argument is the model id e.g. tts-1.
const model = openai.speech('tts-1');
The voice argument can be set to one of OpenAI's available voices: alloy, ash, coral, echo, fable, onyx, nova, sage, or shimmer.
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
voice: 'alloy', // OpenAI voice ID
});
You can also pass additional provider-specific options using the providerOptions argument:
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai, type OpenAISpeechModelOptions } from '@ai-sdk/openai';
const result = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
voice: 'alloy',
providerOptions: {
openai: {
speed: 1.2,
} satisfies OpenAISpeechModelOptions,
},
});
instructions string
Control the voice of your generated audio with additional instructions e.g. "Speak in a slow and steady tone".
Does not work with tts-1 or tts-1-hd.
Optional.
speed number The speed of the generated audio. Select a value from 0.25 to 4.0. Defaults to 1.0. Optional.
| Model | Instructions |
|---|---|
tts-1 | <Check size={18} /> |
tts-1-hd | <Check size={18} /> |
gpt-4o-mini-tts | <Check size={18} /> |