packages/ai/README.md
Unified LLM API with provider collections, automatic auth resolution, token and cost tracking, and simple context persistence and hand-off to other models mid-session.
Note: This library only includes models that support tool calling (function calling), as this is essential for agentic workflows.
cn/ams/sgp regions)npm install @earendil-works/pi-ai
TypeBox exports are re-exported from @earendil-works/pi-ai: Type, Static, and TSchema.
You build a Models collection of providers and stream through it. The quickest start registers every built-in provider; apps that care about bundle size register individual providers instead (see Provider Factories and Bundling and Tree Shaking).
import { Type, type Context, type Tool } from '@earendil-works/pi-ai';
import { builtinModels } from '@earendil-works/pi-ai/providers/all';
// A Models collection with every built-in provider registered
const models = builtinModels();
// Sync lookup against the collection
const model = models.getModel('openai', 'gpt-4o-mini')!;
// Define tools with TypeBox schemas for type safety and validation
const tools: Tool[] = [{
name: 'get_time',
description: 'Get the current time',
parameters: Type.Object({
timezone: Type.Optional(Type.String({ description: 'Optional timezone (e.g., America/New_York)' }))
})
}];
// Build a conversation context (easily serializable and transferable between models)
const context: Context = {
systemPrompt: 'You are a helpful assistant.',
messages: [{ role: 'user', content: 'What time is it?', timestamp: Date.now() }],
tools
};
// Option 1: Streaming with all event types.
// Auth resolves through the provider (OPENAI_API_KEY from the environment here).
const s = models.stream(model, context);
for await (const event of s) {
switch (event.type) {
case 'start':
console.log(`Starting with ${event.partial.model}`);
break;
case 'text_start':
console.log('\n[Text started]');
break;
case 'text_delta':
process.stdout.write(event.delta);
break;
case 'text_end':
console.log('\n[Text ended]');
break;
case 'thinking_start':
console.log('[Model is thinking...]');
break;
case 'thinking_delta':
process.stdout.write(event.delta);
break;
case 'thinking_end':
console.log('[Thinking complete]');
break;
case 'toolcall_start':
console.log(`\n[Tool call started: index ${event.contentIndex}]`);
break;
case 'toolcall_delta':
// Partial tool arguments are being streamed
const partialCall = event.partial.content[event.contentIndex];
if (partialCall.type === 'toolCall') {
console.log(`[Streaming args for ${partialCall.name}]`);
}
break;
case 'toolcall_end':
console.log(`\nTool called: ${event.toolCall.name}`);
console.log(`Arguments: ${JSON.stringify(event.toolCall.arguments)}`);
break;
case 'done':
console.log(`\nFinished: ${event.reason}`);
break;
case 'error':
console.error(`Error: ${event.error.errorMessage}`);
break;
}
}
// Get the final message after streaming, add it to the context
const finalMessage = await s.result();
context.messages.push(finalMessage);
// Handle tool calls if any
const toolCalls = finalMessage.content.filter(b => b.type === 'toolCall');
for (const call of toolCalls) {
const result = call.name === 'get_time'
? new Date().toLocaleString('en-US', {
timeZone: call.arguments.timezone || 'UTC',
dateStyle: 'full',
timeStyle: 'long'
})
: 'Unknown tool';
// Add tool result to context (supports text and images)
context.messages.push({
role: 'toolResult',
toolCallId: call.id,
toolName: call.name,
content: [{ type: 'text', text: result }],
isError: false,
timestamp: Date.now()
});
}
// Continue if there were tool calls
if (toolCalls.length > 0) {
const continuation = await models.complete(model, context);
context.messages.push(continuation);
console.log('After tool execution:', continuation.content);
}
console.log(`Total tokens: ${finalMessage.usage.input} in, ${finalMessage.usage.output} out`);
console.log(`Cost: $${finalMessage.usage.cost.total.toFixed(4)}`);
// Option 2: Get complete response without streaming
const response = await models.complete(model, context);
for (const block of response.content) {
if (block.type === 'text') {
console.log(block.text);
} else if (block.type === 'toolCall') {
console.log(`Tool: ${block.name}(${JSON.stringify(block.arguments)})`);
}
}
Snippets in the rest of this README assume a models collection set up like this (with the relevant providers registered).
A provider is the runtime unit: it owns its model catalog, its auth (API key resolution, OAuth flows), and its stream behavior. A Models collection holds providers and routes every request to the provider that owns the model.
Providers internally share API implementations (the wire protocols): Anthropic models use anthropic-messages, OpenAI uses openai-responses, while xAI, Groq, Cerebras, OpenRouter, and most others share openai-completions. Mixed-API providers (GitHub Copilot, OpenCode Zen) dispatch per model.
For apps that only need specific providers, there is one factory per built-in provider, each a subpath import that pulls only that provider's catalog:
import { anthropicProvider } from '@earendil-works/pi-ai/providers/anthropic';
import { openaiProvider } from '@earendil-works/pi-ai/providers/openai';
import { openrouterProvider } from '@earendil-works/pi-ai/providers/openrouter';
import { amazonBedrockProvider } from '@earendil-works/pi-ai/providers/amazon-bedrock';
// ...one module per provider in the Supported Providers list
const models = createModels();
models.setProvider(anthropicProvider());
models.setProvider(openrouterProvider());
Provider factories import their model catalog and a lazy API wrapper. They do not import other providers. With bundler code splitting, SDK implementations (@anthropic-ai/sdk, openai, @google/genai, etc.) stay in lazy chunks loaded on the first request to a model of that API.
For apps that want everything (as in Quick Start):
import { builtinModels } from '@earendil-works/pi-ai/providers/all';
const models = builtinModels(); // a Models collection with every built-in provider registered
This imports all catalogs and every built-in provider factory. It is the heavy, explicit entrypoint. builtinModels() accepts the same options as createModels() (credentials, authContext); builtinProviders() returns the provider array if you want to register them on your own collection.
Reads are synchronous and return the last-known lists:
const providers = models.getProviders(); // registered Provider objects
const provider = models.getProvider('anthropic'); // one provider
const all = models.getModels(); // every model across providers
const anthropicModels = models.getModels('anthropic');
const model = models.getModel('anthropic', 'claude-sonnet-4-5');
for (const m of anthropicModels) {
console.log(`${m.id}: ${m.name}`);
console.log(` API: ${m.api}`);
console.log(` Context: ${m.contextWindow} tokens`);
console.log(` Vision: ${m.input.includes('image')}`);
console.log(` Reasoning: ${m.reasoning}`);
}
Dynamically listed models are typed Model<Api>. Narrow with the hasApi() guard when you need API-specific option typing:
import { hasApi } from '@earendil-works/pi-ai';
const m = models.getModel('anthropic', 'claude-sonnet-4-5');
if (m && hasApi(m, 'anthropic-messages')) {
// m: Model<'anthropic-messages'> — stream options fully typed
models.stream(m, context, { thinkingEnabled: true, thinkingBudgetTokens: 2048 });
}
For tooling that wants the generated built-in catalog with full literal typing (provider and model IDs auto-complete), independent of any collection:
import { getBuiltinModel, getBuiltinModels, getBuiltinProviders } from '@earendil-works/pi-ai/providers/all';
const model = getBuiltinModel('openai', 'gpt-4o-mini'); // typed Model<'openai-responses'>
const providers = getBuiltinProviders();
const anthropic = getBuiltinModels('anthropic');
Providers may have dynamic model lists (a llama.cpp server, a live OpenRouter listing). Reads stay sync; fetching is an explicit async verb:
// getModels() returns the last-known list (empty before the first refresh)
await models.refresh('llamacpp'); // fetch one provider's list; rejects on failure
await models.refresh(); // refresh all providers concurrently, best-effort
const fresh = models.getModel('llamacpp', 'qwen3-30b');
Static built-in providers are no-ops for refresh(). See createProvider() for building a dynamic provider.
Every provider owns its auth: how API keys resolve (stored credentials, environment variables, ambient sources like AWS profiles or gcloud ADC) and, where supported, OAuth login/refresh flows.
When you call models.stream(), the collection resolves auth through the owning provider and merges it into the request. Explicit per-request values always win:
// Resolved through the provider (env var, stored credential, OAuth token):
await models.complete(model, context);
// Explicit key wins over anything the provider would resolve:
await models.complete(model, context, { apiKey: 'sk-explicit' });
You can inspect resolution without making a request — useful for status UIs:
const auth = await models.getAuth(model);
if (auth) {
console.log(`configured via ${auth.source}`); // e.g. "ANTHROPIC_API_KEY", "OAuth", "stored credential"
} else {
console.log('not configured');
}
getAuth() resolves undefined for unconfigured providers and rejects with ModelsError when something is actually broken ("oauth": token refresh failed, credential preserved for re-login; "auth": key resolution or credential store failure). Request paths surface the same failures as stream errors.
Stored credentials (API keys entered interactively, OAuth tokens) live in a CredentialStore — one type-tagged credential per provider. pi-ai ships an in-memory default; apps inject persistent storage:
import { createModels, type CredentialStore } from '@earendil-works/pi-ai';
const models = createModels({ credentials: myFileBackedStore });
// builtinModels() takes the same options:
// const models = builtinModels({ credentials: myFileBackedStore });
The contract is small: read(providerId), modify(providerId, fn) (the only write path — a serialized read-modify-write), and delete(providerId). OAuth token refresh runs inside modify, so concurrent requests and processes cannot double-refresh a rotated token. A stored credential owns its provider: environment variables are only consulted when nothing is stored, and a failed refresh never silently falls back to an env key.
API-key credentials use the same discriminator as pi's auth.json and can carry provider-scoped env/config values:
const credential = {
type: 'api_key',
key: '...',
env: {
CLOUDFLARE_ACCOUNT_ID: 'account-id',
CLOUDFLARE_GATEWAY_ID: 'gateway-id'
}
} as const;
Built-in providers resolve these env vars (Node.js; in browsers pass apiKey explicitly):
| Provider | Environment Variable(s) |
|---|---|
| OpenAI | OPENAI_API_KEY |
| Ant Ling | ANT_LING_API_KEY |
| Azure OpenAI | AZURE_OPENAI_API_KEY + AZURE_OPENAI_BASE_URL (e.g. https://{resource}.openai.azure.com) or AZURE_OPENAI_RESOURCE_NAME. Supports *.openai.azure.com and *.cognitiveservices.azure.com; root endpoints auto-normalize to /openai/v1. Optional: AZURE_OPENAI_API_VERSION (default v1), AZURE_OPENAI_DEPLOYMENT_NAME_MAP. |
| Anthropic | ANTHROPIC_API_KEY or ANTHROPIC_OAUTH_TOKEN |
| DeepSeek | DEEPSEEK_API_KEY |
| NVIDIA NIM | NVIDIA_API_KEY |
GEMINI_API_KEY | |
| Vertex AI | GOOGLE_CLOUD_API_KEY or GOOGLE_CLOUD_PROJECT (or GCLOUD_PROJECT) + GOOGLE_CLOUD_LOCATION + ADC |
| Mistral | MISTRAL_API_KEY |
| Groq | GROQ_API_KEY |
| Cerebras | CEREBRAS_API_KEY |
| Cloudflare AI Gateway | CLOUDFLARE_API_KEY + CLOUDFLARE_ACCOUNT_ID + CLOUDFLARE_GATEWAY_ID |
| Cloudflare Workers AI | CLOUDFLARE_API_KEY + CLOUDFLARE_ACCOUNT_ID |
| xAI | XAI_API_KEY |
| Fireworks | FIREWORKS_API_KEY |
| Together AI | TOGETHER_API_KEY |
| OpenRouter | OPENROUTER_API_KEY |
| Vercel AI Gateway | AI_GATEWAY_API_KEY |
| ZAI Coding Plan (Global) | ZAI_API_KEY |
| ZAI Coding Plan (China) | ZAI_CODING_CN_API_KEY |
| MiniMax (Global) | MINIMAX_API_KEY |
| MiniMax (China) | MINIMAX_CN_API_KEY |
| Moonshot AI / Moonshot AI (China) | MOONSHOT_API_KEY |
| Hugging Face | HF_TOKEN |
| OpenCode Zen / OpenCode Go | OPENCODE_API_KEY |
| Kimi For Coding | KIMI_API_KEY |
| Xiaomi MiMo (API billing) | XIAOMI_API_KEY |
| Xiaomi MiMo Token Plan (China) | XIAOMI_TOKEN_PLAN_CN_API_KEY |
| Xiaomi MiMo Token Plan (Amsterdam) | XIAOMI_TOKEN_PLAN_AMS_API_KEY |
| Xiaomi MiMo Token Plan (Singapore) | XIAOMI_TOKEN_PLAN_SGP_API_KEY |
| GitHub Copilot | COPILOT_GITHUB_TOKEN |
Amazon Bedrock resolves ambient AWS credentials (AWS_PROFILE, access key pairs, AWS_BEARER_TOKEN_BEDROCK, ECS task roles, web identity tokens). Vertex AI resolves either an explicit key or gcloud Application Default Credentials plus project/location.
Tools enable LLMs to interact with external systems. This library uses TypeBox schemas for type-safe tool definitions with automatic validation using TypeBox's built-in validator and value conversion utilities. TypeBox schemas can be serialized and deserialized as plain JSON, making them ideal for distributed systems.
import { Type, type Tool, StringEnum } from '@earendil-works/pi-ai';
// Define tool parameters with TypeBox
const weatherTool: Tool = {
name: 'get_weather',
description: 'Get current weather for a location',
parameters: Type.Object({
location: Type.String({ description: 'City name or coordinates' }),
units: StringEnum(['celsius', 'fahrenheit'], { default: 'celsius' })
})
};
// Note: For Google API compatibility, use StringEnum helper instead of Type.Enum
// Type.Enum generates anyOf/const patterns that Google doesn't support
const bookMeetingTool: Tool = {
name: 'book_meeting',
description: 'Schedule a meeting',
parameters: Type.Object({
title: Type.String({ minLength: 1 }),
startTime: Type.String({ format: 'date-time' }),
endTime: Type.String({ format: 'date-time' }),
attendees: Type.Array(Type.String({ format: 'email' }), { minItems: 1 })
})
};
Tool results use content blocks and can include both text and images:
import { readFileSync } from 'fs';
const context: Context = {
messages: [{ role: 'user', content: 'What is the weather in London?', timestamp: Date.now() }],
tools: [weatherTool]
};
const response = await models.complete(model, context);
// Check for tool calls in the response
for (const block of response.content) {
if (block.type === 'toolCall') {
// Execute your tool with the arguments
// See "Validating Tool Arguments" section for validation
const result = await executeWeatherApi(block.arguments);
// Add tool result with text content
context.messages.push({
role: 'toolResult',
toolCallId: block.id,
toolName: block.name,
content: [{ type: 'text', text: JSON.stringify(result) }],
isError: false,
timestamp: Date.now()
});
}
}
// Tool results can also include images (for vision-capable models)
const imageBuffer = readFileSync('chart.png');
context.messages.push({
role: 'toolResult',
toolCallId: 'tool_xyz',
toolName: 'generate_chart',
content: [
{ type: 'text', text: 'Generated chart showing temperature trends' },
{ type: 'image', data: imageBuffer.toString('base64'), mimeType: 'image/png' }
],
isError: false,
timestamp: Date.now()
});
During streaming, tool call arguments are progressively parsed as they arrive. This enables real-time UI updates before the complete arguments are available:
const s = models.stream(model, context);
for await (const event of s) {
if (event.type === 'toolcall_delta') {
const toolCall = event.partial.content[event.contentIndex];
// toolCall.arguments contains partially parsed JSON during streaming
// This allows for progressive UI updates
if (toolCall.type === 'toolCall' && toolCall.arguments) {
// BE DEFENSIVE: arguments may be incomplete
// Example: Show file path being written even before content is complete
if (toolCall.name === 'write_file' && toolCall.arguments.path) {
console.log(`Writing to: ${toolCall.arguments.path}`);
// Content might be partial or missing
if (toolCall.arguments.content) {
console.log(`Content preview: ${toolCall.arguments.content.substring(0, 100)}...`);
}
}
}
}
if (event.type === 'toolcall_end') {
// Here toolCall.arguments is complete (but not yet validated)
const toolCall = event.toolCall;
console.log(`Tool completed: ${toolCall.name}`, toolCall.arguments);
}
}
Important notes about partial tool arguments:
toolcall_delta events, arguments contains the best-effort parse of partial JSONarguments will be an empty object {}, never undefinedtoolcall_delta event with the full arguments.When implementing your own tool execution loop, use validateToolCall to validate arguments before passing them to your tools:
import { validateToolCall, type Tool } from '@earendil-works/pi-ai';
const tools: Tool[] = [weatherTool, calculatorTool];
const s = models.stream(model, { messages, tools });
for await (const event of s) {
if (event.type === 'toolcall_end') {
const toolCall = event.toolCall;
try {
// Validate arguments against the tool's schema (throws on invalid args)
const validatedArgs = validateToolCall(tools, toolCall);
const result = await executeMyTool(toolCall.name, validatedArgs);
// ... add tool result to context
} catch (error) {
// Validation failed - return error as tool result so model can retry
context.messages.push({
role: 'toolResult',
toolCallId: toolCall.id,
toolName: toolCall.name,
content: [{ type: 'text', text: error.message }],
isError: true,
timestamp: Date.now()
});
}
}
}
All streaming events emitted during assistant message generation:
| Event Type | Description | Key Properties |
|---|---|---|
start | Stream begins | partial: Initial assistant message structure |
text_start | Text block starts | contentIndex: Position in content array |
text_delta | Text chunk received | delta: New text, contentIndex: Position |
text_end | Text block complete | content: Full text, contentIndex: Position |
thinking_start | Thinking block starts | contentIndex: Position in content array |
thinking_delta | Thinking chunk received | delta: New text, contentIndex: Position |
thinking_end | Thinking block complete | content: Full thinking, contentIndex: Position |
toolcall_start | Tool call begins | contentIndex: Position in content array |
toolcall_delta | Tool arguments streaming | delta: JSON chunk, partial.content[contentIndex].arguments: Partial parsed args |
toolcall_end | Tool call complete | toolCall: Complete validated tool call with id, name, arguments |
done | Stream complete | reason: Stop reason ("stop", "length", "toolUse"), message: Final assistant message |
error | Error occurred | reason: Error type ("error" or "aborted"), error: AssistantMessage with partial content |
Streaming events for different content blocks are not guaranteed to be contiguous. Providers may emit deltas for text, thinking, and tool calls in the same upstream chunk, and pi may surface corresponding events interleaved, for example text_start, text_delta, toolcall_start, text_delta, toolcall_delta. Consumers must use contentIndex to associate each delta/end event with its block and must not assume that a block's *_start/*_delta/*_end sequence is uninterrupted by events for other blocks.
Models with vision capabilities can process images. You can check if a model supports images via the input property. If you pass images to a non-vision model, they are silently ignored.
import { readFileSync } from 'fs';
const model = models.getModel('openai', 'gpt-4o-mini')!;
// Check if model supports images
if (model.input.includes('image')) {
console.log('Model supports vision');
}
const imageBuffer = readFileSync('image.png');
const base64Image = imageBuffer.toString('base64');
const response = await models.complete(model, {
messages: [{
role: 'user',
content: [
{ type: 'text', text: 'What is in this image?' },
{ type: 'image', data: base64Image, mimeType: 'image/png' }
],
timestamp: Date.now()
}]
});
// Access the response
for (const block of response.content) {
if (block.type === 'text') {
console.log(block.text);
}
}
Image generation uses a separate API surface from text/chat generation, mirroring the chat-side design: an ImagesModels collection holds ImagesProviders, reads are sync, and auth resolves through the owning provider. Image generation is a one-shot API: generateImages() waits for the provider response and returns the final AssistantImages result — do not use the chat/stream APIs for it.
import { builtinImagesModels } from '@earendil-works/pi-ai/providers/all';
// Every built-in image-generation provider; accepts the same options as createModels()
const imagesModels = builtinImagesModels();
const model = imagesModels.getModel('openrouter', 'google/gemini-2.5-flash-image')!;
// Auth resolves through the provider (OPENROUTER_API_KEY here); explicit apiKey wins
const result = await imagesModels.generateImages(model, {
input: [{ type: 'text', text: 'Generate a red circle on a plain white background.' }]
});
for (const block of result.output) {
if (block.type === 'text') {
console.log(block.text);
} else if (block.type === 'image') {
console.log(block.mimeType);
console.log(block.data.substring(0, 32));
}
}
Like the chat side, you can build the collection from parts: createImagesModels({ credentials?, authContext? }), the openrouterImagesProvider() factory from @earendil-works/pi-ai/providers/openrouter-images, and createImagesProvider({ id, auth, models, refreshModels?, api }) for custom image providers (with imagesModels.refresh(provider?) for dynamic lists). Failures never reject — they return an AssistantImages with stopReason: "error". The collection's getAuth(model) works exactly like the chat-side one.
The old global API (getImageModel() / getImageModels() / getImageProviders() / generateImages()) remains available on the compat entrypoint:
import { getImageModel, generateImages } from '@earendil-works/pi-ai/compat';
const model = getImageModel('openrouter', 'google/gemini-2.5-flash-image');
const result = await generateImages(model, {
input: [{ type: 'text', text: 'Generate a red circle on a plain white background.' }]
}, {
apiKey: process.env.OPENROUTER_API_KEY
});
Some models also support image input:
import { readFileSync } from 'fs';
const imageBuffer = readFileSync('input.png');
const result = await imagesModels.generateImages(model, {
input: [
{ type: 'text', text: 'Create a variation of this image with a blue background.' },
{ type: 'image', data: imageBuffer.toString('base64'), mimeType: 'image/png' }
]
});
Check capabilities on the model metadata:
console.log(model.input); // ['text', 'image']
console.log(model.output); // ['image'] or ['image', 'text']
ImagesModels collections, chat models in Models collections; the two are separate surfaces.generateImages(), not the chat/stream APIs.AssistantImages.output and can include both base64-encoded ImageContent blocks and TextContent blocks.model.output.model.input.apiKey, signal, headers, onPayload, and onResponse, and results may include stopReason, responseId, and usage.Many models support thinking/reasoning capabilities where they can show their internal thought process. You can check if a model supports reasoning via the reasoning property. If you pass reasoning options to a non-reasoning model, they are silently ignored.
// Many models across providers support thinking/reasoning
const model = models.getModel('anthropic', 'claude-sonnet-4-5')!;
// or models.getModel('openai', 'gpt-5-mini');
// or models.getModel('google', 'gemini-2.5-flash');
// or models.getModel('xai', 'grok-code-fast-1');
// Check if model supports reasoning
if (model.reasoning) {
console.log('Model supports reasoning/thinking');
}
// Use the simplified reasoning option
const response = await models.completeSimple(model, {
messages: [{ role: 'user', content: 'Solve: 2x + 5 = 13', timestamp: Date.now() }]
}, {
reasoning: 'medium' // 'minimal' | 'low' | 'medium' | 'high' | 'xhigh'
});
// Access thinking and text blocks
for (const block of response.content) {
if (block.type === 'thinking') {
console.log('Thinking:', block.thinking);
} else if (block.type === 'text') {
console.log('Response:', block.text);
}
}
models.stream()/complete() accept the owning API's full option set. Use hasApi() to narrow a dynamically looked-up model to its API for full option typing:
import { hasApi } from '@earendil-works/pi-ai';
// OpenAI Reasoning (o1, o3, gpt-5)
const openaiModel = models.getModel('openai', 'gpt-5-mini')!;
if (hasApi(openaiModel, 'openai-responses')) {
await models.complete(openaiModel, context, {
reasoningEffort: 'medium',
reasoningSummary: 'detailed' // OpenAI Responses API only
});
}
// Anthropic Thinking
const anthropicModel = models.getModel('anthropic', 'claude-sonnet-4-5')!;
if (hasApi(anthropicModel, 'anthropic-messages')) {
await models.complete(anthropicModel, context, {
thinkingEnabled: true,
thinkingBudgetTokens: 8192 // Optional token limit
});
}
// Google Gemini Thinking
const googleModel = models.getModel('google', 'gemini-2.5-flash')!;
if (hasApi(googleModel, 'google-generative-ai')) {
await models.complete(googleModel, context, {
thinking: {
enabled: true,
budgetTokens: 8192 // -1 for dynamic, 0 to disable
}
});
}
When streaming, thinking content is delivered through specific events:
const s = models.streamSimple(model, context, { reasoning: 'high' });
for await (const event of s) {
switch (event.type) {
case 'thinking_start':
console.log('[Model started thinking]');
break;
case 'thinking_delta':
process.stdout.write(event.delta); // Stream thinking content
break;
case 'thinking_end':
console.log('\n[Thinking complete]');
break;
}
}
Every AssistantMessage includes a stopReason field that indicates how the generation ended:
"stop" - Normal completion, the model finished its response"length" - Output hit the maximum token limit"toolUse" - Model is calling tools and expects tool results"error" - An error occurred during generation"aborted" - Request was cancelled via abort signalAssistantMessage may also include responseId, a provider-specific upstream response or message identifier when the underlying API exposes one. Do not assume it is always present across providers.
Request failures never throw out of the stream functions: when a request ends with an error (including aborts and tool call validation errors), the streaming API emits an error event and the final message carries the details:
// In streaming
for await (const event of s) {
if (event.type === 'error') {
// event.reason is either "error" or "aborted"
// event.error is the AssistantMessage with partial content
console.error(`Error (${event.reason}):`, event.error.errorMessage);
console.log('Partial content:', event.error.content);
}
}
// The final message will have the error details
const message = await s.result();
if (message.stopReason === 'error' || message.stopReason === 'aborted') {
console.error('Request failed:', message.errorMessage);
// message.content contains any partial content received before the error
// message.usage contains partial token counts and costs
}
Auth failures (no key configured, OAuth refresh failed, unknown provider) surface the same way: as a stream error with stopReason: "error".
The abort signal allows you to cancel in-progress requests. Aborted requests have stopReason === 'aborted':
const controller = new AbortController();
// Abort after 2 seconds
setTimeout(() => controller.abort(), 2000);
const s = models.stream(model, {
messages: [{ role: 'user', content: 'Write a long story', timestamp: Date.now() }]
}, {
signal: controller.signal
});
for await (const event of s) {
if (event.type === 'text_delta') {
process.stdout.write(event.delta);
} else if (event.type === 'error') {
// event.reason tells you if it was "error" or "aborted"
console.log(`${event.reason === 'aborted' ? 'Aborted' : 'Error'}:`, event.error.errorMessage);
}
}
// Get results (may be partial if aborted)
const response = await s.result();
if (response.stopReason === 'aborted') {
console.log('Request was aborted:', response.errorMessage);
console.log('Partial content received:', response.content);
console.log('Tokens used:', response.usage);
}
Aborted messages can be added to the conversation context and continued in subsequent requests:
const context = {
messages: [
{ role: 'user', content: 'Explain quantum computing in detail', timestamp: Date.now() }
]
};
// First request gets aborted after 2 seconds
const controller1 = new AbortController();
setTimeout(() => controller1.abort(), 2000);
const partial = await models.complete(model, context, { signal: controller1.signal });
// Add the partial response to context
context.messages.push(partial);
context.messages.push({ role: 'user', content: 'Please continue', timestamp: Date.now() });
// Continue the conversation
const continuation = await models.complete(model, context);
Use the onPayload callback to inspect the request payload sent to the provider. This is useful for debugging request formatting issues or provider validation errors.
const response = await models.complete(model, context, {
onPayload: (payload) => {
console.log('Provider payload:', JSON.stringify(payload, null, 2));
}
});
The callback is supported by stream, complete, streamSimple, and completeSimple.
createProvider() builds a provider from parts: identity, auth, a model list, and an API implementation. Use it for local inference servers, proxies, or any OpenAI/Anthropic-compatible endpoint:
import { createModels, createProvider, envApiKeyAuth, type Model } from '@earendil-works/pi-ai';
import { openAICompletionsApi } from '@earendil-works/pi-ai/api/openai-completions.lazy';
const ollamaModel: Model<'openai-completions'> = {
id: 'llama-3.1-8b',
name: 'Llama 3.1 8B (Ollama)',
api: 'openai-completions',
provider: 'ollama',
baseUrl: 'http://localhost:11434/v1',
reasoning: false,
input: ['text'],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 128000,
maxTokens: 32000
};
const ollama = createProvider({
id: 'ollama',
name: 'Ollama',
baseUrl: 'http://localhost:11434/v1',
// Every provider declares auth; keyless local servers resolve as configured with no key.
auth: { apiKey: { name: 'Ollama', resolve: async () => ({ auth: {} }) } },
models: [ollamaModel],
api: openAICompletionsApi(),
});
const models = createModels();
models.setProvider(ollama);
await models.complete(models.getModel('ollama', 'llama-3.1-8b')!, context);
For providers with real keys, envApiKeyAuth(displayName, envVars) gives the standard behavior (stored credential wins, then the first set env var):
const proxy = createProvider({
id: 'my-proxy',
auth: { apiKey: envApiKeyAuth('My proxy API key', ['MY_PROXY_API_KEY']) },
models: [/* ... */],
api: openAICompletionsApi(),
});
Mixed-API providers pass a map keyed by model.api; each model dispatches to its API's implementation:
import { anthropicMessagesApi } from '@earendil-works/pi-ai/api/anthropic-messages.lazy';
import { openAIResponsesApi } from '@earendil-works/pi-ai/api/openai-responses.lazy';
const gateway = createProvider({
id: 'my-gateway',
auth: { apiKey: envApiKeyAuth('Gateway key', ['GATEWAY_API_KEY']) },
models: [/* models with api: 'anthropic-messages' or 'openai-responses' */],
api: {
'anthropic-messages': anthropicMessagesApi(),
'openai-responses': openAIResponsesApi(),
},
});
Dynamic model lists use refreshModels; the provider lists empty until the first models.refresh():
const llamacpp = createProvider({
id: 'llamacpp',
auth: { apiKey: { name: 'llama.cpp', resolve: async () => ({ auth: {} }) } },
models: [],
refreshModels: async () => fetchModelsFromServer('http://localhost:8080'),
api: openAICompletionsApi(),
});
models.setProvider(llamacpp);
await models.refresh('llamacpp');
Custom models can carry headers (e.g. proxies behind bot detection) and compat flags — see OpenAI Compatibility Settings.
Some OpenAI-compatible servers do not understand the developer role used for reasoning-capable models. For those providers, set compat.supportsDeveloperRole to false so the system prompt is sent as a system message instead. If the server also does not support reasoning_effort, set compat.supportsReasoningEffort to false too. This commonly applies to Ollama, vLLM, SGLang, and similar OpenAI-compatible servers.
Use model-level thinkingLevelMap to describe model-specific thinking controls. Keys are pi thinking levels (off, minimal, low, medium, high, xhigh). Missing keys use provider defaults, string values are sent to the provider, and null marks a level unsupported.
const ollamaReasoningModel: Model<'openai-completions'> = {
id: 'gpt-oss:20b',
name: 'GPT-OSS 20B (Ollama)',
api: 'openai-completions',
provider: 'ollama',
baseUrl: 'http://localhost:11434/v1',
reasoning: true,
input: ['text'],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 131072,
maxTokens: 32000,
thinkingLevelMap: {
minimal: null,
low: null,
medium: null,
high: 'high',
xhigh: null,
},
compat: {
supportsDeveloperRole: false,
supportsReasoningEffort: false,
}
};
The API implementations are importable on their own. Each module exports exactly stream and streamSimple with that API's full option typing. Direct calls bypass provider auth — pass apiKey explicitly:
import { stream } from '@earendil-works/pi-ai/api/anthropic-messages';
const s = stream(claudeModel, context, {
apiKey: process.env.ANTHROPIC_API_KEY,
thinkingEnabled: true,
thinkingBudgetTokens: 2048,
});
Built-in API implementations live under ./api/<api-id>:
| API id | Options type |
|---|---|
anthropic-messages | AnthropicOptions |
openai-completions | OpenAICompletionsOptions |
openai-responses | OpenAIResponsesOptions |
openai-codex-responses | OpenAICodexResponsesOptions |
azure-openai-responses | AzureOpenAIResponsesOptions |
google-generative-ai | GoogleOptions |
google-vertex | GoogleVertexOptions |
mistral-conversations | MistralOptions |
bedrock-converse-stream | BedrockOptions |
Importing an implementation module loads its SDK. The ./api/<id>.lazy wrappers (used by the provider factories) defer that load to the first request when the runtime or bundler supports dynamic import chunking. Legacy raw API subpaths from older releases (./anthropic, ./google, ./mistral, ./openai-completions, ...) were removed; use @earendil-works/pi-ai/api/<api-id>.
The openai-completions API is implemented by many providers with minor differences. By default, the library auto-detects compatibility settings based on baseUrl for a small set of known OpenAI-compatible providers (Cerebras, xAI, Chutes, DeepSeek, NVIDIA NIM, Together AI, zAi, OpenCode, Cloudflare Workers AI, etc.). For custom proxies or unknown endpoints, you can override these settings via the compat field. For openai-responses models, the compat field supports Responses-specific flags.
interface OpenAICompletionsCompat {
supportsStore?: boolean; // Whether provider supports the `store` field (default: true)
supportsDeveloperRole?: boolean; // Whether provider supports `developer` role vs `system` (default: true)
supportsReasoningEffort?: boolean; // Whether provider supports `reasoning_effort` (default: true)
supportsUsageInStreaming?: boolean; // Whether provider supports `stream_options: { include_usage: true }` (default: true)
supportsStrictMode?: boolean; // Whether provider supports `strict` in tool definitions (default: true)
sendSessionAffinityHeaders?: boolean; // Whether to send `session_id`, `x-client-request-id`, and `x-session-affinity` from `sessionId` when caching is enabled (default: false)
maxTokensField?: 'max_completion_tokens' | 'max_tokens'; // Which field name to use (default: max_completion_tokens)
requiresToolResultName?: boolean; // Whether tool results require the `name` field (default: false)
requiresAssistantAfterToolResult?: boolean; // Whether tool results must be followed by an assistant message (default: false)
requiresThinkingAsText?: boolean; // Whether thinking blocks must be converted to text (default: false)
requiresReasoningContentOnAssistantMessages?: boolean; // Whether all replayed assistant messages must include empty reasoning_content when reasoning is enabled (default: auto-detected for DeepSeek)
thinkingFormat?: 'openai' | 'openrouter' | 'deepseek' | 'together' | 'zai' | 'qwen' | 'chat-template' | 'qwen-chat-template' | 'string-thinking' | 'ant-ling'; // Format for reasoning param: 'openai' uses reasoning_effort, 'openrouter' uses reasoning: { effort }, 'deepseek' uses thinking: { type } plus reasoning_effort when supported, 'together' uses reasoning: { enabled } plus reasoning_effort when supported, 'zai' uses thinking: { type }, 'qwen' uses enable_thinking, 'chat-template' uses configurable chat_template_kwargs, 'qwen-chat-template' uses chat_template_kwargs.enable_thinking and preserve_thinking, 'string-thinking' uses top-level thinking, 'ant-ling' uses reasoning: { effort } only for mapped efforts (default: openai)
chatTemplateKwargs?: Record<string, string | number | boolean | null | { '$var': 'thinking.enabled' | 'thinking.effort'; omitWhenOff?: boolean }>; // chat_template_kwargs values; use $var for pi-controlled thinking values
cacheControlFormat?: 'anthropic'; // Anthropic-style cache_control on system prompt, last tool, and last user/assistant text content
openRouterRouting?: OpenRouterRouting; // OpenRouter routing preferences (default: {})
vercelGatewayRouting?: VercelGatewayRouting; // Vercel AI Gateway routing preferences (default: {})
}
interface OpenAIResponsesCompat {
supportsDeveloperRole?: boolean; // Whether provider supports `developer` role vs `system` (default: true)
sendSessionIdHeader?: boolean; // Whether to send `session_id` from `sessionId` when caching is enabled (default: true)
supportsLongCacheRetention?: boolean; // Whether provider supports `prompt_cache_retention: "24h"` (default: true)
}
If compat is not set, the library falls back to URL-based detection. If compat is partially set, unspecified fields use the detected defaults. This is useful for:
store fieldfauxProvider() builds an in-memory provider with scripted responses for tests and demos:
import {
createModels,
fauxAssistantMessage,
fauxProvider,
fauxText,
fauxThinking,
fauxToolCall,
} from '@earendil-works/pi-ai';
const faux = fauxProvider({
tokensPerSecond: 50 // optional
});
const models = createModels();
models.setProvider(faux.provider);
const model = faux.getModel();
const context = {
messages: [{ role: 'user', content: 'Summarize package.json and then call echo', timestamp: Date.now() }]
};
faux.setResponses([
fauxAssistantMessage([
fauxThinking('Need to inspect package metadata first.'),
fauxToolCall('echo', { text: 'package.json' })
], { stopReason: 'toolUse' })
]);
const first = await models.complete(model, context, {
sessionId: 'session-1',
cacheRetention: 'short'
});
context.messages.push(first);
context.messages.push({
role: 'toolResult',
toolCallId: first.content.find((block) => block.type === 'toolCall')!.id,
toolName: 'echo',
content: [{ type: 'text', text: 'package.json contents here' }],
isError: false,
timestamp: Date.now()
});
faux.setResponses([
fauxAssistantMessage([
fauxThinking('Now I can summarize the tool output.'),
fauxText('Here is the summary.')
])
]);
const s = models.stream(model, context);
for await (const event of s) {
console.log(event.type);
}
// Optional: multiple faux models for model-switching tests
const multiModel = fauxProvider({
provider: 'faux-multi',
models: [
{ id: 'faux-fast', reasoning: false },
{ id: 'faux-thinker', reasoning: true }
]
});
models.setProvider(multiModel.provider);
const thinker = multiModel.getModel('faux-thinker');
console.log(thinker?.reasoning);
console.log(faux.getPendingResponseCount());
console.log(faux.state.callCount);
Notes:
errorMessage: "No more faux responses queued".faux.setResponses([...]) to replace the remaining queue and faux.appendResponses([...]) to add more responses.faux.models exposes all faux models. faux.getModel() returns the first one, and faux.getModel(id) returns a specific one.fauxAssistantMessage(...) for scripted assistant replies. Use fauxText(...), fauxThinking(...), and fauxToolCall(...) to build content blocks without filling in low-level fields manually.sessionId is present and cacheRetention is not "none", prompt cache reads and writes are simulated automatically.toolcall_delta chunks.tokensPerSecond to pace chunk delivery in real time.provider ids.The library supports seamless handoffs between different LLM providers within the same conversation. This allows you to switch models mid-conversation while preserving context, including thinking blocks, tool calls, and tool results.
When messages from one provider are sent to a different provider, the library automatically transforms them for compatibility:
<thinking> tagsimport { createModels, type Context } from '@earendil-works/pi-ai';
import { anthropicProvider } from '@earendil-works/pi-ai/providers/anthropic';
import { openaiProvider } from '@earendil-works/pi-ai/providers/openai';
import { googleProvider } from '@earendil-works/pi-ai/providers/google';
const models = createModels();
models.setProvider(anthropicProvider());
models.setProvider(openaiProvider());
models.setProvider(googleProvider());
const context: Context = { messages: [] };
// Start with Claude
const claude = models.getModel('anthropic', 'claude-sonnet-4-5')!;
context.messages.push({ role: 'user', content: 'What is 25 * 18?', timestamp: Date.now() });
context.messages.push(await models.completeSimple(claude, context, { reasoning: 'medium' }));
// Switch to GPT-5 - it will see Claude's thinking as <thinking> tagged text
const gpt5 = models.getModel('openai', 'gpt-5-mini')!;
context.messages.push({ role: 'user', content: 'Is that calculation correct?', timestamp: Date.now() });
context.messages.push(await models.complete(gpt5, context));
// Switch to Gemini
const gemini = models.getModel('google', 'gemini-2.5-flash')!;
context.messages.push({ role: 'user', content: 'What was the original question?', timestamp: Date.now() });
const geminiResponse = await models.complete(gemini, context);
All providers can handle messages from other providers — text, tool calls and results (including images), thinking blocks (transformed to tagged text), and aborted messages with partial content. This enables flexible workflows: start with a fast model, switch to a more capable one for complex reasoning, or maintain continuity across provider outages.
The Context object can be easily serialized and deserialized using standard JSON methods, making it simple to persist conversations, implement chat history, or transfer contexts between services:
const context: Context = {
systemPrompt: 'You are a helpful assistant.',
messages: [
{ role: 'user', content: 'What is TypeScript?', timestamp: Date.now() }
]
};
const model = models.getModel('openai', 'gpt-4o-mini')!;
const response = await models.complete(model, context);
context.messages.push(response);
// Serialize the entire context
const serialized = JSON.stringify(context);
// Save to database, localStorage, file, etc.
localStorage.setItem('conversation', serialized);
// Later: deserialize and continue the conversation
const restored: Context = JSON.parse(localStorage.getItem('conversation')!);
restored.messages.push({ role: 'user', content: 'Tell me more about its type system', timestamp: Date.now() });
// Continue with any model
const newModel = models.getModel('anthropic', 'claude-3-5-haiku-20241022')!;
const continuation = await models.complete(newModel, restored);
Models are plain serializable data too — no functions or implementations attached — so persisting "which model was this conversation using" is a JSON.stringify away.
Note: If the context contains images (encoded as base64 as shown in the Image Input section), those will also be serialized.
The library supports browser environments. The core entrypoint and provider factories are side-effect free and bundle cleanly. Environment variables are not available in browsers, so pass API keys explicitly — or inject a CredentialStore (e.g. localStorage-backed) and let provider auth resolve from stored credentials:
import { createModels } from '@earendil-works/pi-ai';
import { anthropicProvider } from '@earendil-works/pi-ai/providers/anthropic';
const models = createModels();
models.setProvider(anthropicProvider());
const model = models.getModel('anthropic', 'claude-3-5-haiku-20241022')!;
const response = await models.complete(model, {
messages: [{ role: 'user', content: 'Hello!', timestamp: Date.now() }]
}, {
apiKey: 'your-api-key'
});
Security Warning: Exposing API keys in frontend code is dangerous. Anyone can extract and abuse your keys. Only use this approach for internal tools or demos. For production applications, use a backend proxy that keeps your API keys secure.
Browser compatibility notes:
bedrock-converse-stream) is not supported in browser environments. It can still appear in model lists; calls fail at runtime.For small bundles, import only the providers you need:
import { createModels } from '@earendil-works/pi-ai';
import { openaiProvider } from '@earendil-works/pi-ai/providers/openai';
const models = createModels();
models.setProvider(openaiProvider());
Rules:
@earendil-works/pi-ai is the core entrypoint and does not import built-in catalogs, provider factories, or SDK implementations.@earendil-works/pi-ai/providers/<provider> imports that provider's catalog and lazy API wrapper only.@earendil-works/pi-ai/providers/all imports every built-in provider factory and all catalogs. Use it only when you want the full built-in set.providers/all includes all statically visible SDKs. Bedrock is the exception: its AWS SDK implementation is loaded through a bundler-opaque Node-only import.@earendil-works/pi-ai/api/<api-id> directly loads that API implementation and its SDK immediately.Avoid @earendil-works/pi-ai/compat in new bundled apps; it preserves the old global API and imports the full built-in catalog surface.
For single-file Node ESM bundles, some SDK dependencies may still use dynamic CommonJS require() internally. If you see errors such as Dynamic require of "child_process" is not supported, add a Node require shim to the bundle. With esbuild:
esbuild app.js --bundle --platform=node --format=esm \
--banner:js='import { createRequire } from "module";const require = createRequire(import.meta.url);' \
--outfile=app.bundle.js
This is only for Node bundles; it is not a browser or Cloudflare Workers workaround.
Bedrock is Node-only. Add it like any other provider:
import { createModels } from '@earendil-works/pi-ai';
import { amazonBedrockProvider } from '@earendil-works/pi-ai/providers/amazon-bedrock';
const models = createModels();
models.setProvider(amazonBedrockProvider());
In normal Node package usage and code-split bundles, Bedrock loads its AWS SDK implementation lazily. For a standalone single-file bundle that must include Bedrock support, register the implementation module explicitly:
import { setBedrockProviderModule } from '@earendil-works/pi-ai/api/bedrock-converse-stream.lazy';
import { bedrockProviderModule } from '@earendil-works/pi-ai/bedrock-provider';
setBedrockProviderModule(bedrockProviderModule);
That explicit override bundles the AWS SDK. Without it, Bedrock's opaque runtime import expects the package's Bedrock implementation file to be available at runtime.
Pass env in stream options to scope provider configuration to a request. Values in env are used before process environment variables for provider auth and configuration such as Cloudflare account IDs, Azure OpenAI settings, Vertex project/location, Bedrock settings, PI_CACHE_RETENTION, and HTTP_PROXY/HTTPS_PROXY.
const models = builtinModels();
const model = models.getModel('cloudflare-ai-gateway', 'workers-ai/@cf/moonshotai/kimi-k2.6')!;
const response = await models.complete(model, context, {
env: {
CLOUDFLARE_API_KEY: '...',
CLOUDFLARE_ACCOUNT_ID: 'account-id',
CLOUDFLARE_GATEWAY_ID: 'gateway-id'
}
});
Use this when one process needs different provider settings per request, or when ambient environment variables should not leak into a provider call.
Several providers support OAuth authentication instead of static API keys:
Each of these providers carries an OAuthAuth on provider.auth.oauth with three operations: login(callbacks) runs the interactive flow and returns a credential, refresh(credential) exchanges the refresh token, and toAuth(credential) derives request auth (GitHub Copilot's per-account base URL comes from here). Refresh is automatic: models.getAuth() and the request paths refresh expired tokens under a credential-store lock, so concurrent requests and processes cannot double-refresh.
import { createModels } from '@earendil-works/pi-ai';
import { anthropicProvider } from '@earendil-works/pi-ai/providers/anthropic';
const models = createModels({ credentials: myStore }); // persistent CredentialStore
models.setProvider(anthropicProvider());
// Login: drive the flow with prompt()/notify() callbacks, persist the credential
const provider = models.getProvider('anthropic')!;
const credential = await provider.auth.oauth!.login({
prompt: async (p) => {
// p.type: 'text' | 'secret' | 'select' | 'manual_code'
// manual_code prompts race a local callback server; p.signal aborts them when the server wins
return await askUser(p.message);
},
notify: (event) => {
// event.type: 'auth_url' | 'device_code' | 'progress'
if (event.type === 'auth_url') console.log(`Open: ${event.url}`);
if (event.type === 'device_code') console.log(`Code: ${event.userCode} at ${event.verificationUri}`);
if (event.type === 'progress') console.log(event.message);
},
});
await myStore.modify('anthropic', async () => credential);
// From here on, requests resolve and refresh the token automatically
const model = models.getModel('anthropic', 'claude-sonnet-4-5')!;
await models.complete(model, context);
// Logout
await myStore.delete('anthropic');
Vertex AI models support either a Google Cloud API key or Application Default Credentials (ADC):
GOOGLE_CLOUD_API_KEY or pass apiKey in the call options.gcloud auth application-default loginGOOGLE_APPLICATION_CREDENTIALS to point to a service account JSON key fileWhen using ADC, also set GOOGLE_CLOUD_PROJECT (or GCLOUD_PROJECT) and GOOGLE_CLOUD_LOCATION. You can also pass project/location in the call options. When using GOOGLE_CLOUD_API_KEY, project and location are not required.
# Local (uses your user credentials)
gcloud auth application-default login
export GOOGLE_CLOUD_PROJECT="my-project"
export GOOGLE_CLOUD_LOCATION="us-central1"
# CI/Production (service account key file)
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
Official docs: Application Default Credentials
The quickest way to authenticate:
npx @earendil-works/pi-ai login # interactive provider selection
npx @earendil-works/pi-ai login anthropic # login to specific provider
npx @earendil-works/pi-ai list # list available providers
Credentials are saved to auth.json in the current directory.
The legacy flow functions remain available via the @earendil-works/pi-ai/oauth entry point (loginAnthropic, loginOpenAICodex, loginGitHubCopilot, refreshOAuthToken, getOAuthApiKey); credential storage is the caller's responsibility there. New code should prefer the provider-owned OAuthAuth shown above — it composes with the credential store and gets locked auto-refresh for free.
Provider notes:
OpenAI Codex: Requires a ChatGPT Plus or Pro subscription. Provides access to GPT-5.x Codex models with extended context windows and reasoning capabilities. The library automatically handles session-based prompt caching when sessionId is provided in stream options. You can set transport in stream options to "sse", "websocket", or "auto" for Codex Responses transport selection. When using WebSocket with a sessionId, connections are reused per session and expire after 5 minutes of inactivity.
Azure OpenAI (Responses): Uses the Responses API only. Set AZURE_OPENAI_API_KEY and either AZURE_OPENAI_BASE_URL or AZURE_OPENAI_RESOURCE_NAME. AZURE_OPENAI_BASE_URL supports both https://<resource>.openai.azure.com and https://<resource>.cognitiveservices.azure.com; root endpoints are normalized to .../openai/v1 automatically. Use AZURE_OPENAI_API_VERSION (defaults to v1) to override the API version if needed. Deployment names are treated as model IDs by default, override with azureDeploymentName or AZURE_OPENAI_DEPLOYMENT_NAME_MAP using comma-separated model-id=deployment pairs (for example gpt-4o-mini=my-deployment,gpt-4o=prod). Legacy deployment-based URLs are intentionally unsupported.
GitHub Copilot: If you get "The requested model is not supported" error, enable the model manually in VS Code: open Copilot Chat, click the model selector, select the model (warning icon), and click "Enable".
Older versions exposed a global API: stream()/complete() dispatching on model.api via a global registry, sync getModel()/getModels()/getProviders() catalog reads, registerApiProvider(), getEnvApiKey(), and per-API lazy stream functions. That surface lives unchanged on the compat entrypoint:
// Before
import { getModel, complete } from '@earendil-works/pi-ai';
// After (verbatim behavior, one import-path change)
import { getModel, complete } from '@earendil-works/pi-ai/compat';
Compat is a strict superset of the root entrypoint, so a file can switch its import path wholesale. It will be removed in a future release; migrate to createModels() + provider factories:
| Old | New |
|---|---|
getModel('openai', 'gpt-4o-mini') | models.getModel('openai', 'gpt-4o-mini') or getBuiltinModel() from providers/all |
getModels('anthropic') / getProviders() | models.getModels('anthropic') / models.getProviders() or getBuiltin* |
stream(model, ctx, opts) (env-key injection) | models.stream(model, ctx, opts) (provider auth resolution) |
registerApiProvider({ api, stream, streamSimple }) | createProvider({ id, auth, models, api }) + models.setProvider() |
getEnvApiKey('openai') | await models.getAuth(model) |
streamAnthropic(model, ctx, opts) | stream from @earendil-works/pi-ai/api/anthropic-messages, or a provider in a collection |
registerFauxProvider() | fauxProvider() + models.setProvider() |
Adding a new LLM provider requires changes across multiple files. The layered layout: API implementations live in src/api/, provider factories in src/providers/, generated catalogs in src/providers/<id>.models.ts. This checklist covers all necessary steps:
src/types.ts)KnownApi (for example "bedrock-converse-stream"), if it is a new APIKnownProvider (for example "amazon-bedrock")ApiOptionsMapsrc/api/<api-id>.ts, only for a new API)Create a new API implementation file (for example bedrock-converse-stream.ts) that exports exactly stream and streamSimple, plus:
StreamOptions (for example BedrockOptions)Context to provider formattext, tool_call, thinking, usage, stop)Add a lazy wrapper src/api/<api-id>.lazy.ts (<name>Api() via lazyApi()) so providers can reference the implementation without importing its SDK. Add any root-level export type re-exports in src/index.ts that should remain available from @earendil-works/pi-ai.
scripts/generate-models.ts, scripts/generate-image-models.ts)Model interface via scripts/generate-models.ts; regeneration emits src/providers/<id>.models.ts and the aggregatorImagesModel interface via scripts/generate-image-models.tssrc/providers/<id>.ts)createProvider() wiring catalog + auth + the lazy API wrapperenvApiKeyAuth for standard key providers, a custom ApiKeyAuth for ambient auth (AWS profiles, ADC), lazyOAuth where an OAuth flow existssrc/providers/all.tssrc/compat.ts and add the package subpath export in package.jsontest/)Create or update test files to cover the new provider:
stream.test.ts - Basic streaming and tool usetokens.test.ts - Token usage reportingabort.test.ts - Request cancellationempty.test.ts - Empty message handlingcontext-overflow.test.ts - Context limit errorsimage-limits.test.ts - Image support (if applicable)unicode-surrogate.test.ts - Unicode handlingtool-call-without-result.test.ts - Orphaned tool callsimage-tool-result.test.ts - Images in tool resultstotal-tokens.test.ts - Token counting accuracycross-provider-handoff.test.ts - Cross-provider context replayproviders.test.ts - Provider listing and auth resolutionFor cross-provider-handoff.test.ts, add at least one provider/model pair. If the provider exposes multiple model families (for example GPT and Claude), add at least one pair per family.
For providers with non-standard auth (AWS, Google Vertex), create a utility like bedrock-utils.ts with credential detection helpers.
../coding-agent/)Update src/core/model-resolver.ts:
DEFAULT_MODELSUpdate src/cli/args.ts:
Update README.md:
Update packages/ai/README.md:
Add an entry to packages/ai/CHANGELOG.md under ## [Unreleased]:
### Added
- Added support for [Provider Name] provider ([#PR](link) by [@author](link))
MIT