packages/llm/DESIGN.md
Discussion draft. This document describes the intended replacement for the current private
@opencode-ai/llmAPI. Names and exact TypeScript signatures are illustrative until implementation, but the domain boundaries and defaults are deliberate.
@opencode-ai/aiLLMThe package name leaves room for future domains such as embeddings, images, and speech. Those domains are not part of this design and should not be forced into the LLM run/turn model.
The API has four layers:
LLM.generate or LLM.stream.LLM.generateTurn or LLM.streamTurn.Normal documentation should teach only the first layer initially.
Provider definitions, configured providers, models, protocols, tools, and hooks are immutable values. Importing a provider does not register anything globally.
Requests, messages, tool definitions, events, usage, and result projections are plain immutable data with schemas. Configured models, executable tools, hooks, and provider definitions may contain functions and Effect requirements and are not serializable.
Defaults should make common calls correct without hiding where behavior comes from. Overrides compose in a documented order and never require patching installed dependencies.
An immutable, declarative description of a provider integration. It owns model selection, option schemas, catalog corrections, protocols, and provider-wide hooks. It is an experimental provider-authoring API.
A provider definition bound to deployment concerns such as credentials, endpoint, transport, and provider headers.
configure(...) is intentionally deployment-only. It does not establish hidden
generation defaults.
A process-local executable model value selected from a configured provider. It contains identity, capabilities, pricing metadata, provider-specific option types, reusable request-behavior defaults, and hidden execution behavior.
Normal users do not need to learn the current Route composite. Protocol,
endpoint, auth, transport, and hooks are bound behind Model.
Portable, model-independent input for a model call. It may contain system instructions, messages, tool definitions, generation controls, output intent, cache policy, and metadata. It does not contain a configured model, executable tool handlers, or hooks.
Exactly one request to a model provider and its normalized response. It does not execute local tools or continue the conversation.
A complete interaction consisting of one or more provider turns. A run executes local tools, appends their results, and continues until the model completes or a stopping condition matches.
The result of exactly one provider turn.
The result of a complete model run. It preserves every turn, tool activity, aggregate usage, and estimated cost while exposing shortcuts to the final output.
The provider-wire contract that lowers portable requests into provider-native bodies and raises provider-native stream events into normalized turn events. Protocols are public, reusable, fully inspectable, and immutably patchable, but the entire protocol-authoring API is experimental.
import { Effect } from "effect"
import { LLM } from "@opencode-ai/ai"
import { OpenAI } from "@opencode-ai/ai/providers/openai"
// Environment-based credentials are a provider default. No LLMClient layer is
// required: the Effect exposes standard runtime dependencies directly.
const model = OpenAI.model("gpt-4.1-mini")
const program = Effect.gen(function* () {
const result = yield* LLM.generate({
model,
system: "You are concise.",
prompt: "Explain Effect in one sentence.",
})
// `generate` always returns GenerateResult, even when the run has one turn.
console.log(result.text)
console.log(result.turns.length) // 1
console.log(result.usage)
console.log(result.cost) // Estimated cost, or undefined if any turn is unpriced.
})
The required Effect environment should contain standard services plus services
required by tools and hooks. It should not contain an LLMClient wrapper service.
The current README appears similarly small but omits the package-specific service and layer required at runtime:
// Current API: this request contains an executable model/route value.
const request = LLM.request({
model: OpenAI.configure({ apiKey }).responses("gpt-4o-mini"),
prompt: "Say hello.",
})
// Current API: this performs one provider turn, despite the broad name.
const response = yield * LLM.generate(request)
// Current API: execution also needs LLMClient.layer and RequestExecutor services.
The proposal removes mandatory request construction, removes package-specific
runtime provisioning, and makes generate mean a complete run.
import { OpenAI } from "@opencode-ai/ai/providers/openai"
// Open strings receive autocomplete for IDs from the generated models.dev
// snapshot but continue to accept newly released and fine-tuned model IDs.
const model = OpenAI.model("gpt-4.1-mini")
const openai = OpenAI.configure({
apiKey,
baseURL: "https://gateway.example.com/openai/v1",
headers: {
"x-tenant": "acme",
},
})
const model = openai.model("gpt-4.1-mini")
configure(...) owns deployment concerns only:
It does not own temperature, maximum output tokens, cache policy, retry policy, tools, output schema, or system instructions.
const model = OpenAI.model("gpt-4.1-mini", {
generation: {
temperature: 0.2,
maxTokens: 2_000,
},
cache: "auto",
provider: {
store: false,
},
})
The second argument may default request behavior but not prompt/history or executable tools. Call-level values override model defaults.
Provider-specific options are inferred from the concrete model:
yield *
LLM.generate({
model: OpenAI.model("gpt-4.1-mini"),
prompt: "Hello",
provider: {
store: false,
// OpenAI-specific autocomplete here; no `{ openai: ... }` nesting.
},
})
Code choosing between providers dynamically must narrow the model before using provider-specific options. Portable generation controls remain available without narrowing.
// Current API mixes deployment configuration and reusable request behavior.
const model = OpenAI.configure({
apiKey,
generation: { maxTokens: 160 },
providerOptions: {
openai: { store: false },
},
}).model("gpt-4o-mini")
The proposal separates deployment configuration from selected-model behavior and removes provider-keyed option bags when a concrete model already identifies the provider.
const result =
yield *
LLM.generate({
model,
system: "You are concise.",
prompt: "Summarize this pull request.",
generation: { maxTokens: 500 },
})
const request = LLM.request({
system: "You are concise.",
prompt: "Summarize this pull request.",
generation: { maxTokens: 500 },
})
// Bind process-local execution behavior only when running.
const result = yield * LLM.generate({ model, request })
LLM.request(...) returns a plain immutable object. Use ordinary object spread
to derive another request:
const longer = {
...request,
generation: {
...request.generation,
maxTokens: 1_000,
},
}
There is no LLM.updateRequest(...) helper and no request Schema class.
import { Message } from "@opencode-ai/ai"
const request = LLM.request({
system: "You are concise.",
messages: [
Message.user("What is Effect?"),
Message.assistant("A TypeScript library for typed functional effects."),
Message.user("Why would I use it?"),
],
})
Message helpers return plain immutable data. Object literals remain valid when they satisfy the same input type.
system stays separate from chronological messages because it is the initial
privileged instruction. A chronological system message represents an instruction
change at a specific point in history.
import { Effect, Schema } from "effect"
import { LLM, Tool } from "@opencode-ai/ai"
const tools = {
getWeather: Tool.make({
description: "Get current weather for a city.",
parameters: Schema.Struct({ city: Schema.String }),
success: Schema.Struct({ forecast: Schema.String }),
// Tool service requirements and typed errors flow into LLM.generate's
// Effect environment/error model instead of being erased.
execute: ({ city }) => Weather.get(city),
// Expected domain failures need an explicit model-visible representation.
formatError: (error) => ({
type: "text",
text: `Weather lookup failed: ${error.message}`,
}),
}),
}
const result =
yield *
LLM.generate({
model,
prompt: "What is the weather in London?",
tools,
})
// The runtime advertises definitions, dispatches calls, records results, and
// continues provider turns automatically.
console.log(result.text)
console.log(result.turns)
console.log(result.toolExecutions)
The default stopping condition is equivalent to:
stopWhen: StopWhen.turnCount(20)
This matches the Vercel AI SDK ToolLoopAgent default. Reaching the limit is a
successful result with stopReason: "max-turns", not an Effect failure.
const result =
yield *
LLM.generate({
model,
prompt,
tools,
stopWhen: StopWhen.any(StopWhen.turnCount(8), StopWhen.hasToolCall("finalize")),
})
stopWhen accepts one predicate. Composition is explicit through combinators
such as StopWhen.any, StopWhen.all, and StopWhen.not.
Successful run stop reasons are closed:
type RunStopReason = "completed" | "max-turns" | "stop-condition"
Independent tool calls emitted in one turn run concurrently with a bounded, configurable concurrency limit. Results are appended in deterministic emitted order. The runtime does not infer dependencies between tool calls; the model must request dependent calls in separate turns.
Tools may declare an optional timeout. The overall run timeout still applies.
Today callers must manually bridge every layer:
const request = LLM.request({
model,
prompt,
tools: Tool.toDefinitions(tools),
})
const events = yield * LLM.stream(request).pipe(Stream.runCollect)
const call = Array.from(events).find(LLMEvent.is.toolCall)
if (call && !call.providerExecuted) {
const dispatched = yield * ToolRuntime.dispatch(tools, call)
const followUp = LLM.updateRequest(request, {
messages: [...request.messages, Message.assistant([call]), Message.tool({ ...call, result: dispatched.result })],
})
// Caller must invoke the provider again and repeat the loop.
}
That explicit flow remains possible through turn APIs, but it is no longer the only tool experience.
OpenCode and other durable runtimes need to own persistence, tool settlement, and continuation. They use the explicit turn API:
const result =
yield *
LLM.generateTurn({
model,
request,
// Definitions only. generateTurn never dispatches local handlers.
tools: {
getWeather: Tool.definition({
description: "Get current weather for a city.",
parameters: WeatherInput,
}),
},
})
// Persist the TurnResult and settle calls durably before the next turn.
for (const call of result.toolCalls) {
// Application-owned dispatch and persistence.
}
generateTurn and streamTurn make exactly one provider request. They never
execute a local tool and never continue automatically.
This separation is load-bearing:
generate / stream: complete Model RungenerateTurn / streamTurn: one Provider TurnA portable request may declare serializable definitions, but executable handlers are bound at run time:
const request = LLM.request({
prompt: "What is the weather in London?",
tools: {
getWeather: Tool.definition({
description: "Get current weather for a city.",
parameters: WeatherInput,
}),
},
})
const result =
yield *
LLM.generate({
model,
request,
tools: {
getWeather: Tool.make({
description: "Get current weather for a city.",
parameters: WeatherInput,
success: WeatherOutput,
execute: getWeather,
formatError,
}),
},
})
Definitions and handlers match by record key. Before the first provider call, the runtime validates that every local definition has a compatible executable binding. Missing or incompatible bindings fail with a typed tool-binding error.
Provider-hosted tools are distinct typed values:
const result =
yield *
LLM.generate({
model: OpenAI.model("gpt-4.1"),
prompt: "Find today's relevant announcements.",
tools: {
search: OpenAI.tool.webSearch({ searchContextSize: "medium" }),
},
})
Hosted tools do not pretend to have local handlers, and callers do not inspect a
providerExecuted boolean to decide whether dispatch is safe.
LLM.stream returns an Effect Stream<RunEvent, LLMError, Requirements>.
Run events explicitly expose orchestration boundaries:
const program = LLM.stream({ model, prompt, tools }).pipe(
Stream.tap((event) =>
Effect.sync(() => {
switch (event.type) {
case "run-start":
break
case "turn-start":
break
case "turn-event":
// Normalized text, reasoning, tool-call, usage, and finish events.
if (event.event.type === "text-delta") {
process.stdout.write(event.event.text)
}
break
case "tool-start":
break
case "tool-finish":
break
case "turn-finish":
break
case "run-finish":
// Contains the same full GenerateResult returned by LLM.generate.
console.log(event.result.usage)
break
}
}),
),
Stream.runDrain,
)
Exact event tag spelling remains an implementation detail to finalize, but the algebra is settled:
RunEvent union for run, turn, and tool lifecycleTurnEvent union for normalized provider outputstreamTurn emits only TurnEventGenerateResultExternal cancellation remains Effect interruption. It does not fabricate a
successful result with an interrupted stop reason.
Structured output is an option on generate, not a separate operation:
const Weather = Schema.Struct({
city: Schema.String,
forecast: Schema.String,
highCelsius: Schema.Number,
})
const result =
yield *
LLM.generate({
model,
prompt: "Give me today's weather for London.",
output: Weather,
})
// Inferred from Weather.
result.output.city
The model declaration and protocol select the best reliable strategy:
Advanced callers may override the strategy when exact provider semantics matter.
// Current API is a separate operation and always forces a synthetic tool.
const result =
yield *
LLM.generateObject({
model,
prompt,
schema: Weather,
})
The proposal unifies generation and lets capabilities choose the strategy rather than permanently encoding one cross-provider workaround.
models.dev is the release-time source for:
The package ships a generated, versioned snapshot. Normal execution performs no catalog network requests.
Provider definitions may correct generated metadata where protocol-specific knowledge is more accurate. Precedence is:
models.dev snapshot
< provider-definition correction
< provider configuration override
< model-selection override
< call override
Unknown model IDs inherit only capabilities guaranteed by the selected protocol. Unsupported request capabilities fail before network execution unless the caller explicitly overrides the model declaration.
GenerateResult aggregates normalized usage across every turn, including cache
read/write usage where providers report it.
It also exposes estimated cost using the generated models.dev pricing snapshot:
result.usage.inputTokens
result.usage.outputTokens
result.usage.cacheReadInputTokens
result.usage.cacheWriteInputTokens
result.cost?.total
result.cost?.currency // e.g. "USD"
Cost is an estimate, not a billing guarantee. If reliable pricing is unavailable for any turn, aggregate run cost is unavailable rather than partial or silently zero. Per-turn metadata should retain the catalog/pricing identity used so an estimate can be explained.
Prompt caching remains "auto" by default. The library places protocol-aware
cache boundaries where explicit caching is supported and does nothing on the wire
where providers cache implicitly.
yield *
LLM.generate({
model,
prompt,
cache: "none", // Explicit opt-out.
})
Granular cache policy remains available as an advanced request option.
The default retry policy is deliberately conservative:
Retry configuration is call-scoped only. Provider and model configuration do not silently inherit custom retry policies.
yield *
LLM.generate({
model,
prompt,
timeout: "2 minutes", // Entire run, including tools.
turnTimeout: "30 seconds", // Each provider turn.
tools,
})
Exact Duration input spelling follows Effect conventions. Individual tools may also declare optional timeouts.
AbortSignal, rejecting with a recognizable abort errorStable high-level hooks exist at five named stages:
Hooks are Effectful. They may transform the stage value or fail with a typed error. They may not secretly short-circuit execution, synthesize a response, retry, or redirect control flow.
const model = OpenAI.model("gpt-4.1", {
hooks: {
request: (request) =>
Effect.succeed({
...request,
metadata: { ...request.metadata, tenant: "acme" },
}),
body: (body, context) => auditBody(body, context),
transport: (request) => signInternalGatewayRequest(request),
event: (event) => redactProviderMetadata(event),
error: (error) => classifyInternalError(error),
},
})
Hook scopes compose in this order:
provider-definition hooks -> model hooks -> call hooks
Each hook sees the prior hook's output. Replacement requires an explicit definition-level patch, not accidental last-writer-wins semantics.
Provider-definition hooks are authored by provider integrations. They are not
passed through Provider.configure(...), which remains deployment-only.
The request customization ladder is:
provider optionsyield *
LLM.generate({
model,
prompt,
http: {
headers: { "x-experimental": "1" },
query: { debug: "true" },
body: { newlyReleasedProviderField: true },
},
})
Raw overlays are intentional last-resort support for provider features that ship before the library has a typed option.
Normalized message/content/event unions remain closed and exhaustive. Unknown or
provider-required round-trip data lives in caller-writable providerMetadata.
const assistant = Message.assistant([
{
type: "reasoning",
text: "...",
providerMetadata: {
openai: {
// Opaque provider data needed for replay or continuation.
},
},
},
])
Protocols validate metadata they consume. The field is an escape hatch, not a portable semantic guarantee.
The Effect error channel is a tagged domain union rather than one LLMError
wrapper with nested reasons. Illustrative categories:
type LLMError =
| AuthenticationError
| InvalidRequestError
| UnsupportedCapabilityError
| ToolBindingError
| TransportError
| ProviderResponseError
| InvalidProviderOutputError
| HookError
Each error retains relevant provider/model/turn/stage context and its underlying cause where available.
Expected tool errors keep their own typed error channel. Tool.make requires an
explicit mapping before such errors become model-visible tool results. Expected
mapped failures let the model recover; defects and interruption fail the run.
The core library emits Effect-native spans and metrics for:
Default telemetry records metadata only:
Prompts, model output, tool arguments, and tool results are never recorded by default. Explicit hooks or telemetry configuration may opt into content capture.
Promise wrappers live at a separate subpath so the root remains unambiguously Effect-first:
import { LLM } from "@opencode-ai/ai/promise"
import { OpenAI } from "@opencode-ai/ai/providers/openai"
const result = await LLM.generate({
model: OpenAI.model("gpt-4.1-mini"),
prompt: "Explain Effect in one sentence.",
signal: abortController.signal,
})
Streaming returns an AsyncIterable<RunEvent>:
for await (const event of LLM.stream({ model, prompt, signal })) {
if (event.type === "turn-event" && event.event.type === "text-delta") {
process.stdout.write(event.event.text)
}
}
Top-level Promise functions use a default runtime for built-in services. Custom Effect service requirements use a configured client:
const client = LLM.makeClient({
layer: Layer.mergeAll(WeatherLive, AuditLive),
})
const result = await client.generate({ model, prompt, tools })
The Promise API mirrors Effect semantics. It does not invent different run, error, stopping, or cancellation behavior.
Schemas live in a dedicated namespace/subpath instead of flooding root exports:
import { LLMSchema } from "@opencode-ai/ai/schema"
const request = yield * Schema.decodeUnknown(LLMSchema.Request)(input)
Schemas cover only serializable domain values:
Configured models, executable tools, hooks, provider definitions, and protocols are process-local behavior and do not receive fake serialization schemas.
Provider authoring is public but experimental.
import { Provider, Protocol } from "@opencode-ai/ai/provider"
export const ExampleAI = Provider.define({
id: "example",
options: ExampleProviderOptions,
configure: configureExampleDeployment,
protocols: {
responses: ExampleResponses,
},
models: ({ deployment, catalog }) => ({
model: (id, defaults) =>
Provider.model({
id,
deployment,
protocol: ExampleResponses,
metadata: catalog.model(id),
defaults,
}),
}),
catalog: generatedExampleCatalog,
corrections: exampleCatalogCorrections,
hooks: exampleProviderHooks,
})
The exact builder fields need implementation design, but it must remain one
declarative immutable object, infer provider option types, and support .with(...)
patching. It must not register globally.
Built-ins export their immutable definition for advanced forking:
import { OpenAI } from "@opencode-ai/ai/providers/openai"
const PatchedOpenAI = OpenAI.definition.with({
protocols: {
responses: OpenAI.protocols.responses.with({
// Explicit immutable stage patch.
body: {
fromRequest: patchResponsesBody,
},
}),
},
})
A protocol exposes all native types and stages:
Every stage is immutably patchable. This is deliberately more open than the AI SDK integrations that motivated this package.
const PatchedResponses = OpenAIResponses.with({
body: {
fromRequest: (request) =>
OpenAIResponses.body.fromRequest(request).pipe(Effect.map((body) => ({ ...body, custom_field: true }))),
},
stream: {
step: patchResponsesStep,
},
})
Protocol body, frame, native event, and parser-state types are exported. Because provider wire formats change often, these types and patch APIs are explicitly experimental and do not receive the high-level API's compatibility promise.
Illustrative export layout:
@opencode-ai/ai
LLM
Message
Tool
StopWhen
stable domain types
@opencode-ai/ai/promise
Promise/AsyncIterable LLM facade
@opencode-ai/ai/schema
serializable domain schemas
@opencode-ai/ai/provider
experimental Provider and Protocol authoring APIs
@opencode-ai/ai/providers/openai
@opencode-ai/ai/providers/anthropic
@opencode-ai/ai/providers/google
...
Providers are imported through individual subpaths. The root does not export all providers, and there is no preferred all-providers barrel.
| Concern | Default |
|---|---|
LLM.generate semantics | Complete Model Run |
LLM.generateTurn semantics | Exactly one Provider Turn |
| Maximum turns | 20 |
| Turn-limit outcome | Successful max-turns result |
| Tool execution | Automatic in runs |
| Tool concurrency | Concurrent, bounded, deterministic result order |
| Prompt caching | auto |
| Retries | Conservative, pre-output transient failures only |
| Structured output | Capability-selected native or tool strategy |
| Capability mismatch | Typed failure before network execution |
| Unknown model capability | Conservative protocol baseline |
| Telemetry content | Metadata only |
| Cost | Estimated aggregate or unavailable |
| Cancellation | Interruption/rejection, never successful completion |
The redesign intentionally removes or changes these current concepts:
| Current | Proposed |
|---|---|
@opencode-ai/llm | @opencode-ai/ai |
Mandatory LLM.request({ model, ... }) | Inline calls or model-free portable requests |
LLM.generate means one turn | LLM.generate means complete run |
LLMClient.generate/stream | LLM.generateTurn/streamTurn for one turn |
LLMClient.layer requirement | Standard Effect requirements exposed directly |
Public Route mental model | Hidden behind executable Model |
Provider.make structural helper | Experimental declarative Provider.define |
| Schema classes as canonical values | Plain immutable values plus schema subpath |
LLM.updateRequest | Object spread |
Tool.toDefinitions in normal calls | Named executable tool records |
Manual ToolRuntime.dispatch loop | Automatic run dispatch; explicit turn API for orchestration |
providerOptions: { openai: ... } | Model-typed provider: ... |
generateObject | Typed output option on generate |
| One event union for provider output | Separate TurnEvent and RunEvent unions |
providerExecuted dispatch check | Distinct hosted-tool constructors |
One wrapped LLMError | Tagged domain error union |
OpenCode should migrate to generateTurn / streamTurn, preserving its durable
prompt admission, persistence, permission, tool settlement, and continuation
boundaries. It should not use the automatic run API for Session orchestration.
These do not reopen the main design:
RunEvent and TurnEvent tag names and payloadsGenerateResult shortcut fields for text, reasoning, output, and messages.with(...) patch syntax and replacement semanticsThese should be resolved with call-site sketches and implementation spikes rather than by changing the domain boundaries above.