sdk/packages/llms/src/catalog/README.md
The generated catalog is the SDK's normalized copy of provider and model
metadata. Most built-in catalog data comes from models.dev
through catalog-live.ts and is written to catalog.generated.ts by the model
generation scripts.
This file documents the intended meaning of the token-limit fields and the boundary between catalog metadata and runtime request policy.
models.dev exposes model limits under limit:
limit.context maximum context budget reported for the model
limit.input optional stricter prompt/input-token budget
limit.output maximum output tokens reported for generation
limit.input is not present for every model. When it is present, it should be
treated as the best available prompt/input ceiling. When it is absent, the
catalog falls back to limit.context for the prompt/input ceiling.
The catalog maps those source fields into ModelInfo:
contextWindow provider-reported context budget
maxInputTokens prompt/input-token budget used by compaction and diagnostics
maxTokens provider-reported output-token budget
These fields are not additive guarantees. In particular, this is valid catalog metadata:
contextWindow: 200000
maxInputTokens: 200000
maxTokens: 128000
That means:
the prompt may be allowed to approach 200000 tokens
the model may be capable of generating up to 128000 tokens
an individual request still needs prompt + output to fit the provider's rules
Do not infer that maxInputTokens + maxTokens must be less than or equal to
contextWindow. Many provider catalogs expose separate maxima that share an
underlying context budget.
The catalog should describe reported model capabilities. It should avoid baking in Cline request defaults, product-level safety limits, or provider-specific workarounds unless there is no better place to represent a stable fact.
The code that sends a model request is responsible for deciding how many output tokens to ask for on that specific turn. That decision can depend on:
request.options.maxTokensFor models where prompt and output tokens both have to fit into the same context window, the request limit should be based on the current prompt, not invented while generating the catalog. Conceptually:
safeOutputTokens = min(
modelReportedMaxOutput,
contextWindow - estimatedPromptTokens - reserveTokens,
userConfiguredOutputCap,
)
The SDK gateway only sends an output token limit when the caller provides
request.options.maxTokens or an equivalent host configuration. Catalog
metadata does not become a request parameter by itself.
The exact request-limit policy belongs in the provider/gateway/core request path, not in generated catalog data.
The catalog generator should not turn ambiguous provider metadata into a new Cline output limit.
For example, if a provider reports:
limit.context = 202800
limit.output = 202800
the generated catalog should preserve that reported output limit:
contextWindow: 202800
maxInputTokens: 202800
maxTokens: 202800
Cline may still ask for fewer output tokens on a given request. That belongs in the request path because only the request path knows the current prompt size and the user's configured output cap.
The generated catalog is produced from the live normalizer:
models.dev/api.json
|
v
src/catalog/catalog-live.ts
|
v
scripts/generate-models.ts
|
v
src/catalog/catalog.generated.ts
Use the package script when regenerating:
bun -F @cline/llms generate:models
Catalog changes should usually include tests in catalog-live.test.ts that
document how source limit fields map to ModelInfo.
Provider token semantics are not uniform:
prompt + requestedOutput > contextWindow.Because of those differences, catalog generation should preserve source metadata as much as possible, while runtime request policy should be conservative and observable.
catalog-live.ts: normalizes live models.dev data.catalog-live.test.ts: tests catalog normalization behavior.catalog.generated.ts: checked-in generated provider/model catalog.../../scripts/generate-models.ts: writes generated catalog output.../providers/ai-sdk.ts: passes maxOutputTokens into AI SDK.../providers/gateway.ts: resolves per-request/default maxTokens.