docs/src/content/docs/guides/customize/sampling.md
Sampling parameters control how the engine selects the next token from the model's probability distribution.
Temperature scales the logit distribution before sampling. Higher temperature flattens it; lower temperature sharpens it.
temperature: 0.0: greedy. Always picks the most likely token.temperature: 1.0: matches the model's training distribution.Top-p (nucleus sampling) keeps the smallest set of tokens whose cumulative probability exceeds p, then renormalizes.
top_p: 1.0: disables nucleus sampling.Top-k caps the candidate count at k.
top_k: 1: equivalent to greedy.top_k: 0: disabled (default when unset).Min-p scales with the most likely token's probability. The threshold is min_p times the top-token probability; everything below is dropped.
When the model is confident, min-p filters more tokens. When uncertain, it filters fewer.
DRY penalizes sequences reproducing spans from preceding text.
Parameters:
dry_multiplier: penalty strength.dry_base: exponent base for penalty scaling.dry_allowed_length: match length before the penalty applies.dry_sequence_breakers: tokens that reset matching.Off by default.
presence_penalty: flat penalty on tokens that appeared at all.frequency_penalty: penalty proportional to occurrence count.Both are OpenAI-compatible.
When multiple filters are active, application order is:
All parameters work on the HTTP API, in SDK request types, and in interactive-mode slash commands (/temperature, /topk, /topp). Slash-command values persist between requests; per-request API values override them.
For deployment-wide defaults, the CLI TOML config has a [sampling] section applied to requests not specifying a parameter.
seed in the request controls randomness. Identical seeds with identical prompts produce identical output.