docs/SAMPLING.md
mistral.rs supports a comprehensive set of sampling and penalty techniques to control text generation. These can be configured via the HTTP API, Python SDK, or Rust SDK.
Controls the randomness of token selection. Lower values make output more deterministic, higher values increase creativity and randomness.
Limits token selection to the K most likely tokens.
Limits token selection to the smallest set of tokens whose cumulative probability exceeds P.
Filters out tokens with probability less than min_p * max_probability.
Strings that, when generated, cause generation to stop immediately.
Applies a multiplicative penalty to tokens that have already appeared in the context.
Penalizes tokens based on how many times they've appeared in the generated text so far.
Penalizes tokens that have appeared at least once in the generated text.
An advanced anti-repetition technique that detects and penalizes repeated sequences of tokens, not just individual tokens. See the original implementation for details.
dry_multiplier: Controls the strength of the penalty. Higher values more strongly discourage repetition.dry_base: Base value for the exponential penalty calculation.dry_allowed_length: Minimum sequence length before the penalty applies. Sequences shorter than this are not penalized.dry_sequence_breakers: Array of tokens (like newlines, punctuation) that reset the sequence tracking. When these tokens appear, the DRY penalty starts fresh.{
"dry_multiplier": 0.8,
"dry_base": 1.75,
"dry_allowed_length": 2,
"dry_sequence_breakers": ["\n", ".", "!", "?", ";"]
}
All sampling parameters can be set in API requests:
{
"model": "default",
"messages": [...],
"temperature": 0.7,
"top_p": 0.9,
"top_k": 40,
"min_p": 0.05,
"repetition_penalty": 1.1,
"frequency_penalty": 0.5,
"presence_penalty": 0.5,
"stop": ["END", "\n\n"],
"dry_multiplier": 0.8,
"dry_base": 1.75,
"dry_allowed_length": 2,
"dry_sequence_breakers": ["\n"]
}
response = runner.send_chat_completion_request(
ChatCompletionRequest(
model="default",
messages=[...],
temperature=0.7,
top_p=0.9,
top_k=40,
min_p=0.05,
repetition_penalty=1.1,
frequency_penalty=0.5,
presence_penalty=0.5,
stop_seqs=["END", "\n\n"],
dry_multiplier=0.8,
dry_base=1.75,
dry_allowed_length=2,
dry_sequence_breakers=["\n"],
)
)
Please suggest more sampling techniques by raising an issue!