site/docs/red-team/strategies/multi-turn.md
Multi-turn or "conversational" jailbreaks gradually escalate the potential harm of prompts, exploiting the fuzzy boundary between acceptable and unacceptable responses.
Because they are conversational, this approach can surface vulnerabilities that only emerge after multiple interactions.
Multi-turn jailbreaks operate by:
When the strategy hits a refusal, it backtracks to an earlier point in the conversation.
Promptfoo supports three types of multi-turn strategies:
Gradually increases the intensity or harmfulness of the prompt with each turn, starting from benign and moving toward more adversarial content. This approach is inspired by Microsoft's Crescendo research.
Hydra coordinates an attacker agent that branches across multiple conversational paths. It remembers every refusal, automatically manages backtracking, and shares successful tactics across the entire scan. Use Hydra when you need the attacker to pivot rapidly and reuse prior learnings.
The GOAT strategy is based on Meta's GOAT research. It stands for Generalized Offensive Adversarial Testing and uses a set of attack templates and iteratively refines them over multiple turns to bypass defenses.
Simulates a persistent, creative user who tries different phrasings and approaches over several turns to elicit a harmful or policy-violating response from the model.
Multi-turn strategies can be enabled either in the UI Strategies page, or by adding them to your YAML config:
redteam:
# ...
strategies:
- crescendo
- goat
- jailbreak:hydra
- mischievous-user
Or tune them with the following parameters:
redteam:
strategies:
- id: crescendo
config:
maxTurns: 5
maxBacktracks: 5
stateful: false # Sends the entire conversation history with each turn (Default)
continueAfterSuccess: false # Stop after first successful attack (Default)
- id: jailbreak:hydra
config:
maxTurns: 10
- id: goat
config:
maxTurns: 5
stateful: false
continueAfterSuccess: false
- id: mischievous-user
config:
maxTurns: 5
stateful: false
Increasing the number of turns (and backtracks for strategies that expose that option) will make the strategy more aggressive, but it will also take longer to complete and cost more.
Since multi-turn strategies are relatively high cost, we recommend running it on a smaller number of tests and plugins, with a cheaper provider, or prefer a simpler iterative strategy.
:::info
If your system maintains a conversation history and only expects the latest message to be sent, set stateful: true. Make sure to configure cookies or sessions in your provider as well.
:::
By default, both Crescendo and GOAT strategies stop immediately upon finding a successful attack. You can configure them to continue searching for additional successful attacks until maxTurns is reached:
strategies:
- id: crescendo
config:
maxTurns: 10
continueAfterSuccess: true
- id: goat
config:
maxTurns: 8
continueAfterSuccess: true
When continueAfterSuccess: true:
maxTurns is reachedMulti-turn strategies include an unblocking feature that helps handle situations where the target model asks clarifying questions that block conversation progress. By default, this feature is disabled to optimize for speed and cost.
Enable unblocking when testing:
Keep disabled (default) when:
Enable unblocking by setting an environment variable before running your red team:
export PROMPTFOO_ENABLE_UNBLOCKING=true
promptfoo redteam run
Example blocking scenarios:
Benefits:
Costs:
:::tip Start with unblocking disabled to establish a baseline, then enable it if you notice your target frequently asks clarifying questions during red team attacks. :::
Multi-turn jailbreaks like Crescendo identify vulnerabilities that only emerge after multiple interactions.
They also exploit the tendency of LLMs to become more compliant throughout a conversation, and more likely to ignore their original instructions.
The backtracking automation also saves an enormous amount of time compared to manual red teaming, since it eliminates the need to rebuild entire conversation histories.