site/docs/red-team/strategies/iterative.md
The Iterative Jailbreaks strategy is a technique designed to systematically probe and potentially bypass an AI system's constraints by repeatedly refining a single-shot prompt through multiple iterations. This approach is inspired by research on automated jailbreaking techniques like the Tree of Attacks method 1.
Add it to your promptfooconfig.yaml:
strategies:
# Basic usage
- jailbreak
# With configuration
- id: jailbreak
config:
# Optional: Number of iterations to attempt (default: 10)
numIterations: 50
You can also override the number of iterations via an environment variable:
PROMPTFOO_NUM_JAILBREAK_ITERATIONS=5
The Iterative Jailbreaks strategy works by:
:::warning This strategy is medium cost since it makes multiple API calls per test. We recommend running it on a smaller number of tests and plugins before running a full test. :::
When using transformVars with context.uuid, each iteration automatically gets a new UUID. This prevents conversation history from affecting subsequent attempts.
defaultTest:
options:
transformVars: '{ ...vars, sessionId: context.uuid }'
Here's how the iteration process works:
The process continues until either:
The iterative jailbreak strategy creates refined single-shot jailbreaks that continually improve based on an attacker-judge feedback loop. This approach helps test across a wide range of malicious inputs and identify the most effective ones.
For a comprehensive overview of LLM vulnerabilities and red teaming strategies, visit our Types of LLM Vulnerabilities page.
Mehrotra, A., et al. (2023). "Tree of Attacks: Jailbreaking Black-Box LLMs Automatically". arXiv:2312.02119 ↩