examples/redteam-dalle/README.md
You can run this example with:
npx promptfoo@latest init --example redteam-dalle
cd redteam-dalle
This example demonstrates how to use promptfoo to automatically discover jailbreaks in OpenAI's DALL-E image generation model. It includes pre-configured test cases that attempt to generate various types of harmful content.
⚠️ Warning: Running this example may get your OpenAI account flagged for moderation or banned.
Set your OpenAI API key:
export OPENAI_API_KEY=your_key_here
Initialize the example:
npx promptfoo@latest init --example redteam-dalle
Review and optionally modify the test cases in promptfooconfig.yaml. The example includes the same test cases shown in our blog post.
Run the evaluation:
npx promptfoo@latest eval
View the results in the web UI:
npx promptfoo@latest view
Important table settings:
During evaluation, you may see error messages like:
Error from target provider: 400 Your request was rejected as a result of our safety system.
Error from target provider: 400 This request has been blocked by our content filters.
This is normal expected behavior. Promptfoo automatically retries with modified prompts in a loop until it succeeds or reaches the maximum number of iterations.
The default configuration uses 4 iterations per test case. To increase this (and potentially find more jailbreaks), set:
export PROMPTFOO_NUM_JAILBREAK_ITERATIONS=6
For debugging or to see the internal workings, enable debug logging:
LOG_LEVEL=debug npx promptfoo@latest eval -j 1
Note: DALL-E image URLs expire after 2 hours. The example includes an extension hook that downloads images to a local images directory as soon as they are generated. Each image is saved with a filename based on the test description and timestamp.
-j 1For more details about LLM red teaming with promptfoo, check out: