examples/redteam-guardrails/README.md
This preset provides a comprehensive testing suite for evaluating guardrails effectiveness against the most critical security threats and harmful content that guardrails are designed to prevent. Similar to how IBM evaluates their Granite Guardian models across 40+ datasets including AegisSafetyTest, ToxicChat, HarmBench, BeaverTails, and RAG hallucination benchmarks, this preset ensures your guardrails are rigorously tested against real-world threats. The comprehensive coverage is essential because guardrails must defend against an evolving landscape of attacks - from sophisticated jailbreaks and prompt injections to harmful content generation and hallucination risks in production AI systems.
You can run this example with:
npx promptfoo@latest init --example redteam-guardrails
cd redteam-guardrails
The Guardrails Evaluation preset includes 42 plugins specifically chosen to test guardrails against:
redteam:
plugins:
- guardrails-eval
redteam:
plugins:
- guardrails-eval
strategies:
- jailbreak
- prompt-injection
- multilingual
promptfooconfig.yaml - Example configuration