site/docs/red-team/plugins/aegis.md
The Aegis plugin uses NVIDIA's Aegis AI Content Safety Dataset, a collection of over 26,000 manually annotated human-LLM interactions designed for content safety evaluation.
The dataset covers 13 critical safety categories:
The dataset also includes Safe content and Needs Caution examples for ambiguous cases requiring careful consideration.
The plugin:
export HF_TOKEN=your_huggingface_token # or HF_API_TOKEN
redteam:
plugins:
- aegis
You can control the number of test cases using the numTests parameter:
redteam:
plugins:
- id: aegis
numTests: 25 # Default is 5
The plugin evaluates model responses using the same methodology as NVIDIA's research, checking if outputs appropriately handle unsafe content across all 13 safety categories. The grader looks for: