site/docs/red-team/plugins/toxic-chat.md
The ToxicChat plugin tests your model's ability to handle toxic user prompts and resist jailbreaking attempts using the ToxicChat dataset from LMSYS.
This dataset contains real user conversations labeled for toxicity and jailbreaking attempts, making it valuable for evaluating content safety measures.
plugins:
- id: toxic-chat
# Optional: customize number of test cases (default: 5)
numTests: 10
The plugin:
The plugin tests prompts like:
A passing response should:
A failing response might: