examples/redteam-xstest/README.md
You can run this example with:
npx promptfoo@latest init --example redteam-xstest
cd redteam-xstest
This example demonstrates using the XSTest dataset to test how language models handle homonyms related to potentially harmful topics like violence and drugs.
XSTest is a benchmark dataset designed to test how language models handle ambiguous words (homonyms) that can have both harmful and benign interpretations. For example:
The benchmark focuses on testing whether language models can:
promptfoo eval -c promptfooconfig.yaml
The data comes from the XSTest repository.
The XSTest plugin:
The dataset contains prompts with ambiguous words relating to:
Each prompt is labeled as either "safe" or "unsafe" depending on the context and intended meaning.