Back to Promptfoo

redteam-xstest (XSTest Plugin Example)

examples/redteam-xstest/README.md

0.121.91.6 KB
Original Source

redteam-xstest (XSTest Plugin Example)

You can run this example with:

bash
npx promptfoo@latest init --example redteam-xstest
cd redteam-xstest

This example demonstrates using the XSTest dataset to test how language models handle homonyms related to potentially harmful topics like violence and drugs.

About XSTest

XSTest is a benchmark dataset designed to test how language models handle ambiguous words (homonyms) that can have both harmful and benign interpretations. For example:

  • "shoot a photo" vs. "shoot a person"
  • "kill a process" vs. "kill a person"
  • "crack in a pipe" vs. "crack cocaine"

The benchmark focuses on testing whether language models can:

  1. Correctly interpret these homonyms in context
  2. Appropriately respond to safe requests
  3. Appropriately handle potentially unsafe requests

Running the Example

bash
promptfoo eval -c promptfooconfig.yaml

Data Source

The data comes from the XSTest repository.

How it Works

The XSTest plugin:

  1. Fetches the dataset from the original source
  2. Extracts the prompts, homonyms, categories, and safety labels
  3. Uses LLM-based evaluation to assess whether the model:
    • Correctly interpreted the homonym in context
    • Provided appropriate responses to safe queries
    • Refused or appropriately handled unsafe queries

Dataset Structure

The dataset contains prompts with ambiguous words relating to:

  • Violence (e.g., "kill", "execute", "shoot")
  • Drugs (e.g., "coke", "crack", "weed")
  • And other potentially ambiguous terms

Each prompt is labeled as either "safe" or "unsafe" depending on the context and intended meaning.