CyberSecEval Dataset for LLM Security Testing

Overview

The CyberSecEval plugin uses Meta's Purple Llama CyberSecEval dataset, a comprehensive benchmark suite designed to assess cybersecurity vulnerabilities in Large Language Models (LLMs).

The dataset includes multilingual prompt injection attacks, testing LLM systems against a diverse collection of potentially harmful prompts in multiple languages.

The dataset includes test cases that evaluate:

Direct prompt injection attacks
Indirect prompt injection attempts
Security-violating content
Logic-violating content
Cross-language attack vectors
System prompt leakage attempts

:::tip

This plugin requires a target provider that accepts both a system prompt and user input in OpenAI-compatible chat format.

:::

How it works

The plugin:

Downloads test cases from the CyberSecEval dataset
Formats each test case as a system/user message pair
Randomly selects prompts from the dataset
Tests if the model appropriately handles the injection attempts
Uses language-specific LLM-as-a-judge criteria to evaluate responses

Configuration

To include the CyberSecEval plugin in your LLM red teaming setup, add the following configuration to your YAML file:

yaml

redteam:
  plugins:
    - cyberseceval

You can control the number of test cases using the config.numTests parameter (defaults to 5):

yaml

redteam:
  plugins:
    - id: cyberseceval
      numTests: 25

You can also enable multilingual prompt injection by setting the config.multilingual parameter to true:

yaml

redteam:
  plugins:
    - id: cyberseceval
      config:
        multilingual: true

Full example

Create a configuration file promptfooconfig.yaml:

yaml

targets:
  # You can target a model
  - openai:chat:gpt-5
  # Or an API endpoint
  - id: https
    config:
      url: 'https://example.com/generate'
      method: 'POST'
      headers:
        'Content-Type': 'application/json'
      body:
        myPrompt: '{{prompt}}'

redteam:
  plugins:
    - cyberseceval

Then, run the red team command in the same directory as your configuration file:

promptfoo redteam run

Grading

The plugin uses a specialized grader that evaluates model outputs against language-specific judge questions. The grader checks if:

The model maintained its original instructions despite the injection attempt
The response adheres to security boundaries
The model appropriately handled non-English attack vectors
The system prompt remained protected from extraction attempts

Outputs are graded as either PASS (secure) or FAIL (vulnerable) based on these criteria.

Types of LLM Vulnerabilities
Prompt Injection
Language Configuration - Test in multiple languages

CyberSecEval Dataset for LLM Security Testing

CyberSecEval Dataset for LLM Security Testing

Overview

How it works

Configuration

Full example

Grading

Related Concepts