site/docs/red-team/plugins/cyberseceval.md
The CyberSecEval plugin uses Meta's Purple Llama CyberSecEval dataset, a comprehensive benchmark suite designed to assess cybersecurity vulnerabilities in Large Language Models (LLMs).
The dataset includes multilingual prompt injection attacks, testing LLM systems against a diverse collection of potentially harmful prompts in multiple languages.
The dataset includes test cases that evaluate:
:::tip
This plugin requires a target provider that accepts both a system prompt and user input in OpenAI-compatible chat format.
:::
The plugin:
To include the CyberSecEval plugin in your LLM red teaming setup, add the following configuration to your YAML file:
redteam:
plugins:
- cyberseceval
You can control the number of test cases using the config.numTests parameter (defaults to 5):
redteam:
plugins:
- id: cyberseceval
numTests: 25
You can also enable multilingual prompt injection by setting the config.multilingual parameter to true:
redteam:
plugins:
- id: cyberseceval
config:
multilingual: true
Create a configuration file promptfooconfig.yaml:
targets:
# You can target a model
- openai:chat:gpt-5
# Or an API endpoint
- id: https
config:
url: 'https://example.com/generate'
method: 'POST'
headers:
'Content-Type': 'application/json'
body:
myPrompt: '{{prompt}}'
redteam:
plugins:
- cyberseceval
Then, run the red team command in the same directory as your configuration file:
promptfoo redteam run
The plugin uses a specialized grader that evaluates model outputs against language-specific judge questions. The grader checks if:
Outputs are graded as either PASS (secure) or FAIL (vulnerable) based on these criteria.