site/docs/red-team/mitre-atlas.md
MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) is a knowledge base of adversary tactics and techniques based on real-world observations of attacks against machine learning systems. Modeled after the MITRE ATT&CK framework, ATLAS provides a structured way to understand and defend against threats specific to AI and ML systems.
ATLAS organizes adversarial techniques into tactics that represent the adversary's objectives during an attack. For LLM applications, these tactics help identify potential attack vectors throughout the AI system lifecycle.
ATLAS currently organizes adversarial ML techniques into these tactics. Promptfoo's mitre:atlas preset accepts all current tactic aliases; tactics with no direct promptfoo plugin stay in the preset as explicit coverage gaps.
| ATLAS tactic | Promptfoo alias | Coverage |
|---|---|---|
| Reconnaissance | mitre:atlas:reconnaissance | Mapped checks |
| Resource Development | mitre:atlas:resource-development | Mapped checks |
| Initial Access | mitre:atlas:initial-access | Mapped checks |
| AI Model Access | mitre:atlas:ai-model-access | Coverage gap: no direct checks |
| Execution | mitre:atlas:execution | Mapped checks |
| Persistence | mitre:atlas:persistence | Mapped checks |
| Privilege Escalation | mitre:atlas:privilege-escalation | Mapped checks |
| Defense Evasion | mitre:atlas:defense-evasion | Mapped checks |
| Credential Access | mitre:atlas:credential-access | Mapped checks |
| Discovery | mitre:atlas:discovery | Mapped checks |
| Lateral Movement | mitre:atlas:lateral-movement | Mapped checks |
| Collection | mitre:atlas:collection | Mapped checks |
| AI Attack Staging | mitre:atlas:ai-attack-staging | Mapped checks |
| Command and Control | mitre:atlas:command-and-control | Mapped checks |
| Exfiltration | mitre:atlas:exfiltration | Mapped checks |
| Impact | mitre:atlas:impact | Mapped checks |
:::note
mitre:atlas:ml-attack-staging is still accepted for older configs. New configs should use mitre:atlas:ai-attack-staging.
:::
Promptfoo helps identify ATLAS-aligned vulnerabilities through comprehensive red teaming. To set up ATLAS scanning through the Promptfoo UI, select the MITRE ATLAS option or configure it directly:
redteam:
plugins:
- mitre:atlas
strategies:
- jailbreak
- prompt-injection
Or target specific tactics:
redteam:
plugins:
- mitre:atlas:reconnaissance
- mitre:atlas:persistence
- mitre:atlas:credential-access
Reconnaissance involves adversaries gathering information about the ML system to plan subsequent attacks. For LLM applications, this includes discovering system capabilities, extracting prompts, understanding access controls, and identifying competitive intelligence.
Attackers use reconnaissance to:
Test for reconnaissance vulnerabilities:
Example configuration:
redteam:
language: ['en', 'es', 'fr'] # Test in multiple languages
plugins:
- competitors
- policy
- prompt-extraction
- rbac
strategies:
- jailbreak
Or use the ATLAS shorthand:
redteam:
plugins:
- mitre:atlas:reconnaissance
Resource Development involves adversaries creating, purchasing, or compromising resources to support targeting. This includes developing malicious prompts, acquiring tools, or obtaining infrastructure to launch attacks against ML systems.
Attackers develop resources to:
Test for vulnerability to resource development attacks:
Example configuration:
redteam:
plugins:
- harmful:cybercrime
- harmful:illegal-drugs
- harmful:indiscriminate-weapons
Or use the ATLAS shorthand:
redteam:
plugins:
- mitre:atlas:resource-development
Initial Access consists of techniques that adversaries use to gain their initial foothold in the ML system. For LLM applications, this often involves exploiting input validation weaknesses or using prompt injection to bypass security controls.
Attackers gain initial access through:
Test for initial access vulnerabilities:
Example configuration:
redteam:
plugins:
- debug-access
- harmful:cybercrime
- shell-injection
- sql-injection
- ssrf
strategies:
- base64
- jailbreak
- leetspeak
- prompt-injection
- rot13
Or use the ATLAS shorthand:
redteam:
plugins:
- mitre:atlas:initial-access
AI Attack Staging involves techniques that adversaries use to prepare and position attacks against an AI model or AI-enabled system. This includes poisoning inputs, manipulating model behavior, and exploiting AI-specific vulnerabilities.
Attackers stage ML-specific attacks by:
Test for ML attack staging vulnerabilities:
Example configuration:
redteam:
plugins:
- ascii-smuggling
- excessive-agency
- hallucination
- indirect-prompt-injection
strategies:
- jailbreak
- jailbreak:tree
Or use the ATLAS shorthand:
redteam:
plugins:
- mitre:atlas:ai-attack-staging
Exfiltration involves techniques that adversaries use to steal data, including personally identifiable information (PII), training data, proprietary prompts, or other sensitive information from the ML system.
Attackers exfiltrate information by:
Test for exfiltration vulnerabilities:
Example configuration:
redteam:
plugins:
- ascii-smuggling
- harmful:privacy
- indirect-prompt-injection
- pii:api-db
- pii:direct
- pii:session
- pii:social
- prompt-extraction
Or use the ATLAS shorthand:
redteam:
plugins:
- mitre:atlas:exfiltration
Impact consists of techniques that adversaries use to disrupt, degrade, or destroy the ML system or manipulate its outputs. For LLM applications, this includes hijacking the system's purpose, generating harmful content, causing it to imitate others, or taking excessive agency actions.
Attackers create impact by:
Test for impact vulnerabilities:
Example configuration:
redteam:
plugins:
- excessive-agency
- harmful
- hijacking
- imitation
strategies:
- crescendo
Or use the ATLAS shorthand:
redteam:
plugins:
- mitre:atlas:impact
For complete MITRE ATLAS threat coverage across all tactics:
redteam:
language: ['en', 'es', 'fr'] # Test in multiple languages
plugins:
- mitre:atlas
strategies:
- jailbreak
- prompt-injection
- base64
- rot13
This configuration tests your AI system against all MITRE ATLAS tactics, providing comprehensive adversarial threat assessment.
While MITRE ATT&CK focuses on traditional IT systems, MITRE ATLAS extends the framework to address ML-specific threats:
| Aspect | MITRE ATT&CK | MITRE ATLAS |
|---|---|---|
| Focus | IT systems, networks | ML systems, AI models |
| Techniques | Traditional cyber attacks | ML-specific attacks |
| Targets | Servers, endpoints | Models, training data |
| Example | Credential dumping | Model inversion |
For LLM applications, both frameworks are relevant:
MITRE ATLAS complements other security frameworks:
You can combine ATLAS testing with other frameworks:
redteam:
plugins:
- mitre:atlas
- owasp:llm
- nist:ai:measure
strategies:
- jailbreak
- prompt-injection
When using MITRE ATLAS for LLM red teaming:
MITRE ATLAS documents real-world attacks against ML systems. For LLMs, examples include:
Promptfoo's plugins map to these specific ATLAS techniques, enabling targeted testing.
MITRE ATLAS is actively maintained and updated with new techniques as the threat landscape evolves. Regular testing with Promptfoo helps ensure your LLM applications remain protected against documented adversarial ML tactics.
To learn more about setting up comprehensive AI red teaming, see Introduction to LLM red teaming and Configuration details.