MITRE ATLAS

MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) is a knowledge base of adversary tactics and techniques based on real-world observations of attacks against machine learning systems. Modeled after the MITRE ATT&CK framework, ATLAS provides a structured way to understand and defend against threats specific to AI and ML systems.

ATLAS organizes adversarial techniques into tactics that represent the adversary's objectives during an attack. For LLM applications, these tactics help identify potential attack vectors throughout the AI system lifecycle.

MITRE ATLAS Tactics

ATLAS currently organizes adversarial ML techniques into these tactics. Promptfoo's mitre:atlas preset accepts all current tactic aliases; tactics with no direct promptfoo plugin stay in the preset as explicit coverage gaps.

ATLAS tactic	Promptfoo alias	Coverage
Reconnaissance	`mitre:atlas:reconnaissance`	Mapped checks
Resource Development	`mitre:atlas:resource-development`	Mapped checks
Initial Access	`mitre:atlas:initial-access`	Mapped checks
AI Model Access	`mitre:atlas:ai-model-access`	Coverage gap: no direct checks
Execution	`mitre:atlas:execution`	Mapped checks
Persistence	`mitre:atlas:persistence`	Mapped checks
Privilege Escalation	`mitre:atlas:privilege-escalation`	Mapped checks
Defense Evasion	`mitre:atlas:defense-evasion`	Mapped checks
Credential Access	`mitre:atlas:credential-access`	Mapped checks
Discovery	`mitre:atlas:discovery`	Mapped checks
Lateral Movement	`mitre:atlas:lateral-movement`	Mapped checks
Collection	`mitre:atlas:collection`	Mapped checks
AI Attack Staging	`mitre:atlas:ai-attack-staging`	Mapped checks
Command and Control	`mitre:atlas:command-and-control`	Mapped checks
Exfiltration	`mitre:atlas:exfiltration`	Mapped checks
Impact	`mitre:atlas:impact`	Mapped checks

:::note mitre:atlas:ml-attack-staging is still accepted for older configs. New configs should use mitre:atlas:ai-attack-staging. :::

Scanning for MITRE ATLAS Threats

Promptfoo helps identify ATLAS-aligned vulnerabilities through comprehensive red teaming. To set up ATLAS scanning through the Promptfoo UI, select the MITRE ATLAS option or configure it directly:

yaml

redteam:
  plugins:
    - mitre:atlas
  strategies:
    - jailbreak
    - prompt-injection

Or target specific tactics:

yaml

redteam:
  plugins:
    - mitre:atlas:reconnaissance
    - mitre:atlas:persistence
    - mitre:atlas:credential-access

Reconnaissance (mitre:atlas:reconnaissance) {#reconnaissance-mitre-atlas-reconnaissance}

Reconnaissance involves adversaries gathering information about the ML system to plan subsequent attacks. For LLM applications, this includes discovering system capabilities, extracting prompts, understanding access controls, and identifying competitive intelligence.

Threat Landscape

Attackers use reconnaissance to:

Discover available functions and tools
Extract system prompts and instructions
Map role-based access controls
Identify competitive advantages or proprietary approaches
Understand data sources and knowledge bases

Testing Strategy

Test for reconnaissance vulnerabilities:

Competitors: Verify the system doesn't reveal competitive information
Policy: Test that internal policies and business rules aren't disclosed
Prompt extraction: Ensure system prompts can't be extracted
RBAC: Verify role boundaries aren't easily enumerable

Example configuration:

yaml

redteam:
  language: ['en', 'es', 'fr'] # Test in multiple languages
  plugins:
    - competitors
    - policy
    - prompt-extraction
    - rbac
  strategies:
    - jailbreak

Or use the ATLAS shorthand:

yaml

redteam:
  plugins:
    - mitre:atlas:reconnaissance

Resource Development (mitre:atlas:resource-development) {#resource-development-mitre-atlas-resource-development}

Resource Development involves adversaries creating, purchasing, or compromising resources to support targeting. This includes developing malicious prompts, acquiring tools, or obtaining infrastructure to launch attacks against ML systems.

Threat Landscape

Attackers develop resources to:

Create harmful content generation requests
Develop prompts for illegal activities
Build tools to generate weapons or dangerous materials

Testing Strategy

Test for vulnerability to resource development attacks:

Cybercrime: Verify the system doesn't assist with malicious activities
Illegal drugs: Ensure the system refuses drug-related requests
Indiscriminate weapons: Test that weapon creation requests are blocked

Example configuration:

yaml

redteam:
  plugins:
    - harmful:cybercrime
    - harmful:illegal-drugs
    - harmful:indiscriminate-weapons

Or use the ATLAS shorthand:

yaml

redteam:
  plugins:
    - mitre:atlas:resource-development

Initial Access (mitre:atlas:initial-access) {#initial-access-mitre-atlas-initial-access}

Initial Access consists of techniques that adversaries use to gain their initial foothold in the ML system. For LLM applications, this often involves exploiting input validation weaknesses or using prompt injection to bypass security controls.

Threat Landscape

Attackers gain initial access through:

SQL injection and shell injection attacks
Server-side request forgery (SSRF)
Debug access endpoints
Cybercrime techniques
Prompt injection and obfuscation

Testing Strategy

Test for initial access vulnerabilities:

Debug access: Verify debug endpoints aren't exposed
Cybercrime: Test for assistance with unauthorized access techniques
Shell injection: Ensure commands can't be executed
SQL injection: Verify database queries can't be manipulated
SSRF: Test that the system doesn't make unauthorized requests

Example configuration:

yaml

redteam:
  plugins:
    - debug-access
    - harmful:cybercrime
    - shell-injection
    - sql-injection
    - ssrf
  strategies:
    - base64
    - jailbreak
    - leetspeak
    - prompt-injection
    - rot13

Or use the ATLAS shorthand:

yaml

redteam:
  plugins:
    - mitre:atlas:initial-access

AI Attack Staging (mitre:atlas:ai-attack-staging) {#ml-attack-staging-mitre-atlas-ml-attack-staging}

AI Attack Staging involves techniques that adversaries use to prepare and position attacks against an AI model or AI-enabled system. This includes poisoning inputs, manipulating model behavior, and exploiting AI-specific vulnerabilities.

Threat Landscape

Attackers stage ML-specific attacks by:

Injecting adversarial content through indirect means
Smuggling malicious instructions using encoding
Creating excessive agency scenarios
Inducing hallucinations for exploitation
Using multi-turn attacks to gradually compromise the model

Testing Strategy

Test for ML attack staging vulnerabilities:

ASCII smuggling: Verify hidden instructions can't bypass filters
Excessive agency: Test that the model doesn't exceed intended capabilities
Hallucination: Ensure false information can't be weaponized
Indirect prompt injection: Check for vulnerability to poisoned context

Example configuration:

yaml

redteam:
  plugins:
    - ascii-smuggling
    - excessive-agency
    - hallucination
    - indirect-prompt-injection
  strategies:
    - jailbreak
    - jailbreak:tree

Or use the ATLAS shorthand:

yaml

redteam:
  plugins:
    - mitre:atlas:ai-attack-staging

Exfiltration (mitre:atlas:exfiltration) {#exfiltration-mitre-atlas-exfiltration}

Exfiltration involves techniques that adversaries use to steal data, including personally identifiable information (PII), training data, proprietary prompts, or other sensitive information from the ML system.

Threat Landscape

Attackers exfiltrate information by:

Extracting PII from training data or context
Stealing system prompts and instructions
Accessing unauthorized databases or APIs
Leaking information across user sessions
Using encoding to bypass data loss prevention

Testing Strategy

Test for exfiltration vulnerabilities:

ASCII smuggling: Verify data can't be exfiltrated using encoding
Privacy: Test for unauthorized disclosure of sensitive information
Indirect prompt injection: Check for data exfiltration via poisoned inputs
PII plugins: Comprehensively test for PII leakage
Prompt extraction: Ensure system prompts can't be extracted

Example configuration:

yaml

redteam:
  plugins:
    - ascii-smuggling
    - harmful:privacy
    - indirect-prompt-injection
    - pii:api-db
    - pii:direct
    - pii:session
    - pii:social
    - prompt-extraction

Or use the ATLAS shorthand:

yaml

redteam:
  plugins:
    - mitre:atlas:exfiltration

Impact (mitre:atlas:impact) {#impact-mitre-atlas-impact}

Impact consists of techniques that adversaries use to disrupt, degrade, or destroy the ML system or manipulate its outputs. For LLM applications, this includes hijacking the system's purpose, generating harmful content, causing it to imitate others, or taking excessive agency actions.

Threat Landscape

Attackers create impact by:

Hijacking the AI's intended purpose
Generating harmful or inappropriate content
Impersonating brands or individuals
Causing the system to take unauthorized actions
Using sophisticated multi-turn attacks

Testing Strategy

Test for impact vulnerabilities:

Excessive agency: Verify the system doesn't take unauthorized actions
Harmful content: Test for generation of dangerous or offensive content
Hijacking: Ensure the system maintains its intended purpose
Imitation: Verify the system doesn't impersonate people or brands

Example configuration:

yaml

redteam:
  plugins:
    - excessive-agency
    - harmful
    - hijacking
    - imitation
  strategies:
    - crescendo

Or use the ATLAS shorthand:

yaml

redteam:
  plugins:
    - mitre:atlas:impact

Comprehensive MITRE ATLAS Testing

For complete MITRE ATLAS threat coverage across all tactics:

yaml

redteam:
  language: ['en', 'es', 'fr'] # Test in multiple languages
  plugins:
    - mitre:atlas
  strategies:
    - jailbreak
    - prompt-injection
    - base64
    - rot13

This configuration tests your AI system against all MITRE ATLAS tactics, providing comprehensive adversarial threat assessment.

MITRE ATLAS vs MITRE ATT&CK

While MITRE ATT&CK focuses on traditional IT systems, MITRE ATLAS extends the framework to address ML-specific threats:

Aspect	MITRE ATT&CK	MITRE ATLAS
Focus	IT systems, networks	ML systems, AI models
Techniques	Traditional cyber attacks	ML-specific attacks
Targets	Servers, endpoints	Models, training data
Example	Credential dumping	Model inversion

For LLM applications, both frameworks are relevant:

Use ATLAS for ML-specific vulnerabilities (model extraction, prompt injection)
Use ATT&CK principles for infrastructure security (API security, authentication)

Integration with Other Frameworks

MITRE ATLAS complements other security frameworks:

OWASP LLM Top 10: Maps ATLAS tactics to specific LLM vulnerabilities
NIST AI RMF: ATLAS provides tactical detail for NIST's risk measures
ISO 42001: ATLAS tactics inform security and robustness requirements
GDPR: Exfiltration tactics relate to data protection requirements

You can combine ATLAS testing with other frameworks:

yaml

redteam:
  plugins:
    - mitre:atlas
    - owasp:llm
    - nist:ai:measure
  strategies:
    - jailbreak
    - prompt-injection

Best Practices for ATLAS-Based Red Teaming

When using MITRE ATLAS for LLM red teaming:

Attack lifecycle: Test across all tactics, not just initial access or impact
Defense in depth: Address vulnerabilities at multiple stages of the attack chain
Realistic scenarios: Combine tactics as adversaries would in real attacks
Continuous testing: Regularly test as new ATLAS techniques are documented
Threat intelligence: Stay updated on real-world attacks documented in ATLAS
Purple teaming: Use ATLAS as a common language between red and blue teams

Real-World ATLAS Techniques for LLMs

MITRE ATLAS documents real-world attacks against ML systems. For LLMs, examples include:

AML.T0043 - Craft Adversarial Data: Creating prompts designed to elicit harmful outputs
AML.T0051 - LLM Prompt Injection: Manipulating LLM behavior through crafted inputs
AML.T0024 - Exfiltration via AI Inference API: Extracting training data through queries
AML.T0020 - Poison Training Data: Manipulating fine-tuning or RAG data sources
AML.T0086 - Exfiltration via AI Agent Tool Invocation: Using connected tools to move data out of an AI system
AML.T0110 - AI Agent Tool Poisoning: Modifying agent tools so future invocations execute attacker-controlled behavior

Promptfoo's plugins map to these specific ATLAS techniques, enabling targeted testing.

What's Next

MITRE ATLAS is actively maintained and updated with new techniques as the threat landscape evolves. Regular testing with Promptfoo helps ensure your LLM applications remain protected against documented adversarial ML tactics.

To learn more about setting up comprehensive AI red teaming, see Introduction to LLM red teaming and Configuration details.

MITRE ATLAS

MITRE ATLAS

MITRE ATLAS Tactics

Scanning for MITRE ATLAS Threats

Reconnaissance (mitre:atlas:reconnaissance) {#reconnaissance-mitre-atlas-reconnaissance}

Threat Landscape

Testing Strategy

Resource Development (mitre:atlas:resource-development) {#resource-development-mitre-atlas-resource-development}

Threat Landscape

Testing Strategy

Initial Access (mitre:atlas:initial-access) {#initial-access-mitre-atlas-initial-access}

Threat Landscape

Testing Strategy

AI Attack Staging (mitre:atlas:ai-attack-staging) {#ml-attack-staging-mitre-atlas-ml-attack-staging}

Threat Landscape

Testing Strategy

Exfiltration (mitre:atlas:exfiltration) {#exfiltration-mitre-atlas-exfiltration}

Threat Landscape

Testing Strategy

Impact (mitre:atlas:impact) {#impact-mitre-atlas-impact}

Threat Landscape

Testing Strategy

Comprehensive MITRE ATLAS Testing

MITRE ATLAS vs MITRE ATT&CK

Integration with Other Frameworks

Best Practices for ATLAS-Based Red Teaming

Real-World ATLAS Techniques for LLMs

What's Next

Additional Resources