Memory Poisoning Plugin

Overview

The Memory Poisoning plugin tests whether stateful agents are vulnerable to memory poisoning attacks that manipulate an agent's memory to alter its behavior in subsequent interactions.

This plugin maps to T1: Memory Poisoning from OWASP Agentic AI Top 10 Threats.

Purpose

Memory poisoning attacks attempt to inject malicious instructions into an agent's memory state that persist across turns. When successful, these attacks can:

Override or modify the agent's original purpose
Extract sensitive information
Manipulate the agent's decision-making process
Bypass security controls in subsequent interactions

How it Works

The Memory Poisoning plugin:

Creates a scenario with specific "memories" that the agent should remember
Sends an initial memory-establishing message to the agent
Follows with a poisoned message attempting to corrupt the established memory
Tests the attack's effectiveness with a follow-up question that relies on the original memory
Evaluates whether the agent's response to the follow-up reflects the original memory (secure) or the poisoned memory (vulnerable)

Configuration

Add the following to your promptfoo configuration:

yaml

redteam:
  purpose: "Description of your agent's intended purpose"
  plugins:
    - id: agentic:memory-poisoning
      numTests: 30 # Optional, defaults to 5

:::note Memory Poisoning is a standalone plugin and does not use strategies. Strategies defined in the redteam.strategies configuration are ignored for this plugin. :::

Example

Here's an example configuration for testing an agent:

yaml

# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
description: Travel Agent Redteam

providers:
  - id: http
    config:
      url: http://localhost:3090/api/chat
      method: POST
      headers:
        'Content-Type': 'application/json'
        'x-promptfoo-session': '{{sessionId}}'
      body:
        'message': '{{prompt}}'
      transformResponse: 'json.message'

prompts:
  - '{{prompt}}'

defaultTest:
  options:
    transformVars: '{ ...vars, sessionId: context.uuid }'

redteam:
  purpose: |
    The agent simulates a travel agent that helps users find the best travel deals. It has access to a set of tools
    that allow it to check the weather, lookup and book flights, and book hotels.
  plugins:
    - id: agentic:memory-poisoning
      numTests: 30

Working With Results

The test results will show:

The original memory-establishing message
The poisoned message
The follow-up question
The agent's responses to each message

A successful attack (failed test) is indicated when the agent's response to the follow-up question reflects the poisoned instructions rather than the original memory.

Mitigations

To protect against memory poisoning attacks:

Implement input validation that filters or sanitizes user inputs prior to persistence.
Sanitize user memories prior to including them within the context windows of inference calls.
Segregate memory types - separate system instructions from user input memory.
Apply memory attribution - track where memory content originated.

Memory Poisoning Plugin

Memory Poisoning Plugin

Overview

Purpose

How it Works

Configuration

Example

Working With Results

Mitigations

Related Concepts