Back to Promptfoo

redteam-tracing-example (Red Team Tracing)

examples/redteam-tracing-example/README.md

0.121.97.9 KB
Original Source

redteam-tracing-example (Red Team Tracing)

You can run this example with:

bash
npx promptfoo@latest init --example redteam-tracing-example
cd redteam-tracing-example

This example demonstrates how to use tracing with red team strategies to provide attackers and graders with visibility into the internal operations of your LLM application.

Quick Start

1. Install dependencies:

bash
npm install

2. Start the mock traced server:

bash
npm run server

This starts an HTTP server on port 3110 that:

  • Accepts chat requests
  • Generates OTLP trace spans (LLM calls, guardrails, tools)
  • Sends spans to promptfoo's OTLP receiver

3. Test the server (optional):

bash
# In another terminal
./test-server.sh

4. Run the red team evaluation:

bash
# In another terminal (from the project root)
npm run local -- eval -c examples/redteam-tracing-example/promptfooconfig.yaml

5. View the results:

bash
npm run local -- view

You'll see trace data in:

  • Attack prompts (when includeInAttack: true)
  • Grading context (when includeInGrading: true)
  • Test metadata (traceSnapshots)

Troubleshooting

Server not responding?

bash
# Check if server is running
curl http://localhost:3110/health

# Test basic request
curl -X POST http://localhost:3110/chat \
  -H "Content-Type: application/json" \
  -d '{"prompt": "test"}'

No traces appearing?

  • Make sure the server is emitting to the correct OTLP endpoint (check server logs)
  • Verify promptfoo's OTLP receiver is enabled in config (tracing.enabled: true)
  • Check that traceparent headers are being passed (set in provider context)

What is Red Team Tracing?

Red team tracing allows adversarial strategies to see what happens inside your LLM application during an attack, including:

  • Tool calls and their results
  • Guardrail decisions
  • Internal LLM calls
  • Error conditions
  • Performance metrics

This information can help:

  1. Attack generation: Craft more effective attacks by understanding how the system responds internally
  2. Grading: Make more informed decisions about whether an attack succeeded by seeing internal behavior

Configuration

Basic Configuration

Enable tracing in your promptfooconfig.yaml:

yaml
redteam:
  tracing:
    # Enable tracing for all strategies
    enabled: true

    # Include trace data in attack generation (default: true)
    includeInAttack: true

    # Include trace data in grading (default: true)
    includeInGrading: true

  plugins:
    - harmful
    - pii

  strategies:
    - crescendo
    - goat

Advanced Configuration

Configure tracing behavior:

yaml
redteam:
  tracing:
    enabled: true

    # Include internal spans (e.g., tokenization, parsing)
    includeInternalSpans: false

    # Maximum number of spans to fetch per iteration
    maxSpans: 50

    # Maximum depth of nested spans to fetch
    maxDepth: 5

    # Retry configuration for fetching traces
    maxRetries: 3
    retryDelayMs: 500

    # Filter spans by name pattern (optional)
    spanFilter:
      - 'llm.*'
      - 'tool.*'
      - 'guardrail.*'

    # Sanitize sensitive attributes (recommended)
    sanitizeAttributes: true

Strategy-Specific Configuration

Different strategies may need different tracing settings:

yaml
redteam:
  tracing:
    enabled: true

    # Strategy-specific overrides
    strategies:
      # Crescendo benefits from seeing guardrail decisions
      crescendo:
        includeInAttack: true
        includeInGrading: true
        spanFilter:
          - 'guardrail.*'
          - 'llm.*'

      # GOAT can use tool call information
      goat:
        includeInAttack: true
        spanFilter:
          - 'tool.*'
          - 'llm.*'

      # Iterative may want full trace data
      iterative:
        includeInAttack: true
        includeInGrading: true
        maxSpans: 100

Test-Level Configuration

Override tracing for specific tests:

yaml
tests:
  - description: 'Test with custom tracing'
    vars:
      query: 'Tell me about sensitive data'
    metadata:
      tracing:
        enabled: true
        includeInAttack: true
        includeInGrading: true
        maxSpans: 200

How Tracing Works

1. Attack Generation

When includeInAttack: true, the attacker receives a trace summary like:

text
Trace 0af76519 • 5 spans

Execution Flow:
1. [1.2s] llm.generate (client) | model=gpt-4
2. [300ms] guardrail.check (internal) | tool=content-filter
3. [150ms] tool.database_query (server) | tool=search
4. [50ms] guardrail.check (internal) | ERROR: Rate limit exceeded
5. [800ms] llm.generate (client) | model=gpt-4

Key Observations:
• Guardrail content-filter decision: blocked
• Tool call search via "tool.database_query" (duration 150ms)
• Error span "guardrail.check" (span-4): Rate limit exceeded

The attacker can use this information to craft better attacks (e.g., targeting the rate limit error).

2. Grading

When includeInGrading: true, graders receive the same trace context and can make more informed decisions:

typescript
// Grader receives:
{
  prompt: "...",
  llmOutput: "...",
  test: {...},
  gradingContext: {
    traceContext: {
      traceId: "...",
      spans: [...],
      insights: [...]
    },
    traceSummary: "..."
  }
}

Best Practices

1. Start with Default Settings

The default configuration works well for most use cases:

yaml
redteam:
  tracing:
    enabled: true

2. Use spanFilter for Focused Analysis

If you only care about specific operations:

yaml
redteam:
  tracing:
    enabled: true
    spanFilter:
      - 'guardrail.*' # Only guardrail spans
      - 'tool.*' # Only tool calls

3. Keep sanitizeAttributes Enabled

Always sanitize attributes in production:

yaml
redteam:
  tracing:
    enabled: true
    sanitizeAttributes: true # Recommended

4. Adjust maxSpans Based on Complexity

  • Simple apps: maxSpans: 20
  • Medium complexity: maxSpans: 50 (default)
  • Complex agentic systems: maxSpans: 100-200

5. Use Strategy-Specific Overrides

Different strategies benefit from different trace data:

  • Crescendo: Needs guardrail information
  • GOAT: Benefits from tool call traces
  • Iterative: Can use comprehensive trace data

Security Considerations

Sensitive Data

Tracing can expose sensitive information. Always:

  1. Use sanitizeAttributes: true (default)
  2. Review trace data before sharing
  3. Consider disabling tracing for production testing

Performance

Tracing adds overhead:

  • Fetching traces: ~100-500ms per iteration
  • Processing spans: Minimal overhead
  • Storage: Trace metadata is stored in test results

To minimize impact:

  • Use maxSpans to limit data fetched
  • Set appropriate maxRetries and retryDelayMs
  • Consider disabling for large-scale testing

Debugging

Enable Debug Logging

bash
PROMPTFOO_LOG_LEVEL=debug npm run local -- eval -c redteam.yaml

Check Trace Store

Verify traces are being recorded:

bash
# View traces in the database
npm run db:studio

Test Trace Fetching

typescript
import { fetchTraceContext } from './src/tracing/traceContext';

const trace = await fetchTraceContext('your-trace-id', {
  maxSpans: 50,
  maxDepth: 5,
});
console.log(trace);

Examples

See the example configurations:

  • promptfooconfig.yaml - Basic tracing setup
  • promptfooconfig.advanced.yaml - Advanced configuration
  • promptfooconfig-simple.yaml - Simplified configuration

Troubleshooting

No Traces Appearing

  1. Check that your provider supports tracing (must send traceparent header)
  2. Verify OTLP receiver is running
  3. Check debug logs for trace fetch errors

Traces Not Used in Attacks

  1. Verify includeInAttack: true
  2. Check that traces are being fetched (debug logs)
  3. Ensure trace fetch completes before attack generation

Performance Issues

  1. Reduce maxSpans and maxDepth
  2. Use spanFilter to limit data
  3. Increase retryDelayMs to reduce fetch frequency