redteam-tracing-example (Red Team Tracing)

You can run this example with:

bash

npx promptfoo@latest init --example redteam-tracing-example
cd redteam-tracing-example

This example demonstrates how to use tracing with red team strategies to provide attackers and graders with visibility into the internal operations of your LLM application.

Quick Start

1. Install dependencies:

bash

npm install

2. Start the mock traced server:

bash

npm run server

This starts an HTTP server on port 3110 that:

Accepts chat requests
Generates OTLP trace spans (LLM calls, guardrails, tools)
Sends spans to promptfoo's OTLP receiver

3. Test the server (optional):

bash

# In another terminal
./test-server.sh

4. Run the red team evaluation:

bash

# In another terminal (from the project root)
npm run local -- eval -c examples/redteam-tracing-example/promptfooconfig.yaml

5. View the results:

bash

npm run local -- view

You'll see trace data in:

Attack prompts (when includeInAttack: true)
Grading context (when includeInGrading: true)
Test metadata (traceSnapshots)

Troubleshooting

Server not responding?

bash

# Check if server is running
curl http://localhost:3110/health

# Test basic request
curl -X POST http://localhost:3110/chat \
  -H "Content-Type: application/json" \
  -d '{"prompt": "test"}'

No traces appearing?

Make sure the server is emitting to the correct OTLP endpoint (check server logs)
Verify promptfoo's OTLP receiver is enabled in config (tracing.enabled: true)
Check that traceparent headers are being passed (set in provider context)

What is Red Team Tracing?

Red team tracing allows adversarial strategies to see what happens inside your LLM application during an attack, including:

Tool calls and their results
Guardrail decisions
Internal LLM calls
Error conditions
Performance metrics

This information can help:

Attack generation: Craft more effective attacks by understanding how the system responds internally
Grading: Make more informed decisions about whether an attack succeeded by seeing internal behavior

Configuration

Basic Configuration

Enable tracing in your promptfooconfig.yaml:

yaml

redteam:
  tracing:
    # Enable tracing for all strategies
    enabled: true

    # Include trace data in attack generation (default: true)
    includeInAttack: true

    # Include trace data in grading (default: true)
    includeInGrading: true

  plugins:
    - harmful
    - pii

  strategies:
    - crescendo
    - goat

Advanced Configuration

Configure tracing behavior:

yaml

redteam:
  tracing:
    enabled: true

    # Include internal spans (e.g., tokenization, parsing)
    includeInternalSpans: false

    # Maximum number of spans to fetch per iteration
    maxSpans: 50

    # Maximum depth of nested spans to fetch
    maxDepth: 5

    # Retry configuration for fetching traces
    maxRetries: 3
    retryDelayMs: 500

    # Filter spans by name pattern (optional)
    spanFilter:
      - 'llm.*'
      - 'tool.*'
      - 'guardrail.*'

    # Sanitize sensitive attributes (recommended)
    sanitizeAttributes: true

Strategy-Specific Configuration

Different strategies may need different tracing settings:

yaml

redteam:
  tracing:
    enabled: true

    # Strategy-specific overrides
    strategies:
      # Crescendo benefits from seeing guardrail decisions
      crescendo:
        includeInAttack: true
        includeInGrading: true
        spanFilter:
          - 'guardrail.*'
          - 'llm.*'

      # GOAT can use tool call information
      goat:
        includeInAttack: true
        spanFilter:
          - 'tool.*'
          - 'llm.*'

      # Iterative may want full trace data
      iterative:
        includeInAttack: true
        includeInGrading: true
        maxSpans: 100

Test-Level Configuration

Override tracing for specific tests:

yaml

tests:
  - description: 'Test with custom tracing'
    vars:
      query: 'Tell me about sensitive data'
    metadata:
      tracing:
        enabled: true
        includeInAttack: true
        includeInGrading: true
        maxSpans: 200

How Tracing Works

1. Attack Generation

When includeInAttack: true, the attacker receives a trace summary like:

text

Trace 0af76519 • 5 spans

Execution Flow:
1. [1.2s] llm.generate (client) | model=gpt-4
2. [300ms] guardrail.check (internal) | tool=content-filter
3. [150ms] tool.database_query (server) | tool=search
4. [50ms] guardrail.check (internal) | ERROR: Rate limit exceeded
5. [800ms] llm.generate (client) | model=gpt-4

Key Observations:
• Guardrail content-filter decision: blocked
• Tool call search via "tool.database_query" (duration 150ms)
• Error span "guardrail.check" (span-4): Rate limit exceeded

The attacker can use this information to craft better attacks (e.g., targeting the rate limit error).

2. Grading

When includeInGrading: true, graders receive the same trace context and can make more informed decisions:

typescript

// Grader receives:
{
  prompt: "...",
  llmOutput: "...",
  test: {...},
  gradingContext: {
    traceContext: {
      traceId: "...",
      spans: [...],
      insights: [...]
    },
    traceSummary: "..."
  }
}

Best Practices

1. Start with Default Settings

The default configuration works well for most use cases:

yaml

redteam:
  tracing:
    enabled: true

2. Use spanFilter for Focused Analysis

If you only care about specific operations:

yaml

redteam:
  tracing:
    enabled: true
    spanFilter:
      - 'guardrail.*' # Only guardrail spans
      - 'tool.*' # Only tool calls

3. Keep sanitizeAttributes Enabled

Always sanitize attributes in production:

yaml

redteam:
  tracing:
    enabled: true
    sanitizeAttributes: true # Recommended

4. Adjust maxSpans Based on Complexity

Simple apps: maxSpans: 20
Medium complexity: maxSpans: 50 (default)
Complex agentic systems: maxSpans: 100-200

5. Use Strategy-Specific Overrides

Different strategies benefit from different trace data:

Crescendo: Needs guardrail information
GOAT: Benefits from tool call traces
Iterative: Can use comprehensive trace data

Security Considerations

Sensitive Data

Tracing can expose sensitive information. Always:

Use sanitizeAttributes: true (default)
Review trace data before sharing
Consider disabling tracing for production testing

Performance

Tracing adds overhead:

Fetching traces: ~100-500ms per iteration
Processing spans: Minimal overhead
Storage: Trace metadata is stored in test results

To minimize impact:

Use maxSpans to limit data fetched
Set appropriate maxRetries and retryDelayMs
Consider disabling for large-scale testing

Debugging

Enable Debug Logging

bash

PROMPTFOO_LOG_LEVEL=debug npm run local -- eval -c redteam.yaml

Check Trace Store

Verify traces are being recorded:

bash

# View traces in the database
npm run db:studio

Test Trace Fetching

typescript

import { fetchTraceContext } from './src/tracing/traceContext';

const trace = await fetchTraceContext('your-trace-id', {
  maxSpans: 50,
  maxDepth: 5,
});
console.log(trace);

Examples

See the example configurations:

promptfooconfig.yaml - Basic tracing setup
promptfooconfig.advanced.yaml - Advanced configuration
promptfooconfig-simple.yaml - Simplified configuration

Troubleshooting

No Traces Appearing

Check that your provider supports tracing (must send traceparent header)
Verify OTLP receiver is running
Check debug logs for trace fetch errors

Traces Not Used in Attacks

Verify includeInAttack: true
Check that traces are being fetched (debug logs)
Ensure trace fetch completes before attack generation

Performance Issues

Reduce maxSpans and maxDepth
Use spanFilter to limit data
Increase retryDelayMs to reduce fetch frequency