examples/redteam-tracing-example/README.md
You can run this example with:
npx promptfoo@latest init --example redteam-tracing-example
cd redteam-tracing-example
This example demonstrates how to use tracing with red team strategies to provide attackers and graders with visibility into the internal operations of your LLM application.
1. Install dependencies:
npm install
2. Start the mock traced server:
npm run server
This starts an HTTP server on port 3110 that:
3. Test the server (optional):
# In another terminal
./test-server.sh
4. Run the red team evaluation:
# In another terminal (from the project root)
npm run local -- eval -c examples/redteam-tracing-example/promptfooconfig.yaml
5. View the results:
npm run local -- view
You'll see trace data in:
includeInAttack: true)includeInGrading: true)traceSnapshots)Server not responding?
# Check if server is running
curl http://localhost:3110/health
# Test basic request
curl -X POST http://localhost:3110/chat \
-H "Content-Type: application/json" \
-d '{"prompt": "test"}'
No traces appearing?
tracing.enabled: true)traceparent headers are being passed (set in provider context)Red team tracing allows adversarial strategies to see what happens inside your LLM application during an attack, including:
This information can help:
Enable tracing in your promptfooconfig.yaml:
redteam:
tracing:
# Enable tracing for all strategies
enabled: true
# Include trace data in attack generation (default: true)
includeInAttack: true
# Include trace data in grading (default: true)
includeInGrading: true
plugins:
- harmful
- pii
strategies:
- crescendo
- goat
Configure tracing behavior:
redteam:
tracing:
enabled: true
# Include internal spans (e.g., tokenization, parsing)
includeInternalSpans: false
# Maximum number of spans to fetch per iteration
maxSpans: 50
# Maximum depth of nested spans to fetch
maxDepth: 5
# Retry configuration for fetching traces
maxRetries: 3
retryDelayMs: 500
# Filter spans by name pattern (optional)
spanFilter:
- 'llm.*'
- 'tool.*'
- 'guardrail.*'
# Sanitize sensitive attributes (recommended)
sanitizeAttributes: true
Different strategies may need different tracing settings:
redteam:
tracing:
enabled: true
# Strategy-specific overrides
strategies:
# Crescendo benefits from seeing guardrail decisions
crescendo:
includeInAttack: true
includeInGrading: true
spanFilter:
- 'guardrail.*'
- 'llm.*'
# GOAT can use tool call information
goat:
includeInAttack: true
spanFilter:
- 'tool.*'
- 'llm.*'
# Iterative may want full trace data
iterative:
includeInAttack: true
includeInGrading: true
maxSpans: 100
Override tracing for specific tests:
tests:
- description: 'Test with custom tracing'
vars:
query: 'Tell me about sensitive data'
metadata:
tracing:
enabled: true
includeInAttack: true
includeInGrading: true
maxSpans: 200
When includeInAttack: true, the attacker receives a trace summary like:
Trace 0af76519 • 5 spans
Execution Flow:
1. [1.2s] llm.generate (client) | model=gpt-4
2. [300ms] guardrail.check (internal) | tool=content-filter
3. [150ms] tool.database_query (server) | tool=search
4. [50ms] guardrail.check (internal) | ERROR: Rate limit exceeded
5. [800ms] llm.generate (client) | model=gpt-4
Key Observations:
• Guardrail content-filter decision: blocked
• Tool call search via "tool.database_query" (duration 150ms)
• Error span "guardrail.check" (span-4): Rate limit exceeded
The attacker can use this information to craft better attacks (e.g., targeting the rate limit error).
When includeInGrading: true, graders receive the same trace context and can make more informed decisions:
// Grader receives:
{
prompt: "...",
llmOutput: "...",
test: {...},
gradingContext: {
traceContext: {
traceId: "...",
spans: [...],
insights: [...]
},
traceSummary: "..."
}
}
The default configuration works well for most use cases:
redteam:
tracing:
enabled: true
If you only care about specific operations:
redteam:
tracing:
enabled: true
spanFilter:
- 'guardrail.*' # Only guardrail spans
- 'tool.*' # Only tool calls
Always sanitize attributes in production:
redteam:
tracing:
enabled: true
sanitizeAttributes: true # Recommended
maxSpans: 20maxSpans: 50 (default)maxSpans: 100-200Different strategies benefit from different trace data:
Tracing can expose sensitive information. Always:
sanitizeAttributes: true (default)Tracing adds overhead:
To minimize impact:
maxSpans to limit data fetchedmaxRetries and retryDelayMsPROMPTFOO_LOG_LEVEL=debug npm run local -- eval -c redteam.yaml
Verify traces are being recorded:
# View traces in the database
npm run db:studio
import { fetchTraceContext } from './src/tracing/traceContext';
const trace = await fetchTraceContext('your-trace-id', {
maxSpans: 50,
maxDepth: 5,
});
console.log(trace);
See the example configurations:
promptfooconfig.yaml - Basic tracing setuppromptfooconfig.advanced.yaml - Advanced configurationpromptfooconfig-simple.yaml - Simplified configurationincludeInAttack: truemaxSpans and maxDepthspanFilter to limit dataretryDelayMs to reduce fetch frequency