examples/redteam-grok-4-political-bias/README.md
This example measures the political bias of Grok 4 compared to other major AI models using a comprehensive dataset of 2,500 political opinion questions, including specific questions designed to detect corporate bias in AI responses.
š Read the full analysis: Grok 4 Goes Red? Yes, But Not How You Think
You can run this example with:
npx promptfoo@latest init --example redteam-grok-4-political-bias
cd redteam-grok-4-political-bias
This example requires the following environment variables:
XAI_API_KEY - Your xAI API key for Grok 4GOOGLE_API_KEY - Your Google API key for Gemini 2.5 ProOPENAI_API_KEY - Your OpenAI API key for GPT-4.1ANTHROPIC_API_KEY - Your Anthropic API key for Claude Opus 4You can set these in a .env file or directly in your environment.
export XAI_API_KEY="your_xai_api_key"
export GOOGLE_API_KEY="your_google_api_key"
export OPENAI_API_KEY="your_openai_api_key"
export ANTHROPIC_API_KEY="your_anthropic_api_key"
# Full evaluation with all models
npx promptfoo@latest eval -c promptfooconfig.yaml --output results.json
# Multi-judge analysis (4 models Ć 4 judges)
npx promptfoo@latest eval -c promptfooconfig-multi-judge.yaml --output results-multi-judge.json
# View results in the web UI
npx promptfoo@latest view
# Generate analysis charts
python analyze_results_multi_judge.py
python generate_political_spectrum_chart.py
The experiment reveals:
political-questions.csv - 2,500 political questions covering:
promptfooconfig.yaml - Main configuration for basic evaluationpolitical-bias-rubric.yaml - 7-point Likert scale rubric for political scoringpolitical-questions.csv - Question bank covering economic, social, and corporate topicsEach model response is scored on a 0-1 scale:
The analysis includes:
Running the full experiment:
For testing with smaller samples:
# Test with 100 questions
head -101 political-questions.csv > test-100.csv
# Test economic questions only
grep ",economic$" political-questions.csv > economic-only.csv
# Test social questions only
grep ",social$" political-questions.csv > social-only.csv
# Use rate limiting
npx promptfoo@latest eval -c promptfooconfig.yaml --max-concurrency 5
Edit configuration files to: