examples/compare-openai-models/README.md
This example compares OpenAI's gpt-5.4 with gpt-5.4-mini across various riddles and reasoning tasks.
You can run this example with:
npx promptfoo@latest init --example compare-openai-models
cd compare-openai-models
Initialize this example by running:
npx promptfoo@latest init --example compare-openai-models
Navigate to the newly created compare-openai-models directory:
cd compare-openai-models
Set an OpenAI API key directly in your environment:
export OPENAI_API_KEY="your_openai_api_key"
Alternatively, you can set the API key in a .env file:
OPENAI_API_KEY=your_openai_api_key
Run the evaluation with:
npx promptfoo@latest eval --no-cache
Note: the --no-cache flag is required because the example uses a latency assertion which does not support caching.
View the results:
npx promptfoo@latest view
The expected output will include the responses from both models for the provided riddles, allowing you to compare their performance side by side.
This example compares OpenAI's GPT-5.4 with GPT-5.4 Mini across various riddles and puzzles. It demonstrates:
gpt-5.4 vs gpt-5.4-minicontains assertions to verify specific answersllm-rubric assertions for nuanced evaluation criteria