compare-openai-models (OpenAI Model Comparison)

This example compares OpenAI's gpt-5.4 with gpt-5.4-mini across various riddles and reasoning tasks.

You can run this example with:

bash

npx promptfoo@latest init --example compare-openai-models
cd compare-openai-models

Quick Start

Initialize this example by running:

bash

npx promptfoo@latest init --example compare-openai-models

Navigate to the newly created compare-openai-models directory:
bash
```
cd compare-openai-models
```
Set an OpenAI API key directly in your environment:
bash
```
export OPENAI_API_KEY="your_openai_api_key"
```
Alternatively, you can set the API key in a .env file:
bash
```
OPENAI_API_KEY=your_openai_api_key
```
Run the evaluation with:
bash
```
npx promptfoo@latest eval --no-cache
```
Note: the --no-cache flag is required because the example uses a latency assertion which does not support caching.
View the results:
bash
```
npx promptfoo@latest view
```
The expected output will include the responses from both models for the provided riddles, allowing you to compare their performance side by side.

This example compares OpenAI's GPT-5.4 with GPT-5.4 Mini across various riddles and puzzles. It demonstrates:

Model comparison: Side-by-side evaluation of gpt-5.4 vs gpt-5.4-mini
Cost and latency assertions: Ensuring responses meet performance thresholds
Content validation: Using contains assertions to verify specific answers
LLM-based grading: Using llm-rubric assertions for nuanced evaluation criteria
Diverse test cases: A variety of riddles testing different reasoning capabilities