examples/compare-claude-vs-gpt-image/README.md
You can run this example with:
npx promptfoo@latest init --example compare-claude-vs-gpt-image
cd compare-claude-vs-gpt-image
This example compares an image analysis task using:
GPT-4.1, Claude, and Gemini have different prompt formats. We use custom provider functions in Python and JavaScript to dynamically format the prompt based on context about the provider. The responses are scored using llm-rubric with a vision-capable OpenAI model.
To get started, set your environment variables:
OPENAI_API_KEYANTHROPIC_API_KEYGEMINI_API_KEYAWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYIf you do not have access to all of these providers, simply comment out the providers you do not have access to in promptfooconfig.yaml.
Then run:
npx promptfoo@latest eval
Afterwards, you can view the results by running:
npx promptfoo@latest view