site/docs/providers/helicone.md
Helicone AI Gateway is an open-source, self-hosted AI gateway that provides a unified OpenAI-compatible interface for 100+ LLM providers. The Helicone provider in promptfoo allows you to route requests through a locally running Helicone AI Gateway instance.
First, start a local Helicone AI Gateway instance:
# Set your provider API keys
export OPENAI_API_KEY=your_openai_key
export ANTHROPIC_API_KEY=your_anthropic_key
export GROQ_API_KEY=your_groq_key
# Start the gateway
npx @helicone/ai-gateway@latest
The gateway will start on http://localhost:8080 by default.
No additional dependencies are required. The Helicone provider is built into promptfoo and works with any running Helicone AI Gateway instance.
To route requests through your local Helicone AI Gateway:
providers:
- helicone:openai/gpt-5-mini
- helicone:anthropic/claude-3-5-sonnet
- helicone:groq/llama-3.1-8b-instant
The model format is provider/model as supported by the Helicone AI Gateway.
For more advanced configuration:
providers:
- id: helicone:openai/gpt-4o
config:
# Gateway configuration
baseUrl: http://localhost:8080 # Custom gateway URL
router: production # Use specific router
# Standard OpenAI options
temperature: 0.7
max_tokens: 1500
headers:
Custom-Header: 'custom-value'
If your Helicone AI Gateway is configured with custom routers:
providers:
- id: helicone:openai/gpt-4o
config:
router: production
- id: helicone:openai/gpt-3.5-turbo
config:
router: development
The Helicone provider uses the format: helicone:provider/model
Examples:
helicone:openai/gpt-4ohelicone:anthropic/claude-3-5-sonnethelicone:groq/llama-3.1-8b-instantThe Helicone AI Gateway supports 100+ models from various providers. Some popular examples:
| Provider | Example Models |
|---|---|
| OpenAI | openai/gpt-4o, openai/gpt-5-mini, openai/o1-preview |
| Anthropic | anthropic/claude-3-5-sonnet, anthropic/claude-3-haiku |
| Groq | groq/llama-3.1-8b-instant, groq/llama-3.1-70b-versatile |
| Meta | meta-llama/Llama-3-8b-chat-hf, meta-llama/Llama-3-70b-chat-hf |
google/gemma-7b-it, google/gemma-2b-it |
For a complete list, see the Helicone AI Gateway documentation.
baseUrl (string): Helicone AI Gateway URL (defaults to http://localhost:8080)router (string): Custom router name (optional, uses /ai endpoint if not specified)model (string): Override the model name from the provider specificationapiKey (string): Custom API key (defaults to placeholder-api-key)Since the provider extends OpenAI's chat completion provider, all standard OpenAI options are supported:
temperature: Controls randomness (0.0 to 1.0)max_tokens: Maximum number of tokens to generatetop_p: Nucleus sampling parameterfrequency_penalty: Penalizes frequent tokenspresence_penalty: Penalizes new tokens based on presencestop: Stop sequencesheaders: Additional HTTP headersproviders:
- helicone:openai/gpt-5-mini
prompts:
- "Translate '{{text}}' to French"
tests:
- vars:
text: 'Hello world'
assert:
- type: contains
value: 'Bonjour'
providers:
- id: helicone:openai/gpt-4o
config:
tags: ['openai', 'gpt4']
properties:
model_family: 'gpt-4'
- id: helicone:anthropic/claude-3-5-sonnet-20241022
config:
tags: ['anthropic', 'claude']
properties:
model_family: 'claude-3'
prompts:
- 'Write a creative story about {{topic}}'
tests:
- vars:
topic: 'a robot learning to paint'
providers:
- id: helicone:openai/gpt-4o
config:
baseUrl: https://custom-gateway.example.com:8080
router: production
apiKey: your_custom_api_key
temperature: 0.7
max_tokens: 1000
headers:
Authorization: Bearer your_target_provider_api_key
Custom-Header: custom-value
prompts:
- 'Answer the following question: {{question}}'
tests:
- vars:
question: 'What is artificial intelligence?'
providers:
- id: helicone:openai/gpt-3.5-turbo
config:
cache: true
properties:
cache_strategy: 'aggressive'
use_case: 'batch_processing'
prompts:
- 'Summarize: {{text}}'
tests:
- vars:
text: 'Large text content to summarize...'
assert:
- type: latency
threshold: 2000 # Should be faster due to caching
All requests routed through Helicone are automatically logged with:
Track costs across different providers and models:
Intelligent response caching:
Built-in rate limiting:
HELICONE_API_KEY is set correctlytargetUrlEnable debug logging to see detailed request/response information:
LOG_LEVEL=debug promptfoo eval