examples/huggingface/chat/README.md
This example demonstrates how to use HuggingFace's OpenAI-compatible chat completions API with promptfoo.
Set your HuggingFace token:
export HF_TOKEN=your_huggingface_token
Get your token from huggingface.co/settings/tokens. HuggingFace's router may incur usage costs depending on your plan and the model used.
npx promptfoo@latest init --example huggingface/chat
npx promptfoo@latest eval
Use the huggingface:chat provider format:
providers:
- id: huggingface:chat:meta-llama/Llama-3.3-70B-Instruct
config:
temperature: 0.1
max_new_tokens: 100
Any model available on HuggingFace's Inference Providers that supports chat completions:
deepseek-ai/DeepSeek-R1openai/gpt-oss-120bzai-org/GLM-4.5Qwen/Qwen2.5-Coder-32B-Instructmeta-llama/Llama-3.3-70B-Instructgoogle/gemma-3-27b-itBrowse models at huggingface.co/models?other=conversational.
Some models require routing to a specific Inference Provider. Use a :provider suffix or the inferenceProvider config option:
providers:
# Provider suffix
- id: huggingface:chat:Qwen/QwQ-32B:featherless-ai
# Or config option
- id: huggingface:chat:Qwen/QwQ-32B
config:
inferenceProvider: featherless-ai
| Parameter | Description |
|---|---|
temperature | Controls randomness (0.0-2.0) |
max_new_tokens | Maximum tokens to generate |
top_p | Nucleus sampling parameter |
inferenceProvider | Route to a specific Inference Provider |
apiKey | HuggingFace token (or use env) |
apiBaseUrl | Custom API endpoint (optional) |
See HuggingFace provider docs for full configuration options.