Back to Promptfoo

huggingface/chat (HuggingFace Chat Completions)

examples/huggingface/chat/README.md

0.121.92.2 KB
Original Source

huggingface/chat (HuggingFace Chat Completions)

This example demonstrates how to use HuggingFace's OpenAI-compatible chat completions API with promptfoo.

Setup

Set your HuggingFace token:

bash
export HF_TOKEN=your_huggingface_token

Get your token from huggingface.co/settings/tokens. HuggingFace's router may incur usage costs depending on your plan and the model used.

Usage

bash
npx promptfoo@latest init --example huggingface/chat
npx promptfoo@latest eval

Provider format

Use the huggingface:chat provider format:

yaml
providers:
  - id: huggingface:chat:meta-llama/Llama-3.3-70B-Instruct
    config:
      temperature: 0.1
      max_new_tokens: 100

Supported models

Any model available on HuggingFace's Inference Providers that supports chat completions:

  • deepseek-ai/DeepSeek-R1
  • openai/gpt-oss-120b
  • zai-org/GLM-4.5
  • Qwen/Qwen2.5-Coder-32B-Instruct
  • meta-llama/Llama-3.3-70B-Instruct
  • google/gemma-3-27b-it
  • And many more...

Browse models at huggingface.co/models?other=conversational.

Inference Provider routing

Some models require routing to a specific Inference Provider. Use a :provider suffix or the inferenceProvider config option:

yaml
providers:
  # Provider suffix
  - id: huggingface:chat:Qwen/QwQ-32B:featherless-ai

  # Or config option
  - id: huggingface:chat:Qwen/QwQ-32B
    config:
      inferenceProvider: featherless-ai

Configuration options

ParameterDescription
temperatureControls randomness (0.0-2.0)
max_new_tokensMaximum tokens to generate
top_pNucleus sampling parameter
inferenceProviderRoute to a specific Inference Provider
apiKeyHuggingFace token (or use env)
apiBaseUrlCustom API endpoint (optional)

See HuggingFace provider docs for full configuration options.