docs/providers/huggingface.md
Hugging Face Inference Providers offer OpenAI-compatible chat completions through a single router API. You get access to many models (DeepSeek, Llama, and more) with one token. OpenClaw uses the OpenAI-compatible endpoint (chat completions only); for text-to-image, embeddings, or speech use the HF inference clients directly.
huggingfaceHUGGINGFACE_HUB_TOKEN or HF_TOKEN (fine-grained token with Make calls to Inference Providers)https://router.huggingface.co/v1)<Warning>
The token must have the **Make calls to Inference Providers** permission enabled or API requests will be rejected.
</Warning>
```bash
openclaw onboard --auth-choice huggingface-api-key
```
You can also set or change the default model later in config:
```json5
{
agents: {
defaults: {
model: { primary: "huggingface/deepseek-ai/DeepSeek-R1" },
},
},
}
```
openclaw onboard --non-interactive \
--mode local \
--auth-choice huggingface-api-key \
--huggingface-api-key "$HF_TOKEN"
This will set huggingface/deepseek-ai/DeepSeek-R1 as the default model.
Model refs use the form huggingface/<org>/<model> (Hub-style IDs). The list below is from GET https://router.huggingface.co/v1/models; your catalog may include more.
| Model | Ref (prefix with huggingface/) |
|---|---|
| DeepSeek R1 | deepseek-ai/DeepSeek-R1 |
| DeepSeek V3.2 | deepseek-ai/DeepSeek-V3.2 |
| Qwen3 8B | Qwen/Qwen3-8B |
| Qwen2.5 7B Instruct | Qwen/Qwen2.5-7B-Instruct |
| Qwen3 32B | Qwen/Qwen3-32B |
| Llama 3.3 70B Instruct | meta-llama/Llama-3.3-70B-Instruct |
| Llama 3.1 8B Instruct | meta-llama/Llama-3.1-8B-Instruct |
| GPT-OSS 120B | openai/gpt-oss-120b |
| GLM 4.7 | zai-org/GLM-4.7 |
| Kimi K2.5 | moonshotai/Kimi-K2.5 |
```bash
GET https://router.huggingface.co/v1/models
```
(Optional: send `Authorization: Bearer $HUGGINGFACE_HUB_TOKEN` or `$HF_TOKEN` for the full list; some endpoints return a subset without auth.) The response is OpenAI-style `{ "object": "list", "data": [ { "id": "Qwen/Qwen3-8B", "owned_by": "Qwen", ... }, ... ] }`.
When you configure a Hugging Face API key (via onboarding, `HUGGINGFACE_HUB_TOKEN`, or `HF_TOKEN`), OpenClaw uses this GET to discover available chat-completion models. During **interactive setup**, after you enter your token you see a **Default Hugging Face model** dropdown populated from that list (or the built-in catalog if the request fails). At runtime (e.g. Gateway startup), when a key is present, OpenClaw again calls **GET** `https://router.huggingface.co/v1/models` to refresh the catalog. The list is merged with a built-in catalog (for metadata like context window and cost). If the request fails or no key is set, only the built-in catalog is used.
```json5
{
agents: {
defaults: {
models: {
"huggingface/deepseek-ai/DeepSeek-R1": { alias: "DeepSeek R1 (fast)" },
"huggingface/deepseek-ai/DeepSeek-R1:cheapest": { alias: "DeepSeek R1 (cheap)" },
},
},
},
}
```
- **Policy suffixes:** OpenClaw's bundled Hugging Face docs and helpers currently treat these two suffixes as the built-in policy variants:
- **`:fastest`** — highest throughput.
- **`:cheapest`** — lowest cost per output token.
You can add these as separate entries in `models.providers.huggingface.models` or set `model.primary` with the suffix. You can also set your default provider order in [Inference Provider settings](https://hf.co/settings/inference-providers) (no suffix = use that order).
- **Config merge:** Existing entries in `models.providers.huggingface.models` (e.g. in `models.json`) are kept when config is merged. So any custom `name`, `alias`, or model options you set there are preserved.
<Note>
OpenClaw accepts both `HUGGINGFACE_HUB_TOKEN` and `HF_TOKEN` as env var aliases. Either one works; if both are set, `HUGGINGFACE_HUB_TOKEN` takes precedence.
</Note>