site/docs/providers/fireworks.md
Fireworks AI serves a broad catalogue of open models — Llama, Qwen, DeepSeek, Kimi, GLM, GPT-OSS, and more — through an API that is fully compatible with the OpenAI interface.
The Fireworks AI provider supports all options available in the OpenAI provider.
Create an API key from the Fireworks dashboard (Settings → API Keys) and expose it as an environment variable:
export FIREWORKS_API_KEY=your_api_key_here
The provider keeps Fireworks credentials isolated from OpenAI's: it reads FIREWORKS_API_KEY (never OPENAI_API_KEY) and never inherits OPENAI_API_HOST / OPENAI_API_BASE_URL / OPENAI_ORGANIZATION, so a stray OpenAI variable in your environment can't leak onto or reroute Fireworks requests.
fireworks:<model> — chat completions, e.g. fireworks:accounts/fireworks/models/gpt-oss-120bfireworks:embedding:<model> — embeddings, e.g. fireworks:embedding:accounts/fireworks/models/qwen3-embedding-8bModel identifiers use Fireworks's account-scoped path (accounts/fireworks/models/<model>). Browse the serverless catalogue for currently available ids — the serverless tier rotates, so a model that returns a 404 has likely been retired.
providers:
- id: fireworks:accounts/fireworks/models/gpt-oss-120b
config:
temperature: 0.2
max_tokens: 1024
apiKey: ... # optional; overrides FIREWORKS_API_KEY
:::note
Many of Fireworks's flagship models are reasoning models that emit hidden reasoning tokens before the visible answer. Set max_tokens high enough to leave room for both — otherwise the response can be truncated to empty output.
:::
Run the bundled example end-to-end:
npx promptfoo@latest init --example provider-fireworks
Fireworks serves embedding models on the same key via the fireworks:embedding: prefix. For example, to grade a similar assertion with a Fireworks embedding model:
defaultTest:
options:
provider:
embedding:
id: fireworks:embedding:accounts/fireworks/models/qwen3-embedding-8b
Because the provider extends the OpenAI provider, all OpenAI configuration parameters apply. The most common options:
| Option | Description |
|---|---|
apiKey | Fireworks API key (overrides the FIREWORKS_API_KEY environment variable). |
apiBaseUrl | Base URL override. Can also be set with the FIREWORKS_API_BASE_URL environment variable. |
apiHost | Host override for a proxy or gateway; resolves to https://<apiHost>/v1. |
temperature, max_tokens, top_p, top_k, ... | Standard OpenAI-compatible sampling parameters. |
cost, inputCost, outputCost | Override promptfoo's cost estimate (USD per token). Use inputCost and outputCost for asymmetric pricing; cost is the shared fallback. |
cacheReadInputCost | Per-token rate for Fireworks server-side prompt-cache hits. Defaults to the full inputCost (no discount is assumed, since the discount varies by model). |
| Environment variable | Description |
|---|---|
FIREWORKS_API_KEY | Your Fireworks API key. |
FIREWORKS_API_BASE_URL | Override the base URL (defaults to the public Fireworks endpoint). |
Fireworks prices each model differently, so promptfoo can't infer a per-token rate. Supply inputCost and outputCost to surface spend estimates in your eval results:
providers:
- id: fireworks:accounts/fireworks/models/gpt-oss-120b
config:
inputCost: 0.00000015 # $0.15 / 1M input tokens
outputCost: 0.0000006 # $0.60 / 1M output tokens
If you rely on Fireworks's server-side prompt caching, set cacheReadInputCost to the discounted cached-input rate; otherwise cached prompt tokens are billed at the full inputCost.
https://api.fireworks.ai/inference/v1/chat/completions, /embeddings)