Back to Promptfoo

Fireworks AI

site/docs/providers/fireworks.md

0.121.145.3 KB
Original Source

Fireworks AI

Fireworks AI serves a broad catalogue of open models — Llama, Qwen, DeepSeek, Kimi, GLM, GPT-OSS, and more — through an API that is fully compatible with the OpenAI interface.

The Fireworks AI provider supports all options available in the OpenAI provider.

Setup

Create an API key from the Fireworks dashboard (Settings → API Keys) and expose it as an environment variable:

sh
export FIREWORKS_API_KEY=your_api_key_here

The provider keeps Fireworks credentials isolated from OpenAI's: it reads FIREWORKS_API_KEY (never OPENAI_API_KEY) and never inherits OPENAI_API_HOST / OPENAI_API_BASE_URL / OPENAI_ORGANIZATION, so a stray OpenAI variable in your environment can't leak onto or reroute Fireworks requests.

Provider format

  • fireworks:<model> — chat completions, e.g. fireworks:accounts/fireworks/models/gpt-oss-120b
  • fireworks:embedding:<model> — embeddings, e.g. fireworks:embedding:accounts/fireworks/models/qwen3-embedding-8b

Model identifiers use Fireworks's account-scoped path (accounts/fireworks/models/<model>). Browse the serverless catalogue for currently available ids — the serverless tier rotates, so a model that returns a 404 has likely been retired.

Example Usage

yaml
providers:
  - id: fireworks:accounts/fireworks/models/gpt-oss-120b
    config:
      temperature: 0.2
      max_tokens: 1024
      apiKey: ... # optional; overrides FIREWORKS_API_KEY

:::note Many of Fireworks's flagship models are reasoning models that emit hidden reasoning tokens before the visible answer. Set max_tokens high enough to leave room for both — otherwise the response can be truncated to empty output. :::

Run the bundled example end-to-end:

sh
npx promptfoo@latest init --example provider-fireworks

Embeddings

Fireworks serves embedding models on the same key via the fireworks:embedding: prefix. For example, to grade a similar assertion with a Fireworks embedding model:

yaml
defaultTest:
  options:
    provider:
      embedding:
        id: fireworks:embedding:accounts/fireworks/models/qwen3-embedding-8b

Configuration

Because the provider extends the OpenAI provider, all OpenAI configuration parameters apply. The most common options:

OptionDescription
apiKeyFireworks API key (overrides the FIREWORKS_API_KEY environment variable).
apiBaseUrlBase URL override. Can also be set with the FIREWORKS_API_BASE_URL environment variable.
apiHostHost override for a proxy or gateway; resolves to https://<apiHost>/v1.
temperature, max_tokens, top_p, top_k, ...Standard OpenAI-compatible sampling parameters.
cost, inputCost, outputCostOverride promptfoo's cost estimate (USD per token). Use inputCost and outputCost for asymmetric pricing; cost is the shared fallback.
cacheReadInputCostPer-token rate for Fireworks server-side prompt-cache hits. Defaults to the full inputCost (no discount is assumed, since the discount varies by model).
Environment variableDescription
FIREWORKS_API_KEYYour Fireworks API key.
FIREWORKS_API_BASE_URLOverride the base URL (defaults to the public Fireworks endpoint).

Cost tracking

Fireworks prices each model differently, so promptfoo can't infer a per-token rate. Supply inputCost and outputCost to surface spend estimates in your eval results:

yaml
providers:
  - id: fireworks:accounts/fireworks/models/gpt-oss-120b
    config:
      inputCost: 0.00000015 # $0.15 / 1M input tokens
      outputCost: 0.0000006 # $0.60 / 1M output tokens

If you rely on Fireworks's server-side prompt caching, set cacheReadInputCost to the discounted cached-input rate; otherwise cached prompt tokens are billed at the full inputCost.

API Details