docs/provider-config/fireworks.mdx
Fireworks AI is a leading infrastructure platform for generative AI that focuses on delivering exceptional performance through optimized inference capabilities. With up to 4x faster inference speeds than alternative platforms and support for over 40 different AI models, Fireworks eliminates the operational complexity of running AI models at scale.
Website: https://fireworks.ai/
Cline supports the following Fireworks AI models:
accounts/fireworks/models/kimi-k2p5 (Default) - Kimi K2.5 flagship agentic model with multimodal support (262K context, prompt caching, $0.60/$3.00 per 1M tokens)accounts/fireworks/models/qwen3-vl-30b-a3b-thinking - Qwen3-VL reasoning model with image support (262K context, prompt caching, $0.15/$0.60 per 1M tokens)accounts/fireworks/models/qwen3-vl-30b-a3b-instruct - Qwen3-VL instruct model with image support (262K context, $0.15/$0.60 per 1M tokens)accounts/fireworks/models/deepseek-v3p2 - DeepSeek V3.2 model (164K context, prompt caching, $0.56/$1.68 per 1M tokens)accounts/fireworks/models/glm-4p7 - GLM-4.7 model (203K context, prompt caching, $0.60/$2.20 per 1M tokens)accounts/fireworks/models/glm-5 - GLM-5 model (203K context, prompt caching, $1.00/$3.20 per 1M tokens)accounts/fireworks/models/minimax-m2p5 - MiniMax M2.5 model (197K context, prompt caching, $0.30/$1.20 per 1M tokens)accounts/fireworks/models/minimax-m2p1 - MiniMax M2.1 model (197K context, prompt caching, $0.30/$1.20 per 1M tokens)accounts/fireworks/models/gpt-oss-120b - OpenAI gpt-oss-120b model (131K context, prompt caching, $0.15/$0.60 per 1M tokens)Fireworks AI's competitive advantages center on performance optimization and developer experience:
Fireworks AI uses a usage-based pricing model with competitive rates:
| Parameter Count | Price per 1M Input Tokens |
|---|---|
| Less than 4B parameters | $0.10 |
| 4B - 16B parameters | $0.20 |
| More than 16B parameters | $0.90 |
| MoE 0B - 56B parameters | $0.50 |
| Base Model Size | Price per 1M Training Tokens |
|---|---|
| Up to 16B parameters | $0.50 |
| 16.1B - 80B parameters | $3.00 |
| DeepSeek R1 / V3 | $10.00 |
| GPU Type | Price per Hour |
|---|---|
| A100 80GB | $2.90 |
| H100 80GB | $5.80 |
| H200 141GB | $6.99 |
| B200 180GB | $11.99 |
| AMD MI300X | $4.99 |
Fireworks offers sophisticated fine-tuning services accessible through CLI interface, supporting JSON-formatted data from databases like MongoDB Atlas. Fine-tuned models cost the same as base models for inference.
Advanced support for reasoning models with <think> tag processing and reasoning content extraction, making complex multi-step reasoning practical for real-time applications.
Fireworks AI's optimization delivers measurable improvements: