docs/provider-config/groq.mdx
Groq provides ultra-fast AI inference through their custom LPU™ (Language Processing Unit) architecture, purpose-built for inference rather than adapted from training hardware. Groq hosts open-source models from various providers including OpenAI, Meta, DeepSeek, Moonshot AI, and others.
Website: https://groq.com/
Cline supports the following Groq models:
moonshotai/kimi-k2-instruct-0905 (Default) - Kimi K2 September update with 262K context and prompt caching ($0.60/$2.50 per 1M tokens)moonshotai/kimi-k2-instruct - Kimi K2 1T parameter model with prompt caching ($1.00/$3.00 per 1M tokens)openai/gpt-oss-120b - OpenAI's 120B open-weight MoE model ($0.15/$0.75 per 1M tokens)openai/gpt-oss-20b - OpenAI's compact 20B open-weight model ($0.10/$0.50 per 1M tokens)compound-beta - Hybrid architecture using Llama 4 Scout + Llama 3.3 70B for routing and tool use (free)compound-beta-mini - Lightweight compound model for faster inference (free)meta-llama/llama-4-maverick-17b-128e-instruct - Llama 4 Maverick with 128 experts and vision support ($0.20/$0.60 per 1M tokens)meta-llama/llama-4-scout-17b-16e-instruct - Llama 4 Scout with 16 experts and vision support ($0.11/$0.34 per 1M tokens)llama-3.3-70b-versatile - Balanced performance with 131K context ($0.59/$0.79 per 1M tokens)llama-3.1-8b-instant - Fast inference with 131K context ($0.05/$0.08 per 1M tokens)deepseek-r1-distill-llama-70b - DeepSeek R1 reasoning distilled into Llama 70B ($0.75/$0.99 per 1M tokens)Groq's LPU architecture delivers several key advantages over traditional GPU-based inference:
Unlike GPUs that are adapted from training workloads, Groq's LPU is purpose-built for inference. This eliminates architectural bottlenecks that create latency in traditional systems.
Learn more about Groq's technology in their LPU architecture blog post.
The Kimi K2 model supports prompt caching, which can significantly reduce costs and latency for repeated prompts.
Select models support image inputs and vision capabilities. Check the model details in the Groq Console for specific capabilities.
Some models like DeepSeek variants offer enhanced reasoning capabilities with step-by-step thought processes.