Groq - Cline — ContextQMD

Groq provides ultra-fast AI inference through their custom LPU™ (Language Processing Unit) architecture, purpose-built for inference rather than adapted from training hardware. Groq hosts open-source models from various providers including OpenAI, Meta, DeepSeek, Moonshot AI, and others.

Website: https://groq.com/

Getting an API Key

Sign Up/Sign In: Go to Groq and create an account or sign in.
Navigate to Console: Go to the Groq Console to access your dashboard.
Create a Key: Navigate to the API Keys section and create a new API key. Give your key a descriptive name (e.g., "Cline").
Copy the Key: Copy the API key immediately. You will not be able to see it again. Store it securely.

Supported Models

Cline supports the following Groq models:

Featured Models

moonshotai/kimi-k2-instruct-0905 (Default) - Kimi K2 September update with 262K context and prompt caching ($0.60/$2.50 per 1M tokens)
moonshotai/kimi-k2-instruct - Kimi K2 1T parameter model with prompt caching ($1.00/$3.00 per 1M tokens)
openai/gpt-oss-120b - OpenAI's 120B open-weight MoE model ($0.15/$0.75 per 1M tokens)
openai/gpt-oss-20b - OpenAI's compact 20B open-weight model ($0.10/$0.50 per 1M tokens)

Compound Models

compound-beta - Hybrid architecture using Llama 4 Scout + Llama 3.3 70B for routing and tool use (free)
compound-beta-mini - Lightweight compound model for faster inference (free)

Meta Llama Models

meta-llama/llama-4-maverick-17b-128e-instruct - Llama 4 Maverick with 128 experts and vision support ($0.20/$0.60 per 1M tokens)
meta-llama/llama-4-scout-17b-16e-instruct - Llama 4 Scout with 16 experts and vision support ($0.11/$0.34 per 1M tokens)
llama-3.3-70b-versatile - Balanced performance with 131K context ($0.59/$0.79 per 1M tokens)
llama-3.1-8b-instant - Fast inference with 131K context ($0.05/$0.08 per 1M tokens)

Reasoning Models

deepseek-r1-distill-llama-70b - DeepSeek R1 reasoning distilled into Llama 70B ($0.75/$0.99 per 1M tokens)

Configuration in Cline

Open Cline Settings: Click the settings icon (⚙️) in the Cline panel.
Select Provider: Choose "Groq" from the "API Provider" dropdown.
Enter API Key: Paste your Groq API key into the "Groq API Key" field.
Select Model: Choose your desired model from the "Model" dropdown.

Groq's Speed Revolution

Groq's LPU architecture delivers several key advantages over traditional GPU-based inference:

LPU Architecture

Unlike GPUs that are adapted from training workloads, Groq's LPU is purpose-built for inference. This eliminates architectural bottlenecks that create latency in traditional systems.

Unmatched Speed

Sub-millisecond latency that stays consistent across traffic, regions, and workloads
Static scheduling with pre-computed execution graphs eliminates runtime coordination delays
Tensor parallelism optimized for low-latency single responses rather than high-throughput batching

Quality Without Tradeoffs

TruePoint numerics reduce precision only in areas that don't affect accuracy
100-bit intermediate accumulation ensures lossless computation
Strategic precision control maintains quality while achieving 2-4× speedup over BF16

Memory Architecture

SRAM as primary storage (not cache) with hundreds of megabytes on-chip
Eliminates DRAM/HBM latency that plagues traditional accelerators
Enables true tensor parallelism by splitting layers across multiple chips

Learn more about Groq's technology in their LPU architecture blog post.

Special Features

Prompt Caching

The Kimi K2 model supports prompt caching, which can significantly reduce costs and latency for repeated prompts.

Vision Support

Select models support image inputs and vision capabilities. Check the model details in the Groq Console for specific capabilities.

Reasoning Models

Some models like DeepSeek variants offer enhanced reasoning capabilities with step-by-step thought processes.

Tips and Notes

Model Selection: Choose models based on your specific use case and performance requirements.
Speed Advantage: Groq excels at single-request latency rather than high-throughput batch processing.
OSS Model Provider: Groq hosts open-source models from multiple providers (OpenAI, Meta, DeepSeek, etc.) on their fast infrastructure.
Context Windows: Most models offer large context windows (up to 131K tokens) for including substantial code and context.
Pricing: Groq offers competitive pricing with their speed advantages. Check the Groq Pricing page for current rates.
Rate Limits: Groq has generous rate limits, but check their documentation for current limits based on your usage tier.