docs/running-models-locally/overview.mdx
Run Cline completely offline with genuinely capable models on your own hardware. No API costs, no data leaving your machine, no internet dependency.
Local models have reached a turning point where they're now practical for real development work. This guide covers everything you need to know about running Cline with local models.
Your RAM determines which models you can run effectively:
| RAM | Recommended Model | Quantization | Performance Level |
|---|---|---|---|
| 32GB | Qwen3 Coder 30B | 4-bit | Entry-level local coding |
| 64GB | Qwen3 Coder 30B | 8-bit | Full Cline features |
| 128GB+ | GLM-4.5-Air | 4-bit | Cloud-competitive performance |
After extensive testing, Qwen3 Coder 30B is the most reliable model under 70B parameters for Cline:
Download sizes:
Most models under 30B parameters (7B-20B) fail with Cline because they:
In Cline:
In LM Studio:
262144 (maximum)OFF (critical for proper function)ON (if available on your hardware)In Ollama:
num_ctx 262144Quantization reduces model precision to fit on consumer hardware:
| Type | Size Reduction | Quality | Use Case |
|---|---|---|---|
| 4-bit | ~75% | Good | Most coding tasks, limited RAM |
| 8-bit | ~50% | Better | Professional work, more nuance |
| 16-bit | None | Best | Maximum quality, requires high RAM |
GGUF (Universal)
MLX (Mac only)
✅ Perfect for:
☁️ Better for:
"Shell integration unavailable"
"No connection could be made"
Slow or incomplete responses
Model confusion or errors
For faster inference:
For better quality:
If you have multiple GPUs, you can split model layers:
num_gpu parameterWhile Qwen3 Coder 30B is recommended, you can experiment with:
Note: These may require additional configuration and testing.
Ready to get started? Choose your path:
<CardGroup cols={2}> <Card title="LM Studio Setup" icon="desktop" href="/running-models-locally/lm-studio"> User-friendly GUI approach with detailed configuration guide </Card> <Card title="Ollama Setup" icon="terminal" href="/running-models-locally/ollama"> Command-line setup for power users and automation </Card> </CardGroup>Local models with Cline are now genuinely practical. While they won't match top-tier cloud APIs in speed, they offer complete privacy, zero costs, and offline capability. With proper configuration and the right hardware, Qwen3 Coder 30B can handle most coding tasks effectively.
The key is proper setup: adequate RAM, correct configuration, and realistic expectations. Follow this guide, and you'll have a capable coding assistant running entirely on your hardware.