docs/src/pages/post/run-ai-models-locally.mdx
import { Callout } from 'nextra/components' import CTABlog from '@/components/Blog/CTA'
Running AI locally is fastest when you take these 3 actions in order. This walkthrough gets you from zero to a working offline AI on your computer.
The rest of this guide explains each step and answers common questions.
Download Jan from jan.ai - it's free and open source.
Jan helps you pick the right AI model for your computer.
That's all to run your first AI model locally!
Start chatting with local AI models using Jan.
Keep reading to learn key terms of local AI and the things you should know before running AI models locally.
Jan works on all major operating systems with the same features:
Windows (10, 11)
Download the .exe installer from jan.ai. Works on Windows 10 and 11 with no additional setup.
macOS (Intel and Apple Silicon)
Download the .dmg file. Supports both Intel Macs and Apple Silicon (M1, M2, M3) natively.
Linux (Ubuntu, Debian, Fedora)
Download the .AppImage or .deb package. Works on most modern Linux distributions.
All platforms get the same models and features. The rest of this guide applies to all operating systems.
No. Jan handles installation, GGUF downloads, and updates. You point and click, then start chatting.
Yes. Jan is open source, local AI models are free, and offline AI replies cost nothing to run on your computer.
Only during inference. Close big apps or pause the model if you need the CPU or GPU for other work.
You only need it to download Jan and your first model. After that, you can run AI locally offline whenever you want.
Everything stays on-device unless you choose to share it. No prompts are sent to Jan’s servers by default.
With the basics and beginner FAQs out of the way, here's what is happening under the hood when you run AI on your computer.
Before diving into the details, let's understand how AI runs on your computer:
<Callout> **Why do we need special tools for local AI?** Think of AI models like compressed files - they need to be "unpacked" to work on your computer. Tools like llama.cpp do this job: - They make AI models run efficiently on regular computers - Convert complex AI math into something your computer understands - Help run large AI models even with limited resources </Callout>llama.cpp helps millions of people run AI locally on their computers.
<Callout> **What is GGUF and why do we need it?**Original AI models are huge and complex - like trying to read a book in a language your computer doesn't understand. Here's where GGUF comes in:
Problem it solves:
How GGUF helps:
When browsing models, you'll see "GGUF" in the name (like "DeepSeek-R1-GGUF"). Don't worry about finding them - Jan automatically shows you the right GGUF versions for your computer. </Callout>
Think of AI models like apps on your computer - some are light and quick to use, while others are bigger but can do more things. When you're choosing an AI model to run on your computer, you'll see names like "Llama-3-8B" or "Mistral-7B". Let's break down what this means in simple terms.
<Callout> The "B" in model names (like 7B) stands for "billion" - it's just telling you the size of the AI model. Just like how some apps take up more space on your computer, bigger AI models need more space on your computer.Jan Hub makes it easy to understand different model sizes and versions
Running local AI models becomes easier once you understand how size affects speed; next you'll see what you can do after the install.
Good news: Jan helps you pick the right model size for your computer automatically! You don't need to worry about the technical details - just choose a model that matches what Jan recommends for your computer.
Most modern computers can run AI locally. Here's what you need:
Minimum requirements (works on most laptops):
| Regular Laptop | 3B-7B models | Good for chatting and writing. Like having a helpful assistant |
| Gaming Laptop | 7B-13B models | More capable. Better at complex tasks like coding and analysis |
| Powerful Desktop | 13B+ models | Better performance. Great for professional work and advanced tasks |
When browsing models in Jan, you'll see terms like "Q4", "Q6", or "Q8". Here's what that means in simple terms:
<Callout> These are different versions of the same AI model, just packaged differently to work better on different computers:Pro tip: Start with Q4 versions - they work great for most people and run smoothly on regular computers!
You'll often see links to "Hugging Face" when downloading AI models. Think of Hugging Face as the "GitHub for AI" - it's where the AI community shares their models. Jan makes it super easy to use:
You'll often see links to "Hugging Face" when downloading AI models. Think of Hugging Face as the "GitHub for AI" - it's where the AI community shares their models. This sounds technical, but Jan makes it super easy to use:
Download Jan from jan.ai - it sets everything up for you.
You can get models two ways:
Use Jan Hub to download AI models
Find and copy a GGUF model link from Hugging Face
Look for models with "GGUF" in their name
Launch Jan and go to the Models tab
Navigate to the Models section in Jan
Paste your Hugging Face link into Jan
Paste your GGUF model link here
Select your quantization and start the download
Choose your preferred model size and download
Yes. CPU-only inference works fine for 3B-7B models. Expect slower responses, so keep prompts short and close other heavy apps.
Pick any Jan-recommended 7B GGUF model like DeepSeek-R1 7B Q4 or Llama-3.1 8B Q4. They balance accuracy, speed, and memory use for most laptops.
Reserve 5 GB storage per model plus 2× the model size in free RAM. Example: a 4 GB Q4 file needs roughly 8 GB of RAM to run smoothly.
Move up to Q6 or Q8 quantization or 13B+ models if you have a desktop GPU. Jan shows real-time VRAM and RAM requirements before download.