README.md
LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.
A small core, not a bundle. Each backend wraps a best-in-class engine (llama.cpp, vLLM, whisper.cpp, stable-diffusion, MLX...) in its own image, pulled only when a model needs it. You install nothing you don't use.
Created by Ettore Di Giacinto and maintained by the LocalAI team.
:book: Documentation | :speech_balloon: Discord | 💻 Quickstart | 🖼️ Models | ❓FAQ
https://github.com/user-attachments/assets/08cbb692-57da-48f7-963d-2e7b43883c18
<details> <summary> Click to see more! </summary>https://github.com/user-attachments/assets/228fa9ad-81a3-4d43-bfb9-31557e14a36c
https://github.com/user-attachments/assets/6270b331-e21d-4087-a540-6290006b381a
https://github.com/user-attachments/assets/cbb03379-23b4-4e3d-bd26-d152f057007f
https://github.com/user-attachments/assets/5ba4ace9-d3df-4795-b7d4-b0b404ea71ee
https://github.com/user-attachments/assets/ed88e34c-fed3-4b83-8a67-4716a9feeb7b
</details>Note: The DMG is not signed by Apple. After installing, run:
sudo xattr -d com.apple.quarantine /Applications/LocalAI.app. See #6268 for details.
Already ran LocalAI before? Use
docker start -i local-aito restart an existing container.
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
# CUDA 13
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13
# CUDA 12
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
# NVIDIA Jetson ARM64 (CUDA 12, for AGX Orin and similar)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64
# NVIDIA Jetson ARM64 (CUDA 13, for DGX Spark)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas
docker run -ti --name local-ai -p 8080:8080 --device=/dev/dri/card1 --device=/dev/dri/renderD128 localai/localai:latest-gpu-intel
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan
# From the model gallery (see available models with `local-ai models list` or at https://models.localai.io)
local-ai run llama-3.2-1b-instruct:q4_k_m
# From Huggingface
local-ai run huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf
# From the Ollama OCI registry
local-ai run ollama://gemma:2b
# From a YAML config
local-ai run https://gist.githubusercontent.com/.../phi-2.yaml
# From a standard OCI registry (e.g., Docker Hub)
local-ai run oci://localai/phi-2:latest
To test a running LocalAI server from the terminal, open an interactive chat session from another shell. Inside the prompt, /models lists installed models and /model <name> switches between them.
# Terminal 1
local-ai run llama-3.2-1b-instruct:q4_k_m
# Terminal 2
local-ai chat --model llama-3.2-1b-instruct:q4_k_m
Automatic Backend Detection: LocalAI automatically detects your GPU capabilities and downloads the appropriate backend. For advanced options, see GPU Acceleration.
For more details, see the Getting Started guide.
insightface and speaker-recognition backends (PR #10441).llama.cpp prompt cache on by default (repeated system prompts collapse from minutes to seconds), keyless cosign signing of backend OCI images, per-API-key + per-user usage attribution, Distributed v3 with per-request replica routing. Release notesFor older news and full release notes, see GitHub Releases and the News page.
llama.cpp, transformers, vllm ... and more)LocalAI supports 60+ backends including llama.cpp, vLLM, SGLang, transformers, whisper.cpp, diffusers, MLX, MLX-VLM, and many more. Hardware acceleration is available for NVIDIA (CUDA 12/13), AMD (ROCm), Intel (oneAPI/SYCL), Apple Silicon (Metal), Vulkan, and NVIDIA Jetson (L4T). All backends can be installed on-the-fly from the Backend Gallery.
See the full Backend & Model Compatibility Table and GPU Acceleration guide.
Most backends wrap a best-in-class upstream engine. A handful of them are native C/C++/GGML engines (no Python at inference) developed and maintained by the LocalAI project itself:
| Backend | What it does |
|---|---|
| parakeet.cpp | C++/GGML port of NVIDIA NeMo Parakeet ASR (tdt/ctc/rnnt/hybrid), with cache-aware streaming transcription |
| ced.cpp | C++/GGML port of the CED audio-tagging models: sound-event classification (527-class AudioSet) over REST and the realtime API for live recognition |
| voxtral.c | Voxtral Realtime 4B speech-to-text in pure C |
| vibevoice.cpp | Native port of Microsoft VibeVoice for TTS (voice cloning) and long-form ASR with speaker diarization |
| rf-detr.cpp | Native RF-DETR object detection and instance segmentation |
| locate-anything.cpp | Open-vocabulary object detection and visual grounding (LocateAnything-3B) |
| depth-anything.cpp | Depth Anything 3 monocular metric depth + camera pose estimation |
| privacy-filter.cpp | Standalone GGML PII/NER token-classification engine powering LocalAI's PII redaction tier |
| LocalVQE | Joint acoustic echo cancellation, noise suppression, and dereverberation |
| local-store | Local-first vector database for embeddings (shipped in-tree) |
We also maintain apex-quant, a per-tensor, per-layer quantization recipe for Mixture-of-Experts models that exploits their structural sparsity to produce GGUFs matching or beating Q8_0 quality - and they run out of the box on stock llama.cpp.
LocalAI is maintained by a small team of humans, together with the wider community of contributors.
A huge thank you to everyone who contributes code, reviews PRs, files issues, and helps users in Discord — LocalAI is a community-driven project and wouldn't exist without you. See the full contributors list.
If you utilize this repository, data in a downstream project, please consider citing it with:
@misc{localai,
author = {Ettore Di Giacinto},
title = {LocalAI: The free, Open source OpenAI alternative},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/go-skynet/LocalAI}},
Do you find LocalAI useful?
Support the project by becoming a backer or sponsor. Your logo will show up here with a link to your website.
A huge thank you to our generous sponsors who support this project covering CI expenses, and our Sponsor list:
<p align="center"> <a href="https://www.spectrocloud.com/" target="blank"> </a> </p> <details> <summary> Past sponsors </summary> <p align="center"> <a href="https://www.premai.io/" target="blank"> </a> </p> </details>A special thanks to individual sponsors, a full list is on GitHub and buymeacoffee. Special shout out to drikster80 for being generous. Thank you everyone!
LocalAI is a community-driven project created by Ettore Di Giacinto and maintained by the LocalAI team.
MIT - Author Ettore Di Giacinto [email protected]
LocalAI couldn't have been built without the help of great software already available from the community. Thank you!
This is a community project, a special thanks to our contributors! <a href="https://github.com/go-skynet/LocalAI/graphs/contributors">
</a>