docs/content/reference/compatibility-table.md
+++ disableToc = false title = "Model compatibility table" weight = 24 url = "/model-compatibility/" +++
Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the backends, compatible models families and the associated repository.
{{% notice note %}}
LocalAI will attempt to automatically load models which are not explicitly configured for a specific backend. You can specify the backend to use by configuring a model with a YAML file. See [the advanced section]({{%relref "advanced" %}}) for more details.
{{% /notice %}}
| Backend | Description | Capability | Embeddings | Streaming | Acceleration |
|---|---|---|---|---|---|
| llama.cpp | LLM inference in C/C++. Supports LLaMA, Mamba, RWKV, Falcon, Starcoder, GPT-2, and many others | GPT, Functions | yes | yes | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
| vLLM | Fast LLM serving with PagedAttention | GPT | no | no | CUDA 12, ROCm, Intel |
| vLLM Omni | Unified multimodal generation (text, image, video, audio) | Multimodal GPT | no | no | CUDA 12, ROCm |
| transformers | HuggingFace Transformers framework | GPT, Embeddings, Multimodal | yes | yes* | CPU, CUDA 12/13, ROCm, Intel, Metal |
| MLX | Apple Silicon LLM inference | GPT | no | no | Metal |
| MLX-VLM | Vision-Language Models on Apple Silicon | Multimodal GPT | no | no | Metal |
| MLX Distributed | Distributed LLM inference across multiple Apple Silicon Macs | GPT | no | no | Metal |
| Backend | Description | Acceleration |
|---|---|---|
| whisper.cpp | OpenAI Whisper in C/C++ | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
| faster-whisper | Fast Whisper with CTranslate2 | CUDA 12/13, ROCm, Intel, Metal |
| WhisperX | Word-level timestamps and speaker diarization | CPU, CUDA 12/13, ROCm, Metal |
| moonshine | Ultra-fast transcription for low-end devices | CPU, CUDA 12/13, Metal |
| voxtral | Voxtral Realtime 4B speech-to-text in C | CPU, Metal |
| Qwen3-ASR | Qwen3 automatic speech recognition | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
| NeMo | NVIDIA NeMo ASR toolkit | CPU, CUDA 12/13, ROCm, Intel, Metal |
| Backend | Description | Acceleration |
|---|---|---|
| piper | Fast neural TTS | CPU |
| Coqui TTS | TTS with 1100+ languages and voice cloning | CPU, CUDA 12/13, ROCm, Intel, Metal |
| Kokoro | Lightweight TTS (82M params) | CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
| Chatterbox | Production-grade TTS with emotion control | CPU, CUDA 12/13, Metal, Jetson L4T |
| VibeVoice | Real-time TTS with voice cloning | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
| Qwen3-TTS | TTS with custom voice, voice design, and voice cloning | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
| fish-speech | High-quality TTS with voice cloning | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
| Pocket TTS | Lightweight CPU-efficient TTS with voice cloning | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
| OuteTTS | TTS with custom speaker voices | CPU, CUDA 12 |
| faster-qwen3-tts | Real-time Qwen3-TTS with CUDA graph capture | CUDA 12/13, Jetson L4T |
| NeuTTS Air | Instant voice cloning TTS | CPU, CUDA 12, ROCm |
| VoxCPM | Expressive end-to-end TTS | CPU, CUDA 12/13, ROCm, Intel, Metal |
| Kitten TTS | Kitten TTS model | CPU, Metal |
| MLX-Audio | Audio models on Apple Silicon | Metal, CPU, CUDA 12/13, Jetson L4T |
| Backend | Description | Acceleration |
|---|---|---|
| ACE-Step | Music generation from text descriptions, lyrics, or audio | CPU, CUDA 12/13, ROCm, Intel, Metal |
| acestep.cpp | ACE-Step 1.5 C++ backend using GGML | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
| Backend | Description | Acceleration |
|---|---|---|
| stable-diffusion.cpp | Stable Diffusion, Flux, PhotoMaker in C/C++ | CPU, CUDA 12/13, Intel SYCL, Vulkan, Metal, Jetson L4T |
| diffusers | HuggingFace diffusion models (image and video generation) | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
| Backend | Description | Acceleration |
|---|---|---|
| RF-DETR | Real-time transformer-based object detection | CPU, CUDA 12/13, Intel, Metal, Jetson L4T |
| rerankers | Document reranking for RAG | CUDA 12/13, ROCm, Intel, Metal |
| local-store | Local vector database for embeddings | CPU, Metal |
| Silero VAD | Voice Activity Detection | CPU |
| TRL | Fine-tuning (SFT, DPO, GRPO, RLOO, KTO, ORPO) | CPU, CUDA 12/13 |
| llama.cpp quantization | HuggingFace → GGUF model conversion and quantization | CPU, Metal |
| Opus | Audio codec for WebRTC / Realtime API | CPU, Metal |
Note: any backend name listed above can be used in the backend field of the model configuration file (See [the advanced section]({{%relref "advanced" %}})).