Back to Localai

Compatibility Table

docs/content/reference/compatibility-table.md

4.5.012.5 KB
Original Source

+++ disableToc = false title = "Model compatibility table" weight = 24 url = "/model-compatibility/" +++

Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the backends, compatible models families and the associated repository.

{{% notice note %}}

LocalAI will attempt to automatically load models which are not explicitly configured for a specific backend. You can specify the backend to use by configuring a model with a YAML file. See [the advanced section]({{%relref "advanced" %}}) for more details.

All backends listed here can be installed on demand from the [Backend Gallery]({{%relref "features/backends" %}}). The exact set of acceleration variants published for each backend is defined in backend/index.yaml.

{{% /notice %}}

Text Generation & Language Models

BackendDescriptionCapabilityEmbeddingsStreamingAcceleration
llama.cppLLM inference in C/C++. Supports LLaMA, Mamba, RWKV, Falcon, Starcoder, GPT-2, and many othersGPT, FunctionsyesyesCPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T
ik_llama.cppHard fork of llama.cpp optimized for CPU/hybrid CPU+GPU with IQK quants, custom quant mixes, and MLA for DeepSeekGPTyesyesCPU (AVX2+)
turboquantllama.cpp fork adding the TurboQuant KV-cache quantization schemeGPTyesyesCPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Jetson L4T
ds4DeepSeek V4 Flash single-model inference engine, optimized for Metal and CUDAGPTnoyesCPU, CUDA 12/13, Metal, Jetson L4T
vLLMFast LLM serving with PagedAttention; GPTQ/AWQ/FP8 quantizationGPT, Functions, MultimodalnoyesCUDA 12/13, ROCm, Intel SYCL, Jetson L4T
vLLM OmniUnified multimodal generation (text, image, video, audio) on top of vLLMMultimodal GPT, FunctionsnoyesCUDA 12/13, ROCm, Jetson L4T
SGLangFast serving framework for LLMs and vision-language models with speculative decodingGPT, Functions, MultimodalnoyesCUDA 12/13, ROCm, Intel SYCL, Jetson L4T
transformersHuggingFace Transformers frameworkGPT, Embeddings, Multimodalyesyes*CUDA 12/13, ROCm, Intel SYCL, Metal
MLXApple Silicon LLM inferenceGPT, FunctionsnoyesCPU, CUDA 12/13, Metal, Jetson L4T
MLX-VLMVision-Language Models on Apple SiliconMultimodal GPT, FunctionsnoyesCPU, CUDA 12/13, Metal, Jetson L4T
MLX DistributedDistributed LLM inference across multiple Apple Silicon MacsGPTnonoCPU, CUDA 12/13, Metal, Jetson L4T
tinygradMinimalist deep-learning framework with zero runtime dependenciesGPT, Embeddings, MultimodalyesyesCPU

Speech-to-Text

BackendDescriptionAcceleration
whisper.cppOpenAI Whisper in C/C++CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T
faster-whisperFast Whisper with CTranslate2CPU, CUDA 12/13, ROCm, Intel SYCL, Metal, Jetson L4T
WhisperXWord-level timestamps and speaker diarizationCPU, CUDA 12/13, Metal, Jetson L4T
moonshineUltra-fast transcription for low-end devices (ONNX)CPU, CUDA 12/13, Metal
parakeet.cppC++/GGML port of NVIDIA NeMo Parakeet (tdt/ctc/rnnt/hybrid), with cache-aware streamingCPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T
CrispASRUnified speech engine (whisper.cpp fork) supporting Parakeet, Canary, and many ASR architectures, plus TTSCPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T
voxtralVoxtral Realtime 4B speech-to-text in pure CCPU, Metal
Qwen3-ASRQwen3 automatic speech recognitionCPU, CUDA 12/13, ROCm, Intel SYCL, Metal, Jetson L4T
NeMoNVIDIA NeMo ASR toolkitCPU, CUDA 12/13, ROCm, Intel SYCL, Metal
sherpa-onnxSherpa-ONNX ASR (Whisper, Paraformer, SenseVoice) and TTSCPU, CUDA 12, Metal

Text-to-Speech

BackendDescriptionAcceleration
piperFast neural TTSCPU, Metal
Coqui TTSTTS with 1100+ languages and voice cloningCUDA 12, ROCm, Intel SYCL, Metal
KokoroLightweight TTS (82M params)CUDA 12/13, ROCm, Intel SYCL, Metal, Jetson L4T
KokorosPure Rust Kokoro TTS via ONNXCPU
ChatterboxProduction-grade TTS with emotion controlCPU, CUDA 12/13, Metal, Jetson L4T
VibeVoiceReal-time TTS with voice cloningCPU, CUDA 12/13, ROCm, Intel SYCL, Metal, Jetson L4T
vibevoice.cppNative C++/GGML port of VibeVoice for TTS (voice cloning) and long-form ASR with diarizationCPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T
Qwen3-TTSTTS with custom voice, voice design, and voice cloningCPU, CUDA 12/13, ROCm, Intel SYCL, Metal, Jetson L4T
qwentts.cppNative C++/GGML Qwen3-TTS with streaming, named speakers, and voice designCPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T
OmniVoiceNative C++/GGML TTS with voice cloning, voice design, and streamingCPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T
fish-speechHigh-quality TTS with voice cloningCPU, CUDA 12/13, ROCm, Intel SYCL, Metal, Jetson L4T
Pocket TTSLightweight CPU-efficient TTS with voice cloningCPU, CUDA 12/13, ROCm, Intel SYCL, Metal, Jetson L4T
OuteTTSTTS with custom speaker voicesCPU, CUDA 12
faster-qwen3-ttsReal-time Qwen3-TTS with CUDA graph captureCPU, CUDA 12/13, Jetson L4T
NeuTTS AirInstant voice cloning, on-device TTSCPU, CUDA 12, ROCm
VoxCPMExpressive end-to-end TTSCPU, CUDA 12/13, ROCm, Intel SYCL, Metal
Kitten TTSKitten TTS modelCPU, Metal
SupertonicLightning-fast on-device multilingual TTS via ONNXCPU
MLX-AudioAudio models on Apple SiliconCPU, CUDA 12/13, Metal, Jetson L4T
liquid-audioLFM2 end-to-end speech-to-speech, ASR, and TTSCPU, CUDA 12/13, ROCm, Intel SYCL, Jetson L4T

Music & Sound Generation

BackendDescriptionAcceleration
ACE-StepMusic generation from text descriptions, lyrics, or audioCPU, CUDA 12/13, ROCm, Intel SYCL, Metal
acestep.cppACE-Step 1.5 C++ backend using GGMLCPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T

Image & Video Generation

BackendDescriptionAcceleration
stable-diffusion.cppStable Diffusion, Flux, PhotoMaker, Ideogram in C/C++CPU, CUDA 12/13, Intel SYCL, Vulkan, Metal, Jetson L4T
diffusersHuggingFace diffusion models (image and video generation)CPU, CUDA 12/13, ROCm, Intel SYCL, Metal, Jetson L4T
vLLM OmniMultimodal generation including text-to-image and text-to-videoCUDA 12/13, ROCm, Jetson L4T

Vision, Detection & Recognition

BackendDescriptionAcceleration
RF-DETRReal-time transformer-based object detection (Python)CPU, CUDA 12/13, Intel SYCL, Metal, Jetson L4T
rf-detr.cppNative RF-DETR object detection and instance segmentation in C/C++ using GGMLCPU, CUDA 12/13, Intel SYCL, Vulkan, Jetson L4T
locate-anything.cppOpen-vocabulary object detection and visual grounding (LocateAnything-3B) in C/C++ using GGMLCPU, CUDA 12/13, Intel SYCL, Vulkan, Jetson L4T
depth-anything.cppDepth Anything 3 monocular metric depth + camera pose in C/C++ using GGMLCPU, CUDA 12/13, Intel SYCL, Vulkan, Jetson L4T
sam3.cppSegment Anything (SAM 3/2/EdgeTAM) with text/point/box prompts in C/C++ using GGMLCPU, CUDA 12/13, Intel SYCL, Vulkan, Jetson L4T
insightfaceFace verification, embedding, and anti-spoofing liveness (ONNX Runtime)CPU, CUDA 12
speaker-recognitionSpeaker (voice) recognition via SpeechBrain ECAPA-TDNNCPU, CUDA 12, Metal

Audio Processing

BackendDescriptionAcceleration
Silero VADVoice Activity DetectionCPU, Metal
LocalVQEJoint acoustic echo cancellation, noise suppression, and dereverberation in C/C++ using GGMLCPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Jetson L4T
OpusAudio codec for WebRTC / Realtime APICPU, Metal

Utilities & Other

BackendDescriptionAcceleration
rerankersDocument reranking for RAGCUDA 12, ROCm, Intel SYCL, Metal
privacy-filter.cppStandalone GGML engine for the openai-privacy-filter PII/NER token-classification model family (powers LocalAI's PII redaction tier)CPU, CUDA 13, Vulkan
local-storeLocal-first vector database for embeddingsCPU, Metal
TRLFine-tuning (SFT, DPO, GRPO, RLOO, KTO, ORPO)CPU, CUDA 12/13
llama.cpp quantizationHuggingFace → GGUF model conversion and quantizationCPU, Metal

Acceleration Support Summary

GPU Acceleration

  • NVIDIA CUDA: CUDA 12.0, CUDA 13.0 support across most backends
  • AMD ROCm: HIP-based acceleration for AMD GPUs
  • Intel oneAPI: SYCL-based acceleration for Intel GPUs (F16/F32 precision)
  • Vulkan: Cross-platform GPU acceleration
  • Metal: Apple Silicon GPU acceleration (M1/M2/M3+)

Specialized Hardware

  • NVIDIA Jetson (L4T CUDA 12): ARM64 support for embedded AI (AGX Orin, Jetson Nano, Jetson Xavier NX, Jetson AGX Xavier)
  • NVIDIA Jetson (L4T CUDA 13): ARM64 support for embedded AI (DGX Spark)
  • Apple Silicon: Native Metal acceleration for Mac M1/M2/M3+
  • Darwin x86: Intel Mac support

CPU Optimization

  • AVX/AVX2/AVX512: Advanced vector extensions for x86
  • Quantization: 4-bit, 5-bit, 8-bit integer quantization support
  • Mixed Precision: F16/F32 mixed precision support

Note: any backend name listed above can be used in the backend field of the model configuration file (See [the advanced section]({{%relref "advanced" %}})).

  • * Only for CUDA and OpenVINO CPU/XPU acceleration.