Back to Localai

Compatibility Table

docs/content/reference/compatibility-table.md

4.1.37.5 KB
Original Source

+++ disableToc = false title = "Model compatibility table" weight = 24 url = "/model-compatibility/" +++

Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the backends, compatible models families and the associated repository.

{{% notice note %}}

LocalAI will attempt to automatically load models which are not explicitly configured for a specific backend. You can specify the backend to use by configuring a model with a YAML file. See [the advanced section]({{%relref "advanced" %}}) for more details.

{{% /notice %}}

Text Generation & Language Models

BackendDescriptionCapabilityEmbeddingsStreamingAcceleration
llama.cppLLM inference in C/C++. Supports LLaMA, Mamba, RWKV, Falcon, Starcoder, GPT-2, and many othersGPT, FunctionsyesyesCPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T
vLLMFast LLM serving with PagedAttentionGPTnonoCUDA 12, ROCm, Intel
vLLM OmniUnified multimodal generation (text, image, video, audio)Multimodal GPTnonoCUDA 12, ROCm
transformersHuggingFace Transformers frameworkGPT, Embeddings, Multimodalyesyes*CPU, CUDA 12/13, ROCm, Intel, Metal
MLXApple Silicon LLM inferenceGPTnonoMetal
MLX-VLMVision-Language Models on Apple SiliconMultimodal GPTnonoMetal
MLX DistributedDistributed LLM inference across multiple Apple Silicon MacsGPTnonoMetal

Speech-to-Text

BackendDescriptionAcceleration
whisper.cppOpenAI Whisper in C/C++CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T
faster-whisperFast Whisper with CTranslate2CUDA 12/13, ROCm, Intel, Metal
WhisperXWord-level timestamps and speaker diarizationCPU, CUDA 12/13, ROCm, Metal
moonshineUltra-fast transcription for low-end devicesCPU, CUDA 12/13, Metal
voxtralVoxtral Realtime 4B speech-to-text in CCPU, Metal
Qwen3-ASRQwen3 automatic speech recognitionCPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T
NeMoNVIDIA NeMo ASR toolkitCPU, CUDA 12/13, ROCm, Intel, Metal

Text-to-Speech

BackendDescriptionAcceleration
piperFast neural TTSCPU
Coqui TTSTTS with 1100+ languages and voice cloningCPU, CUDA 12/13, ROCm, Intel, Metal
KokoroLightweight TTS (82M params)CUDA 12/13, ROCm, Intel, Metal, Jetson L4T
ChatterboxProduction-grade TTS with emotion controlCPU, CUDA 12/13, Metal, Jetson L4T
VibeVoiceReal-time TTS with voice cloningCPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T
Qwen3-TTSTTS with custom voice, voice design, and voice cloningCPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T
fish-speechHigh-quality TTS with voice cloningCPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T
Pocket TTSLightweight CPU-efficient TTS with voice cloningCPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T
OuteTTSTTS with custom speaker voicesCPU, CUDA 12
faster-qwen3-ttsReal-time Qwen3-TTS with CUDA graph captureCUDA 12/13, Jetson L4T
NeuTTS AirInstant voice cloning TTSCPU, CUDA 12, ROCm
VoxCPMExpressive end-to-end TTSCPU, CUDA 12/13, ROCm, Intel, Metal
Kitten TTSKitten TTS modelCPU, Metal
MLX-AudioAudio models on Apple SiliconMetal, CPU, CUDA 12/13, Jetson L4T

Music Generation

BackendDescriptionAcceleration
ACE-StepMusic generation from text descriptions, lyrics, or audioCPU, CUDA 12/13, ROCm, Intel, Metal
acestep.cppACE-Step 1.5 C++ backend using GGMLCPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T

Image & Video Generation

BackendDescriptionAcceleration
stable-diffusion.cppStable Diffusion, Flux, PhotoMaker in C/C++CPU, CUDA 12/13, Intel SYCL, Vulkan, Metal, Jetson L4T
diffusersHuggingFace diffusion models (image and video generation)CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T

Specialized Tasks

BackendDescriptionAcceleration
RF-DETRReal-time transformer-based object detectionCPU, CUDA 12/13, Intel, Metal, Jetson L4T
rerankersDocument reranking for RAGCUDA 12/13, ROCm, Intel, Metal
local-storeLocal vector database for embeddingsCPU, Metal
Silero VADVoice Activity DetectionCPU
TRLFine-tuning (SFT, DPO, GRPO, RLOO, KTO, ORPO)CPU, CUDA 12/13
llama.cpp quantizationHuggingFace → GGUF model conversion and quantizationCPU, Metal
OpusAudio codec for WebRTC / Realtime APICPU, Metal

Acceleration Support Summary

GPU Acceleration

  • NVIDIA CUDA: CUDA 12.0, CUDA 13.0 support across most backends
  • AMD ROCm: HIP-based acceleration for AMD GPUs
  • Intel oneAPI: SYCL-based acceleration for Intel GPUs (F16/F32 precision)
  • Vulkan: Cross-platform GPU acceleration
  • Metal: Apple Silicon GPU acceleration (M1/M2/M3+)

Specialized Hardware

  • NVIDIA Jetson (L4T CUDA 12): ARM64 support for embedded AI (AGX Orin, Jetson Nano, Jetson Xavier NX, Jetson AGX Xavier)
  • NVIDIA Jetson (L4T CUDA 13): ARM64 support for embedded AI (DGX Spark)
  • Apple Silicon: Native Metal acceleration for Mac M1/M2/M3+
  • Darwin x86: Intel Mac support

CPU Optimization

  • AVX/AVX2/AVX512: Advanced vector extensions for x86
  • Quantization: 4-bit, 5-bit, 8-bit integer quantization support
  • Mixed Precision: F16/F32 mixed precision support

Note: any backend name listed above can be used in the backend field of the model configuration file (See [the advanced section]({{%relref "advanced" %}})).

  • * Only for CUDA and OpenVINO CPU/XPU acceleration.