backend/README.md
This directory contains the core backend infrastructure for LocalAI, including the gRPC protocol definition, multi-language Dockerfiles, and language-specific backend implementations.
LocalAI uses a unified gRPC-based architecture that allows different programming languages to implement AI backends while maintaining consistent interfaces and capabilities. The backend system supports multiple hardware acceleration targets and provides a standardized way to integrate various AI models and frameworks.
backend.proto)The backend.proto file defines the gRPC service interface that all backends must implement. This ensures consistency across different language implementations and provides a contract for communication between LocalAI core and backend services.
Predict, PredictStream for LLM inferenceEmbedding for text vectorizationGenerateImage for stable diffusion and image modelsAudioTranscription, TTS, SoundGenerationGenerateVideo for video synthesisDetect for computer vision tasksStoresSet, StoresGet, StoresFind for RAG operationsRerank for document relevance scoringVAD for audio segmentationPredictOptions: Comprehensive configuration for text generationModelOptions: Model loading and configuration parametersResult: Standardized response formatStatusResponse: Backend health and memory usage informationThe backend system provides language-specific Dockerfiles that handle the build environment and dependencies for different programming languages:
Dockerfile.pythonDockerfile.golangDockerfile.llama-cpppython/)go/)cpp/)index.yaml)The index.yaml file serves as a central registry for all available backends, providing:
Example of build commands with Docker
# Build Python backend
docker build -f backend/Dockerfile.python \
--build-arg BACKEND=transformers \
--build-arg BUILD_TYPE=cublas12 \
--build-arg CUDA_MAJOR_VERSION=12 \
--build-arg CUDA_MINOR_VERSION=0 \
-t localai-backend-transformers .
# Build Go backend
docker build -f backend/Dockerfile.golang \
--build-arg BACKEND=whisper \
--build-arg BUILD_TYPE=cpu \
-t localai-backend-whisper .
# Build C++ backend
docker build -f backend/Dockerfile.llama-cpp \
--build-arg BACKEND=llama-cpp \
--build-arg BUILD_TYPE=cublas12 \
-t localai-backend-llama-cpp .
For ARM64/Mac builds, docker can't be used, and the makefile in the respective backend has to be used.
cpu: CPU-only optimizationcublas12, cublas13: CUDA 12.x, 13.x with cuBLAShipblas: ROCm with rocBLASintel: Intel oneAPI optimizationvulkan: Vulkan-based accelerationmetal: Apple Metal optimizationbackend.protoindex.yamlbackend-name/
├── backend.py/go/cpp # Main implementation
├── requirements.txt # Dependencies
├── Dockerfile # Build configuration
├── install.sh # Installation script
├── run.sh # Execution script
├── test.sh # Test script
└── README.md # Backend documentation
At minimum, backends must implement:
Health() - Service health checkLoadModel() - Model loading and initializationPredict() - Main inference endpointStatus() - Backend status and metricsBackends communicate with LocalAI core through gRPC:
LoadModelPredict or specialized endpointsWhen contributing to the backend system: