docs/content/reference/nvidia-l4t.md
+++ disableToc = false title = "Running on Nvidia ARM64" weight = 27 +++
LocalAI can be run on Nvidia ARM64 devices, such as the Jetson Nano, Jetson Xavier NX, Jetson AGX Orin, and Nvidia DGX Spark. The following instructions will guide you through building and using the LocalAI container for Nvidia ARM64 devices.
Pre-built images are available on quay.io and dockerhub:
docker pull quay.io/go-skynet/local-ai:latest-nvidia-l4t-arm64
# or
docker pull localai/localai:latest-nvidia-l4t-arm64
docker pull quay.io/go-skynet/local-ai:latest-nvidia-l4t-arm64-cuda-13
# or
docker pull localai/localai:latest-nvidia-l4t-arm64-cuda-13
If you need to build the container yourself, use the following commands:
git clone https://github.com/mudler/LocalAI
cd LocalAI
docker build --build-arg SKIP_DRIVERS=true --build-arg BUILD_TYPE=cublas --build-arg BASE_IMAGE=nvcr.io/nvidia/l4t-jetpack:r36.4.0 --build-arg IMAGE_TYPE=core -t quay.io/go-skynet/local-ai:master-nvidia-l4t-arm64-core .
git clone https://github.com/mudler/LocalAI
cd LocalAI
docker build --build-arg SKIP_DRIVERS=false --build-arg BUILD_TYPE=cublas --build-arg CUDA_MAJOR_VERSION=13 --build-arg CUDA_MINOR_VERSION=0 --build-arg BASE_IMAGE=ubuntu:24.04 --build-arg IMAGE_TYPE=core -t quay.io/go-skynet/local-ai:master-nvidia-l4t-arm64-cuda-13-core .
Run the LocalAI container on Nvidia ARM64 devices using the following commands, where /data/models is the directory containing the models:
docker run -e DEBUG=true -p 8080:8080 -v /data/models:/models -ti --restart=always --name local-ai --runtime nvidia --gpus all quay.io/go-skynet/local-ai:latest-nvidia-l4t-arm64
docker run -e DEBUG=true -p 8080:8080 -v /data/models:/models -ti --restart=always --name local-ai --runtime nvidia --gpus all quay.io/go-skynet/local-ai:latest-nvidia-l4t-arm64-cuda-13
Note: /data/models is the directory containing the models. You can replace it with the directory containing your models.
If you run a worker on a Jetson, DGX Spark (GB10), or Thor and the Nodes page in the frontend shows the node as fully used, check two things:
NVIDIA_DRIVER_CAPABILITIES must include utility so nvidia-smi /
NVML work inside the container. With --gpus all alone (or
--runtime nvidia without extra flags) only compute is wired in on
some driver versions. Add -e NVIDIA_DRIVER_CAPABILITIES=compute,utility
to your docker run, or capabilities: [gpu, utility] in compose /
Kubernetes device reservations.--init to docker run (or init: true in compose) so the
container has a proper PID 1 reaper — otherwise short-lived child
processes like nvidia-smi can intermittently fail with
waitid: no child processes.On unified-memory devices LocalAI auto-detects the SoC via
/sys/devices/soc0/{family,soc_id} and reports system RAM as VRAM, so
nvidia-smi is not strictly required for VRAM metrics. See
[Distributed Mode → NVIDIA GPU support]({{% relref "/features/distributed-mode#nvidia-gpu-support" %}})
for full context.