docs/content/getting-started/models.md
+++ disableToc = false title = "Setting Up Models" weight = 2 icon = "hub" description = "Learn how to install, configure, and manage models in LocalAI" +++
This section covers everything you need to know about installing and configuring models in LocalAI. You'll learn multiple methods to get models running.
The Model Gallery is the simplest way to install models. It provides pre-configured models ready to use.
http://localhost:8080For more details, refer to the [Gallery Documentation]({{% relref "features/model-gallery" %}}).
# List available models
local-ai models list
# Install a specific model
local-ai models install llama-3.2-1b-instruct:q4_k_m
# Start LocalAI with a model from the gallery
local-ai run llama-3.2-1b-instruct:q4_k_m
To run models available in the LocalAI gallery, you can use the model name as the URI. For example, to run LocalAI with the Hermes model, execute:
local-ai run hermes-2-theta-llama-3-8b
To install only the model, use:
local-ai models install hermes-2-theta-llama-3-8b
Note: The galleries available in LocalAI can be customized to point to a different URL or a local directory. For more information on how to setup your own gallery, see the [Gallery Documentation]({{% relref "features/model-gallery" %}}).
Visit models.localai.io to browse all available models in your browser.
The WebUI provides a powerful model import interface that supports both simple and advanced configuration:
http://localhost:8080https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct-GGUF)For full control over model configuration:
The advanced editor includes:
This is especially useful for:
LocalAI can directly install models from Hugging Face:
# Install and run a model from Hugging Face
local-ai run huggingface://TheBloke/phi-2-GGUF
The format is: huggingface://<repository>/<model-file> (<model-file> is optional)
local-ai run huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf
local-ai run ollama://gemma:2b
local-ai run oci://localai/phi-2:latest
To run models via URI, specify a URI to a model file or a configuration file when starting LocalAI. Valid syntax includes:
file://path/to/model (absolute path to a file within your models directory)huggingface://repository_id/model_file (e.g., huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf)oci://container_image:tag, ollama://model_id:taghttps://gist.githubusercontent.com/.../phi-2.yaml{{% notice note %}}
When using file:// URLs, the path must point to a file within your models directory (specified by MODELS_PATH). Files outside this directory are rejected for security reasons.
{{% /notice %}}
Configuration files can be used to customize the model defaults and settings. For advanced configurations, refer to the [Customize Models section]({{% relref "getting-started/customize-model" %}}).
local-ai run huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf
local-ai run ollama://gemma:2b
local-ai run https://gist.githubusercontent.com/.../phi-2.yaml
local-ai run oci://localai/phi-2:latest
For full control, you can manually download and configure models.
Download a GGUF model file. Popular sources:
Example:
mkdir -p models
wget https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf \
-O models/phi-2.Q4_K_M.gguf
Create a YAML file to configure the model:
# models/phi-2.yaml
name: phi-2
parameters:
model: phi-2.Q4_K_M.gguf
temperature: 0.7
context_size: 2048
threads: 4
backend: llama-cpp
Customize model defaults and settings with a configuration file. For advanced configurations, refer to the [Advanced Documentation]({{% relref "advanced" %}}).
Choose one of the following methods to run LocalAI:
{{< tabs >}} {{% tab title="Docker" %}}
mkdir models
cp your-model.gguf models/
docker run -p 8080:8080 -v $PWD/models:/models -ti --rm quay.io/go-skynet/local-ai:latest --models-path /models --context-size 700 --threads 4
curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
"model": "your-model.gguf",
"prompt": "A long time ago in a galaxy far, far away",
"temperature": 0.7
}'
{{% notice tip %}} Other Docker Images:
For other Docker images, please refer to the table in [the container images section]({{% relref "getting-started/container-images" %}}). {{% /notice %}}
mkdir models
wget https://huggingface.co/TheBloke/Luna-AI-Llama2-Uncensored-GGUF/resolve/main/luna-ai-llama2-uncensored.Q4_0.gguf -O models/luna-ai-llama2
cp -rf prompt-templates/getting_started.tmpl models/luna-ai-llama2.tmpl
docker run -p 8080:8080 -v $PWD/models:/models -ti --rm quay.io/go-skynet/local-ai:latest --models-path /models --context-size 700 --threads 4
curl http://localhost:8080/v1/models
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "luna-ai-llama2",
"messages": [{"role": "user", "content": "How are you?"}],
"temperature": 0.9
}'
{{% notice note %}}
{{% /tab %}} {{% tab title="Docker Compose" %}}
git clone https://github.com/go-skynet/LocalAI
cd LocalAI
cp your-model.gguf models/
docker compose up -d --pull always
curl http://localhost:8080/v1/models
curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
"model": "your-model.gguf",
"prompt": "A long time ago in a galaxy far, far away",
"temperature": 0.7
}'
{{% notice tip %}} Other Docker Images:
For other Docker images, please refer to the table in Getting Started. {{% /notice %}}
Note: If you are on Windows, ensure the project is on the Linux filesystem to avoid slow model loading. For more information, see the Microsoft Docs.
{{% /tab %}} {{% tab title="Kubernetes" %}}
For Kubernetes deployment, see the [Kubernetes installation guide]({{% relref "installation/kubernetes" %}}).
{{% /tab %}} {{% tab title="From Binary" %}}
LocalAI binary releases are available on GitHub.
# With binary
local-ai --models-path ./models
{{% notice tip %}} If installing on macOS, you might encounter a message saying:
"local-ai-git-Darwin-arm64" (or the name you gave the binary) can't be opened because Apple cannot check it for malicious software.
Hit OK, then go to Settings > Privacy & Security > Security and look for the message:
"local-ai-git-Darwin-arm64" was blocked from use because it is not from an identified developer.
Press "Allow Anyway." {{% /notice %}}
{{% /tab %}} {{% tab title="From Source" %}}
For instructions on building LocalAI from source, see the [Build from Source guide]({{% relref "installation/build" %}}).
{{% /tab %}} {{< /tabs >}}
For instructions on GPU acceleration, visit the [GPU Acceleration]({{% relref "features/gpu-acceleration" %}}) page.
For more model configurations, visit the Examples Section.
Models come in different quantization levels (quality vs. size trade-off):
| Quantization | Size | Quality | Use Case |
|---|---|---|---|
| Q8_0 | Largest | Highest | Best quality, requires more RAM |
| Q6_K | Large | Very High | High quality |
| Q4_K_M | Medium | High | Balanced (recommended) |
| Q4_K_S | Small | Medium | Lower RAM usage |
| Q2_K | Smallest | Lower | Minimal RAM, lower quality |
Consider:
Create a YAML file in your models directory:
name: my-model
parameters:
model: model.gguf
temperature: 0.7
top_p: 0.9
context_size: 2048
threads: 4
backend: llama-cpp
See the [Model Configuration]({{% relref "advanced/model-configuration" %}}) guide for all available options.
# Via API
curl http://localhost:8080/v1/models
# Via CLI
local-ai models list
Simply delete the model file and configuration from your models directory:
rm models/model-name.gguf
rm models/model-name.yaml # if exists
Check backend: Ensure the required backend is installed
local-ai backends list
local-ai backends install llama-cpp # if needed
Check logs: Enable debug mode
DEBUG=true local-ai
Verify file: Ensure the model file is not corrupted
context_size in configurationCheck the [Compatibility Table]({{% relref "reference/compatibility-table" %}}) to ensure you're using the correct backend for your model.