content/manuals/ai/compose/models-and-compose.md
{{< summary-bar feature_name="Compose models" >}}
Compose lets you define AI models as core components of your application, so you can declare model dependencies alongside services and run the application on any platform that supports the Compose Specification.
Compose models are a standardized way to define AI model dependencies in your application. By using the models top-level element in your Compose file, you can:
To define models in your Compose application, use the models top-level element:
services:
chat-app:
image: my-chat-app
models:
- llm
models:
llm:
model: ai/smollm2
This example defines:
chat-app that uses a model named llmllm that references the ai/smollm2 model imageModels support various configuration options:
models:
llm:
model: ai/smollm2
context_size: 1024
runtime_flags:
- "--a-flag"
- "--another-flag=42"
Common configuration options include:
model (required): The OCI artifact identifier for the model. This is what Compose pulls and runs via the model runner.
context_size: Defines the maximum token context size for the model.
[!NOTE] Each model has its own maximum context size. When increasing the context length, consider your hardware constraints. In general, try to keep context size as small as feasible for your specific needs.
runtime_flags: A list of raw command-line flags passed to the inference engine when the model is started.
See Configuration options for commonly used parameters and examples.
Platform-specific options may also be available via extension attributes x-*
[!TIP] See more example in the Common runtime configurations section.
Services can reference models in two ways: short syntax and long syntax.
The short syntax is the simplest way to bind a model to a service:
services:
app:
image: my-app
models:
- llm
- embedding-model
models:
llm:
model: ai/smollm2
embedding-model:
model: ai/all-minilm
With short syntax, the platform automatically generates environment variables based on the model name:
LLM_URL - URL to access the LLM modelLLM_MODEL - Model identifier for the LLM modelEMBEDDING_MODEL_URL - URL to access the embedding-modelEMBEDDING_MODEL_MODEL - Model identifier for the embedding-modelThe long syntax allows you to customize environment variable names:
services:
app:
image: my-app
models:
llm:
endpoint_var: AI_MODEL_URL
model_var: AI_MODEL_NAME
embedding-model:
endpoint_var: EMBEDDING_URL
model_var: EMBEDDING_NAME
models:
llm:
model: ai/smollm2
embedding-model:
model: ai/all-minilm
With this configuration, your service receives:
AI_MODEL_URL and AI_MODEL_NAME for the LLM modelEMBEDDING_URL and EMBEDDING_NAME for the embedding modelOne of the key benefits of using Compose models is portability across different platforms that support the Compose specification.
When Docker Model Runner is enabled:
services:
chat-app:
image: my-chat-app
models:
llm:
endpoint_var: AI_MODEL_URL
model_var: AI_MODEL_NAME
models:
llm:
model: ai/smollm2
context_size: 4096
runtime_flags:
- "--no-prefill-assistant"
Docker Model Runner will:
The same Compose file can run on cloud providers that support Compose models:
services:
chat-app:
image: my-chat-app
models:
- llm
models:
llm:
model: ai/smollm2
# Cloud-specific configurations
x-cloud-options:
- "cloud.instance-type=gpu-small"
- "cloud.region=us-west-2"
Cloud providers might:
Below are some example configurations for various use cases.
services:
app:
image: app
models:
dev_model:
endpoint_var: DEV_URL
model_var: DEV_MODEL
models:
dev_model:
model: ai/model
context_size: 4096
runtime_flags:
- "--verbose" # Set verbosity level to infinity
- "--verbose-prompt" # Print a verbose prompt before generation
- "--log-prefix" # Enable prefix in log messages
- "--log-timestamps" # Enable timestamps in log messages
- "--log-colors" # Enable colored logging
services:
app:
image: app
models:
conservative_model:
endpoint_var: CONSERVATIVE_URL
model_var: CONSERVATIVE_MODEL
models:
conservative_model:
model: ai/model
context_size: 4096
runtime_flags:
- "--temp" # Temperature
- "0.1"
- "--top-k" # Top-k sampling
- "1"
- "--reasoning-budget" # Disable reasoning
- "0"
services:
app:
image: app
models:
creative_model:
endpoint_var: CREATIVE_URL
model_var: CREATIVE_MODEL
models:
creative_model:
model: ai/model
context_size: 4096
runtime_flags:
- "--temp" # Temperature
- "1"
- "--top-p" # Top-p sampling
- "0.9"
services:
app:
image: app
models:
deterministic_model:
endpoint_var: DET_URL
model_var: DET_MODEL
models:
deterministic_model:
model: ai/model
context_size: 4096
runtime_flags:
- "--temp" # Temperature
- "0"
- "--top-k" # Top-k sampling
- "1"
services:
app:
image: app
models:
concurrent_model:
endpoint_var: CONCURRENT_URL
model_var: CONCURRENT_MODEL
models:
concurrent_model:
model: ai/model
context_size: 2048
runtime_flags:
- "--threads" # Number of threads to use during generation
- "8"
- "--mlock" # Lock memory to prevent swapping
services:
app:
image: app
models:
rich_vocab_model:
endpoint_var: RICH_VOCAB_URL
model_var: RICH_VOCAB_MODEL
models:
rich_vocab_model:
model: ai/model
context_size: 4096
runtime_flags:
- "--temp" # Temperature
- "0.1"
- "--top-p" # Top-p sampling
- "0.9"
When using embedding models with the /v1/embeddings endpoint, you must include the --embeddings runtime flag for the model to be properly configured.
services:
app:
image: app
models:
embedding_model:
endpoint_var: EMBEDDING_URL
model_var: EMBEDDING_MODEL
models:
embedding_model:
model: ai/all-minilm
context_size: 2048
runtime_flags:
- "--embeddings" # Required for embedding models
[!IMPORTANT]
This approach is deprecated. Use the
modelstop-level element instead.
You can also use the provider service type, which allows you to declare platform capabilities required by your application.
For AI models, you can use the model type to declare model dependencies.
To define a model provider:
services:
chat:
image: my-chat-app
depends_on:
- ai_runner
ai_runner:
provider:
type: model
options:
model: ai/smollm2
context-size: 1024
runtime-flags: "--no-prefill-assistant"
models top-level elementmodels attribute