docs/src/content/docs/reference/environment-variables.md
User-facing environment variables read by mistralrs or its build scripts. Standard Cargo build variables such as OUT_DIR and TARGET are omitted.
| Variable | Purpose |
|---|---|
HF_HOME | Root of the Hugging Face cache. Default ~/.cache/huggingface. |
HF_HUB_CACHE | Hugging Face hub cache location. |
HF_TOKEN | Auth token. Overrides any token saved by mistralrs login at $HF_HOME/token. |
HF_HUB_TOKEN | Auth token fallback when HF_TOKEN is not set. |
HF_HUB_OFFLINE | Set to 1/true/yes/on to disable all Hugging Face Hub network calls. Files and listings are then served only from $HF_HUB_CACHE/$HF_HOME/hub, and a missing file errors out. Also skips the mistralrs doctor connectivity check. |
If --token-source env:NAME is used, mistral.rs reads the environment variable named by NAME as the token source.
For the offline workflow (pre-downloading models, local paths), see run any model.
| Variable | Purpose |
|---|---|
RUST_LOG | Override the tracing log filter. Examples: mistralrs_core=debug,tower_http=info, trace. CLI users can usually use -v or -vv instead. |
MISTRALRS_DEBUG | MISTRALRS_DEBUG=1 enables extra debug-level engine tracing. |
| Variable | Purpose |
|---|---|
MISTRALRS_NO_MMAP | MISTRALRS_NO_MMAP=1 loads safetensors without mmap. |
MISTRALRS_ISQ_SINGLETHREAD | If set, runs ISQ (in-situ quantization) single-threaded. |
| Variable | Purpose |
|---|---|
MISTRALRS_SANDBOX | auto, on, or off. Overrides the sandbox only when the resolved mode is auto; on and off in CLI/TOML win. See sandbox reference. |
| Variable | Purpose |
|---|---|
MCP_CONFIG_PATH | MCP (Model Context Protocol) client configuration path used when --mcp-config is not passed. |
KEEP_ALIVE_INTERVAL | SSE (Server-Sent Events) keep-alive interval in milliseconds. Falls back to the default if missing or invalid. |
XDG_CACHE_HOME | Base cache directory for web UI state. The UI uses $XDG_CACHE_HOME/mistralrs. |
HOME | Fallback for web UI cache path when XDG_CACHE_HOME is not set. |
| Variable | Purpose |
|---|---|
MISTRALRS_CUDA_GRAPHS | CUDA decode graph capture and replay is enabled by default for supported paged-attention decode steps. Set to 0, false, no, or off to disable. See CUDA graphs. |
MISTRALRS_FLASHINFER_DECODE | Set to 0, false, no, or off to disable the FlashInfer (paged-attention kernel library) paged decode/cache layout and use the generic paged KV-cache layout instead. Defaults to enabled on CUDA when compatible. |
MISTRALRS_NO_MLA | MISTRALRS_NO_MLA=1 disables the MLA (Multi-head Latent Attention) path for DeepSeek V2/V3. Generic attention is used instead. |
MISTRALRS_MOE_BACKEND | Forces the MoE (Mixture of Experts) expert backend: cutile, cutlass, fused (also wmma, native, legacy), or fast. Default is automatic selection. See MoE expert backends. |
CUTILE_TILEIRAS_PATH | Path to a specific tileiras binary for the cuTile JIT instead of resolving it from PATH. |
| Variable | Purpose |
|---|---|
MISTRALRS_NO_NCCL | MISTRALRS_NO_NCCL=1 disables NCCL at runtime; single-machine CUDA multi-GPU then falls back to layer mapping. When using the ring backend on a binary also built with nccl, set this so the ring backend is selected. |
MISTRALRS_MN_GLOBAL_WORLD_SIZE | Total NCCL tensor-parallel world size across nodes. Presence of this variable enables multi-node NCCL mode. |
MISTRALRS_MN_LOCAL_WORLD_SIZE | Local NCCL tensor-parallel size contributed by each node. |
MISTRALRS_MN_HEAD_NUM_WORKERS | Set on the head node: number of worker nodes. |
MISTRALRS_MN_HEAD_PORT | Set on the head node: listening port for worker connections. |
MISTRALRS_MN_WORKER_SERVER_ADDR | Set on worker nodes: address of the head node. |
MISTRALRS_MN_WORKER_ID | Set on worker nodes: worker index (0-based). |
RING_CONFIG | Path to the ring backend JSON config. Setting it selects the ring backend when built with the ring feature. If the binary also has nccl, set MISTRALRS_NO_NCCL=1 as well. |
See the distributed inference guide for use.
| Variable | Purpose |
|---|---|
MISTRALRS_IGPU_MEMORY_FRACTION | Fraction of integrated GPU memory usable on CUDA systems with iGPUs. Default 0.75. |
These are read by build scripts, not at runtime.
| Variable | Purpose |
|---|---|
MISTRALRS_METAL_PRECOMPILE | MISTRALRS_METAL_PRECOMPILE=0 skips Metal kernel precompilation at build time; kernels are compiled at runtime on first use. |
CUDA_NVCC_FLAGS | Extra compiler options passed to CUDA builds. |
MISTRALRS_INSTALL_TAG | Pins the installers to a specific release tag (e.g. v0.8.6): the prebuilt is downloaded from that release, and a source build checks out that git tag. Default is the latest stable release (prebuilt) or latest master (source). |
MISTRALRS_INSTALL_FROM_SOURCE | MISTRALRS_INSTALL_FROM_SOURCE=1 makes the shell and PowerShell installers skip the prebuilt download and build from the latest master (bleeding edge) instead of the latest stable release. |
MISTRALRS_INSTALL_NCCL | MISTRALRS_INSTALL_NCCL=1 forces the shell and PowerShell installers to add the nccl feature for CUDA builds even if NCCL is not detected. |
MISTRALRS_INSTALL_NO_NCCL | MISTRALRS_INSTALL_NO_NCCL=1 makes the shell and PowerShell installers skip the nccl feature. |
MISTRALRS_INSTALL_YES | MISTRALRS_INSTALL_YES=1 auto-confirms every installer prompt (non-interactive installs for CI/containers; used by mistralrs update). |
MISTRALRS_INSTALL_IGNORE_FFMPEG | MISTRALRS_INSTALL_IGNORE_FFMPEG=1 skips the installer's FFmpeg step, leaving any existing FFmpeg untouched. |
MISTRALRS_BUILD_NCCL | MISTRALRS_BUILD_NCCL=1 forces scripts/build_wheels.py to add nccl to Linux CUDA wheels. |
MISTRALRS_BUILD_NO_NCCL | MISTRALRS_BUILD_NO_NCCL=1 makes scripts/build_wheels.py skip nccl for CUDA wheels. |
MISTRALRS_GIT_REVISION | Git revision embedded in the binary by the build script. |
Not intended for direct use.
| Variable | Purpose |
|---|---|
__MISTRALRS_DAEMON_INTERNAL | Set by the engine on spawned worker processes. |