Back to Mistral Rs

Environment variables

docs/src/content/docs/reference/environment-variables.md

0.8.66.6 KB
Original Source

User-facing environment variables read by mistralrs or its build scripts. Standard Cargo build variables such as OUT_DIR and TARGET are omitted.

Hugging Face

VariablePurpose
HF_HOMERoot of the Hugging Face cache. Default ~/.cache/huggingface.
HF_HUB_CACHEHugging Face hub cache location.
HF_TOKENAuth token. Overrides any token saved by mistralrs login at $HF_HOME/token.
HF_HUB_TOKENAuth token fallback when HF_TOKEN is not set.
HF_HUB_OFFLINESet to 1/true/yes/on to disable all Hugging Face Hub network calls. Files and listings are then served only from $HF_HUB_CACHE/$HF_HOME/hub, and a missing file errors out. Also skips the mistralrs doctor connectivity check.

If --token-source env:NAME is used, mistral.rs reads the environment variable named by NAME as the token source.

For the offline workflow (pre-downloading models, local paths), see run any model.

Logging

VariablePurpose
RUST_LOGOverride the tracing log filter. Examples: mistralrs_core=debug,tower_http=info, trace. CLI users can usually use -v or -vv instead.
MISTRALRS_DEBUGMISTRALRS_DEBUG=1 enables extra debug-level engine tracing.

Quantization and loading

VariablePurpose
MISTRALRS_NO_MMAPMISTRALRS_NO_MMAP=1 loads safetensors without mmap.
MISTRALRS_ISQ_SINGLETHREADIf set, runs ISQ (in-situ quantization) single-threaded.

Sandbox

VariablePurpose
MISTRALRS_SANDBOXauto, on, or off. Overrides the sandbox only when the resolved mode is auto; on and off in CLI/TOML win. See sandbox reference.

Server and UI

VariablePurpose
MCP_CONFIG_PATHMCP (Model Context Protocol) client configuration path used when --mcp-config is not passed.
KEEP_ALIVE_INTERVALSSE (Server-Sent Events) keep-alive interval in milliseconds. Falls back to the default if missing or invalid.
XDG_CACHE_HOMEBase cache directory for web UI state. The UI uses $XDG_CACHE_HOME/mistralrs.
HOMEFallback for web UI cache path when XDG_CACHE_HOME is not set.

CUDA and attention kernels

VariablePurpose
MISTRALRS_CUDA_GRAPHSCUDA decode graph capture and replay is enabled by default for supported paged-attention decode steps. Set to 0, false, no, or off to disable. See CUDA graphs.
MISTRALRS_FLASHINFER_DECODESet to 0, false, no, or off to disable the FlashInfer (paged-attention kernel library) paged decode/cache layout and use the generic paged KV-cache layout instead. Defaults to enabled on CUDA when compatible.
MISTRALRS_NO_MLAMISTRALRS_NO_MLA=1 disables the MLA (Multi-head Latent Attention) path for DeepSeek V2/V3. Generic attention is used instead.
MISTRALRS_MOE_BACKENDForces the MoE (Mixture of Experts) expert backend: cutile, cutlass, fused (also wmma, native, legacy), or fast. Default is automatic selection. See MoE expert backends.
CUTILE_TILEIRAS_PATHPath to a specific tileiras binary for the cuTile JIT instead of resolving it from PATH.

Multi-GPU and multi-node

VariablePurpose
MISTRALRS_NO_NCCLMISTRALRS_NO_NCCL=1 disables NCCL at runtime; single-machine CUDA multi-GPU then falls back to layer mapping. When using the ring backend on a binary also built with nccl, set this so the ring backend is selected.
MISTRALRS_MN_GLOBAL_WORLD_SIZETotal NCCL tensor-parallel world size across nodes. Presence of this variable enables multi-node NCCL mode.
MISTRALRS_MN_LOCAL_WORLD_SIZELocal NCCL tensor-parallel size contributed by each node.
MISTRALRS_MN_HEAD_NUM_WORKERSSet on the head node: number of worker nodes.
MISTRALRS_MN_HEAD_PORTSet on the head node: listening port for worker connections.
MISTRALRS_MN_WORKER_SERVER_ADDRSet on worker nodes: address of the head node.
MISTRALRS_MN_WORKER_IDSet on worker nodes: worker index (0-based).
RING_CONFIGPath to the ring backend JSON config. Setting it selects the ring backend when built with the ring feature. If the binary also has nccl, set MISTRALRS_NO_NCCL=1 as well.

See the distributed inference guide for use.

GPU memory

VariablePurpose
MISTRALRS_IGPU_MEMORY_FRACTIONFraction of integrated GPU memory usable on CUDA systems with iGPUs. Default 0.75.

Build-time

These are read by build scripts, not at runtime.

VariablePurpose
MISTRALRS_METAL_PRECOMPILEMISTRALRS_METAL_PRECOMPILE=0 skips Metal kernel precompilation at build time; kernels are compiled at runtime on first use.
CUDA_NVCC_FLAGSExtra compiler options passed to CUDA builds.
MISTRALRS_INSTALL_TAGPins the installers to a specific release tag (e.g. v0.8.6): the prebuilt is downloaded from that release, and a source build checks out that git tag. Default is the latest stable release (prebuilt) or latest master (source).
MISTRALRS_INSTALL_FROM_SOURCEMISTRALRS_INSTALL_FROM_SOURCE=1 makes the shell and PowerShell installers skip the prebuilt download and build from the latest master (bleeding edge) instead of the latest stable release.
MISTRALRS_INSTALL_NCCLMISTRALRS_INSTALL_NCCL=1 forces the shell and PowerShell installers to add the nccl feature for CUDA builds even if NCCL is not detected.
MISTRALRS_INSTALL_NO_NCCLMISTRALRS_INSTALL_NO_NCCL=1 makes the shell and PowerShell installers skip the nccl feature.
MISTRALRS_INSTALL_YESMISTRALRS_INSTALL_YES=1 auto-confirms every installer prompt (non-interactive installs for CI/containers; used by mistralrs update).
MISTRALRS_INSTALL_IGNORE_FFMPEGMISTRALRS_INSTALL_IGNORE_FFMPEG=1 skips the installer's FFmpeg step, leaving any existing FFmpeg untouched.
MISTRALRS_BUILD_NCCLMISTRALRS_BUILD_NCCL=1 forces scripts/build_wheels.py to add nccl to Linux CUDA wheels.
MISTRALRS_BUILD_NO_NCCLMISTRALRS_BUILD_NO_NCCL=1 makes scripts/build_wheels.py skip nccl for CUDA wheels.
MISTRALRS_GIT_REVISIONGit revision embedded in the binary by the build script.

Internal

Not intended for direct use.

VariablePurpose
__MISTRALRS_DAEMON_INTERNALSet by the engine on spawned worker processes.