Environment variables - Mistral Rs

User-facing environment variables read by mistralrs or its build scripts. Standard Cargo build variables such as OUT_DIR and TARGET are omitted.

Hugging Face

Variable	Purpose
`HF_HOME`	Root of the Hugging Face cache. Default `~/.cache/huggingface`.
`HF_HUB_CACHE`	Hugging Face hub cache location.
`HF_TOKEN`	Auth token. Overrides any token saved by `mistralrs login` at `$HF_HOME/token`.
`HF_HUB_TOKEN`	Auth token fallback when `HF_TOKEN` is not set.
`HF_HUB_OFFLINE`	Set to `1`/`true`/`yes`/`on` to disable all Hugging Face Hub network calls. Files and listings are then served only from `$HF_HUB_CACHE`/`$HF_HOME/hub`, and a missing file errors out. Also skips the `mistralrs doctor` connectivity check.

If --token-source env:NAME is used, mistral.rs reads the environment variable named by NAME as the token source.

For the offline workflow (pre-downloading models, local paths), see run any model.

Logging

Variable	Purpose
`RUST_LOG`	Override the `tracing` log filter. Examples: `mistralrs_core=debug,tower_http=info`, `trace`. CLI users can usually use `-v` or `-vv` instead.
`MISTRALRS_DEBUG`	`MISTRALRS_DEBUG=1` enables extra debug-level engine tracing.

Quantization and loading

Variable	Purpose
`MISTRALRS_NO_MMAP`	`MISTRALRS_NO_MMAP=1` loads safetensors without mmap.
`MISTRALRS_ISQ_SINGLETHREAD`	If set, runs ISQ (in-situ quantization) single-threaded.

Sandbox

Variable	Purpose
`MISTRALRS_SANDBOX`	`auto`, `on`, or `off`. Overrides the sandbox only when the resolved mode is `auto`; `on` and `off` in CLI/TOML win. See sandbox reference.

Server and UI

Variable	Purpose
`MCP_CONFIG_PATH`	MCP (Model Context Protocol) client configuration path used when `--mcp-config` is not passed.
`KEEP_ALIVE_INTERVAL`	SSE (Server-Sent Events) keep-alive interval in milliseconds. Falls back to the default if missing or invalid.
`XDG_CACHE_HOME`	Base cache directory for web UI state. The UI uses `$XDG_CACHE_HOME/mistralrs`.
`HOME`	Fallback for web UI cache path when `XDG_CACHE_HOME` is not set.

CUDA and attention kernels

Variable	Purpose
`MISTRALRS_CUDA_GRAPHS`	CUDA decode graph capture and replay is enabled by default for supported paged-attention decode steps. Set to `0`, `false`, `no`, or `off` to disable. See CUDA graphs.
`MISTRALRS_FLASHINFER_DECODE`	Set to `0`, `false`, `no`, or `off` to disable the FlashInfer (paged-attention kernel library) paged decode/cache layout and use the generic paged KV-cache layout instead. Defaults to enabled on CUDA when compatible.
`MISTRALRS_NO_MLA`	`MISTRALRS_NO_MLA=1` disables the MLA (Multi-head Latent Attention) path for DeepSeek V2/V3. Generic attention is used instead.
`MISTRALRS_MOE_BACKEND`	Forces the MoE (Mixture of Experts) expert backend: `cutile`, `cutlass`, `fused` (also `wmma`, `native`, `legacy`), or `fast`. Default is automatic selection. See MoE expert backends.
`CUTILE_TILEIRAS_PATH`	Path to a specific `tileiras` binary for the cuTile JIT instead of resolving it from `PATH`.

Multi-GPU and multi-node

Variable	Purpose
`MISTRALRS_NO_NCCL`	`MISTRALRS_NO_NCCL=1` disables NCCL at runtime; single-machine CUDA multi-GPU then falls back to layer mapping. When using the ring backend on a binary also built with `nccl`, set this so the ring backend is selected.
`MISTRALRS_MN_GLOBAL_WORLD_SIZE`	Total NCCL tensor-parallel world size across nodes. Presence of this variable enables multi-node NCCL mode.
`MISTRALRS_MN_LOCAL_WORLD_SIZE`	Local NCCL tensor-parallel size contributed by each node.
`MISTRALRS_MN_HEAD_NUM_WORKERS`	Set on the head node: number of worker nodes.
`MISTRALRS_MN_HEAD_PORT`	Set on the head node: listening port for worker connections.
`MISTRALRS_MN_WORKER_SERVER_ADDR`	Set on worker nodes: address of the head node.
`MISTRALRS_MN_WORKER_ID`	Set on worker nodes: worker index (0-based).
`RING_CONFIG`	Path to the ring backend JSON config. Setting it selects the ring backend when built with the `ring` feature. If the binary also has `nccl`, set `MISTRALRS_NO_NCCL=1` as well.

See the distributed inference guide for use.

GPU memory

Variable	Purpose
`MISTRALRS_IGPU_MEMORY_FRACTION`	Fraction of integrated GPU memory usable on CUDA systems with iGPUs. Default 0.75.

Build-time

These are read by build scripts, not at runtime.

Variable	Purpose
`MISTRALRS_METAL_PRECOMPILE`	`MISTRALRS_METAL_PRECOMPILE=0` skips Metal kernel precompilation at build time; kernels are compiled at runtime on first use.
`CUDA_NVCC_FLAGS`	Extra compiler options passed to CUDA builds.
`MISTRALRS_INSTALL_TAG`	Pins the installers to a specific release tag (e.g. `v0.8.6`): the prebuilt is downloaded from that release, and a source build checks out that git tag. Default is the latest stable release (prebuilt) or latest `master` (source).
`MISTRALRS_INSTALL_FROM_SOURCE`	`MISTRALRS_INSTALL_FROM_SOURCE=1` makes the shell and PowerShell installers skip the prebuilt download and build from the latest `master` (bleeding edge) instead of the latest stable release.
`MISTRALRS_INSTALL_NCCL`	`MISTRALRS_INSTALL_NCCL=1` forces the shell and PowerShell installers to add the `nccl` feature for CUDA builds even if NCCL is not detected.
`MISTRALRS_INSTALL_NO_NCCL`	`MISTRALRS_INSTALL_NO_NCCL=1` makes the shell and PowerShell installers skip the `nccl` feature.
`MISTRALRS_INSTALL_YES`	`MISTRALRS_INSTALL_YES=1` auto-confirms every installer prompt (non-interactive installs for CI/containers; used by `mistralrs update`).
`MISTRALRS_INSTALL_IGNORE_FFMPEG`	`MISTRALRS_INSTALL_IGNORE_FFMPEG=1` skips the installer's FFmpeg step, leaving any existing FFmpeg untouched.
`MISTRALRS_BUILD_NCCL`	`MISTRALRS_BUILD_NCCL=1` forces `scripts/build_wheels.py` to add `nccl` to Linux CUDA wheels.
`MISTRALRS_BUILD_NO_NCCL`	`MISTRALRS_BUILD_NO_NCCL=1` makes `scripts/build_wheels.py` skip `nccl` for CUDA wheels.
`MISTRALRS_GIT_REVISION`	Git revision embedded in the binary by the build script.

Internal

Not intended for direct use.

Variable	Purpose
`__MISTRALRS_DAEMON_INTERNAL`	Set by the engine on spawned worker processes.