Back to Mistral Rs

Cargo features

docs/src/content/docs/reference/cargo-features.md

0.8.133.3 KB
Original Source

mistral.rs uses Cargo features to gate platform-specific and optional functionality.

Accelerator features

FeatureCratesPurpose
cudamistralrs-cli, mistralrs, mistralrs-core, mistralrs-server-coreNVIDIA GPU support via CUDA, including CUDA paged attention and FlashInfer (paged-attention kernel library) paged kernels.
cudnnas abovecuDNN-accelerated kernels.
flash-attnas aboveFlash attention v2 (Ampere+, requires cuda).
flash-attn-v3mistralrs-cli, mistralrs-core, mistralrs-server-coreFlash attention v3 (Hopper, requires cuda). Not exposed by the top-level mistralrs crate.
cutilemistralrs-cli, mistralrs-corecuTile JIT MoE (Mixture of Experts) kernels (fastest MoE backend on Ampere and Blackwell). Requires CUDA >= 13.1 to build and the tileiras assembler at runtime. Without it, MoE models fall back to the built-in CUTLASS (NVIDIA GEMM template library) kernels. See MoE expert backends. Not exposed by the top-level mistralrs crate.
metalas aboveApple Silicon GPU support via Metal.
accelerateas aboveApple Accelerate framework for CPU math.
mklas aboveIntel MKL for CPU math.
ncclmistralrs-cli, mistralrs, mistralrs-core, mistralrs-server-coreNCCL single-machine CUDA multi-GPU support. Requires the NCCL runtime library at build and runtime.

Typical combinations:

  • NVIDIA Hopper: cuda flash-attn flash-attn-v3 cudnn
  • NVIDIA Ampere or Ada: cuda flash-attn cudnn (add cutile for the fastest MoE backend; needs CUDA >= 13.1)
  • NVIDIA Blackwell with CUDA >= 13.1: cuda flash-attn cudnn cutile
  • NVIDIA older: cuda cudnn
  • Apple Silicon: metal
  • Intel CPU with MKL: mkl

For Linux CUDA multi-GPU, add nccl when NCCL is installed. The Linux installer and CUDA wheel builder add it automatically when they detect libnccl.

Functional features

FeatureCratesPurpose
code-executionmistralrs-cli, mistralrs, mistralrs-core, mistralrs-server-corePython code execution tool. In mistralrs-cli defaults.
ringas aboveMulti-machine ring distributed inference.
swagger-uimistralrs-server-coreMounts Swagger UI on the HTTP server. On by default in mistralrs-server-core.

Enabling features

From cargo install:

bash
cargo install mistralrs-cli --features "cuda nccl flash-attn cudnn"

From a source checkout:

bash
cargo install --path mistralrs-cli --features "cuda nccl flash-attn cudnn"

In a consumer crate depending on mistralrs:

toml
[dependencies]
mistralrs = { version = "0.8", features = ["cuda", "nccl", "flash-attn", "cudnn"] }

Default features

mistralrs-cli's default feature is code-execution. mistralrs-server-core's default feature is swagger-ui. To exclude defaults, use --no-default-features.

No crate enables an accelerator feature by default. Opt in to the accelerator matching your hardware.

Feature verification

mistralrs doctor prints a Build features: line listing the compiled-in accelerator features (cuda, metal, cudnn, flash-attn, flash-attn-v3, accelerate, mkl). Other features such as cutile, nccl, ring, code-execution, and swagger-ui are not shown on that line.