Back to Mistral Rs

mistralrs — Blazing-Fast LLM Inference in Rust

mistralrs/README.md

0.8.02.8 KB
Original Source

mistralrs — Blazing-Fast LLM Inference in Rust

The Rust SDK for mistral.rs, a high-performance LLM inference engine supporting text, multimodal, speech, image generation, and embedding models.

API Docs | GitHub | Examples | Discord

Quick Start

rust
use mistralrs::{IsqBits, ModelBuilder, TextMessages, TextMessageRole};

#[tokio::main]
async fn main() -> mistralrs::error::Result<()> {
    let model = ModelBuilder::new("Qwen/Qwen3-4B")
        .with_auto_isq(IsqBits::Four)
        .build()
        .await?;

    let response = model.chat("What is Rust's ownership model?").await?;
    println!("{response}");
    Ok(())
}

Capabilities

CapabilityBuilderExample
Any model (auto-detect)ModelBuilderexamples/getting_started/text_generation/
Text generationTextModelBuilderexamples/getting_started/text_generation/
Multimodal (image+text)MultimodalModelBuilderexamples/getting_started/multimodal/
GGUF quantized modelsGgufModelBuilderexamples/getting_started/gguf/
Image generationDiffusionModelBuilderexamples/models/diffusion/
Speech synthesisSpeechModelBuilderexamples/models/speech/
EmbeddingsEmbeddingModelBuilderexamples/getting_started/embedding/
Structured outputModel::generate_structuredexamples/advanced/json_schema/
Tool callingTool, ToolChoiceexamples/advanced/tools/
AgentsAgentBuilderexamples/advanced/agent/
LoRA / X-LoRALoraModelBuilder, XLoraModelBuilderexamples/advanced/lora/
AnyMoEAnyMoeModelBuilderexamples/advanced/anymoe/
MCP clientMcpClientConfigexamples/advanced/mcp_client/

Choosing a Request Type

TypeUse WhenSampling
TextMessagesSimple text-only chatDeterministic
MultimodalMessagesPrompt includes images or audioDeterministic
RequestBuilderTools, logprobs, custom sampling, constraints, adapters, or web searchConfigurable

TextMessages and MultimodalMessages convert into RequestBuilder via Into<RequestBuilder>.

Feature Flags

FlagEffect
cudaCUDA GPU support
flash-attnFlash Attention 2 kernels (requires cuda)
cudnncuDNN acceleration (requires cuda)
ncclMulti-GPU via NCCL (requires cuda)
metalApple Metal GPU support
accelerateApple Accelerate framework
mklIntel MKL acceleration

The default feature set (no flags) builds with pure Rust — no C compiler or system libraries required.

License

MIT