docs/src/content/docs/guides/rust/embed-in-axum.md
To add mistral.rs to an existing Axum app, mount the mistralrs router under a sub-path. The pattern uses two builders from mistralrs-server-core:
MistralRsForServerBuilder constructs the engine state (SharedMistralRsState = Arc<MistralRs>, used later for custom handlers).MistralRsServerRouterBuilder produces an Axum Router from that state.[dependencies]
anyhow = "1"
mistralrs-core = "0.8"
mistralrs-server-core = "0.8"
axum = "0.8"
tokio = { version = "1", features = ["full"] }
The high-level mistralrs crate is not needed here; the server builders take a ModelSelected from mistralrs-core directly.
use axum::{Router, routing::get};
use mistralrs_core::{AutoDeviceMapParams, ModelDType, ModelSelected};
use mistralrs_server_core::{
mistralrs_for_server_builder::MistralRsForServerBuilder,
mistralrs_server_router_builder::MistralRsServerRouterBuilder,
};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let model = ModelSelected::Plain {
model_id: "Qwen/Qwen3-4B".into(),
tokenizer_json: None,
arch: None,
dtype: ModelDType::Auto,
topology: None,
organization: None,
write_uqff: None,
from_uqff: None,
imatrix: None,
calibration_file: None,
max_seq_len: AutoDeviceMapParams::DEFAULT_MAX_SEQ_LEN,
max_batch_size: AutoDeviceMapParams::DEFAULT_MAX_BATCH_SIZE,
hf_cache_path: None,
matformer_config_path: None,
matformer_slice_name: None,
};
let shared_mistralrs = MistralRsForServerBuilder::new()
.with_model(model)
.with_in_situ_quant("4".to_string())
.build()
.await?;
let mistralrs_router = MistralRsServerRouterBuilder::new()
.with_mistralrs(shared_mistralrs)
.build()
.await?;
let app = Router::new()
.route("/", get(|| async { "My app" }))
.nest("/ai", mistralrs_router);
let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await?;
axum::serve(listener, app).await?;
Ok(())
}
POST /ai/v1/chat/completions then behaves identically to the standalone server, as do the other routes.
with_in_situ_quant("4") applies ISQ (in-situ quantization) to 4-bit; omit it to run the model unquantized.
ModelSelected names every field, so this literal will not compile when new fields are added. For the current field list, see the docs.rs ModelSelected entry or the mistralrs-server-core crate-level docs.
MistralRsServerRouterBuilder exposes:
with_include_swagger_routes(bool)with_base_path(&str)with_allowed_origins(Vec<String>)with_max_body_limit(usize)with_max_tool_rounds(usize)with_tool_dispatch_url(String)with_agent_permission(AgentPermission) and with_code_execution_permission(CodeExecutionPermission)MistralRsForServerBuilder exposes engine-level options (with_model, with_in_situ_quant, set_paged_attn, with_seed, multi-model via add_model, etc.).
For custom request shapes, share the SharedMistralRsState directly with Axum handlers and use the lower-level helpers exposed by mistralrs-server-core (chat_completion::parse_request, handler_core::send_request, ...).
A complete example with custom OpenAPI integration is in the mistralrs-server-core crate-level documentation.