docs/src/content/docs/guides/customize/lora-adapters.mdx
import { Tabs, TabItem } from '@astrojs/starlight/components';
LoRA (Low-Rank Adaptation) adapters add task-specific fine-tuning on top of a base model without modifying the base weights. X-LoRA loads several adapters at once and lets the model select among them per request.
mistral.rs reads adapter_config.json from the LoRA repo for the targeted modules and rank. Pass multiple adapters as a semicolon-separated list.
mistralrs run -m <base-model> --lora <lora-repo>
Multiple adapters:
mistralrs run -m <base-model> --lora "<lora-repo-1>;<lora-repo-2>"
from mistralrs import Runner, Which
runner = Runner(
which=Which.Lora(
model_id="<base-model>",
adapter_model_ids=["<lora-repo-1>", "<lora-repo-2>"],
)
)
use mistralrs::{LoraModelBuilder, TextModelBuilder};
let model = LoraModelBuilder::from_text_model_builder(
TextModelBuilder::new("<base-model>"),
vec!["<lora-repo-1>", "<lora-repo-2>"],
)
.build()
.await?;
Full examples: lora-zephyr (Python), lora (Rust).
X-LoRA loads multiple adapters with a learned scaling head that selects per-token weighting. The ordering file maps adapters to the scaler's output positions.
<Tabs> <TabItem label="CLI">mistralrs run \
-m <base-model> \
--xlora <xlora-repo> \
--xlora-order <ordering-file.json>
Flag rules:
--xlora conflicts with --lora.--xlora-order and --tgt-non-granular-index are only valid alongside --xlora.--tgt-non-granular-index <n> controls how often the X-LoRA scaler recomputes. Without it, the scaler recomputes every token.from mistralrs import Runner, Which
runner = Runner(
which=Which.XLora(
model_id="<base-model>",
xlora_model_id="<xlora-repo>",
order="<ordering-file.json>",
# tgt_non_granular_index=...,
)
)
use std::fs::File;
use mistralrs::{XLoraModelBuilder, TextModelBuilder};
let model = XLoraModelBuilder::from_text_model_builder(
TextModelBuilder::new("<base-model>"),
"<xlora-repo>",
serde_json::from_reader(File::open("<ordering-file.json>")?)?,
)
.build()
.await?;
Full examples: xlora-zephyr (Python), xlora (Rust).
AnyMoE goes a step further than adapters: it composes several fine-tunes of the same base model into a MoE (Mixture of Experts) configuration at inference time, training only a small per-layer router.
AnyMoeModelBuilder) and the Python SDK (AnyMoeConfig, AnyMoeExpertType); it is not configurable via the CLI.AnyMoeConfig docstrings in the AnyMoE Python reference cover finding the prefix/mlp values from a model's model.safetensors.index.json.Full examples: anymoe (Python), anymoe and anymoe-lora (Rust).