docs/QWEN3_5.md
The Qwen 3.5 models are vision-language models using a hybrid Gated Delta Network (GDN) + full attention architecture. Both dense and MoE (Mixture of Experts) variants are supported:
Qwen/Qwen3.5-27BQwen/Qwen3.5-35B-A3B, Qwen/Qwen3.5-397B-A17BMistral.rs supports the Qwen 3.5 multimodal model family with examples in the Rust, Python, and HTTP APIs. ISQ quantization is supported to allow running the model with less memory requirements. MoE variants also support MoQE via the --organization moqe flag.
UQFF quantizations are also available.
The Python and HTTP APIs support sending images as:
The Rust SDK takes an image from the image crate.
Note: When using device mapping or model topology, only the text model and its layers will be managed. This is because it contains most of the model parameters.
Mistral.rs supports interactive mode for multimodal models! It is an easy way to interact with the model.
Start up interactive mode with the Qwen 3.5 model (dense):
mistralrs run multimodal -m Qwen/Qwen3.5-27B
Or with the MoE variant:
mistralrs run multimodal -m Qwen/Qwen3.5-35B-A3B
You can find this example here.
We support an OpenAI compatible HTTP API for multimodal models. This example demonstrates sending a chat completion request with an image.
Note: The image_url may be either a path, URL, or a base64 encoded string.
mistralrs serve multimodal -p 1234 -m Qwen/Qwen3.5-27B
from openai import OpenAI
client = OpenAI(api_key="foobar", base_url="http://localhost:1234/v1/")
completion = client.chat.completions.create(
model="default",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://www.garden-treasures.com/cdn/shop/products/IMG_6245.jpg"
},
},
{
"type": "text",
"text": "What type of flower is this? Give some fun facts.",
},
],
},
],
max_tokens=256,
frequency_penalty=1.0,
top_p=0.1,
temperature=0,
)
resp = completion.choices[0].message.content
print(resp)
You can find this example here.
use anyhow::Result;
use mistralrs::{IsqType, TextMessageRole, MultimodalMessages, MultimodalModelBuilder};
#[tokio::main]
async fn main() -> Result<()> {
let model = MultimodalModelBuilder::new("Qwen/Qwen3.5-27B")
.with_isq(IsqType::Q4K)
.with_logging()
.build()
.await?;
let bytes = match reqwest::blocking::get(
"https://www.nhmagazine.com/content/uploads/2019/05/mtwashingtonFranconia-2-19-18-108-Edit-Edit.jpg",
) {
Ok(http_resp) => http_resp.bytes()?.to_vec(),
Err(e) => anyhow::bail!(e),
};
let image = image::load_from_memory(&bytes)?;
let messages = MultimodalMessages::new().add_image_message(
TextMessageRole::User,
"What is this?",
vec![image],
);
let response = model.send_chat_request(messages).await?;
println!("{}", response.choices[0].message.content.as_ref().unwrap());
dbg!(
response.usage.avg_prompt_tok_per_sec,
response.usage.avg_compl_tok_per_sec
);
Ok(())
}
You can find this example here.
This example demonstrates loading and sending a chat completion request with an image.
Note: the image_url may be either a path, URL, or a base64 encoded string.
from mistralrs import Runner, Which, ChatCompletionRequest, MultimodalArchitecture
# Dense variant
MODEL_ID = "Qwen/Qwen3.5-27B"
runner = Runner(
which=Which.MultimodalPlain(
model_id=MODEL_ID,
arch=MultimodalArchitecture.Qwen3_5,
),
)
# For MoE variant, use:
# MODEL_ID = "Qwen/Qwen3.5-35B-A3B"
# runner = Runner(
# which=Which.MultimodalPlain(
# model_id=MODEL_ID,
# arch=MultimodalArchitecture.Qwen3_5Moe,
# ),
# )
res = runner.send_chat_completion_request(
ChatCompletionRequest(
model="default",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://www.garden-treasures.com/cdn/shop/products/IMG_6245.jpg"
},
},
{
"type": "text",
"text": "What type of flower is this? Give some fun facts.",
},
],
}
],
max_tokens=256,
presence_penalty=1.0,
top_p=0.1,
temperature=0.1,
)
)
print(res.choices[0].message.content)
print(res.usage)