docs/source/en/model_doc/ministral3.md
This model was released on 2025-12-01 and added to Hugging Face Transformers on 2025-12-01.
A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.
This model is the instruct post-trained version, fine-tuned for instruction tasks, making it ideal for chat and instruction based use cases.
The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware.
Key features:
import torch
from transformers import Mistral3ForConditionalGeneration, MistralCommonBackend
model_id = "mistralai/Ministral-3-3B-Instruct-2512"
tokenizer = MistralCommonBackend.from_pretrained(model_id)
model = Mistral3ForConditionalGeneration.from_pretrained(
model_id, device_map="auto"
)
image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438"
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.",
},
{"type": "image_url", "image_url": {"url": image_url}},
],
},
]
tokenized = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to(model.device)
tokenized["input_ids"] = tokenized["input_ids"].to(device="cuda")
tokenized["pixel_values"] = tokenized["pixel_values"].to(dtype=torch.bfloat16, device="cuda")
image_sizes = [tokenized["pixel_values"].shape[-2:]]
output = model.generate(
**tokenized,
image_sizes=image_sizes,
max_new_tokens=512,
)[0]
decoded_output = tokenizer.decode(output[len(tokenized["input_ids"][0]):])
print(decoded_output)
[[autodoc]] Ministral3Config
[[autodoc]] Ministral3PreTrainedModel - forward
[[autodoc]] Ministral3Model - forward
[[autodoc]] Ministral3ForCausalLM
[[autodoc]] Ministral3ForSequenceClassification
[[autodoc]] Ministral3ForTokenClassification
[[autodoc]] Ministral3ForQuestionAnswering