docs/source/en/model_doc/qwen3_moe.md
This model was released on 2025-04-29 and added to Hugging Face Transformers on 2025-03-31.
<div style="float: right;"> <div class="flex flex-wrap space-x-1"></div>
Qwen3MoE is the mixture-of-experts variant in the Qwen3 family, with 30.5B total parameters and 3.3B active parameters per token. It uses 128 routed experts with 8 activated per token across 48 layers, and supports up to 131K context with YaRN. See also the dense variant Qwen3.
The example below demonstrates how to generate text with [Pipeline] or the [AutoModelForCausalLM] class.
from transformers import pipeline
pipe = pipeline(
task="text-generation",
model="Qwen/Qwen3-30B-A3B",
)
pipe("The key to effective reasoning is")
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-30B-A3B")
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-30B-A3B",
device_map="auto",
)
input_ids = tokenizer("The key to effective reasoning is", return_tensors="pt").to(model.device)
output = model.generate(**input_ids, max_new_tokens=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))
[[autodoc]] Qwen3MoeConfig
[[autodoc]] Qwen3_5MoeVisionConfig
[[autodoc]] Qwen3MoeModel - forward
[[autodoc]] Qwen3MoeForCausalLM - forward
[[autodoc]] Qwen3MoeForSequenceClassification - forward
[[autodoc]] Qwen3MoeForTokenClassification - forward
[[autodoc]] Qwen3MoeForQuestionAnswering - forward