docs/source/en/model_doc/zaya.md
This model was contributed to Hugging Face Transformers on 2026-07-01.
ZAYA1 is a 760M active / 8.4B total parameter MoE language model trained by Zyphra. It combines Compressed Convolutional Attention (CCA), a nonlinear ZAYA1 router, and residual scaling.
ZAYA1 uses the Gemma 3 tokenizer. For more details, see the ZAYA1 model card and Zyphra's technical reports.
This model was contributed by JJJYmmm.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Zyphra/ZAYA1-8B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
inputs = tokenizer.apply_chat_template(
[{"role": "user", "content": "Write a haiku about recursion in programming."}],
tokenize=True,
add_generation_prompt=True,
enable_thinking=False,
return_tensors="pt",
)
inputs = inputs.to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
[[autodoc]] ZayaConfig
[[autodoc]] ZayaModel - forward
[[autodoc]] ZayaForCausalLM - forward