This model was contributed to Hugging Face Transformers on 2026-07-01.

ZAYA

Overview

ZAYA1 is a 760M active / 8.4B total parameter MoE language model trained by Zyphra. It combines Compressed Convolutional Attention (CCA), a nonlinear ZAYA1 router, and residual scaling.

ZAYA1 uses the Gemma 3 tokenizer. For more details, see the ZAYA1 model card and Zyphra's technical reports.

This model was contributed by JJJYmmm.

Usage examples

python

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Zyphra/ZAYA1-8B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

inputs = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Write a haiku about recursion in programming."}],
    tokenize=True,
    add_generation_prompt=True,
    enable_thinking=False,
    return_tensors="pt",
)
inputs = inputs.to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

ZayaConfig

[[autodoc]] ZayaConfig

ZayaModel

[[autodoc]] ZayaModel - forward

ZayaForCausalLM

[[autodoc]] ZayaForCausalLM - forward