docs/source/en/api/models/ace_step_transformer.md
A 1D Diffusion Transformer for music generation from ACE-Step 1.5. The model operates on the 25 Hz stereo latents produced by [AutoencoderOobleck] using flow matching, and is trained with a Qwen3-derived backbone (grouped-query attention, rotary position embedding, RMSNorm, AdaLN-Zero timestep conditioning) plus cross-attention to the text / lyric / timbre conditions built by AceStepConditionEncoder.
[[autodoc]] AceStepTransformer1DModel