Back to Annotated Deep Learning Paper Implementations

An Attention Free Transformer

labml_nn/transformers/aft/readme.md

latest619 B
Original Source

An Attention Free Transformer

This is a PyTorch implementation of the paper An Attention Free Transformer.

This paper replaces the self-attention layer with a new efficient operation, that has memory complexity of O(Td), where T is the sequence length and $d$ is the dimensionality of embeddings.

The paper introduces AFT along with AFT-local and AFT-conv. Here we have implemented AFT-local which pays attention to closeby tokens in an autoregressive model.