Using xFormers with FairSeq

xFormers is a xFormers is a modular library for flexibly generating transformer architectures with interoperable and optimized building blocks. The current integration allows for FairSeq users to use an attention variant available in the xFormers repository.

In order to enable xFormers, all that needs to be passed in is a string representing an xFormers attention config.

The various attention variants can be found here. These include sparse attention and blocksparse attention.

For example, you could pass in the following args:

python

decoder_xformers_att_config = '{"name": "scaled_dot_product"}'

encoder_xformers_att_config = '{"name": "linformer", "seq_len": "256"}'

In order to use blocksparse attention you would have to additionally pass in a blocksparse layout and blocksize. For example:

python


 xformers_att_config = '{"name": "scaled_dot_product"}'
 xformers_blocksparse_blocksize = 16
 xformers_blocksparse_layout = torch.ones(
     seq_len // xformers_blocksparse_blocksize,
     seq_len // xformers_blocksparse_blocksize,
 )

xf_blocksparse_mha = (
       MultiheadAttention(
           embedding,
           num_heads,
           dropout=0.0,
           add_zero_attn=add_zero_attn,
           xformers_att_config=xformers_att_config,
           xformers_blocksparse_layout=xformers_blocksparse_layout,
           xformers_blocksparse_blocksize=xformers_blocksparse_blocksize,
       )

The xFormers repository currenlty has benchmarks on the runtime and memory usage of the various attentions.