examples/xformers/README.md
xFormers is a xFormers is a modular library for flexibly generating transformer architectures with interoperable and optimized building blocks. The current integration allows for FairSeq users to use an attention variant available in the xFormers repository.
In order to enable xFormers, all that needs to be passed in is a string representing an xFormers attention config.
The various attention variants can be found here. These include sparse attention and blocksparse attention.
For example, you could pass in the following args:
decoder_xformers_att_config = '{"name": "scaled_dot_product"}'
encoder_xformers_att_config = '{"name": "linformer", "seq_len": "256"}'
In order to use blocksparse attention you would have to additionally pass in a blocksparse layout and blocksize. For example:
xformers_att_config = '{"name": "scaled_dot_product"}'
xformers_blocksparse_blocksize = 16
xformers_blocksparse_layout = torch.ones(
seq_len // xformers_blocksparse_blocksize,
seq_len // xformers_blocksparse_blocksize,
)
xf_blocksparse_mha = (
MultiheadAttention(
embedding,
num_heads,
dropout=0.0,
add_zero_attn=add_zero_attn,
xformers_att_config=xformers_att_config,
xformers_blocksparse_layout=xformers_blocksparse_layout,
xformers_blocksparse_blocksize=xformers_blocksparse_blocksize,
)
The xFormers repository currenlty has benchmarks on the runtime and memory usage of the various attentions.