templates/adding_a_new_model/README.md
FasterTransformer welcomes the community contributions. This documents describe how to add a new model, adding a new feature or optimize some kernels.
If a contributor has a transformer-based, and FasterTransformer still does not support, he/she can follow this guide to add the model in FasterTransformer. Here, we use Longformer as an example.
longformer folder in src/fastertransformer/models/.src/fastertransformer/layers. The file name can be LongformerAttentionLayer.
Encoder.cc and Bert.cc are the positions of layer normalization. We should reuse the attention layer, feed forward network layer and layer normalization kernel to create a new class Encoder, but not modify the Bert class to fit the Encoder.src/fastertransformer/models/longformer. The file name can be Longformer.tensorflow/bert/bert_example.py is ok. An task example like tensorflow/bert/run_squad_wrap.py is better. The example code can be cpp, TensorFlow or PyTorch (should put in examples/cpp/longformer, examples/tensorflow/longformer and examples.pytorch/longformer respectively). The requirement is that other users can use this example code to check the correctness.Assume we have a new layer normalization kernel, which provides better performance than current layer normalization kernel, called inovkeLayerNorm.
src/fastertransformer/kernels/layernorm_kernels.cu). Otherwise, create a new file in src/fastertransformer/kernels/. The function can be invokeLayerNormV2 (simplest method to distinguish with current kernel) or invokeLayerNormWithoutBlockReduction, where BlockReduction is a method to accelerate the kernel and we can distinguish it from current one.| Batch_size | Seq_len | Precision | FT old layernorm Latency (ms) | FT new layernorm Latency (ms) | Speedup | |:----------:|:-------:|:---------:|:-----------------------------------:|:-----------------------------------:|:-------:| | 1 | 32 | FP16 | 2.57 | 1.87 | 1.30 | | 1 | 128 | FP16 | 5.37 | 4.70 | 2.10 | | 1 | 384 | FP16 | 7.39 | 6.61 | 0.81 | | 8 | 32 | FP16 | 5.26 | 4.59 | 1.13 | | 8 | 128 | FP16 | 13.29 | 12.54 | 1.89 | | 8 | 384 | FP16 | 38.07 | 36.66 | 1.71 | | 32 | 32 | FP16 | 13.78 | 13.24 | 1.79 | | 32 | 128 | FP16 | 45.90 | 45.02 | 1.86 | | 32 | 384 | FP16 | 150.26 | 143.41 | 1.78 |
Contributor only needs to shows the performance on some cases. We will review and test on other framework/GPUs if the modification makes sense.
BertLayer.cc only contains the BertLayer class._, like cuda_utils.h.invokeLayerNorm._, like batch_size.