Back to Annotated Deep Learning Paper Implementations

Primer EZ Experiment

docs/transformers/primer_ez/experiment.html

latest4.6 KB
Original Source

hometransformersprimer_ez

View code on Github

#

Primer EZ Experiment

This is an annotated PyTorch experiment to train a Primer EZ transformer.

This is based on our vanilla transformer experiment. We use the same experiment and add the Primer EZ modifications.

15fromlabmlimportexperiment16fromlabml.configsimportoption17fromlabml\_nn.transformersimportTransformerConfigs18fromlabml\_nn.transformers.basic.autoregressive\_experimentimportConfigs19fromlabml\_nn.transformers.configsimportFeedForwardConfigs20fromlabml\_nn.transformers.primer\_ezimportSquaredReLU

#

Add the option of squared ReLU to configurablefeed forward module.

23@option(FeedForwardConfigs.activation,'SquaredReLU')24def\_squared\_relu():

#

30returnSquaredReLU()

#

Add the option of Multi-DConv-Head Attention to configurable transformer

33@option(TransformerConfigs.encoder\_attn,'MultiDConvHeadAttention')34def\_d\_conv\_mha(c:TransformerConfigs):

#

40fromlabml\_nn.transformers.primer\_ezimportMultiDConvHeadAttention41returnMultiDConvHeadAttention(c.n\_heads,c.d\_model,dropout\_prob=c.dropout)

#

Add the option of Multi Depth-wise Shared Conv Head Attention to configurable transformer

šŸ“ This is a variation we tried

44@option(TransformerConfigs.encoder\_attn,'MultiDSharedConvHeadAttention')45def\_d\_shared\_conv\_mha(c:TransformerConfigs):

#

53fromlabml\_nn.transformers.primer\_ez.variationsimportMultiDSharedConvHeadAttention54returnMultiDSharedConvHeadAttention(c.n\_heads,c.d\_model,dropout\_prob=c.dropout)

#

Add the option of Multi Depth-wise Per Head Conv Head Attention to configurable transformer

šŸ“ This is a variation we tried

57@option(TransformerConfigs.encoder\_attn,'MultiDPHConvHeadAttention')58def\_d\_per\_head\_conv\_mha(c:TransformerConfigs):

#

66fromlabml\_nn.transformers.primer\_ez.variationsimportMultiDPHConvHeadAttention67returnMultiDPHConvHeadAttention(c.n\_heads,c.d\_model,dropout\_prob=c.dropout)

#

70defmain():

#

Create experiment

72experiment.create(name="primer\_ez")

#

Create configs

74conf=Configs()

#

Override configurations

76experiment.configs(conf,{

#

Use character level tokenizer

78'tokenizer':'character',

#

Prompt separator is blank

80'prompt\_separator':'',

#

Starting prompt for sampling

82'prompt':'It is ',

#

Use Tiny Shakespeare dataset

84'text':'tiny\_shakespeare',

#

Use a context size of 256

87'seq\_len':256,

#

Train for 128 epochs

89'epochs':128,

#

Batch size 32

91'batch\_size':32,

#

Switch between training and validation for 10 times per epoch

94'inner\_iterations':10,

#

Model size

97'd\_model':512,98'transformer.ffn.d\_ff':2048,

#

Use Adam optimizer

101'optimizer.optimizer':'Adam',102'optimizer.learning\_rate':2.5e-4,

#

ā­ļø Use squared ReLU activation in the feed forward network.

Replace this with ReLU for ReLU.

107'transformer.ffn.activation':'SquaredReLU',

#

ā­ļø Use Multi-DConv-Head Attention for encoder attention.

Replace this with mha for original multi-head attention.

112'transformer.encoder\_attn':'MultiDConvHeadAttention',113})

#

Set models for saving and loading

116experiment.add\_pytorch\_models({'model':conf.model})

#

Start the experiment

119withexperiment.start():

#

Run training

121conf.run()

#

125if\_\_name\_\_=='\_\_main\_\_':126main()

labml.ai