Back to Annotated Deep Learning Paper Implementations

Pay Attention to MLPs (gMLP) Experiment

docs/transformers/gmlp/experiment.html

latest3.6 KB
Original Source

hometransformersgmlp

View code on Github

#

Pay Attention to MLPs (gMLP) Experiment

This is an annotated PyTorch experiment to train a gMLP model. The paper also applies a Stochastic Depth regularization where some layers are removed randomly during training. We have not implemented that here.

This is based on training loop and configurations for a simple transformer auto-regressive NLP task.

16fromlabmlimportexperiment17fromlabml.configsimportoption18fromlabml\_nn.transformersimportTransformerConfigs19fromlabml\_nn.transformers.basic.autoregressive\_experimentimportConfigsasBasicAutoRegressionConfigs20fromlabml\_nn.transformers.gmlpimportGMLPBlock

#

Configurations

This inherits from training loop and configurations for a simple transformer auto-regressive NLP task.

23classConfigs(BasicAutoRegressionConfigs):

#

Transformer

32transformer:TransformerConfigs='gMLP'

#

gMLP Block

34gmlp:GMLPBlock

#

d_ffn for gMLP projection layer

36d\_ffn:int=2048

#

Create a gMLP block

39@option(Configs.gmlp,'gMLP')40def\_gmlp\_configs(c:Configs):

#

44returnGMLPBlock(c.d\_model,c.d\_ffn,c.seq\_len)

#

Transformer configurations

47@option(Configs.transformer,'gMLP')48def\_transformer\_configs(c:Configs):

#

We use our configurable transformer implementation

55conf=TransformerConfigs()

#

Set the vocabulary sizes for embeddings and generating logits

57conf.n\_src\_vocab=c.n\_tokens58conf.n\_tgt\_vocab=c.n\_tokens

#

Set model size

60conf.d\_model=c.d\_model

#

Replace the encoder layer with a gMLP layer

62conf.encoder\_layer=c.gmlp6364returnconf

#

67defmain():

#

Create experiment

69experiment.create(name="gMLP")

#

Create configs

71conf=Configs()

#

Override configurations

73experiment.configs(conf,{

#

Use character level tokenizer

75'tokenizer':'character',

#

Prompt separator is blank

77'prompt\_separator':'',

#

Starting prompt for sampling

79'prompt':'It is ',

#

Use Tiny Shakespeare dataset

81'text':'tiny\_shakespeare',

#

Use a context size of 256

84'seq\_len':256,

#

Train for 128 epochs

86'epochs':128,

#

Batch size 32

88'batch\_size':32,

#

Switch between training and validation for 10 times per epoch

91'inner\_iterations':10,

#

Model size

94'd\_model':512,95'd\_ffn':2048,

#

Use Noam optimizer

98'optimizer.optimizer':'Noam',99'optimizer.learning\_rate':1.,100})

#

Set models for saving and loading

103experiment.add\_pytorch\_models({'model':conf.model})

#

Start the experiment

106withexperiment.start():

#

Run training

108conf.run()

#

112if\_\_name\_\_=='\_\_main\_\_':113main()

labml.ai