Back to Annotated Deep Learning Paper Implementations

Gated Linear Units and Variants

docs/transformers/glu_variants/experiment.html

latest4.2 KB
Original Source

hometransformersglu_variants

View code on Github

#

Gated Linear Units and Variants

This trains a simple transformer model for auto-regression. We try different variants for the position-wise feedforward network. The reusable & configurable are defined in configs.py.

16importtorch17fromlabmlimportexperiment18fromlabml.configsimportoption19fromlabml.utils.pytorchimportget\_modules20fromlabml\_nn.experiments.nlp\_autoregressionimportNLPAutoRegressionConfigs21fromlabml\_nn.transformersimportEncoder,Generator,TransformerConfigs22fromlabml\_nn.transformers.utilsimportsubsequent\_mask23fromtorchimportnn

#

Auto regressive model

26classAutoregressiveModel(nn.Module):

#

31def\_\_init\_\_(self,src\_embed:nn.Module,encoder:Encoder,generator:Generator):32super().\_\_init\_\_()

#

Token embedding module

34self.src\_embed=src\_embed

#

Transformer based encoder

36self.encoder=encoder

#

Next token generation layer; this give logits of the the next token

39self.generator=generator

#

This will be initialized on the first call

41self.src\_mask=None

#

43defforward(self,src:torch.Tensor):

#

Create subsequent mask, so that the transformer can only pay attention to past tokens.

45ifself.src\_maskisNoneorself.src\_mask.size(0)!=len(src):46self.src\_mask=subsequent\_mask(len(src)).to(src.device)

#

Embed the tokens (src ) and run it through the the transformer

48res=self.encoder(self.src\_embed(src),self.src\_mask)

#

Generate logits of the next token

50returnself.generator(res),None

#

Configurations

The default configs can and will be over-ridden when we start the experiment

53classConfigs(NLPAutoRegressionConfigs):

#

60transformer:TransformerConfigs61model:AutoregressiveModel

#

Initialize the auto-regressive model

64@option(Configs.model)65defautoregressive\_model(c:Configs):

#

69m=AutoregressiveModel(c.transformer.src\_embed,c.transformer.encoder,c.transformer.generator)70returnm.to(c.device)

#

Initialize the configurable transformer encoder for our autoregressive model.

73@option(Configs.transformer)74deftransformer\_c(c:Configs):

#

78tc=TransformerConfigs()79tc.n\_src\_vocab=c.n\_tokens80tc.n\_tgt\_vocab=c.n\_tokens8182returntc

#

85defmain():

#

Create experiment

87experiment.create(name="glu\_variants")

#

Create configs

89conf=Configs()

#

Load configurations

91experiment.configs(conf,

#

A dictionary of configurations to override

93{'tokenizer':'character',94'prompt\_separator':'',95'prompt':'It is ',96'text':'tiny\_shakespeare',9798'optimizer.optimizer':'Noam',99'optimizer.learning\_rate':1.,100'optimizer.d\_model':256,101102'seq\_len':1024,103'epochs':128,104'batch\_size':6,105'inner\_iterations':10,

#

GLU Variant, one of GLU, Bilinear, ReGLU, GEGLU, SwiGLU

These are defined in the configurable FFN implementation

111'transformer.ffn.glu\_variant':'Bilinear',

#

Transformer configurations

114'transformer.d\_model':256,115'transformer.ffn.d\_ff':1024,116'transformer.n\_heads':8,117'transformer.n\_layers':6})

#

This is needed to initialize models

120conf.n\_tokens=conf.text.n\_tokens

#

Set models for saving and loading

123experiment.add\_pytorch\_models(get\_modules(conf))

#

Start the experiment

126withexperiment.start():

#

TrainValidConfigs.run

128conf.run()129130131if\_\_name\_\_=='\_\_main\_\_':132main()

labml.ai