docs/transformers/glu_variants/experiment.html
This trains a simple transformer model for auto-regression. We try different variants for the position-wise feedforward network. The reusable & configurable are defined in configs.py.
16importtorch17fromlabmlimportexperiment18fromlabml.configsimportoption19fromlabml.utils.pytorchimportget\_modules20fromlabml\_nn.experiments.nlp\_autoregressionimportNLPAutoRegressionConfigs21fromlabml\_nn.transformersimportEncoder,Generator,TransformerConfigs22fromlabml\_nn.transformers.utilsimportsubsequent\_mask23fromtorchimportnn
26classAutoregressiveModel(nn.Module):
31def\_\_init\_\_(self,src\_embed:nn.Module,encoder:Encoder,generator:Generator):32super().\_\_init\_\_()
Token embedding module
34self.src\_embed=src\_embed
Transformer based encoder
36self.encoder=encoder
Next token generation layer; this give logits of the the next token
39self.generator=generator
This will be initialized on the first call
41self.src\_mask=None
43defforward(self,src:torch.Tensor):
Create subsequent mask, so that the transformer can only pay attention to past tokens.
45ifself.src\_maskisNoneorself.src\_mask.size(0)!=len(src):46self.src\_mask=subsequent\_mask(len(src)).to(src.device)
Embed the tokens (src ) and run it through the the transformer
48res=self.encoder(self.src\_embed(src),self.src\_mask)
Generate logits of the next token
50returnself.generator(res),None
The default configs can and will be over-ridden when we start the experiment
53classConfigs(NLPAutoRegressionConfigs):
60transformer:TransformerConfigs61model:AutoregressiveModel
Initialize the auto-regressive model
64@option(Configs.model)65defautoregressive\_model(c:Configs):
69m=AutoregressiveModel(c.transformer.src\_embed,c.transformer.encoder,c.transformer.generator)70returnm.to(c.device)
Initialize the configurable transformer encoder for our autoregressive model.
73@option(Configs.transformer)74deftransformer\_c(c:Configs):
78tc=TransformerConfigs()79tc.n\_src\_vocab=c.n\_tokens80tc.n\_tgt\_vocab=c.n\_tokens8182returntc
85defmain():
Create experiment
87experiment.create(name="glu\_variants")
Create configs
89conf=Configs()
Load configurations
91experiment.configs(conf,
A dictionary of configurations to override
93{'tokenizer':'character',94'prompt\_separator':'',95'prompt':'It is ',96'text':'tiny\_shakespeare',9798'optimizer.optimizer':'Noam',99'optimizer.learning\_rate':1.,100'optimizer.d\_model':256,101102'seq\_len':1024,103'epochs':128,104'batch\_size':6,105'inner\_iterations':10,
GLU Variant, one of GLU, Bilinear, ReGLU, GEGLU, SwiGLU
These are defined in the configurable FFN implementation
111'transformer.ffn.glu\_variant':'Bilinear',
Transformer configurations
114'transformer.d\_model':256,115'transformer.ffn.d\_ff':1024,116'transformer.n\_heads':8,117'transformer.n\_layers':6})
This is needed to initialize models
120conf.n\_tokens=conf.text.n\_tokens
Set models for saving and loading
123experiment.add\_pytorch\_models(get\_modules(conf))
Start the experiment
126withexperiment.start():
TrainValidConfigs.run
128conf.run()129130131if\_\_name\_\_=='\_\_main\_\_':132main()