docs/transformers/alibi/experiment.html
This is an annotated PyTorch experiment to train a ALiBi model.
This is based on our GPT model.
14importtorch15fromtorch.utils.dataimportDataLoader1617fromlabmlimportexperiment,tracker18fromlabml.configsimportoption,calculate19fromlabml\_nn.helpers.datasetsimportSequentialUnBatchedDataset20fromlabml\_nn.transformers.alibiimportAlibiMultiHeadAttention21fromlabml\_nn.experiments.nlp\_autoregressionimporttranspose\_batch22fromlabml\_nn.transformersimportTransformerConfigs23fromlabml\_nn.transformers.gptimportConfigsasGPTConfigs
We extend GPT configurations and change the attention mechanism.
26classConfigs(GPTConfigs):
ALiBi based transformer (defined below)
34transformer:TransformerConfigs='GPT\_ALiBi'
Longer validation set
36valid\_seq\_len:int=12837valid\_loader='shuffled\_longer\_valid\_loader'
Log losses at the initial and final tokens
39defother\_metrics(self,output:torch.Tensor,target:torch.Tensor):
If there are more tokens that the training sequence length (during validation),
44ifself.seq\_len\<output.shape[0]:
Log the loss at training sequence length
46tracker.add(f'loss.{self.seq\_len - 1}.',self.loss\_func(output[self.seq\_len-1],target[self.seq\_len-1]))
Log the loss at the first token
48tracker.add(f'loss.0.',self.loss\_func(output[0],target[0]))
Log the loss at the final token
50tracker.add(f'loss.{int(output.shape[0]) - 1}.',self.loss\_func(output[-1],target[-1]))
Create an ALiBi attention module
53def\_alibi\_mha(c:TransformerConfigs):
57returnAlibiMultiHeadAttention(c.n\_heads,c.d\_model,dropout\_prob=c.dropout)
Set all attention mechanisms to ALiBi
61calculate(TransformerConfigs.encoder\_attn,'alibi\_mha',\_alibi\_mha)62calculate(TransformerConfigs.decoder\_attn,'alibi\_mha',\_alibi\_mha)63calculate(TransformerConfigs.decoder\_mem\_attn,'alibi\_mha',\_alibi\_mha)
Shuffled validation data loader with valid_seq_len sequence length
66@option(Configs.valid\_loader)67defshuffled\_longer\_valid\_loader(c:Configs):
71returnDataLoader(SequentialUnBatchedDataset(text=c.text.valid,72dataset=c.text,73seq\_len=c.valid\_seq\_len),74batch\_size=c.batch\_size,75collate\_fn=transpose\_batch,76shuffle=True)
79@option(Configs.transformer,'GPT\_ALiBi')80def\_transformer\_configs(c:Configs):
We use our configurable transformer implementation
87conf=TransformerConfigs()
Set the vocabulary sizes for embeddings and generating logits
89conf.n\_src\_vocab=c.n\_tokens90conf.n\_tgt\_vocab=c.n\_tokens
GPT uses GELU activation for position wise feedforward
92conf.ffn.activation='GELU'
ALiBi doesn't use positional embeddings
95conf.src\_embed='no\_pos'96conf.tgt\_embed='no\_pos'
Set all attention mechanisms to ALiBi
99conf.encoder\_attn='alibi\_mha'100conf.decoder\_attn='alibi\_mha'101conf.decoder\_mem\_attn='alibi\_mha'
104returnconf
107defmain():
Create experiment
109experiment.create(name="gpt\_alibi")
Create configs
111conf=Configs()
Override configurations
113experiment.configs(conf,{
Use character level tokenizer
115'tokenizer':'character',
Prompt separator is blank
117'prompt\_separator':'',
Starting prompt for sampling
119'prompt':'It is ',
Use Tiny Shakespeare dataset
121'text':'tiny\_shakespeare',
'text': 'tiny_shakespeare_no_split',
Use a context size of 128
125'seq\_len':64,
Use a context size of 128
127'valid\_seq\_len':80,
Train for 32 epochs
129'epochs':128,
Batch size 128
131'batch\_size':128,
Switch between training and validation for 10 times per epoch
134'inner\_iterations':10,
Transformer configurations
137'transformer.d\_model':128,138'transformer.ffn.d\_ff':512,139'transformer.n\_heads':8,140'transformer.n\_layers':4,141'transformer.dropout':0.1,142})
Set models for saving and loading
145experiment.add\_pytorch\_models({'model':conf.model})
Start the experiment
148withexperiment.start():
Run training
150conf.run()
154if\_\_name\_\_=='\_\_main\_\_':155main()