docs/transformers/rope/value_pe/experiment.html
This is an annotated PyTorch experiment to train a transformer model with Rotary Positional Embeddings (RoPE).
12fromlabmlimportexperiment13fromlabml.configsimportcalculate14fromlabml\_nn.transformersimportTransformerConfigs15fromlabml\_nn.transformers.rope.experimentimportConfigsasRoPEConfigs
20classConfigs(RoPEConfigs):# , ArithmeticAutoregression):21pass
24def\_rotary\_value\_pe\_mha(c:TransformerConfigs):25fromlabml\_nn.transformers.rope.value\_peimportRotaryValuePEMultiHeadAttention26returnRotaryValuePEMultiHeadAttention(c.n\_heads,c.d\_model,1.,1.)
Configuration options
30calculate(TransformerConfigs.encoder\_attn,'rotary\_value',\_rotary\_value\_pe\_mha)31calculate(TransformerConfigs.decoder\_attn,'rotary\_value',\_rotary\_value\_pe\_mha)32calculate(TransformerConfigs.decoder\_mem\_attn,'rotary\_value',\_rotary\_value\_pe\_mha)
35defmain():
Create experiment
37experiment.create(name="rotary\_shakespeare",comment="rotary value",writers={'screen','labml'})
Create configs
39conf=Configs()
Override configurations
41experiment.configs(conf,{
No fixed positional embeddings
43'transformer.src\_embed':'no\_pos',44'transformer.tgt\_embed':'no\_pos',
Encoder with RoPE
47'transformer.encoder\_attn':'rotary\_value',
'transformer.encoder_attn': 'rotary',
51'model':'rotary\_pe\_transformer',
Use character level tokenizer
54'tokenizer':'character',
Prompt separator is blank
56'prompt\_separator':'',
Starting prompt for sampling
58'prompt':'It is ',
Use Tiny Shakespeare dataset
60'text':'tiny\_shakespeare',
Use a context size of 256
63'seq\_len':512,
Train for 32 epochs
65'epochs':24,
Batch size 4
67'batch\_size':16,
Switch between training and validation for 10 times per epoch
70'inner\_iterations':4,
Model size
73'd\_model':128,74'transformer.ffn.d\_ff':512,75'transformer.n\_heads':4,76'transformer.dropout':0.0,
Use Adam optimizer
79'optimizer.optimizer':'Adam',80'optimizer.learning\_rate':2.5e-4,8182'dataloader\_shuffle\_with\_replacement':True83})
Set models for saving and loading
86experiment.add\_pytorch\_models({'model':conf.model})
Start the experiment
89withexperiment.start():
Run training
91conf.run()
95if\_\_name\_\_=='\_\_main\_\_':96main()