Back to Annotated Deep Learning Paper Implementations

Rotary Positional Embeddings (RoPE) Experiment

docs/transformers/rope/value_pe/experiment.html

latest3.2 KB
Original Source

hometransformersropevalue_pe

View code on Github

#

Rotary Positional Embeddings (RoPE) Experiment

This is an annotated PyTorch experiment to train a transformer model with Rotary Positional Embeddings (RoPE).

12fromlabmlimportexperiment13fromlabml.configsimportcalculate14fromlabml\_nn.transformersimportTransformerConfigs15fromlabml\_nn.transformers.rope.experimentimportConfigsasRoPEConfigs

#

Rotary PE attention

#

20classConfigs(RoPEConfigs):# , ArithmeticAutoregression):21pass

#

24def\_rotary\_value\_pe\_mha(c:TransformerConfigs):25fromlabml\_nn.transformers.rope.value\_peimportRotaryValuePEMultiHeadAttention26returnRotaryValuePEMultiHeadAttention(c.n\_heads,c.d\_model,1.,1.)

#

Configuration options

30calculate(TransformerConfigs.encoder\_attn,'rotary\_value',\_rotary\_value\_pe\_mha)31calculate(TransformerConfigs.decoder\_attn,'rotary\_value',\_rotary\_value\_pe\_mha)32calculate(TransformerConfigs.decoder\_mem\_attn,'rotary\_value',\_rotary\_value\_pe\_mha)

#

35defmain():

#

Create experiment

37experiment.create(name="rotary\_shakespeare",comment="rotary value",writers={'screen','labml'})

#

Create configs

39conf=Configs()

#

Override configurations

41experiment.configs(conf,{

#

No fixed positional embeddings

43'transformer.src\_embed':'no\_pos',44'transformer.tgt\_embed':'no\_pos',

#

Encoder with RoPE

47'transformer.encoder\_attn':'rotary\_value',

#

'transformer.encoder_attn': 'rotary',

#

51'model':'rotary\_pe\_transformer',

#

Use character level tokenizer

54'tokenizer':'character',

#

Prompt separator is blank

56'prompt\_separator':'',

#

Starting prompt for sampling

58'prompt':'It is ',

#

Use Tiny Shakespeare dataset

60'text':'tiny\_shakespeare',

#

Use a context size of 256

63'seq\_len':512,

#

Train for 32 epochs

65'epochs':24,

#

Batch size 4

67'batch\_size':16,

#

Switch between training and validation for 10 times per epoch

70'inner\_iterations':4,

#

Model size

73'd\_model':128,74'transformer.ffn.d\_ff':512,75'transformer.n\_heads':4,76'transformer.dropout':0.0,

#

Use Adam optimizer

79'optimizer.optimizer':'Adam',80'optimizer.learning\_rate':2.5e-4,8182'dataloader\_shuffle\_with\_replacement':True83})

#

Set models for saving and loading

86experiment.add\_pytorch\_models({'model':conf.model})

#

Start the experiment

89withexperiment.start():

#

Run training

91conf.run()

#

95if\_\_name\_\_=='\_\_main\_\_':96main()

labml.ai