docs/transformers/positional_encoding.html
The positional encoding encodes the position along the sequence into a vector of size d_model .
PEp,2iPEp,2i+1=sin(10000dmodel2ip)=cos(10000dmodel2ip)
Where 1≤2i,2i+1≤dmodel are the feature indexes in the encoding, and p is the position.
23importmath2425importnumpyasnp26importtorch27importtorch.nnasnn
30classPositionalEncoding(nn.Module):
31def\_\_init\_\_(self,d\_model:int,dropout\_prob:float,max\_len:int=5000):32super().\_\_init\_\_()33self.dropout=nn.Dropout(dropout\_prob)3435self.register\_buffer('positional\_encodings',get\_positional\_encoding(d\_model,max\_len),False)
37defforward(self,x:torch.Tensor):38pe=self.positional\_encodings[:x.shape[0]].detach().requires\_grad\_(False)39x=x+pe40x=self.dropout(x)41returnx
44defget\_positional\_encoding(d\_model:int,max\_len:int=5000):
Empty encodings vectors
46encodings=torch.zeros(max\_len,d\_model)
Position indexes
48position=torch.arange(0,max\_len,dtype=torch.float32).unsqueeze(1)
2∗i
50two\_i=torch.arange(0,d\_model,2,dtype=torch.float32)
10000dmodel2i
52div\_term=torch.exp(two\_i\*-(math.log(10000.0)/d\_model))
PEp,2i=sin(10000dmodel2ip)
54encodings[:,0::2]=torch.sin(position\*div\_term)
PEp,2i+1=cos(10000dmodel2ip)
56encodings[:,1::2]=torch.cos(position\*div\_term)
Add batch dimension
59encodings=encodings.unsqueeze(1).requires\_grad\_(False)6061returnencodings
64def\_test\_positional\_encoding():65importmatplotlib.pyplotasplt6667plt.figure(figsize=(15,5))68pe=get\_positional\_encoding(20,100)69plt.plot(np.arange(100),pe[:,0,4:8].numpy())70plt.legend(["dim %d"%pforpin[4,5,6,7]])71plt.title("Positional encoding")72plt.show()737475if\_\_name\_\_=='\_\_main\_\_':76\_test\_positional\_encoding()