docs/activations/fta/index.html
[View code on Github](https://github.com/labmlai/annotated_deep_learning_paper_implementations/tree/master/labml_nn/activations/fta/ init.py)
This is a PyTorch implementation/tutorial of Fuzzy Tiling Activations: A Simple Approach to Learning Sparse Representations Online.
Fuzzy tiling activations are a form of sparse activations based on binning.
Binning is classification of a scalar value into a bin based on intervals. One problem with binning is that it gives zero gradients for most values (except at the boundary of bins). The other is that binning loses precision if the bin intervals are large.
FTA overcomes these disadvantages. Instead of hard boundaries like in Tiling Activations, FTA uses soft boundaries between bins. This gives non-zero gradients for all or a wide range of values. And also doesn't lose precision since it's captured in partial values.
c is the tiling vector,
c=(l,l+δ,l+2δ,…,u−2δ,u−δ)
where [l,u] is the input range, δ is the bin size, and u−l is divisible by δ.
Tiling activation is,
ϕ(z)=1−I+(max(c−z,0)+max(z−δ−c))
where I+(⋅) is the indicator function which gives 1 if the input is positive and 0 otherwise.
Note that tiling activation gives zero gradients because it has hard boundaries.
The fuzzy indicator function,
Iη,+(x)=I+(η−x)x+I+(x−η)
which increases linearly from 0 to 1 when 0≤x<η and is equal to 1 for η≤x. η is a hyper-parameter.
FTA uses this to create soft boundaries between bins.
ϕη(z)=1−Iη,+(max(c−z,0)+max(z−δ−c,0))
Here's a simple experiment that uses FTA in a transformer.
61importtorch62fromtorchimportnn
65classFTA(nn.Module):
lower_limit is the lower limit lupper_limit is the upper limit udelta is the bin size δeta is the parameter η that detemines the softness of the boundaries.70def\_\_init\_\_(self,lower\_limit:float,upper\_limit:float,delta:float,eta:float):
77super().\_\_init\_\_()
Initialize tiling vector c=(l,l+δ,l+2δ,…,u−2δ,u−δ)
80self.c=nn.Parameter(torch.arange(lower\_limit,upper\_limit,delta),requires\_grad=False)
The input vector expands by a factor equal to the number of bins δu−l
82self.expansion\_factor=len(self.c)
δ
84self.delta=delta
η
86self.eta=eta
Iη,+(x)=I+(η−x)x+I+(x−η)
88deffuzzy\_i\_plus(self,x:torch.Tensor):
94return(x\<=self.eta)\*x+(x\>self.eta)
96defforward(self,z:torch.Tensor):
Add another dimension of size 1. We will expand this into bins.
99z=z.view(\*z.shape,1)
ϕη(z)=1−Iη,+(max(c−z,0)+max(z−δ−c,0))
102z=1.-self.fuzzy\_i\_plus(torch.clip(self.c-z,min=0.)+torch.clip(z-self.delta-self.c,min=0.))
Reshape back to original number of dimensions. The last dimension size gets expanded by the number of bins, δu−l.
106returnz.view(\*z.shape[:-2],-1)
109def\_test():
113fromlabml.loggerimportinspect
Initialize
116a=FTA(-10,10,2.,0.5)
Print c
118inspect(a.c)
Print number of bins δu−l
120inspect(a.expansion\_factor)
Input z
123z=torch.tensor([1.1,2.2,3.3,4.4,5.5,6.6,7.7,8.8,9.,10.,11.])
Print z
125inspect(z)
Print ϕη(z)
127inspect(a(z))128129130if\_\_name\_\_=='\_\_main\_\_':131\_test()