docs/normalization/group_norm/index.html
[View code on Github](https://github.com/labmlai/annotated_deep_learning_paper_implementations/tree/master/labml_nn/normalization/group_norm/ init.py)
This is a PyTorch implementation of the Group Normalization paper.
Batch Normalization works well for large enough batch sizes but not well for small batch sizes, because it normalizes over the batch. Training large models with large batch sizes is not possible due to the memory capacity of the devices.
This paper introduces Group Normalization, which normalizes a set of features together as a group. This is based on the observation that classical features such as SIFT and HOG are group-wise features. The paper proposes dividing feature channels into groups and then separately normalizing all channels within each group.
All normalization layers can be defined by the following computation.
x^i=σi1(xi−μi)
where x is the tensor representing the batch, and i is the index of a single value. For instance, when it's 2D images i=(iN,iC,iH,iW) is a 4-d vector for indexing image within batch, feature channel, vertical coordinate and horizontal coordinate. μi and σi are mean and standard deviation.
μiσi=m1k∈Si∑xk=m1k∈Si∑(xk−μi)2+ϵ
Si is the set of indexes across which the mean and standard deviation are calculated for index i. m is the size of the set Si which is the same for all i.
The definition of Si is different for Batch normalization, Layer normalization, and Instance normalization.
Si={k∣kC=iC}
The values that share the same feature channel are normalized together.
Si={k∣kN=iN}
The values from the same sample in the batch are normalized together.
Si={k∣kN=iN,kC=iC}
The values from the same sample and same feature channel are normalized together.
Si={k∣kN=iN,⌊C/GkC⌋=⌊C/GiC⌋}
where G is the number of groups and C is the number of channels.
Group normalization normalizes values of the same sample and the same group of channels together.
Here's a CIFAR 10 classification model that uses instance normalization.
84importtorch85fromtorchimportnn
89classGroupNorm(nn.Module):
groups is the number of groups the features are divided intochannels is the number of features in the inputeps is ϵ, used in Var[x(k)]+ϵ for numerical stabilityaffine is whether to scale and shift the normalized value94def\_\_init\_\_(self,groups:int,channels:int,\*,95eps:float=1e-5,affine:bool=True):
102super().\_\_init\_\_()103104assertchannels%groups==0,"Number of channels should be evenly divisible by the number of groups"105self.groups=groups106self.channels=channels107108self.eps=eps109self.affine=affine
Create parameters for γ and β for scale and shift
111ifself.affine:112self.scale=nn.Parameter(torch.ones(channels))113self.shift=nn.Parameter(torch.zeros(channels))
x is a tensor of shape [batch_size, channels, *] . * denotes any number of (possibly 0) dimensions. For example, in an image (2D) convolution this will be [batch_size, channels, height, width]
115defforward(self,x:torch.Tensor):
Keep the original shape
123x\_shape=x.shape
Get the batch size
125batch\_size=x\_shape[0]
Sanity check to make sure the number of features is the same
127assertself.channels==x.shape[1]
Reshape into [batch_size, groups, n]
130x=x.view(batch\_size,self.groups,-1)
Calculate the mean across last dimension; i.e. the means for each sample and channel group E[x(iN,iG)]
134mean=x.mean(dim=[-1],keepdim=True)
Calculate the squared mean across last dimension; i.e. the means for each sample and channel group E[x(iN,iG)2]
137mean\_x2=(x\*\*2).mean(dim=[-1],keepdim=True)
Variance for each sample and feature group Var[x(iN,iG)]=E[x(iN,iG)2]−E[x(iN,iG)]2
140var=mean\_x2-mean\*\*2
Normalize x^(iN,iG)=Var[x(iN,iG)]+ϵx(iN,iG)−E[x(iN,iG)]
145x\_norm=(x-mean)/torch.sqrt(var+self.eps)
Scale and shift channel-wise yiC=γiCx^iC+βiC
149ifself.affine:150x\_norm=x\_norm.view(batch\_size,self.channels,-1)151x\_norm=self.scale.view(1,-1,1)\*x\_norm+self.shift.view(1,-1,1)
Reshape to original and return
154returnx\_norm.view(x\_shape)
Simple test
157def\_test():
161fromlabml.loggerimportinspect162163x=torch.zeros([2,6,2,4])164inspect(x.shape)165bn=GroupNorm(2,6)166167x=bn(x)168inspect(x.shape)
172if\_\_name\_\_=='\_\_main\_\_':173\_test()