docs/basic/batchsize_decay_en.md
Referring to TensorFlow, Angel implements a variety of learning rate decay solutions, from which users can select at will. Before depicting the Decay solutions in detail, let's take a first look on how Decay is introduced in Angel and when we should use Decay.
Decay is introduced to Graph in the GraphLearner class. Following codes are executed during initialization:
val ssScheduler: StepSizeScheduler = StepSizeScheduler(SharedConf.getStepSizeScheduler, lr0)
The StepSizeScheduler is the base class of all Decay, and the object of the same name is the factory of all Decay instances. SharedConf.getStepSizeScheduler can obtain the specified decay type by reading the value of ml.opt.decay.class.name (StandardDecay by default).
For the second question, Angel provides two solutions controlled by the ml.opt.decay.on.batch parameter:
Detailed codes are included in the train and trainOneEpoch methods of the GraphLearner class.
ConstantLearningRate is the simplest Decay, that is, the learning rate keeps unchanged throughout the training process without any decay.
Configuration example:
ml.opt.decay.class.name=ConstantLearningRate
The standard Decay solution. The formula is:
Configuration example:
ml.opt.decay.class.name=StandardDecay
ml.opt.decay.alpha=0.001
The Correction Decay is explicitly designed for Momentum. Please do not use it for other optimizers such as Adam. The formula is:
The first part of the multiplication is exactly the StandardDecay; The second part is the correction designed for Momentum, which is the reciprocal of the sum of the motion coefficients. The $\beta$ must be equal to the momentum in optimizer, and can be generally set to 0.9.
There are two things to note about using CorrectionDecay:
Configuration example:
ml.opt.decay.class.name=CorrectionDecay
ml.opt.decay.alpha=0.001
ml.opt.decay.beta=0.9
WarmRestarts is a more advanced Decay solution, which is representative of Decay in cycle. The standard calculation formula is as follows:
We make following improvements for the standard calculation formula:
Configuration example:
ml.opt.decay.class.name=WarmRestarts
ml.opt.decay.alpha=0.001
The is configured via ml.opt.decay.intervals:
class WarmRestarts(var etaMax: Double, etaMin: Double, alpha: Double) extends StepSizeScheduler {
var current: Double = 0
var numRestart: Int = 0
var interval: Int = SharedConf.get().getInt(MLConf.ML_OPT_DECAY_INTERVALS, 100)
override def next(): Double = {
current += 1
val value = etaMin + 0.5 * (etaMax - etaMin) * (1 + math.cos(current / interval * math.Pi))
if (current == interval) {
current = 0
interval *= 2
numRestart += 1
etaMax = etaMax / math.sqrt(1.0 + numRestart * alpha)
}
value
}
override def isIntervalBoundary: Boolean = {
current == 0
}
}