Books24x7 Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks

5.4 Common Modifications

5.4.1 Momentum

A common modification of the basic weight update rule is the addition of a momentum term. The idea is to stabilize the weight trajectory by making the weight change a combination of the gradient-decreasing term in equation 5.23 plus a fraction of the previous weight change. The modified weight change formula is

(5.25)

That is, the weight change Δw(t) is a combination of a step down the negative gradient,

plus a fraction 0 ≤ α < 1 of the previous weight change. Typical values are 0 ≤ α < 0.9.

This gives the system a certain amount of inertia since the weight vector will tend to continue moving in the same direction unless opposed by the gradient term. Effects of momentum are considered in more detail in section 6.2. Briefly, momentum tends to damp oscillations in the weight trajectory and accelerate learning in regions where∂E/∂w is small.

5.4.2 Weight Decay

Another common modification of the weight update rule is the addition of a weight decay term. Weight decay is sometimes used to help adjust the complexity of the network to the difficulty of the problem. The idea is that if the network is overly complex, then it should be possible to delete many weights without increasing the error significantly. One way to do this is to give the weights a tendency to drift to zero by reducing their magnitudes slightly at each iteration. The update rule with weight decay is then

(5.26)

where 0 ≤ρ << 1 is the weight decay parameter. ∂E/∂wⁱ = 0 for some w^I, then wⁱ will decay to zero exponentially. Otherwise, if the weight really is necessary then ∂E/∂w_i will be nonzero and the two terms will balance at some point, preventing the weight from decaying to zero. Weight decay is considered in more detail in sections 6.2.4 and section 16.5 and chapter 13.

Chapter 5 - Back-Propagation

5.4 Common Modifications

5.4.1 Momentum

5.4.2 Weight Decay