Skip to Book Content
Book cover image

Chapter 9 - Faster Variations of Back-Propagation

Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks
Russell D. Reed and Robert J. Marks II
Copyright © 1999 Massachusetts Institute of Technology
 

9.2 Vogl's Method (Bold Driver)

Vogl et al. [380] describe an adaptive learning rate method where the global learning rate η(t) at time t is updated according to

(9.1)

where φ > 1 and β < 1 are constants. Suggested values are φ = 1.05 and β = 0.7. The name "bold driver" comes from Battiti [27]; there the value β = 0.5 is suggested based on the idea that an increase in E indicates a minimum has been overstepped and, on average it is reasonable to guess it is halfway between the current and previous weights.

In addition to decreasing the learning rate when the error increases significantly, the previous weight change is also retracted and the momentum parameter is reset α = 0 for the next step. The justification for clearing α is that α > 0 makes the current weight change similar to previous weight changes and the increase in the error indicates the need for a change in direction. Thus α is restored to its normal value after a successful step is taken.

In [380], learning speed increased by a factor of about 2.5 and 30 on two test problems. A similar method without momentum was unfavorably compared to conjugate gradient training on parity problems of various sizes in [27]. There it appears to give results similar to normal back-propagation with an optimally tuned fixed learning rate but without the need to search for the optimal learning rate.

The method was empirically compared to a number of other methods on a single test problem by Alpsan et al. [9]. In one case, learning was stopped as soon as all patterns were correctly classified (all outputs on all patterns correct within a tolerance 0.1 of the target values). With high momentum, it had about the same speed as optimally tuned back-propagation, but generalization was not as good. Generalization was better without momentum, but then learning was much slower than regular back-propagation. In a second case where convergence criteria required the outputs to essentially match the target values, the method converged whereas plain back-propagation did not, but it was not among the fastest methods. In an earlier test by the same authors, it was said to be somewhat unstable and no easier to tune than plain back-propagation.