Skip to Book Content
Book cover image

Chapter 9 - Faster Variations of Back-Propagation

Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks
Russell D. Reed and Robert J. Marks II
Copyright © 1999 Massachusetts Institute of Technology
 

9.5 SuperSAB

SuperSAB [372] is another adaptive learning rate method based on the delta-bar-delta heuristics. It is based on an earlier method called SAB, which stands for "self-adapting back propagation." Like the method of Vogl et al. (section 9.4), the learning rate is both increased and decreased multiplicatively.

Parameters include the initial learning rate η start, an increase factor η+ > 1, and a decrease factor 0 < η- < 1. Each weight has its own learning rate η ij(t) which changes with time t. The algorithm is:

  1. Initialize all learning rates ηij(0) =η start.

  2. Do a back-propagation step with momentum.

  3. For each weight wij

    • if the sign of its derivative is unchanged then increase the learning rate, ηij( t + 1 )=η+ij(t);

    • otherwise (the sign changed), retract the step wij (t + 1) = wij (t) - Δwij (t), decrease the learning rate ηij(t + 1) =η- ηij (t), and set Δwij (t + 1) = 0 so momentum has no effect in the next cycle.

  4. Go to 2.

Typical suggested values are η+ = 1.2 and η- = 0.5. (There appear to be typographical errors in [372]. This is based on the explanation accompanying the formula.)

Reported results have been inconsistent. In some cases SuperSAB is among the fastest methods [9]; others have reported it to be very unstable [8]. The possibility of instability, especially when momentum is high, is noted in the original paper. This shows itself as a sudden large increase in the error. Sometimes the error will correct itself in subsequent steps; otherwise a restart may be necessary. Because η increases multiplicatively and can become large quickly, it is reasonable to set limiting values on both η and the maximum allowed weight magnitude. Because of the instability problems and because it does not appear to have major speed advantages, other methods may be preferable in general.