Skip to Book Content
Book cover image

Chapter 8 - The Error Surface

Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks
Russell D. Reed and Robert J. Marks II
Copyright © 1999 Massachusetts Institute of Technology
 

8.2 The Gradient is the Sum of Single-Pattern Gradients

With an SSE or MSE cost function, the E(w) surface is the sum (or average) of the individual surfaces for each pattern and the total gradient is the sum (or average) of the single-pattern gradients. In other words, the error is shaped by the interaction of the weights with each of the individual training patterns. Figure 8.5 shows single-pattern gradients for a simple two-weight problem. These are the vectors that would be used for weight updates in on-line learning. On a "hillside" (a), most of the vectors point in a dominant direction. On a "ridge" (b) or at the bottom of a "valley" (c), there are often two bundles of vectors pointing in opposite directions across the valley. In on-line learning, the weights are updated from just one pattern and thus tend to oscillate across the valley. At a local minima (d) the vectors sum to zero; they may be large and distributed in all directions, or they may all go to zero. If they simply cancel without going to zero, the minimum will be unstable with on-line learningthe weight vector will move off the minimum if placed there. Point (e) shows a relatively "flat spot". These examples aren't universal since similar E(w) features could be created in many ways, but they are common. Other cost functions may yield different behavior.

Click To expand
Figure 8.5: Single-pattern weight update vectors. With an SSE cost function, the E(w) surface is the sum of individual surfaces for each pattern and the total gradient is the sum of the single-pattern gradients. On a "hillside" (a), most of the vectors point in a dominant direction. On a "ridge" (b) or at the bottom of a "valley" (c), there are often two bundles of vectors pointing in opposite directions. At a local minima (d) the vectors sum to zero; they may be large and evenly distributed in all directions, or they may all go to zero. Point (e) shows a relatively flat spot.