Overview
Because
the network output is a function of its weights, the error is a function of w.
In general, E(w) is a multidimensional function
and impossible to visualize. If it could be plotted as a function of w, however, E might look like
a landscape with hills and valleys, high where E is high and low where E is low. Back-propagation, as an approximation to
gradient descent, could then be viewed as placing a marble at some random point
on the landscape and letting it roll downhill. If the surface were shaped like
a smooth bowl, the marble (the weight state) would always roll to the lowest point;
back-propagation would always find the best solution and local minima would never
be a problem. Usually, of course, the surface is not so simple. Because the shape
of the error surface has a fundamental effect on the learning process, it is useful
to examine some of its properties. Many of the figures that follow are adapted
from Hush, Horne, and Salas [183], [181].