Overview
Terms like training and learning are often used in an
artificial neural network context to describe the process of adapting a set of
parameters (the weights) to achieve a specific technical result (minimization of
the error on a training set). As such, learning can be viewed as a general
optimization problem that may be addressed by numerous techniques.
Back-propagation is by far the most commonly used method for
training MLP neural networks. Batch-mode back-propagation with a small learning
rate (and no momentum) is a specialization of gradient descent to MLPs while
on-line back-propagation is related to stochastic gradient descent. Gradient
descent is not highly regarded in the optimization community, however, mainly
because of its slow rate of convergence. This is a particular problem when the
Hessian matrix is poorly conditioned, that is, when the gradient changes quickly
in some directions and slowly in others, as it does in so-called ravines of the
error surface.
Optimization is a mature field and many algorithms, some
quite sophisticated, have been developed over the years. In appropriate
circumstances, they may be better alternatives to back-propagation. Many
converge much faster than gradient descent in certain situations while others
promise a higher probability of convergence to global minima. This chapter
reviews some of the standard methods. Aside from performance improvements they
might provide, familiarity with these techniques is useful to understand
back-propagation and its variants, many of which draw on ideas from the standard
methods.