Overview
One of the common complaints about back-propagation is that
it can be very slow. A typical training session may require thousands of
iterations. Large networks with large training sets might take days or weeks to
train. This chapter reviews a number of relatively simple variations of the
basic algorithm that are intended to speed up learning.
It should be noted that things such as the network structure, the
input-output representation, the choice of error function, and so on, often have
much stronger effects on learning time (possibly orders of magnitude) than
variations in the optimization method. At the time of training, however, these
choices have already been made and the goal of the methods described next is to
accelerate learning in a given network with the given data.
Many variations of the basic algorithm have been proposed and new
ones continue to appear regularly. We will not attempt to summarize them all.
Many methods are heuristic and somewhat ad hoc; others are founded on principled
theory. Some are specialized to certain problem types, for example,
classification, and do not always work well on other sorts of problems. Some
draw on general optimization techniques specialized to neural network
applications. To appreciate these, it is worth reviewing the classic
optimization techniques (chapter 10).
Next, a few methods are listed that have stood up to testing
and seem to work reasonably well on a wide range of problems. Also listed are
some well-known methods that deserve mention if only to inform the reader who
has heard of them and wonders what is involved.