One of the common complaints about back-propagation is that it can be very slow. A typical training session may require thousands of iterations. Large networks with large training sets might take days or weeks to train. This chapter reviews a number of relatively simple variations of the basic algorithm that are intended to speed up learning.
It should be noted that things such as the network structure, the input-output representation, the choice of error function, and so on, often have much stronger effects on learning time (possibly orders of magnitude) than variations in the optimization method. At the time of training, however, these choices have already been made and the goal of the methods described next is to accelerate learning in a given network with the given data.
Many variations of the basic algorithm have been proposed and new ones continue to appear regularly. We will not attempt to summarize them all. Many methods are heuristic and somewhat ad hoc; others are founded on principled theory. Some are specialized to certain problem types, for example, classification, and do not always work well on other sorts of problems. Some draw on general optimization techniques specialized to neural network applications. To appreciate these, it is worth reviewing the classic optimization techniques (chapter 10).
Next, a few methods are listed that have stood up to testing and seem to work reasonably well on a wide range of problems. Also listed are some well-known methods that deserve mention if only to inform the reader who has heard of them and wonders what is involved.