6.3
Remarks
The preceding sections illustrate some effects that learning rate
and momentum have on training and list some hints for recognizing when parameter
changes are needed. It should be emphasized that the illustrations are based on
a single small classification problem. The actual curves are unique to the
example and different curves will be obtained from different networks with
different training sets and initialization conditions. The examples are intended
to be representative in that many problems will show qualitative similarities;
quantitative differences should be expected however.
The main benefit of good parameters is faster training and the
prevention of divergence to avoidable bad solutions. As in the more general case
of tuning an optimization method, serious training difficulty may be an
indication of problems more basic than optimization parameters. That is, if
careful tuning of learning rate and momentum are needed to obtain good
solutions, the effort might be better spent in reconsidering more fundamental
things like the choice of representation or network architecture.
The difficulty of choosing a good learning rate a priori is one of the reasons adaptive learning
rate methods are useful and popular. Most adapt parameters dynamically as
appropriate for changing conditions and many assign different rates appropriate
for different parts of the network. A few methods are summarized in chapter 9.