Chapter 10: Classical Optimization Techniques

Overview

Terms like training and learning are often used in an artificial neural network context to describe the process of adapting a set of parameters (the weights) to achieve a specific technical result (minimization of the error on a training set). As such, learning can be viewed as a general optimization problem that may be addressed by numerous techniques.

Back-propagation is by far the most commonly used method for training MLP neural networks. Batch-mode back-propagation with a small learning rate (and no momentum) is a specialization of gradient descent to MLPs while on-line back-propagation is related to stochastic gradient descent. Gradient descent is not highly regarded in the optimization community, however, mainly because of its slow rate of convergence. This is a particular problem when the Hessian matrix is poorly conditioned, that is, when the gradient changes quickly in some directions and slowly in others, as it does in so-called ravines of the error surface.

Optimization is a mature field and many algorithms, some quite sophisticated, have been developed over the years. In appropriate circumstances, they may be better alternatives to back-propagation. Many converge much faster than gradient descent in certain situations while others promise a higher probability of convergence to global minima. This chapter reviews some of the standard methods. Aside from performance improvements they might provide, familiarity with these techniques is useful to understand back-propagation and its variants, many of which draw on ideas from the standard methods.

Chapter 10 - Classical Optimization Techniques

Chapter 10: Classical Optimization Techniques

Overview