Books24x7 Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks

10.3 Line Search

In many methods, the choice of a search direction is treated separately from the problem of how far to move along that direction. A number of methods, in fact, are defined by the way they choose the search direction and the existence of a perfect line search is assumed. Line search routines are basically one-dimensional optimization methods to find the minimum in a given interval. In general, they are a subcomponent of a larger overall algorithm, which applies them to find the minimum along a given line in a higher dimensional space.

The efficiency of the line search routine can be critical since, in many cases, the calculations needed to compute the search direction are relatively minor and most of the computation time will be spent in the line search. In general, there is a trade-off between the efficiency of the algorithm (the number of function and gradient evaluations required) and the precision with which the minimum can be located; more precision calls for more evaluations. General multidimensional algorithms that can tolerate some inexactness in the line search routine are preferred for this reason. If a lot is known about the function then evaluation at just a few points may be enough to locate the minimum exactly; for example, 3 points may be sufficient for a quadratic function. This is unusual, however, because functions are usually not so simple. In a typical case, a line search may require 10 to 20 function (and/or gradient) evaluations depending on the nonlinearity of the function and the precision demanded. In some cases, many more evaluations may be needed.

There are many different line search methods varying in efficiency and robustness. We will not describe them here since they are covered well in many optimization texts. As in the general case, there are specialized methods that may be very efficient in situations where they are appropriate, but may not work well in other cases. Methods using gradient information are usually more efficient than evaluation-only methods if the gradient calculation is inexpensive (as it is in neural network simulations). Whether the gradient calculation is worth the savings in function evaluations can be problem-dependent, however. If gradient information is used in the line search routine, it certainly pays to also use it in the computation of the search direction.

In most methods, the first task is to find an interval that brackets a minimum. This calls for three points where the interior point is lower than either end point and, unless you are lucky, will usually require more than three function evaluations to find. Given a bounding interval, the next task is to locate the minimum. One approach is to iteratively subdivide the interval until the uncertainty is tolerable; each step reduces the uncertainty by roughly half on average. The other main approach is to approximate the function on the interval (e.g., with a parabola) and then estimate the minimum location analytically. The first method is generally more robust, but the second method can be much faster. Some practical algorithms start with the first method and then switch to the second. Brent's method [302] is a common choice.

Chapter 10 - Classical Optimization Techniques

10.3 Line Search