10.3 Line
Search
In many methods, the choice of a search direction is treated
separately from the problem of how far to move along that direction. A number of
methods, in fact, are defined by the way they choose the search direction and
the existence of a perfect line search is assumed. Line search routines are
basically one-dimensional optimization methods to find the minimum in a given
interval. In general, they are a subcomponent of a larger overall algorithm,
which applies them to find the minimum along a given line in a higher
dimensional space.
The efficiency of the line search routine can be critical since,
in many cases, the calculations needed to compute the search direction are
relatively minor and most of the computation time will be spent in the line
search. In general, there is a trade-off between the efficiency of the algorithm
(the number of function and gradient evaluations required) and the precision
with which the minimum can be located; more precision calls for more
evaluations. General multidimensional algorithms that can tolerate some
inexactness in the line search routine are preferred for this reason. If a lot
is known about the function then evaluation at just a few points may be enough
to locate the minimum exactly; for example, 3 points may be sufficient for a
quadratic function. This is unusual, however, because functions are usually not
so simple. In a typical case, a line search may require 10 to 20 function
(and/or gradient) evaluations depending on the nonlinearity of the function and
the precision demanded. In some cases, many more evaluations may be needed.
There are many different line search methods varying in efficiency
and robustness. We will not describe them here since they are covered well in
many optimization texts. As in the general case, there are specialized methods
that may be very efficient in situations where they are appropriate, but may not
work well in other cases. Methods using gradient information are usually more
efficient than evaluation-only methods if the gradient calculation is
inexpensive (as it is in neural network simulations). Whether the gradient
calculation is worth the savings in function evaluations can be
problem-dependent, however. If gradient information is used in the line search
routine, it certainly pays to also use it in the computation of the search
direction.
In most methods, the first task is to find an interval that
brackets a minimum. This calls for three points where the interior point is
lower than either end point and, unless you are lucky, will usually require more
than three function evaluations to find. Given a bounding interval, the next
task is to locate the minimum. One approach is to iteratively subdivide the
interval until the uncertainty is tolerable; each step reduces the uncertainty
by roughly half on average. The other main approach is to approximate the
function on the interval (e.g., with a parabola) and then estimate the minimum
location analytically. The first method is generally more robust, but the second
method can be much faster. Some practical algorithms start with the first method
and then switch to the second. Brent's method [302] is a common
choice.