A.3 The
LMS Algorithm
The Widrow-Hoff learning rule [402], also called the LMS (least mean squares) algorithm or
the delta rule is basically an iterative on-line implementation of linear
regression. Like gradient descent, it avoids storing R, but
the LMS method reduces storage requirements even further. Gradient descent still
requires O(N) storage to accumulate error terms for the entire
training set in order to approximate the true gradient before making a weight
change. The LMS method avoids this may making weight changes as soon as errors
occur. In the limit of very small learning rates, the result is the same. Like
linear regression, it also minimizes the mean squared error of a linear fit so
it shares many properties with linear regression and succeeds or fails in the
same situations. An extensive summary of the LMS algorithm is provided by Widrow
and Stearns [405].
As before, the half mean squared error is E = ½E [e2] and the gradient is
The LMS
algorithm does steepest descent using an estimate of the gradient
based only on the error on the current pattern
(A.23) |
 |
(A.24) |
 |
That is, it does on-line rather than batch learning. The update
rule is then
(A.25) |
 |
(A.26) |
 |
where 0 < η < 2/λmax for stability. Here, subscripts
index pattern presentations rather than vector elements.
Because λmax is
unknown unless R is analyzed, an estimate must be used. R is nonnegative-definite, so all eigenvalues are nonnegative
λi ≥ 0 and
λmax can be bounded by
(A.27) |
 |
where trace(R) = ∑irii
= ∑i E
[x2i]. Because this places a upper bound on
λmax that may too high, the
resulting η value may be smaller than necessary. Adaptive learning rate methods,
perhaps initialized this way, may be able to improve training speed by adjusting
the rate based on observed training performance. As an aside, this provides
justification for centering input variables because E [x2i] = σ2i + μ2i where σ2i and μi are the variance and mean of input
xi; zero-mean inputs will produce
lower estimates of λmax and allow
larger step sizes to be used.