A.1
Newton's Method
This gives the update rule
(A.10) |
 |
(A.11) |
 |
(where the subscripts index time rather than vector
elements). At each step, w is changed by a fraction η of the
difference (w* - wk). Because the error surface is
quadratic, the solution could be obtained in a single step when η = 1. For
nonlinear optimization tasks such as most neural network problems, however, the
linear approximation is only locally valid and smaller step sizes are used to
avoid straying too far from the region of validity; one-step convergence is not
possible and iteration with a smaller step size is necessary. In the linear
case, the eventual solution is the same however, w∞ = w*.