| |||||
| |||||
The following calculations are used in chapter 17.
For small noise amplitudes, the network output y(x + n) can be approximated by
where H is the Hessian matrix with elements hij = ∂2y/(∂xi∂xj). Assuming an even noise distribution so that <nk> = 0 for k odd, one can write
where m4 is the fourth moment <n4>. Dropping all terms higher than second order in σ gives
and when H is assumed to be zero, this reduces to (17.15). The Laplacian term, Tr(H) = ▽2y, omitted in (17.15), can be described as an approximate measure of the difference between the average surrounding values and the precise value of the field at a point [100]. The third term in (C.2) is the first order regularization term in (17.15).
Training with nonjittered data simply minimizes the error at the training points and puts no constraints on the function at other points. In contrast, training with jitter minimizes the error while also forcing the approximating function to have small derivatives and a local average that approaches the target in the vicinity of each training point.