Appendix C: Jitter Calculations

The following calculations are used in chapter 17.

C.1 Jitter: Small-Perturbation Approximation

For small noise amplitudes, the network output y(x + n) can be approximated by

(C.1)

where H is the Hessian matrix with elements h_ij = ∂²y/(∂x_i∂x_j). Assuming an even noise distribution so that <n^k> = 0 for k odd, one can write

where m₄ is the fourth moment <n⁴>. Dropping all terms higher than second order in σ gives

(C.2)

and when H is assumed to be zero, this reduces to (17.15). The Laplacian term, Tr(H) = ▽²y, omitted in (17.15), can be described as an approximate measure of the difference between the average surrounding values and the precise value of the field at a point [100]. The third term in (C.2) is the first order regularization term in (17.15).

Training with nonjittered data simply minimizes the error at the training points and puts no constraints on the function at other points. In contrast, training with jitter minimizes the error while also forcing the approximating function to have small derivatives and a local average that approaches the target in the vicinity of each training point.

Appendix C - Jitter Calculations

Appendix C: Jitter Calculations

C.1 Jitter: Small-Perturbation Approximation