Noise is usually considered undesirable—something to be eliminated if possible, but many studies (e.g. , , , , , , , , ) have noted that adding small amounts of noise to input patterns during training often results in better generalization and fault tolerance.
A short explanation for these results is that the noise blurs the data. When random noise is added every time a pattern is presented, the network never sees exactly that same input twice, even when the same training pattern is selected, so it cannot simply "memorize" the training data. Averaging over the noise effectively smooths the target function and prevents the network from overfitting a limited set of training data. This turns out to be helpful for generalization because many of the functions that interest people tend to be smooth.
The following sections examine these ideas in more detail. The term jitter is used to refer to noise intentionally added to the inputs in contrast to undesired, uncontrolled noise from other sources.