Overview
Given that data are limited and may be sampled in nonrandom
ways, and that little is known about the "complexity" of the target function,
the problem is to produce a system that fits the data as accurately as possible.
One of the first tasks is to choose a network architecture. Even when
generalization is not explicitly mentioned, the intent is usually to find a
network that is powerful enough to solve the problem but simple enough to train
easily and generalize well. Generalization criteria usually favor choosing the
smallest network that will do the job, but in small networks back-propagation,
for example, may be more likely to become trapped by local minima and may be
more sensitive to initial conditions. If the algorithm cannot find a solution
that does well on the training set, the solution it does find is not likely to
do well on the test set either and generalization will be poor. Given
limitations in the learning algorithm, a network that leans the problem reliably
may be more complex than absolutely necessary and may not generalize as well as
possible. Thus, additional techniques are often needed to aid
generalization.
The following sections discuss some specific techniques that
have-been suggested as ways to improve generalization. Some are based on
theoretical principles while others are more heuristic. Purely numerical
techniques are considered first, followed by techniques using domain-dependent
prior information.