Chapter 16: Heuristics for Improving Generalization

Overview

Given that data are limited and may be sampled in nonrandom ways, and that little is known about the "complexity" of the target function, the problem is to produce a system that fits the data as accurately as possible. One of the first tasks is to choose a network architecture. Even when generalization is not explicitly mentioned, the intent is usually to find a network that is powerful enough to solve the problem but simple enough to train easily and generalize well. Generalization criteria usually favor choosing the smallest network that will do the job, but in small networks back-propagation, for example, may be more likely to become trapped by local minima and may be more sensitive to initial conditions. If the algorithm cannot find a solution that does well on the training set, the solution it does find is not likely to do well on the test set either and generalization will be poor. Given limitations in the learning algorithm, a network that leans the problem reliably may be more complex than absolutely necessary and may not generalize as well as possible. Thus, additional techniques are often needed to aid generalization.

The following sections discuss some specific techniques that have-been suggested as ways to improve generalization. Some are based on theoretical principles while others are more heuristic. Purely numerical techniques are considered first, followed by techniques using domain-dependent prior information.

Chapter 16 - Heuristics for Improving Generalization

Chapter 16: Heuristics for Improving Generalization

Overview