Skip to Book Content
Book cover image

Chapter 14 - Factors Influencing Generalization

Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks
Russell D. Reed and Robert J. Marks II
Copyright © 1999 Massachusetts Institute of Technology
 

14.3 Network Complexity versus Target Complexity

In order to generalize well, a system needs to be sufficiently powerful to approximate the target function. If it is too simple to fit even the training data then generalization to new data is also likely to be poor. (The true error may not be much worse than the training error, however, depending on how well the training data represents the target function.) If the network is powerful enough then good generalization is at least possible if not limited by other factors. In contrast to the rule of thumb that simpler is better, the larger network may generalize better since it is more powerful and better able to approximate the true target function. An overly complex system, however, may be able to approximate the data in many different ways that give similar errors and is unlikely to choose the one that will generalize best unless other constraints are imposed.

Figure 14.2 illustrates possible under- and overfitting. The fitting function is a linear combination of M evenly spaced Gaussian basis functions with width inversely proportional to M. At M = 3, the approximation is too simple and the error is large. At M = 5, the errors are smaller. At M = 30, the approximation may be overfitting the data.

Whether a given network overfits or underfits the data depends in part on the size of the training set. Figure 14.3 shows generalization error versus complexity curves for a slightly more complex function fitted by the same system of Gaussian basis functions. In general, the curve for a particular sample size N has a minimum at some intermediate complexity value M. Below a certain threshold, the approximation is too simple and all systems have large errors. At high values of M, the system begins to overfit and the error increases.

Click To expand
Figure 14.2: Underfitting and overfitting by a minimum-MSE approximation using M evenly spaced Gaussian basis functions with widths inversely proportional to M: (a) underfitting (M = 3), the approximation is too simple, (b) perhaps a reasonable fit (M = 5), and (c) possible overfitting (M = 30).
Click To expand
Figure 14.3: Generalization error versus network complexity for a minimum-MSE fit using M evenly spaced Gaussian basis functions for various sample sizes N. For intermediate sample sizes, the curve has a minimum at some intermediate value. (Each point is the average of 200 trials. Training samples are evenly spaced with some jitter but no additive noise. The target function is shown in the inset.)

Unfortunately, if the target function is completely unknown, there is no way to determine a priori if the network is complex enough. Figure 14.2c may be overfitting if the data is noisy and the target function has a form similar to figure 14.2b, but it could be that the data are clean and the actual function is a complex deterministic function in which case figure 14.2b may be underfitting. Additional information is needed.