| |||||
| |||||
In order to generalize well, a system needs to be sufficiently powerful to approximate the target function. If it is too simple to fit even the training data then generalization to new data is also likely to be poor. (The true error may not be much worse than the training error, however, depending on how well the training data represents the target function.) If the network is powerful enough then good generalization is at least possible if not limited by other factors. In contrast to the rule of thumb that simpler is better, the larger network may generalize better since it is more powerful and better able to approximate the true target function. An overly complex system, however, may be able to approximate the data in many different ways that give similar errors and is unlikely to choose the one that will generalize best unless other constraints are imposed.
Figure 14.2 illustrates possible under- and overfitting. The fitting function is a linear combination of M evenly spaced Gaussian basis functions with width inversely proportional to M. At M = 3, the approximation is too simple and the error is large. At M = 5, the errors are smaller. At M = 30, the approximation may be overfitting the data.
Whether a given network overfits or underfits the data depends in part on the size of the training set. Figure 14.3 shows generalization error versus complexity curves for a slightly more complex function fitted by the same system of Gaussian basis functions. In general, the curve for a particular sample size N has a minimum at some intermediate complexity value M. Below a certain threshold, the approximation is too simple and all systems have large errors. At high values of M, the system begins to overfit and the error increases.
Unfortunately, if the target function is completely unknown, there is no way to determine a priori if the network is complex enough. Figure 14.2c may be overfitting if the data is noisy and the target function has a form similar to figure 14.2b, but it could be that the data are clean and the actual function is a complex deterministic function in which case figure 14.2b may be underfitting. Additional information is needed.