| |||||
| |||||
Generalization is influenced by many factors. Items considered in this chapter include the following:
The samples alone are insufficient to choose a good generalizer; more information must be provided. This is usually done by biasing the learning procedure in some way.
The biases of the learning procedure must fit reality.
The network must be powerful enough to fit the target.
The data must be representative of the target function; there must be sufficient data to illustrate the target.
The distribution of the training and test data should be similar.
More data will generally be needed to achieve the same accuracy when the data are noisy.
There must be enough data to constrain the net (or the net must be chosen so that it is constrained by the data and biases of the training procedure).
Weaknesses of the learning procedure may lead to poor solutions.
Dynamics of the learning procedure cause generalization to vary with training time; excessive training may lead to overfitting.
The same error measure should be used for training and testing.
Input variables should supply necessary information.
An appropriate data representation should be chosen.
Independent subproblems should be assigned to independent networks, rather than combined in a single system.
Remarks The sections in this chapter list many factors that affect generalization and may give the impression that useful approximation is almost impossible because so many things could go wrong. However, the intent is to examine factors that need to be considered and might be encountered at one time or another in different problems. In most problems, many of these factors will not be critical.
Neural networks are often used to solve problems with hundreds of variables in spite of the curse of dimensionality that could make such problems very hard. Many problems turn out to be easier than expected. It is not clear why this happens. Some possible reasons are suggested in [79]:
application studies use clever preprocessing or data encodings to simplify the learning task;
many input variables are interdependent so the effective dimensionality is small;
the high-dimensionality data has a nonuniform distribution or forms clusters that favor local representation methods; and
getting the right bias (net structure, parameters, learning algorithm, etc.) may be much more important than learning from the data. An appropriate bias reduces the need for data.