| |||||
| |||||
The representational capability of a network can be defined as the range of mappings it can implement when the weights are varied. A particular mapping is within the representational capability of a net if there is a set of weights for which the net performs the mapping. Theories about representational capability answer questions like: Given a set of inputs and associated target outputs, is there a set of weights that allow the network to perform the desired mapping? Is the network capable of generating the desired outputs from the given inputs? In some cases, an exact solution may be required while in others, an approximation may be allowed. Results say whether a particular problem might be solvable by a particular network without necessarily producing the weights that generate the solution. If the answer is negative, we know the network cannot solve the problem and this saves us the expense of a futile search. If the answer is positive, we know a solution exists but these results don't guarantee that it will be easy to find or that a chosen optimization method will be able to find it. Results of this kind provide broad guidelines in selecting an architecture for a problem. One well-known representational result, for example, is that single-layer networks are capable of representing only linearly separable functions.
Two Hidden Layers Are Sufficient When designing a layered network, an obvious first question is how many layers to use. Lippmann [247] shows that two hidden layers are sufficient to create classification regions of any desired shape. In figure 4.2, linear threshold units in the first hidden layer divide the input space into half-spaces with hyperplanes, units in the second hidden layer AND (form intersections of) these half-spaces to produce convex regions, and the output units OR (form unions of) the convex regions into arbitrary, possibly unconnected, shapes. Given a sufficient number of units, a network can be formed that divides the input space in any way desired, producing 0 when the input is in one region and 1 when it is in the other. The boundaries are piecewise linear, but any smooth boundary can be approximated with enough units.
To approximate continuous functions, one can add and subtract (rather than logically OR) convex regions with appropriate weighting factors so two hidden layers are also sufficient to approximate any desired bounded continuous function [234].
One-hidden-layer Nets Can Represent Nonconvex, Disjoint Regions Units in the second hidden layer of figure 4.2 respond to convex regions in the input space. Because these would be the outputs in a single-hidden-layer network, it is sometimes mistakenly assumed that single-hidden-layer networks can recognize only convex decision regions.
Counter-examples are shown in figures 4.3 through 4.5 Figures 4.3 and 4.4 illustrate nonconvex regions recognized by three-layer nets. (Figure 4.4 is derived from Weiland and Leighton [407].) Figure 4.5 shows examples of disjoint (unconnected) regions recognizable by three-layer networks. Other examples can be found in [180], [255], [249].