Books24x7 Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks

3.2 Linear Separability

An important limitation of single-layer perceptrons was pointed out by Minsky and Papert [268]: a single-layer perceptron can correctly classify only data sets which are linearly separable. Classes A and B are linearly separable if they can be separated by a hyperplane, i.e., if a plane exists such that classes A and B lie on opposite sides (figure 3.6). In two dimensions, the hyperplane reduces to a line and classes are linearly separable if a line can be drawn between them. Section 3.2 discusses the probability that a random set of patterns is linearly separable. The exclusive-OR function (Figure 3.7) is a well-known example of a simple function that is not linearly separable and thus not computable by single-layer perceptrons.

Figure 3.6: (a) Linearly separable classes can be separated by a hyperplane. In two dimensions, the hyperplane is a line. (b) classes that are not linearly separable cannot be separated by a hyperplane.

Figure 3.7: The two-input exclusive-OR function is a simple binary function that is not linearly separable and thus not learnable by a single-layer perceptron. In more than two dimensions, it generalizes to the parity function, which is 1 when the number of active inputs is odd and 0 when it is even

This result is significant because many classification problems are not linearly separable. There are 2²^d Boolean functions of d Boolean input variables, for example, only O(2^d²) of which are linearly separable [278]. When d is large, the fraction of Boolean functions that are linearly separable and thus computable by a single-layer net becomes very small. For d = 2, a total of 14 of the 16 Boolean functions are linearly separable. (Exclusive-OR and its complement are the exceptions.) But for d = 4, only 1,882 out of 65536 are linearly separable [278], [273].

A large part of the appeal of neural networks is that they "learn" from examples, apparently imitating human abilities. The letdown when it was demonstrated that single-layer networks could not compute functions even as simple as exclusive-OR is said to have nearly stopped research in neural networks. Although it was known that multilayer nets are more powerful than single-layer nets, effective multilayer learning procedures were not well known at the time. In the 1980s, learning algorithms for multilayer networks became widely known and research interest revived.

In spite of their limitations, single-layer networks are significant for reasons beyond historical interest. There are significant signal processing applications that can be handled by single-layer networks. Understanding of single-layer networks is also useful and necessary for insight into multilayer networks, which are, after all, simply cascades of single-layer networks.

Chapter 3 - Single-Layer Networks

3.2 Linear Separability