3.2 Linear
Separability
An important limitation of single-layer perceptrons was
pointed out by Minsky and Papert [268]: a single-layer perceptron can correctly
classify only data sets which are linearly
separable. Classes A and B are linearly separable if they can be separated
by a hyperplane, i.e., if a plane exists such that classes A and B lie on
opposite sides (figure
3.6). In two dimensions, the hyperplane reduces to a line and classes are
linearly separable if a line can be drawn between them. Section
3.2 discusses the probability that a random set of patterns is linearly
separable. The exclusive-OR function (figure
3.7) is a well-known example of a simple function that is not linearly
separable and thus not computable by single-layer perceptrons.
This result is significant because many classification problems
are not linearly separable. There are 22d Boolean functions of d Boolean input variables, for example, only O(2d2) of which are linearly
separable [278]. When d is large, the
fraction of Boolean functions that are linearly separable and thus computable by
a single-layer net becomes very small. For d
= 2, a total of 14 of the 16 Boolean functions are linearly separable.
(Exclusive-OR and its complement are the exceptions.) But for d = 4, only 1,882 out of 65536 are linearly
separable [278], [273].
A large part of the appeal of neural networks is that they "learn"
from examples, apparently imitating human abilities. The letdown when it was
demonstrated that single-layer networks could not compute functions even as
simple as exclusive-OR is said to have nearly stopped research in neural
networks. Although it was known that multilayer nets are more powerful than
single-layer nets, effective multilayer learning procedures were not well known
at the time. In the 1980s, learning algorithms for multilayer networks became
widely known and research interest revived.
In spite of their limitations, single-layer networks are
significant for reasons beyond historical interest. There are significant signal
processing applications that can be handled by single-layer networks.
Understanding of single-layer networks is also useful and necessary for insight
into multilayer networks, which are, after all, simply cascades of single-layer
networks.