| |||||
| |||||
|
Equations 3.1 and 3.2 are fundamental to most of the networks considered later so it is useful to examine them more closely. The locus of points x with a constant sum defilinehyperplane perpendicular to the vector w. The Euclidean vector norm ||x|| = measures vector length. Because wTx = ||w|| ||x|| cos φ, where φ is the angle between w and x, u is proportional to the projection ||x|| cos φ of x onto w and all points with equivalent projections produce equivalent outputs (figure 3.3). The locus of points with equivalent projections on w are hyperplanes orthogonal to w, so the output y is a function of the distance from x to the hyperplane defined by w. The constant-output surfaces of (3.2) are hyperplanes perpendicular to w.
Orientation The orientation of the node hyperplane is determined by the direction of w. This depends on the relative sizes of the weights wi but not on the overall magnitude of w. Let ei be the unit vector aligned with the ith coordinate axis, for example, e1 = (1,0,0,…, 0) (wi will still be used to refer to the ith component of a vector, however). The angle φ1 between the hyperplane normal and the ith coordinate axis is then
The orientation of the plane is independent of the magnitude of w because the ratios wi/||w|| remain constant when w is multiplied by a constant.
Distance from the Origin As noted previously, the constant-output surfaces of (3.2) are hyperplanes perpendicular to w. More specifically, the weighted sum Σjwjxj = 0 defines a hyperplane through the origin. Inclusion of a threshold, or bias, term θ
shifts the hyperplane along w to a distance d = θ/||w|| from the origin. To see this, let v be the vector from the origin to the closest point on the plane. It must be normal to the plane, and thus parallel to w, so v = dw/||w||. The node hyperplane is the locus of points where
Figure 3.4 illustrates the utility of the bias term. Without bias, the decision surface must pass through the origin and so will be unable to separate some data sets. Addition of a bias allows the surface to be shifted from the origin to obtain better classification. To simplify analyses, the threshold is usually absorbed into the weight vector by assuming that one of the inputs is constant,xbias = 1. The constant input is called the bias node
Gradation The node nonlinearity ƒ in (3.2) controls how the output varies as the distance from x to the node hyperplane changes. As noted, ƒ is usually chosen to be a bounded monotonic function. When ƒ is a binary hard-limiting function as in a linear threshold unit, the node divides the input space with a hyperplane, producing 0 for inputs on one side of the plane and 1 for inputs on the other side. With a softer nonlinearity such as the sigmoid, the transition from 0 to 1 is smoother but other properties are similar.
The magnitude of w in equation 3.3 plays the role of a scaling parameter that can be varied to obtain transitions of varying steepness. The slope of the transition is ||∂y/∂x|| = f'(u) ||w||, which is proportional to ||w||, the magnitude of the weight vector. For large ||w||, the slope is steep and the sigmoid approximates a step function. For small ||w||, the slope is small and y(x) is nearly linear over a wide range of inputs. Figure 3.5 illustrates functions with various degrees of gradation. In any case, the output is solely a function of the distance of the input from the hyperplane.