Skip to Book Content
Book cover image

Chapter 12 - Constructive Methods

Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks
Russell D. Reed and Robert J. Marks II
Copyright © 1999 Massachusetts Institute of Technology
 

12.6 Meiosis Networks

Hanson [153] describes meiosis networks, which work by splitting nodes. (In biology, meiosis refers to a process of cell division.) The algorithm varies the sizes of layers in a given network but does not add new layers. The description in [153] assumes a single-hidden-layer net, but other forms might also be used. In principle, the target function can be either continuous or discrete; the description in [153] presents results for several classification problems.

The optimization procedure is stochastic in that the network weights have noisy values, which change randomly from one instant to the next. The mean and variance for each weight are adjusted during training. The specificity or certainty of a node is estimated by the variance of its weights relative to their means. Nodes with high relative variances are candidates for splitting.

Weight values change randomly from one instant to the next according to a probability distribution such as

(12.8)

where φ() is an N(0,1) Gaussian density function. μij. and σij are, respectively, the mean and standard deviation for the fluctuations of weight wij. Because of this variability, successive presentations of the same pattern can result in different outputs.

The initial network contains one hidden unit whose weights are initialized with random means and variances The mean is adjusted by gradient descent

(12.9)

with a learning rate parameter α. The standard deviation changes depending on the magnitude of the gradient

(12.10)

β is a learning rate parameter. Values 0.1 < β < 0.5 are suggested in [153]. This update mechanism can only increase σij. Decreases occur by a decay process

(12.11)

As errors approach zero during training, the standard deviations decay to zero and the network becomes deterministic. Low values of ζ, for example, < 0.7, produce little node splitting; large values, for example, > 0.99, produce continual node splitting. A value of 0.98 was used in simulations.

The standard deviation of a weight is considered to be a measure of its certainty or prediction value; large variances tend to mean low prediction value. This process above tends to assign small variances to weights that converge quickly and high variances to weights that converge slowly. Presumably, quick convergence indicates that the weights are clearly necessary and adequate while slow convergence indicates a delicate balance between opposing forces that the net is unable to resolve quickly. That is, a high variance reflects uncertainty in the proper weight value.

Nodes with many uncertain weights are candidates for splitting. Nodes split when the standard deviation becomes large relative to the mean for both the input and output weight vectors

(12.12)

and

(12.13)

(It may be preferable to use the sum of absolute mean values here.) Child node weights are initialized with the same mean as the parent node and half the variance.

One problem with this splitting criterion is that nodes whose weights have small mean values are more likely to be split than other nodes. A completely unnecessary node whose mean weights are all zero would be split many times.