Skip to Book Content
Book cover image

Chapter 13 - Pruning Algorithms

Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks
Russell D. Reed and Robert J. Marks II
Copyright © 1999 Massachusetts Institute of Technology
 

13.1 Pruning Algorithms

A brute-force pruning method is: for every weight, set the weight to zero and evaluate the change in the error; if it increases too much then restore the weight, otherwise remove it. On a serial computer, each forward propagation takes O(W) time, where W is the number of weights. This is repeated for each of the weights and each of M training patterns resulting in O(MW2) time for each pruning pass. A number of passes are usually required. An even more cautious method would evaluate the change in error for all weights and patterns and then delete just the one weight with the least effect. This would be repeated until the least change in error reaches some threshold and could take O(MW3) time. This would be very slow for large networks, so most of the methods described in the following take a less direct approach.

Many of the algorithms can be put into two broad groups. One group estimates the sensitivity of the error function to removal of elements and deletes those with the least effect. The other group adds terms to the objective function that penalize complex solutions. Most of these can be viewed as forms of regularization. Many penalize large connection weights and are similar to weight decay rules. A term proportional to the sum of all weight magnitudes, for example, favors solutions with small weights; those that are nearly zero are unlikely to influence the output much and can be eliminated. There is some overlap in these groups because the objective function could include sensitivity terms.

In general, the sensitivity methods operate on a trained network. That is, the network is trained, sensitivities are estimated, and then weights or nodes are removed. The penalty-term methods, on the other hand, modify the cost function so that minimization drives unnecessary weights to zero and, in effect, removes them during training. Even if the weights are not actually removed, the network acts like a smaller system.