Skip to Book Content
Book cover image

Chapter 16 - Heuristics for Improving Generalization

Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks
Russell D. Reed and Robert J. Marks II
Copyright © 1999 Massachusetts Institute of Technology
 

16.6 Information Minimization

A heuristic for improving generalization based on the idea of information minimization is described by Kamimura, Takagi, and Nakanishi [205]. The uncertainty of a sigmoidal node is taken to be maximum when its activation is 0.5. A pseudo-entropy of the network for a particular set of patterns is defined as

(16.5)

where K is the number of input patterns, M is the number of hidden units, and vki is the activation of unit i for pattern k. The information in the network is given as

The entropy is used as a penalty function to minimize the information contained in the network so the augmented error function is

where Eo is the standard sum of squared errors. Minimizing E' adds the term

to the weight adjustment rule, giving

Here, δki = Ek/ai is the back-propagation delta term calculated in section 5.2. The use of H as a penalty term makes this an example of a regularization method. This also has effects similar to weight decay because (i) the entropy of a sigmoidal node is maximum when its output is 0.5, (ii) the output is 0.5 when the input is 0, and (iii) the input is 0 when the input weights are 0; that is, minimizing the weights would tend to minimize -H.