Skip to Book Content
Book cover image

Chapter 12 - Constructive Methods

Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks
Russell D. Reed and Robert J. Marks II
Copyright © 1999 Massachusetts Institute of Technology
 

12.4 The Tiling Algorithm

Mézard and Nadal [265] describe the tiling algorithm (figure 12.5) for constructing multilayer networks of linear threshold units to solve binary mapping problems. It should also be suitable for mappings from continuous inputs to binary outputs.

Units are added a layer at a time from the inputs upward. The first unit in each hidden layer is called the master unit. It attempts to classify as many of the training patterns as possible based on the input from the previous layer. It is always possible to create a new master unit on the current layer that will make at least one less error than the master unit of the preceding layer. Thus, if enough layers are created, the final master unit will not make any errors. Convergence is guaranteed because the number of layers required is limited by the number of patterns to be learned.

Figure 12.5: The tiling algorithm [265] adds a new hidden layer at each iteration. A master unit in each layer attempts to classify as many patterns as possible based on input from the preceding layer. If it cannot classify all patterns correctly, then 'ancillary' units are added until the layer produces a 'faithful' representation such that any two input patterns with different targets produce a different pattern of activity on the layer. This then serves as input for the next layer. It is always possible to create a master unit which makes fewer errors than the master unit in the previous layer so convergence is guaranteed if enough layers are added.

If the newly created master unit still makes some errors then additional "ancillary" units are added to the layer until it produces a "faithful representation" of the training patterns such that any two input patterns with distinct targets produce different patterns of activity on the layer.

The Algorithm

  1. Create a master unit for the new hidden layer and train it to separate as many of the input patterns as possible.

  2. If the new unit produces the correct responses for all patterns, then it is the final output unit. Stop.

  3. Otherwise, add ancillary units to create a faithful representation on the current layer.

  4. Go to 1.

These steps are explained in more detail in the following.

Generating the Master Unit Assume there are po patterns to be learned and that the preceding layer L-1 has been established. Layer L = O is taken to be the input layer. At layer L, let τμ = (τμj), be the vector of activity patterns, or "prototypes," generated in the preceding layer. μ = 1,, PL-1 indexes activity patterns and j = O,, NL-1 indexes nodes in the previous layer (j =O is the index of a bias node and j = 1 is the index of the master unit in the preceding layer). A number Vμ,of different input patterns may be represented by the same prototype τ μ, μ Vμ = po. Let μ o be the index of one of the patterns for which the master unit of the preceding layer makes an error,that is τ μo1=sμo, where Sμo = ±1 is the desired target.

The set of weights w1 = 1 and wj≠1=λsμoτ jμo,1/NL-1 < λ < 1/(NL-1-2) ensures that this master unit will make at least one less error than the master unit in the preceding layer. Let λ=1/(NL-1-1).When pattern μo is again presented, the unit output will be

(12.4)

so the pattern μo is stabilized. When another pattern μ is presented for whichτμ1 = sμ the unit output will be

(12.5)

If μ μ o then the sum is less than or equal NL-1-2 so if λ < 1/(NL-1-2) then mμ= sgn(τ μ1 and the classification of patternμ is preserved. Thus, with this set of weights, the current layer will also stabilize prototype μ o in addition to all the prototypes μ stabilized in layer L - 1,

If the unit is initialized with this set of weights and trained by the pocket algorithm [133] then the final set of weights will be at least as good. If training converges to zero error, then the new unit is the desired output. This happens when the targets are linearly separable in terms of the preceding layer activities. Otherwise ancillary units must be created.

Creating Ancillary Units If the master unit still makes errors, then ancillary units must be created so that the layer generates a unique pattern of activity for all input patterns with different targets. Assume the preceding layer generates p = pL-1 distinct prototypes τμ = τ μi where i = O,,N and N=NL-1.The p prototypes are a faithful representation by construction. Each τμ is the prototype of one faithful class of the (L - 1)th layer. The current layer must produce a mapping from these p patterns.

Suppose 1 + N' units have already been created and they produce p' distinct representations. In general, p' < p. If the p' patterns are not a faithful representation, then at least one activity pattern on this layer doesn't have a unique target. One of the unfaithful classes is selected and the next unit is trained to produce the mapping τμ Sμ for patterns μ belonging only to the unfaithful class. In the best case, the mapping will be learned perfectly and the unfaithful class will be broken into two faithful classes. Often, however, the mapping will not be linearly separable. In such cases it is possible to break the unfaithful class into two classesone faithful and one unfaithful. In the worst case, the faithful class may consist of just one prototype.

This is repeated until the layer generates a faithful representation. In practice, if more than one unfaithful class exists, the smallest is selected first. If the new unit is able to separate this class successfully, then the next largest unfaithful class is also attempted with the same unit. As a result, each new unit breaks at least one class into two classes and at most p units are needed to create a faithful representation.

Other Notes

  • For multiple output problems, a master unit can be created in each layer for each of the outputs.

  • Results of simulations are described by Mézard and Nadal [265] for N -bit parity N 10 and random Boolean functions N 8. Random Boolean functions with N = 8 required an average of 7 hidden-layers and about 55 hidden neurons. In comparison, a single-hiddenlayer AND-OR network would require on the order of 28 hidden units to compute random functions. A typical function would require about 128 units.

  • Simulations show that, in general, the number of units per layer decreases with each successive layer.