Books24x7 Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks

12.3 The Upstart Algorithm

Frean [128] describes the upstart algorithm for learning binary mappings with layered networks of linear threshold units. It will converge to zero error for any Boolean mapping, including problems that require hidden units. The networks that result are often smaller than those of the tiling algorithm (see later discussion).

Figure 12.4: The upstart algorithm [128] starts with a single output unit Z. If errors remain after training, then daughter units X and Y are inserted to correct it. Ideally, X corrects Z when it is wrongly ON and Y corrects Z when it is wrongly OFF. If either X or Y cannot correct all the errors assigned to them, additional subunits are introduced to correct their errors and so on. The result is a tree structure for which there is an equivalent single-hidden-layer network.

The idea is that a unit Z can make two types of errors: "wrongly ON" and "wrongly OFF." A wrongly ON error can be corrected by adding a negative connection from a unit X, which is ON only when Z is wrongly ON. Likewise, a wrongly OFF error can be corrected by adding a strong positive connection from a unit Y which is ON only when Z is wrongly OFF.

The algorithm starts with a single output unit Z with weights chosen to separate as many of the training points as possible. If errors remain, daughter units X and Y are created to correct Z when it is wrong. Ideally, X corrects all the wrongly ON errors and Y corrects all the wrongly OFF errors. If either cannot correct all the errors assigned to them, additional subunits are introduced to correct their errors and so forth. Each new unit can always correct at least one error so the number of errors decreases at each step and the process eventually terminates when all patterns are classified correctly. The result is a tree structure (figure 12.4) for which there is an equivalent single-hidden-layer network.

The Algorithm The weights from the inputs to the output Z are trained to minimize the error and then frozen. Perceptron learning [325], [284] can be used, but will not converge to a stable solution if the patterns are not linearly separable. The "pocket" algorithm [133] can also be used. The following steps are then repeated recursively, first for Z, then for each of the daughter units X and Y, then for their daughters, and so on.

If Z makes any wrongly ON mistakes, create a new unit X. The targets for X are {o^μ_z∧¬t^μ_z} whereo^μ_z and t^μ_z are the output and target for node Z on pattern μ. (The symbols∧ and ¬ indicate the logical AND and NOT operations.) That is, X is designed to turn ON when Z is wrongly ON. When Z is ON and its target value is OFF, the target for X is ON; otherwise the target for X is OFF. The patterns for which both Z and the target are OFF can be eliminated from X's training set although this is not necessary.
Similarly, if Z makes any wrongly OFF mistakes, create a new unit Y with targets {¬o^μ_z∧t^μ_z}The patterns for which Z and the target are both ON can be eliminated from Y's training set.
Connect the outputs of X and Y to Z. The weight from X is large-negative and the weight from Y is large-positive. The size of the X(Y) weight needs to exceed the sum of Z's positive (negative) input weights. These weights can be set explicitly, or by an appropriate training procedure.
Go to 1 and repeat recursively. That is, correct the errors of X and Y by generating two daughters for each.

The algorithm builds a binary tree of units from the output down to the inputs. The daughter nodes X and Y have an easier problem to solve than does Z. Each new unit can separate at least one of the incorrect patterns so they will always make fewer errors than their parent and will reduce the number of errors made by the parent if connected to it by appropriate weights. Daughter units are created only if the parent makes errors. The number of errors decreases with every branching, so at some stage none of the daughters will make any errors. This means their parents will not either and so on up to the top of the tree. As noted, some of the patterns for X and Y can be eliminated. This is not necessary, but speeds up training by a factor of about two [128].

Frean [128] shows that the resulting tree structure can be converted to an equivalent single-hidden-layer structure if the unnecessary patterns are not eliminated. The original Z unit and all the hidden units are placed in a single layer with the connections between them eliminated and a new output unit is created. The weights from the hidden units to the new output unit can be found by the perceptron algorithm, or by inspection of the tree structure.

Chapter 12 - Constructive Methods

12.3 The Upstart Algorithm