Books24x7 Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks

11.1 The Basic Algorithm

The basic operations are (1) selection based on fitness, (2) recombination of genetic material by crossover, and (3) mutation. The algorithm operates on a population of many units. Each unit has a bit string, its "genetic code," which encodes its solution to the given problem. The user supplies a problem-specific function that decodes the bit string, evaluates the solution, and returns a value which is translated to a fitness score. The fitness score determines which units are selected for mating. High scoring solutions tend to be selected more often than low scoring solutions and thus pass their characteristics to the next generation at a higher rate.

The basic algorithm starts with an initial population of N units with random parameters encoded in a binary bit string. Larger population sizes generally increase the chance of finding a good solution but, of course, require more processing time. The following steps are repeated until a solution is found or patience is exhausted.

Evaluation Evaluate each unit and assign it a non-negative score (higher=better). Normalize by dividing by the sum of all scores to obtain fitness scores f_i in the range 0–1. If any unit satisfies the goal criteria, discard the other units and stop.

Selection On N trials, select an individual i with probability f_i and copy it to the mating population. Units can be selected with probability f_i by assigning each unit a segment of the 0–1 interval proportional to its fitness and choosing a uniform random number; if the number falls in the interval assigned to the kth unit then select unit k. In the case of three units with fitness scores 0.1, 0.6, and 0.3, for example, the intervals would be 0–0.1, 0.1–0.7, and 0.7–1.0. Figure 11.1 illustrates this by analogy to a roulette wheel where each unit has a number of slots proportional to its fitness.

Figure 11.1: Selection. The genetic algorithm selects units for reproduction with probability proportional to their fitness. Units with higher fitness (corresponding to more slots on the wheel) are more likely to be chosen than units with lower fitness, but all units have some chance of being chosen.

Figure 11.2: Crossover. Crossover mixes genetic codes inherited from two parents by crossing the bit strings at one or two random points. The bit strings encode characteristics of the parents so the offspring receives traits of both parents, but is not identical to either. The crossing point is random, so the mix of characteristics transferred varies with each mating.

Because of the element of chance, the number of times a unit reproduces will not be exactly proportional to its fitness, but, on average, if unit i has twice the fitness of unit j, then it will usually have about twice the offspring. Units with very low fitness ratings will rarely reproduce and face extinction.

Crossover Divide the mating population into pairs and mix their genetic information by crossing the bit strings at one or two random points. Figure 11.2 illustrates the operation. If units A and B have parameter strings 110100111001 and 011011100101, for example, then they would produce offspring 011000111001 and 110111100101 if the crossing point is after position 4. The probability that crossover will occur is set by a parameter p_c. For p_c < 1, there is some chance that the parents simply survive in the next generation unaltered by crossover. This helps to preserve good solutions since some copies are likely to survive unchanged. Typical values are p_c = 0.6 to 0.9.

Mutation For each unit in the new set, flip each bit with some small probability; for example, p_m ≤ 0.001. The number of mutations should be small to prevent deterioration of the algorithm into random search. The main purpose of mutation is to maintain population diversity. In general, p_m should be chosen so that mutations occur in only a small percentage of the population—in only one or two units for moderately sized populations.

Replacement Copy the newly created units to the working population. In some variations, new units replace their parents. In others, they replace the least fit units.

Many variations of the algorithm have been proposed. In the basic algorithm, all units have a chance to reproduce and large portions of parameter strings are exchanged during reproduction so it is possible for good solutions to be lost. One remedy is to allow only the most successful fraction of the population to mate, with their offspring replacing the less successful part of the population. Since the offspring do not replace their parents, this helps to preserve good solutions.

Other variations extend the biological analogy further by incorporating features such as paired chromosomes, dominance, inversion, and niche specialization. Some versions are Lamarckian, allowing adaptations made in the lifetime of a parent to be passed on to the offspring. Some vary the number of units that reproduce at each cycle. Some allow the population size to fluctuate and some maintain several subpopulations with only limited mixing. Goldberg [141] reviews many of these cases.

11.1.1 Effects of Crossover

By some accounts, crossover is responsible for most of the adaptive power of the algorithm. Crossover selects parameters from two good solutions and mixes them to create new combinations. The parent solutions were successful enough to be selected for reproduction so they presumably contain good parameter sets. Ideally, the offspring will inherit the best parameters from both parents and produce a new combination which is better than either.

Crossover is different from random search in that with crossover the offspring are in some sense intermediate between the two parents; they inherit some attributes from parent A and some from parent B but are identical to neither. This tends to confine the search to new combinations of parameters that have already proven useful. Random search, in contrast, is unguided and might create new units anywhere in the parameter space.

A schemata theory [141] has been developed to study how parameter strings evolve. A particular template of 1s, 0s and *s (don't cares) in a bit string, for example, 011 * * * *01 * **, is called a schema (plural: schemata). Each bit in the string is simultaneously a component of many different schemata and each string simultaneously contains many overlapping schemata. Likewise, a single schema may be present in many strings in the population. The core idea is that a string containing a bit combination that is strongly correlated with good solutions is likely to be reproduced in the next generation. The defining length of a schema is the distance between its most separated defining bits. The distance between the leading 0 and the final 1 of 011 * * * *01 * **, for example, is 8 bits. Schemata with long defining lengths contain widely separated significant bits and are more likely to be broken during crossover and thus less likely to survive than shorter schemata. Depending on how parameters are encoded in the bit string, this tends to make the algorithm favor low order, less complex, solutions over high order ones—usually a desirable feature for a learning algorithm.

11.1.2 Effects of Mutation

Mutation plays a rather small part in the standard algorithm. If the mutation rate is too large, the algorithm tends to degenerate into an inefficient random search. When all the units are very similar however, as in the final stages of convergence, crossover creates few new solutions and mutation becomes more important. Because all defining bits of a schema must survive mutation for the schema to survive, schemata with fewer defining bits are more likely to survive mutation than those with many defining bits. If simple solutions have representations in terms of small numbers of parameters (few bits), then this favors simpler and presumably more robust solutions. Overall, the combination of fitness selection, crossover, and mutation favors schemata with above average fitness, short defining length, and low order.

11.1.3 Fitness Scaling

Because the user-supplied evaluation function can be chosen arbitrarily, it is useful to scale the raw scores to obtain normalized fitness scores. If all units receive raw scores in the range from 1000 to 1005, for example, the best solutions would have very little advantage over the worst and the search would be essentially random. This might occur in late stages of the algorithm when most units are clustered around a good solution. Similarly, in early generations most units may have low raw scores and a unit that makes a significant (but not decisive) improvement may get a much higher score, allowing it to dominate the next generation and cause premature loss of population diversity. This effect is more important when populations are small.

Scaling of the raw scores helps prevent these problems. A linear transformation is often used to map the raw scores f to fitness values f'

In choosing a and b it is desirable that f_avg→f'_avg so that one expects each average unit to produce one offspring. The number of offspring for the best unit is controlled by ensuring f'_max=C_multf_avg, where C_mult is the desired number of offspring for the best unit. For small populations (n = 50 to 100), values of C_mult = 1.2 to 2 are suggested [141]. If this scaling results in negative scores, set them to 0. Other methods of fitness scaling are discussed by Goldberg [141].

Chapter 11 - Genetic Algorithms and Neural Networks

11.1 The Basic Algorithm

11.1.1 Effects of Crossover

11.1.2 Effects of Mutation

11.1.3 Fitness Scaling