Skip to Book Content
Book cover image

Chapter 11 - Genetic Algorithms and Neural Networks

Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks
Russell D. Reed and Robert J. Marks II
Copyright © 1999 Massachusetts Institute of Technology
 

Chapter 11: Genetic Algorithms and Neural Networks

Overview

One of the basic tasks in network design is to choose an architecture and weights appropriate for the given problem. The genetic algorithm (GA) [173], [141] is a general optimization method that has been applied to many problems including neural network training. It is appropriate for neural network training. It is appropriate for neural because it scales well to large nonlinear problems with multiple local minima.

As the name implies, the genetic algorithm is based on an analogy to natural evolutionary mechanisms. Many variations have been investigated, but the basic idea is competition between alternative solutions and "survival of the fittest." In this case, fitness is measured by a predefined objective function. Individuals in a large population have varying traits that affect their reproductive success in the given environment. Successful individuals live long enough to mate and pass their traits to their offspring. Offspring inherit traits from successful parents so they also have a good chance of being successful. Over many generations, the population adapts to its environment; disadvantageous traits become rare and the average fitness tends to increase over time.

One of the main advantages of the algorithm is that it requires very little problem-specific information. To apply the method to a specific problem, all that is needed is a fitness function that evaluates individual solutions and returns a rating of their quality, or "fitness." The algorithm itself operates on bit strings containing the "genetic code," that is, the parameters specifying a particular solution. Aside from the problem-specific evaluation function, all problems look the same to the algorithm, differing only in the length of the bit string and the number of units.

Because the algorithm needs so little problem-specific information, it is useful for complex problems that are difficult to analyze correctly. In particular, it does not need gradient information and so can be used on discontinuous functions and functions that are described empirically rather than analytically. It can also be used for temporal learning problems in which evaluation comes at the end of a long sequence of actions with no intermediate target values. It is not a simple hill-climbing method so it is not particularly bothered by local maxima. It will also tolerate a certain amount of noise in the evaluation function.

The algorithm has some of the flavor of simulated annealing in that many alternative solutions are examined and the search contains an element of randomness that helps prevent convergence to local maxima. It differs in that many candidate solutions are maintained rather than just one and elements of the better solutions are combined to generate new candidates. Like simulated annealing, it is a general optimization method that has applications beyond neural networks.

The main disadvantage of the method is the amount of computation needed to evaluate and store a large population of candidate solutions and converge to an optimum. Other techniques often converge faster when they can be used.