Evolutionary Computation: A Perpetual Motion
Machine for Design Information?
By Robert J. Marks II
Evolutionary computing, modeled after Darwinian
evolution, is a useful engineering tool. It can create
unexpected, insightful and clever results. Consequently,
an image is often painted of evolutionary computation as
a free source of intelligence and information. The
design of a program to perform evolutionary computation,
however, requires infusion of implicit information
concerning the goal of the program. This information
fine tunes the performance of the evolutionary search
and is mandatory for a successful search.
Computational Intelligence
Fifty years ago, Ross W. Ashby asked "Can a Chess
Machine Outplay Its Maker?" (BRITISH JOURNAL FOR THE
PHILOSOPHY OF SCIENCE, 1952). We know today that it can.
A more relevant question is "can a computer program
generate more information than it is given?"
Evolutionary computing, on the surface, seems to be a
candidate paradigm. As with all "something for nothing"
claims, this is not the case.
Pioneers of evolutionary computing in the 1960s
proposed that computer emulation of evolution overcame
the difficulty of demonstrating Darwinian evolution in
the biology lab. Proof of Darwinian evolution "has from
the beginning been handicapped by the fact that no
proper test has been found to decide whether such
evolution was possible and how it would develop under
controlled conditions." (N.A. BARRICELLI, ACTA
BIOTHEORETICA, 1962.) "In general, it is usually
impossible or impracticable to test hypotheses about
evolution in a particular species by the deliberate
setting up of controlled experiments with living
organisms of that species. We can attempt to partially
to get around this difficulty by constructing models
representing the evolutionary system we wish to study,
and use these to test at least the theoretical validity
of our ideas." (J.L. CROSBY, "COMPUTERS IN THE STUDY OF
EVOLUTION", SCI. PROG. OXF., 1967.)
Engineering Design
Evolutionary computation is used today largely in
engineering design and problem solving. Design begins
with establishing a goal or design objective. From a
favorite list of paradigms, a viable model is chosen.
Design consists of identification of parameter values
within the chosen model. Design has been defined as "the
judicious manipulation of mid-range values" within the
confines of a model (RANDALL JEAN, 2005). Search
algorithms do this with the aid of a computer.
Consider the simple example of designing a recipe for
boiling an egg. Our questions include the following.
1. Do we place the eggs in cold water and bring
to a boil, or place the eggs in boiling water? (two
choices) 2. How long do we boil the
eggs? 3. Do we remove the pan from the heat and
let the water cool, place the eggs on a dish to cool, or
immediately place the eggs in cold water? (three
choices)
At step (1) there are two choices, and at step (3),
three choices. For the duration of boiling in step (2),
let's assume there are choices in fifteen second
intervals from 30 seconds to three minutes: 0:30, 0:45,
1:00, …, 3:00. That's eleven choices of time intervals.
The total number of possible recipes is therefore
2 11 3 = 66. We have defined a
search space, but have not yet defined what our
design criterion is, namely, what is the optimal recipe?
Suppose I taste the egg and rate it from one to 100 in
terms of taste. This measure, assigned to each of the 66
recipes, is the fitness of the recipe. Anything
above a 90 will meet the design criterion. The design
goal is identification of a recipe that meets the design
criterion.
Assume you have never cooked and have absolutely no
idea which recipe is best. We apply Bernoulli's
principle of insufficient reason which states that,
in the absence of any prior knowledge, we must assume
that all the recipes have an equal probability of being
best. One recipe must be assumed as good as another. To
find the optimal recipe, all 66 would need to be tried.
One approach to find a decent recipe is trial and error.
If trial and error could be done on computer, the tests
could be done quickly. Suppose we can emulate the
boiling of the egg and the fitness of the result on a
computer. Then we could determine the optimal recipe
quickly by evaluating all 66 recipes. Looking at all
possible solutions is called exhaustive search.
Unfortunately, search problems scale poorly and this is
not possible for even reasonably sized problems. If we
have, instead of three, 100 variables and each variable
has ten possible outcomes, the number of elements in the
search space becomes 10^100 (i.e., 10 multiplied by
itself 100 times), which is a larger number than there
are atoms in the universe. Exhaustive search is not
possible in such cases.
We can remove Bernoulli's principle of insufficient
reason from the search problem only through infusion of
information into the search process. The information can
be explicit. For the egg example, knowledge of chemistry
tells us that placing the boiled eggs in cold water
retards the chemical reaction which will ultimately make
the eggs smell like sulfur. Assuming a sulfur smell will
detract from the fitness, we can eliminate one of the
search variables and reduce the search to 44 recipes.
Alternately, the information can be implicit. You may
know, for example, that of ten famous egg boilers, two
place the raw eggs in cold water and eight in boiling
water. This information can guide your search of recipes
initially to those with a greater chance of meeting the
design criterion.
The Need for Implicit
Information
Purely theoretical considerations suggest that, given
a fast enough computer and sufficient time, a space can
be successfully searched to find the optimal solution.
But this is the myth of "monkeys at a typewriter" The
story, theoretically plausible, says that if enough
monkeys pound out random letters long enough, all of the
great texts in history will eventually result. If enough
monkeys pound out random letters for a long enough time,
all of the great texts, such as Moby Dick
(1,170,200 characters), Grimms Tales (1,435,800
characters) and the King James Bible (3,556,480 letters
not including spaces) will eventually result. The
finiteness of the closed universe, however, prohibits
this.
Looking for a single solution in a large unstructured
search space is dubbed a "needle in a haystack" problem.
In moderately large cases, it simply can't be done.
Choosing randomly from a 26 letter alphabet, the chances
of writing the KJB is 26^3,556,480 = 3.8
10^5,032,323. This is a number so large it defies
description. If all the matter in the universe (10^58
kg) were converted to energy (E = mc^2) ten billion
times per second since the big bang (20 billion years)
and all this energy were used to generate text at the
minimum irreversible bit level (i.e., ln(2) kT =
2.9 10^-21 joules per bit), then about 10^88
messages as long as the KJB could be generated. If we
multiply this by the number of atoms in the universe
( 10^78 atoms), we have 10^166 messages, still
dwarfed by the required 10^5,032,323.
Let's try a more modest problem: the
phrase IN*THE*BEGINNING*GOD*CREATED
(We could complete the phrase with "the heaven and
the earth," but the numbers grow too large.) Here there
are 27 possible characters (26 letters and a space) and
a string has length 28 characters. The odds that this is
the phrase written by the monkeys is 27^28 =
1.20 10^40 to one. This number isn't so big
that we can't wrap our minds around it. The chance of a
monkey typing 28 letters and typing these specific
words is the same as choosing a single atom from
over one trillion short tons of iron. [Using Avogadro's
number, we compute 2728 atoms ( 1 mole per
6.022 10^23 atoms ) (55.845
grams per mole) (1 short ton per 907,185
grams) = 1.22 10^12 short tons.]
Quantum computers would help by reduction of the
equivalent search size by a square root (HO et al., IEEE
TRANS. AUT. CONT., MAY 2003, P.783), but the problem
remains beyond the resources of the closed universe.
Information must be infused into the search process.
Searching an unstructured space without imposition of
structure on the space is computationally prohibitive
for even small problems. Early structure requirements
included gradient availability, the dependence of the
optimal solution on the second derivative of the
fitness, convexity, and unimodal fitness functions.
(BREMMERMAN et al. "GLOBAL PROPERTIES OF EVOLUTION
PROCESS, 1966; NASH, "SUMT (REVISITED)", OPERATIONS
RESEARCH, 1998.) More recently, the need for implicit
information imposed by design heuristics has been
emphasized by the no free lunch theorems
(WOLPERT, ET AL., IEEE TRANS. EVOLUTIONARY COMPUTATION,
1997.) which have shown, "unless you can make prior
assumptions about the ... [problems] you are working on,
then no search strategy, no matter how sophisticated,
can be expected to perform better than any other" (HO
op. cit.) No free lunch theorems "indicate the
importance of incorporating problem-specific knowledge
into the behavior of the [optimization or search]
algorithm." (WOLPERT, op. cit.)
Sources of Information
A common structure in evolutionary search is an
imposed fitness function, wherein the merit of a design
for each set of parameters is assigned a number. The
bigger the fitness, the better. The optimization problem
is to maximize the fitness function. Penalty
functions are similar, but are to be minimized. In
the early days of computing, an engineer colleague of
mine described his role in conducting searches as a
penalty function artist. He took pride in using
his domain expertise to craft penalty functions. The
structured search model developed by the design engineer
must be, in some sense, a good model. Exploring
through the parameters of a poor model, no matter how
thoroughly, will not result in a viable design. In a
contrary manner, a cleverly conceived model can result
in better solutions in faster time.
Here is a simple example of structure in a search.
Instead of choosing each letter at random, let's choose
more commonly used letters more frequently. If we choose
characters at random then each character has a chance of
1/27 = 3.7 percent of being chosen. In English, the
letter "e" is used about 10 percent of the time. A blank
occurs 20 percent of the time. If we choose letters in
accordance to their frequency of occurrence, then the
odds of choosing IN*THE*BEGINNING*GOD*CREATED nose dives
to a five one millionths (0.0005%) of its
original size – from 1.2 10^40 to
5.35 10^34. This is still a large number:
the trillion tons of iron has been reduced to 5 and a
half million tons. If we use the frequency of digraphs,
we can reduce it further. (Digraphs are letter pairs
that occur frequently; for instance, the digraph "e_"
where "_" is a space is the most common pair of
characters in English.) Trigraph frequency will reduce
the odds more.
The Fine Tuning of the Search
Space As more implicit structure is
imposed on the search space, the easier the search
becomes. Even more interesting is that, for moderately
long messages, if the target message does not match the
search space structuring, the message won't be found.
(PAPOULIS, PROBABILITY, RANDOM VARIABLES AND STOCHASTIC
PROCESSES, 1991.)
The search space fine-tuning
theorem. Let a search space be structured
with a disposition to generate a type of message. If a
target does not match this predisposition, it will be
found with probability zero.
This theorem, long known in information theory in a
different context, is a direct consequence of the
law of large numbers. If, for example, we structure
the search space to give an "e" 10 percent of the time,
then the number of "e's" in a message of length 10,000
will be very close to 1000. The curious book
Gadsby, containing no "e's", would be found
with a vanishingly small probability.
Structuring the search space also reduces its
effective size. The search space consists of all
possible sequences. For a structured space, let's dub
the set of all probable sequences that are predisposed
to the structure the search space subset. For
frequency of occurrence structuring of the alphabet, all
of the great novels we seek, except for Gadsby,
lie in or close to this subset.
The more structure is added to a search space, the
more added information there is. Trigraphs, for example,
add more information than digraphs.
Diminishing subset
theorem. As the length of a sequence
increases and the added structure information increases,
the percent of elements in the search subset goes to
zero.
Structuring of a search space therefore not only
confines solutions to obey the structure of the space;
the number of solutions becomes a diminishingly small
percentage of the search space as the message length
increases. (IBID.)
Final thoughts.
Search spaces require structuring for search
algorithms to be viable. This includes evolutionary
search for a targeted design goal. The added structure
information needs to be implicitly infused into the
search space and is used to guide the process to a
desired result. The target can be specific, as is the
case with a precisely identified phrase; or it can be
general, such as meaningful phrases that will pass, say,
a spelling and grammar check. In any case, there is yet
no perpetual motion machine for the design of
information arising from evolutionary computation.
BIOSKETCH: Robert J. Marks II is
Distinguished Professor of Engineering and Graduate
Director in the Department of Engineering at Baylor
University. He is Fellow of both IEEE and The Optical
Society of America. Professor Marks has received the
IEEE Centennial Medal. He has served as Distinguished
Lecturer for the IEEE Neural Networks Society and the
IEEE Computational Intelligence Society. Dr. Marks
served as the first President of the IEEE Neural
Networks Council (now a Society). He has over 300
publications. Some of them are very good. Eight of Dr.
Marks' papers have been reproduced in volumes of
collections of outstanding papers. He has three US
patents in the field of artificial neural networks and
signal
processing.
|