Darwin Bicentenary Part 11: William Demski’s Active Information
The information, I, of an outcome is given by the equation:
I = - log(p1) where p1 is the probability of the outcome. Equation 1
If the probability p1 is a result of a product of probabilities this expression becomes a sum of logs. For example, if we have a sequence of bits where the probability associated with each bit are independent, then the total information contained in the sequence is found by adding together the information calculated for each bit. Hence, the definition of information conveniently transforms the awkward exponentials of combinatorial systems into a linear additive property.
It is clear from the above equation that smaller values of p1 imply greater values of I. This behavior of I with increasing p1 is intuitively agreeable because the occurrence of an unexpected outcome (that is an outcome with a low p1) increases the information of the observer simply because the occurrence means that he learns something he didn’t know. But in the case of an expected outcome (that is an outcome with a high p1) I is low, because should it occur it does not add much to what the observer already knows.
Now, imagine that that an observer learns something about the dynamics of the system before the outcome that is initially the subject of p1 occurs. The observer's increased knowledge about the system may have the effect of increasing p1 to a higher value of p2. Substituting p2 in place of p1 in equation 1 we see that this increase in information about the system will have the effect of lowering the information value of the outcome that was originally the subject of p1. This is intuitively agreeable because the outcome is now less informative about the system because the observer already has to hand information about the system, and thus its subsequent behavior adds less to what he already knows.
In one sense information, as defined by the equation 1, is conserved because the extra information gained through knowing something about the dynamics of the system is compensated for by a corresponding decrease in the informative value associated with actual outcomes. In the case just considered where an outcome initially has a probability of p1 and then increases to p2 as a result of learning about the system, the information we have gained through our new knowledge will be equal to –log(p1) + log(p2). This increase in information is referred to in Dembski’s paper as the active information.
To exemplify the foregoing abstractions Dembski first considers the case of a well shuffled pack of cards lying face down on a table. From this pack cards are successively removed by an observer from the top and turned face up. Dynamically this system has available to it all possible ways of arranging the deck of cards in the pack. For the uninitiated observer coming to the pack, the probabilities of finding particular cards therefore have maximal displacement away from certainty. Hence, actual outcomes have maximum informative value about the arrangement of the pack. However, for an observer who shuffled the pack face up and has watched the shuffling taking place, then the respective probabilities are likely to be closer to certainty because this observer may have become aware of the arrangement of the cards in the pack; he will already know something about the state of the pack. Hence, for such an observer the action of removing a card and turning it face up is going to be less informative because the information he already possesses means that he has less to learn from the action of removing a card from the top of the pack.
The second observer clearly has more active information than the first observer. But notice that the active information in this case has nothing whatsoever to do with the dynamics of the system of cards itself; the differences in information between the observers is down to differences in their knowledge and not about any difference or changes in the system. In this case the active information is extrinsic in that it has nothing to with the dynamics of the system but rather with the observer’s knowledge of the system.
In a further exemplification Dembski stipulates that another pack of cards is such that cards next to one another in the pack “differ at most only by two. Thus, one would know that cards next to an uncovered jack of diamonds are only 9s, 10s, Jacks, Queens, and Kings”. In this second example the pack is clearly dynamically different to the first pack because a mathematical constraint has been applied to it eliminating a large range of possible arrangements of the pack; it is far more ordered than the pack of shuffled cards. This intrinsic difference in the system means that for an observer who knows about this difference then, in comparison with the well shuffled pack, the probabilities will be displaced away from maximum uncertainty toward certainty. The observer who knows about the dynamical constraint therefore has greater information, but the information contained in outcomes is correspondingly reduced because he has less to learn from them.
In this second system it is important note that the “loss” of information from the system outcomes in favour of observer knowledge has come about not just because the observer has learnt something about the system but also because he knows that the systems is in a more ordered state than that of a well shuffled pack of cards. Let me repeat that: this shift in information is not just about an observer gaining some information, but it is also about a system with an intrinsically more ordered dynamic that makes it more knowable. Hence in this case the active information reflects not just the observer’s information but also something about the order of the system itself.
These two systems (the shuffled pack of cards and the pack subject to a constraint in its arrangement) show us that there are two sorts of active information; 1. extrinsic active information that comes about as a result of an observer learning about a system that is otherwise of maximum disorder, and 2. intrinsic active information that comes about also as result of an observer learning something, but in this case learning about the constraints of a more ordered system. Unfortunately Dembski doesn’t bring this distinction out in his paper and I have a feeling that I will be commenting on this omission in future posts.
Dembski goes onto to apply the active information concept to other systems: in particular the “partitioned” search of Richard Dawkins infamous and trivial “Me Thinks It is Like a Weasal” program. This program randomly shuffles letters in a sentence: when by chance one the required letters makes an appearance in the right position it is locked into place. The object of the exercise was to show that by this means otherwise very improbable arrangements can be found relatively quickly. The program has been criticized (rightly so) for “front loading” the required configuration into the program from the outset and thus being completely vacuous.
The “partitioning” dynamics of Dawkins program is basically the idea of having a ratchet built into the search: if the system happens to move in the direction of certain classes of outcome it has a high probability of not moving back; hence there is a progressive drift toward an “end result”; if one of the required letters pops up by chance, it is locked into place and this event is prevented from being undone. Clearly as, Dembski shows quantitatively, the ratchet probability of getting the required configuration of letters is much greater than the probability of “Me thinks it is like a weasel…” appearing spontaneously. As a rule spontaneous probabilities associated with large configurations are negligibly different from zero.
Leaving behind Dawkins “Me thinks” program Dembski goes on to consider the far more subtle Avida program that searches for arrangements of NAND gate components in order to produce the XNOR logical function, a function which can only be constructed from a complex arrangement of NAND components. The XNOR logical function can be broken down into configurations of components and sub components and sub-sub components. That is, the XNOR function is a hierarchical structure of sub components that themselves are composed of components and so on down to the NAND gate which is effectively the given “atomic” element out of which the whole XNOR gate is constructed.
Avida works by ‘rewarding’ the random formation of components on the hierarchy with a score should perchance they arise as a result of random shufflings. This scoring policy has the effect of consolidating the appearance of the required hierarchy components, increasing their probability of sticking around, thus giving the whole thing a directional drift up the structural heirarchy, by way of a logical ratchet. As Demsbki points out, and as the Avida creators admit, the whole thing wouldn’t work without this ratchet. In Dembski’s words the “Hard wired stair step active information is therefore essential in order for Avida to produce results”.
In both the “Me thinks” and the Avida programs the “active information” is intrinsic to their dynamics,. The dynamics of these systems have applied constraints that make their behavior ordered. The logical ratchet built into these systems means that spontaneous probabilities for certain classes of configuration are no longer valid. Dembski is right in pointing out that this active information must somehow be built into the system. Without this active information, as Dembski says, there is “..no metric to determine nearness, the search landscape for such searches can be binary – either success or failure. There are no sloped hills to climb”.
The Avida program is a kind of basic prototype evolutionary model that will serve as a specific example introducing general concepts that I will employ to draw conclusions in my next posts.
STOP PRESS 18/3/2009
William Dembski is revisiting the "Weasel" program here.