04 April 2010

Behe and probability: one more try

Almost two years ago, I reviewed Michael Behe's latest book, The Edge of Evolution, here on the blog. I was unimpressed, to say the least, and remain of the opinion that Behe should not be considered a serious scientific thinker given his failure in that ludicrous book.

Since then, my posts have been referenced occasionally in the blogosphere, typically by people trying to explain Behe's surprisingly crude mishandling of probability in the context of genetics. One particular point has been singled out as a mistake on my part, and some ID defenders want that mistake to rescue Behe's argument. Let me describe the so-called mistake, then explain why I'm right.

Behe's misuse of probability is an error on a basic point in evolutionary genetics, and I devoted a separate post to the topic. Here is a key excerpt from that post:
Here is one of many places in EoE where Behe makes a very basic mistake in the presentation of probabilities:
Recall that the odds against getting two necessary, independent mutations are the multiplied odds for getting each mutation individually. What if a problem arose during the course of life on earth that required a cluster of mutations that was twice as complex as a CCC? (Let's call it a double CCC.) For example, what if instead of the several amino acid changes needed for chloroquine resistance in malaria, twice that number were needed? In that case the odds would be that for a CCC times itself. Instead of 1020 cells to solve the evolutionary problem, we would need 1040 cells. (pp. 62-63)
What Behe is saying is this: if event A has probability a, and event B has probability b, then the probability of both events happening is a times b. But that is only true if the events must happen simultaneously. That's the only time you multiply two probabilities.
Now note that Behe refers to the probability of "two necessary, independent mutations." What does he mean by that? By "independent" he means that each mutation event is separate from the other, such that one does not facilitate or discourage the other. That's a basic concept in probability, and it's the cause for the confusion I'm addressing in this post. By "necessary" he means "required for the effect in question," I think, although he never makes that clear. I doubt it matters at all, especially since positive selection for either of the individual mutations makes his argument even weaker than it already is.

Let's think about what Behe is claiming. He's saying that the probability of any two mutations appearing in the same genome is the product of the probability of each individual mutation, because the two events are independent. This sounds right at first, and both Behe and some pro-ID commentators elsewhere seem to think that it's obvious. My criticism of the claim is that Behe is in fact assuming that the mutations must occur simultaneously. Behe's defenders think I'm mistaken, because: 1) Behe didn't say "simultaneous," he said "independent," and 2) the correct way to calculate the probability of two independent events is to multiply the individual probabilities. Moreover, Behe's defenders claim that sequential events, as long as they are independent, are probabilistically the same as simultaneous events.

Sorry, folks. I'm right. Here's why.

Behe multiplied two probabilities: the probability that a particular mutation would happen in a single individual, and the probability that a second mutation would happen in a single individual. That means he calculated the probability that two particular mutations would occur in the same individual. This, biologically speaking, is the probability that the two mutations happen simultaneously. (In case it's not obvious, by "simultaneously" I mean "in the same cell.") What else could it be?

Please stop and think about this. It's really important. Behe's calculations refer to the probability of two mutations occurring in the same organism.

"Ah, but they could be sequential," Behe's defenders claim. Oh? How? Here's how it could go: the first mutation occurs in one individual, and the second mutation occurs in that individual's offspring. Well, sure, but now the second probability that Behe used in his calculation is wrong, unless the first individual has exactly one descendant bearing the first mutation. Why do I say that? Because he used the probability of a mutation occurring in a single individual. What if that individual has 10 offspring? Or more to the point, what if that mutation becomes fixed in the population for whatever reason, and now it occurs in the vast majority of the individuals? Do you see how fantastically dumb it is to do the calculation Behe's way? Excluding the bizarre scenario of a single descendant being available for the second mutation, we see that Behe is calculating the probability of simultaneous mutations, whether you or he recognize it or not.

Now, at this point I struggle to find words to describe Behe's error. It's understandable for, say, a computer scientist or lawyer with a layperson's understanding of genetics to make the mistake. And I concede that a first-pass consideration of the problem suggests a simple multiplication due to statistical independence. But we're talking about biology here. And the mistake is so utterly basic that it defies charitable description. If you're an admirer of Mike Behe, you need to be warned: you cannot trust him to tell you the truth about basic population genetics. Whether he's ignorant or dishonest, I don't know and I don't care. He can't be trusted.

Finally, maybe you'd like to know how a knowledgeable biologist (or moderately educated layperson, for that matter) does think about such probabilities. After all, Behe and his defenders are correct in asserting that the odds against the occurrence of two particular mutations occurring at the same time are beyond vast.

To start with, the biologist would consider the population size. So, if the odds are 1 in 100 million and there are a million reproducing individuals, then the odds of occurrence of a particular mutation in one generation are 1 in 100. And the biologist would consider the pre-existing genetic variation, wondering whether the mutation of interest might already exist in the population. That would cause her to wonder about whether a particular mutation has a fitness cost or benefit, or if it's a neutral mutation. She would further wonder whether the mutation is likely to be maintained in the population, regardless of its fitness, based perhaps on its linkage to other genetic features that increase fitness. Such information would help her decide whether the two mutations are likely to proceed in a particular order. Finally, she would want to know if the two mutations are physically close together in the genome; if they're separated to any reasonable extent, then they can be brought together by recombination.

That's pretty basic stuff. I explained it all before. Even biochemists should understand it, especially if they are inclined to write books about it. (To the well-read biochemists: I mean no disrespect, really.)

So, while I think I understand the confusion of Behe's defenders, I urge them to think about the probabilities within the context of biology. Behe has no excuse, and neither will they if they persist in granting any credibility at all to his calculations regarding double mutations.