04 April 2010

Behe and probability: one more try

Almost two years ago, I reviewed Michael Behe's latest book, The Edge of Evolution, here on the blog. I was unimpressed, to say the least, and remain of the opinion that Behe should not be considered a serious scientific thinker given his failure in that ludicrous book.

Since then, my posts have been referenced occasionally in the blogosphere, typically by people trying to explain Behe's surprisingly crude mishandling of probability in the context of genetics. One particular point has been singled out as a mistake on my part, and some ID defenders want that mistake to rescue Behe's argument. Let me describe the so-called mistake, then explain why I'm right.

Behe's misuse of probability is an error on a basic point in evolutionary genetics, and I devoted a separate post to the topic. Here is a key excerpt from that post:
Here is one of many places in EoE where Behe makes a very basic mistake in the presentation of probabilities:
Recall that the odds against getting two necessary, independent mutations are the multiplied odds for getting each mutation individually. What if a problem arose during the course of life on earth that required a cluster of mutations that was twice as complex as a CCC? (Let's call it a double CCC.) For example, what if instead of the several amino acid changes needed for chloroquine resistance in malaria, twice that number were needed? In that case the odds would be that for a CCC times itself. Instead of 1020 cells to solve the evolutionary problem, we would need 1040 cells. (pp. 62-63)
What Behe is saying is this: if event A has probability a, and event B has probability b, then the probability of both events happening is a times b. But that is only true if the events must happen simultaneously. That's the only time you multiply two probabilities.
Now note that Behe refers to the probability of "two necessary, independent mutations." What does he mean by that? By "independent" he means that each mutation event is separate from the other, such that one does not facilitate or discourage the other. That's a basic concept in probability, and it's the cause for the confusion I'm addressing in this post. By "necessary" he means "required for the effect in question," I think, although he never makes that clear. I doubt it matters at all, especially since positive selection for either of the individual mutations makes his argument even weaker than it already is.

Let's think about what Behe is claiming. He's saying that the probability of any two mutations appearing in the same genome is the product of the probability of each individual mutation, because the two events are independent. This sounds right at first, and both Behe and some pro-ID commentators elsewhere seem to think that it's obvious. My criticism of the claim is that Behe is in fact assuming that the mutations must occur simultaneously. Behe's defenders think I'm mistaken, because: 1) Behe didn't say "simultaneous," he said "independent," and 2) the correct way to calculate the probability of two independent events is to multiply the individual probabilities. Moreover, Behe's defenders claim that sequential events, as long as they are independent, are probabilistically the same as simultaneous events.

Sorry, folks. I'm right. Here's why.

Behe multiplied two probabilities: the probability that a particular mutation would happen in a single individual, and the probability that a second mutation would happen in a single individual. That means he calculated the probability that two particular mutations would occur in the same individual. This, biologically speaking, is the probability that the two mutations happen simultaneously. (In case it's not obvious, by "simultaneously" I mean "in the same cell.") What else could it be?

Please stop and think about this. It's really important. Behe's calculations refer to the probability of two mutations occurring in the same organism.

"Ah, but they could be sequential," Behe's defenders claim. Oh? How? Here's how it could go: the first mutation occurs in one individual, and the second mutation occurs in that individual's offspring. Well, sure, but now the second probability that Behe used in his calculation is wrong, unless the first individual has exactly one descendant bearing the first mutation. Why do I say that? Because he used the probability of a mutation occurring in a single individual. What if that individual has 10 offspring? Or more to the point, what if that mutation becomes fixed in the population for whatever reason, and now it occurs in the vast majority of the individuals? Do you see how fantastically dumb it is to do the calculation Behe's way? Excluding the bizarre scenario of a single descendant being available for the second mutation, we see that Behe is calculating the probability of simultaneous mutations, whether you or he recognize it or not.

Now, at this point I struggle to find words to describe Behe's error. It's understandable for, say, a computer scientist or lawyer with a layperson's understanding of genetics to make the mistake. And I concede that a first-pass consideration of the problem suggests a simple multiplication due to statistical independence. But we're talking about biology here. And the mistake is so utterly basic that it defies charitable description. If you're an admirer of Mike Behe, you need to be warned: you cannot trust him to tell you the truth about basic population genetics. Whether he's ignorant or dishonest, I don't know and I don't care. He can't be trusted.

Finally, maybe you'd like to know how a knowledgeable biologist (or moderately educated layperson, for that matter) does think about such probabilities. After all, Behe and his defenders are correct in asserting that the odds against the occurrence of two particular mutations occurring at the same time are beyond vast.

To start with, the biologist would consider the population size. So, if the odds are 1 in 100 million and there are a million reproducing individuals, then the odds of occurrence of a particular mutation in one generation are 1 in 100. And the biologist would consider the pre-existing genetic variation, wondering whether the mutation of interest might already exist in the population. That would cause her to wonder about whether a particular mutation has a fitness cost or benefit, or if it's a neutral mutation. She would further wonder whether the mutation is likely to be maintained in the population, regardless of its fitness, based perhaps on its linkage to other genetic features that increase fitness. Such information would help her decide whether the two mutations are likely to proceed in a particular order. Finally, she would want to know if the two mutations are physically close together in the genome; if they're separated to any reasonable extent, then they can be brought together by recombination.

That's pretty basic stuff. I explained it all before. Even biochemists should understand it, especially if they are inclined to write books about it. (To the well-read biochemists: I mean no disrespect, really.)

So, while I think I understand the confusion of Behe's defenders, I urge them to think about the probabilities within the context of biology. Behe has no excuse, and neither will they if they persist in granting any credibility at all to his calculations regarding double mutations.

20 comments:

Bilbo said...

Hi Steve,

If I understand Behe, I think he's referring to simultaneous mutations. In which case it sounds like you would agree that such an occurrence would be "beyond vast."

The real question, then, is whether any biological features have ever required such an event. I take it that most biologists would say never, while Behe would say very often.

I think that's why Behe approaches the question from a different angle in chapter 7, "The Two Binding-Sites Rule," though I suppose most biologists would say his argument there is weak as well.

Bilbo said...

I guess what's bothering me is your qiuckness to call Behe dishonest. He wrote a book where not only does he say he accepts common descent, he even argues for it, despite the fact that he knew that many of the leaders in the ID movement reject it. And then he even devotes a chapter to showing examples of what Darwinian evolution can accomplish.

This is not what I would expect from a dishonest person.

However, repeatedly calling Behe dishonest is what I expect from someone who has a real axe to grind. Which is why I will ask you a second time if your Reformed theology -- which I (perhaps mistakenly) believe rejects Natural theology -- is playing a role in your outlook.

Stephen Matheson said...

Bilbo--

First, if Behe believes that simultaneous mutations must happen "very often," then his arguments are aimed at one of the lamer strawmen that I've seen in my many years of encountering opposition to evolutionary theory. You can rest assured that if you're right about his argument, that he's not even attempting a critique of modern evolutionary theory.

Second, I offered you two choices regarding Behe's errors on basic population genetics. Dishonesty was only one, and I did not commit myself to that choice. What I did say is that you can't trust him, and that is clearly the case. You can't trust a third-grader to enlighten you on evolutionary genetics, either, and that sure doesn't make the child dishonest.

As to my reasons for not preferring the type of teleological outlook that you prefer, that'll be a post in the next week or two. But here's a teaser. I think you see my view as ateleological. It's not. It's pan-teleological. That's why I don't care for attempts to identify intelligent guidance in arcane statistical arguments or in the deep past. Our God reigns, then and now and in the future, and that means that if you have to look for his "guidance" in the remotest reaches of the universe, then you probably don't know what it looks like at all.

Bilbo said...

Hi Steve,

1) I thinks it's safe to assume that Behe is not being dishonest.

2) Behe is not arguing against a strawman. He is not saying, "Modern evolutionary theory claims that double CCCs happen all the time." Rather, he is claiming that double CCCs and even more improbable events must have happened. You inspired me to pick up EoE again, and I started re-reading chap. 5, "What Darwinism Can't Do." He very much wants to argue that something like the IFT took many simultaneous mutations.

3) I'm looking forward to your theological/teleological perspective.

Gabe Moothart said...

Steve/Bilbo,
Allow me to jump in. I think a more careful read of chapter 3 of Edge will help.

Behe defines a CCC as a beneficial mutation that is roughly as likely to occur as chloroquine resistance in malaria. The estimate of 1 in 10^20 that he gives is an experimental result for which he sites a source (bottom of page 57). It is *not* a number derived by haphazardly multiplying probabilities of independent mutations. It is how hard it really was for malaria to actually acquire this resistance.

Behe further clarifies in the top half of page 62, concluding "So a CCC isn't just the odds of a particular protein getting the right mutations; it's the probability of an effective cluster of mutations arising in an entire organism."

A double-CCC, then, would be a beneficial mutation that is only expected to occur once in every 10^40 individuals in a real population. It has nothing to do with multiplying together the probabilities of independent mutations. Behe does suggest, as an example of a double-CCC, a hypothetical change that takes twice the number of mutations as a CCC (top of pg. 63), but this is a rough example and does not mean that he has suddenly changed his definition.

I would think that his Edge criteria should be rather uncontroversial (though not how he applies it later in the book). Surely no one expects evolution to have produced many (any?) mutations that can be expected to arise only in one in every 10^40 individuals in a population.

A better critique is that Behe does not suggest any rigorous way of determining if a particular mutation is a double-CCC. Mostly he describes a lot of complexity requiring lots of mutations, and hand-waves that all of this must be at least as hard as a double-CCC. This is even less rigorous than Darwin's Black Box. He describes IFT as beyond the edge, for example, but doesn't even bother to argue that it is irreducibly complex. His discussion of binding sites, which seem to me like just the sort of thing which *could* be improved gradually by Darwinian processes, just assumes that there evolution must be an all-or-nothing affair. If this is the case he needs to cite research which supports his conclusion.

This is incredibly important because the project that Behe lays out is *the* task for ID in the biological sciences. Dembski et al. can and has laid out a convincing mathematical/algorithmic case for the limits of Dawiniam processes, but biologists are not mathematicians and quite frankly will not care. Experimental evidence of the edge of evolution is what is needed, and as Steve says Behe does not even try to provide it in this book.

Bilbo said...

Hi Gabe,

1) I don't think Behe was just handwaving on the IFT. There were a number of premises to his argument: a) Examples we have of Darwinian evolution are either of breaking genes or crude, sloppy structures like antifreeze proteins in fish. No coherent machinery. b) Despite 10,000 years, malaria hasn't overcome the sickle cell, or even cool temperatures. c) IC systems such as the cilium remain even conceptually -- as far as their evolution --unexplained. d) Now we know that the cilium depends upon the IFT for its construction, which also remains evolutionarily unexplained. e) And now we are finding out this construction must be done in a certain sequence, which also remains evolutionarily unexplained. f) And this all resembles bottom up-top down construction projects.

2) Behe's argument on binding-sites depends on a number of premises, including that 50% binding must occur in order to be selectable, so it's not clear how gradual evolution of a new protein could be. I think his weak point is having to rely on point mutation calculations.

3) Until recently, I don't think it was clear how to go about proving or disproving Behe's claims. I think Thornton's group at Oregon State may have found the answer. They discovered that based on their specific structures, we may be able to tell if one homologous protein can evolve into another one. I'm wondering if this might provide a way to settle the dispute.

lee_merrill said...

It should be noted that Behe's numbers are based not so much on random variables, but on a random process, where events per unit time are considered, instead of the probability of an event per replication.

Now I believe Behe's numbers are generous, assuming in effect that all individuals get any mutation of interest, so then when they all have occurred somewhere, they all occur everywhere, in every individual.

So we look at events per unit time, based on what evolution actually did, and this would still give a 10^40 probability for 4 simultaneous mutations, required for a new protein-protein interaction.

May I quote Larry Moran?

"Evolutionary biologists have no intention of refuting the 'tripleCCC' hypothesis. It's based on evolutionary biology and it is correct. If evolution requires the simultaneous occurrence of three separate mutations then it's never going to happen."

Some might say this is quote-mining, I would say "is correct" means it's in fact correct.

- Lee

Anonymous said...

" It is how hard it really was for malaria to actually acquire this resistance."

Providing, I suppose, the only way to generate this resistance was for a CCC mutation to occur all at once.

I've not read Behe's EoE, nor do I intend to - I read DBB was thoroughly underwhelmed by it all, I can't see volume 2 being much better.

NickM said...

A number of Behe's/Bilbo's claims are just poorly researched and empirically wrong. E.g. IFT is not actually always required for cilium formation. That and a number of similar problems are here:
http://pandasthumb.org/archives/2007/10/full-text-of-th.html

Gabe Moothart said...

Bilbo,
I actually found Behe's arguments fairly convincing. But it seems to me he is resorting to arguments based on certain assumptions when data is what he needs. In this respect Edge is similar to evolutionary story-telling, and I'm sure you are as suspicious of that as I am. Do I think that his points about IFT are likely correct? Yes. But pointing out that IFT is really complicated and no one knows how it evolved does not constitute a scientific argument for design.

The only way to know if Behe's assumptions are correct is to test them. Take protein binding sites: does Behe cite a source for his 50% figure? I missed it if he did. That could be easily tested experimentally by degrading a handful of known binding sites and observing the results.

Or his two criteria in the Benchmarks chapter: steps and coherence. He could have run a few computer simulations to give some formal teeth to his argument that these seriously inhibit darwinian processes.

In nearly all his probability calculations, he assumes that all the necessary parts must have come together at once, which is exactly the point in dispute.

Behe presents a scientific, testable, hypothesis in Edge, but does not do much by way of establishing the truth of his hypothesis or even make many specific predictions, and that is where I find it lacking. I hope you're right that this sort of data is coming down the pipe.

Bilbo said...

Hi Gabe,

I agree that Behe's arguments seem convincing. On the other hand, Mike Gene doesn't think so. And thanks to that troublemaker Nick Matzke, Mike now thinks the bacterial flagellum -- the poster child of the ID movement -- evolved by Darwinian processes.

I'm not sure if or how Behe can do more to strengthen his arguments. And it may be a long time before Thornton's methods can tell us much. So the current value of his arguments may just be to serve as a challenge to the scientific community.

Yes, Behe gives a citation for the 50% in the binding-sites chapter, though I didn't understand it.

Hi Nick, I'll look at your link tomorrow, you troublemaker. ;)

John Farrell said...

I'm not sure if or how Behe can do more to strengthen his arguments.

He could actually get a paper past a peer review and into Science or Nature.

Bill said...

Behe's misuse of probability has been shown to him time and time again. He has no response to this criticism because he is wrong and he knows it.

The fact that Behe both fails to acknowledge criticism or address it leads to Behe being marginalized in scientific circles.

Tell you what. Calculate the probability of a specific molecule of water in the Gulf of Mexico landing on my head next week. The results are improbable as to be impossible, yet, next week I'll get wet. According to Behe, I should not take my umbrella to work.

lee_merrill said...

> Lee: "... required for a new protein-protein interaction."

I should have said "two new protein-protein interactions".

> Gabe: "Or his two criteria in the Benchmarks chapter: steps and coherence. He could have run a few computer simulations to give some formal teeth to his argument that these seriously inhibit darwinian processes."

Lenski's work with Avida would seem applicable here, where he took out selection for intermediate steps, and lo, the EQU function did not evolve as before. Behe's proposal is a straightforward idea--more required independent, unselected mutations makes a structure less likely, exponentially less likely.

Hardly something that would seem so very disputable!

> Gabe: "In nearly all his probability calculations, he assumes that all the necessary parts must have come together at once, which is exactly the point in dispute."

I believe there was an instance where he was considering deleterious mutations--in that case yes, they need to occur together. But in general, with mutations that are not deleterious, they need not occur all at once.

Bilbo said...

Hi Bill,

Given the specidfication, Make Bill wet, then any combination of drops from the Gulf of Mexico will do.

But given the specification, Provide Bill with a mouth full of freshwater, then I doubt that any combination of drops from the Gulf of Mexico will do.

So how was the fishing?

Bill said...

It rained.

Stephen Matheson said...

Gabe--

I think we agree on much of what is wrong with Behe's analysis. I repeatedly stated that his claim (that many or most mutations are non-random) is both testable and unrefuted, and I proposed a collaborative effort to test the idea.

But as I've told Bilbo, his arguments are very crude, and amount to either nonsense or to the thrashing of strawmen. There is no dispute about whether the occurrence of specific multiple mutations is spectacularly unlikely. The question is whether Behe has demonstrated that such a thing is necessary for evolutionary change, or is predicted by evolutionary theory. He hasn't.

And both you and Bilbo should pay a lot more attention to the fact that Behe is utterly ignored by the people who actually know evolutionary genetics. If you can say, as a layperson, that you find Behe convincing, when no one who knows anything about the topic gives him any credit at all, then you should at least admit that you are asserting something far more dramatic than a particular opinion on the pace of microevolution.

Lee--

It is apparent that you don't understand the discussion. This is probably because, as the quote from Larry Moran clearly demonstrates, you haven't bothered to read what I wrote. Best wishes.

Bilbo--

Bill is raising a much more serious challenge to you and Behe than you seem to acknowledge. Mere improbability of one particular outcome, as you know, is insufficient to establish "design" or anything like it. And "specification" doesn't do the work either. What is required is a demonstration that the outcome could not have been otherwise. This is one very important reason why extrapolating from a single case study in an obligate parasite adapting to a targeted attack is so very stupid. There are almost certainly very few ways to escape death by chloroquine. But there are vastly more ways to build cells and signaling systems. I hinted at this in my posts on Notch signaling and I'm outlining a book on the topic right now. Suffice it to say that it is exceedingly unwise for design enthusiasts to ignore the simple criticism that Bill offered. It's a huge problem for your explanation, and I don't believe you'll ever overcome it.

Bill said...

Thank you, Steve!

I could get even more weird with bogus probability calculations. Just now I inhaled a molecule of carbon dioxide, part of the atmosphere. The particular carbon atom I inhaled was formed in a supernova that exploded about 6 billion years ago and out of which our sun and solar system formed.

Calculate the probability of that particular carbon atom blasting out from that supernova and ending up in my nose. Well, basically, it's stupid to bother but suffice it to say the number would be "astronomical!" However, up my nose it just went.

The point is this. Once a hand of cards, or an atom up the nose, is dealt the probability collapses to 1, that is, it happened.

Behe, who is not a moron but who is, in my opinion, intellectually dishonest, manipulates the common idea of "probability" to cast doubt on what has actually happened in order to imply that an event is impossible.

To be sure, the probability of an atom of carbon floating around for 6 billion years and ending up my nose while fishing is pretty improbable, but it just happened. And again. And again.

Stephen Meyer uses the same ruse in "Signature" to say, in essence, "how improbable is that!"

I always get blown off by creationists with my examples of improbable events that nevertheless occur every second of every day. However, it's a major flaw to the theses presented by Behe, Meyer and others and one they evade addressing in public.

In Behe's case, germane to this thread, he committed two errors. First, he used an estimate, merely a footnote not even related to his argument, the infamous 10^20. Second, he compounded the error by asserting that two misrepresentations had to occur simultaneously. He is wrong on both counts and to make matters worse he knew he was wrong when he wrote the book. He misrepresented both the probability estimate and the science on purpose. Seriously, what's the probability that a moron would make BOTH mistakes in a single example?

Behe's thesis in his little book of fiction has been taken apart by real scientists who actually do laboratory work and, scientifically speaking, Behe is less than irrelevant. He may be a hit among the church crowd, but that's about it.

Buh-bye, Behe!

gingoro said...

Well said Steve! I must have missed your original post but not anymore as I see them in my RSS reader so I don't have to navigate to your site to find any new posts. I find it hard to understand why the ID folks don't even seem to get other ID folks to review their books and fix such errors. Dembski as a statistician must understand this issue, after all he talked about replication resources in his books and that is where Behe's error is.
Dave W

Anonymous said...

Behe's calculations maybe are not valid, but do any of you know better, valid calculations about the same subject?

Behe has at least tried to calculate something. It is maybe the first attempt. But has anyone tried to get the calculation right, and to calculate the right numbers?