Recall that Fowler et al. set out to design a system in which one can study a protein's function (its “fitness”) as it varies in sequence. The idea is to look at all (or at least nearly all) of the variants of a particular protein to see how well each one works, and then to map this measure of fitness onto the sequence space of the protein. Such a map would be a form of fitness landscape. Fowler and colleagues (henceforth called the UW group) used a previously-known technique (protein display) to link each variant of the protein to its gene sequence, then used next-generation gene-sequencing technology to rapidly determine the gene sequences of millions of variants.
Last October, a group at the Fred Hutchinson Cancer Research Center in Seattle reported the results of a somewhat similar experimental effort. Jason Pitt and Adrian Ferré-D'Amaré co-authored the paper in Science, and their title identified their research objective: “Rapid Construction of Empirical RNA Fitness Landscapes.” The first couple of sentences of the abstract should sound familiar by now. (The genotype is the gene sequence. The phenotype is the function.)
Evolution is an adaptive walk through a hypothetical fitness landscape, which depicts the relationship between genotypes and the fitness of each corresponding phenotype. We constructed an empirical fitness landscape for a catalytic RNA by combining next-generation sequencing, computational analysis, and “serial depletion,” an in vitro selection protocol.And they identify two major challenges, both of which we have already discussed:
First, even for macromolecules of modest length, the sequence space is vast; a 20-mer RNA or protein has ~1012 or ~1026 possible sequences, respectively. Second, to characterize the landscape, the phenotypic fitness of each individual genotype needs to be measured, or an indirect measure of fitness needs to be validated.The authors tackled the challenges using a strategy very similar to that of the UW group: first they designed a functional screen, a way to subject an enormous population of variants to a gauntlet of selection, so that the population would be altered in structure after each round of selection. Think of it as evolution in a test tube. But the UW group had a problem that Pitt and Ferré-D'Amaré didn't have to worry about: the linkage of protein function with the underlying gene sequence. Why the difference? Pitt and Ferré-D'Amaré didn't study protein. They studied RNA – specifically, they analyzed the function of a ribozyme, which is a molecule of RNA that is capable of altering chemical reactions the way protein enzymes do. This means that there was no translation problem for them, since the gene sequence (the base sequence of the RNA) also comprises the structure of the molecule that is being functionally assessed.
Each dot is a single sequence. (Actually, each dot is a whole set of sequences that have the same similarity to the reference sequence. In Figures 1C and 1E the authors introduce another dimension to show the spread that each dot represents.) The green dots show enrichment of sequences after 1 minute of competition; the reference sequence is on the far right, such that the steeply-sloping peak on the far right represents sequences that are similar to that reference sequence. As we might expect, the more similar a sequence is to the reference sequence, the more “fit” it is (in general). Fitness is indicated by extent of enrichment, which the authors term “fecundity.” The magenta dots represent not enrichment, but depletion; in a reciprocal experiment, the investigators removed the most fit molecules from the pool by subtracting the best-binding population from the pool. Notice that the depletion landscape is basically a mirror image of the enrichment landscape, as we would expect if the process is truly selecting based on binding activity.
There's a lot of data in that graph. Here's how the authors describe the result:
...the fecundity of any individual sequence provides a metric of its fitness, and we can create an experimental fitness landscape composed of ~107 different RNA genotypes in a single experiment.And yet the picture is a vast oversimplification of that huge data set. For one thing, the graph provides no specific sequence information even though the sequence of every one of those 10 million variants is known. Pitt and Ferré-D'Amaré write:
The empirical fitness landscape we generated is a high-dimensional object. We visualized it by computing the information content per residue of the master sequence, in essence projecting the landscape onto the ribozyme sequence.Sewall Wright sketched, but it's a fitness landscape nonetheless, made possible by the creativity of Pitt and Ferré-D'Amaré and by the power of next-generation sequencing.
So, we've looked at three significant articles in the last year or so on fitness landscapes, in which talented scientists explored the relationships between genotype and phenotype, on scales barely imaginable just a decade ago. All three studies were carried out in Seattle, Washington, within just a few miles of Biologic Institute, where the scientists of the intelligent design movement work on questions of the same ilk. If those scientists really want to be taken seriously, if they really seek to understand how structure and function and evolution are related, they'll have to understand fitness landscapes and their experimental applications. Fortunately, they can find some of the world's experts on that very subject right in their own backyard. Whether that amounts to tragic irony or a golden opportunity is a choice for the intelligent design apologists of the Seattle area. May they choose wisely.
[Cross-posted at Panda's Thumb.]
Pitt, J., & Ferre-D'Amare, A. (2010). Rapid Construction of Empirical RNA Fitness Landscapes. Science, 330 (6002), 376-379 DOI: 10.1126/science.1192001