Quintessence of Dust: Harmful genes, and sneaky, too: Genetic hitchhiking in the human genome

Genetic hitchhiking is thought to be an inevitable result of strong positive selection in a population. The basic idea is that if a particular gene is strongly selected for (as opposed to selected against), then the chunk of the genome that carries that gene will become very common in the population. The result is a local loss of genetic diversity: all (or nearly all) of the individuals in the population will have that same chunk of genetic information, whereas before the selection process acted, there might have been a lot of variation in that chunk throughout the population. And this means that areas of the human genome that are less variable between people are suspected sites of recent positive selection. Within that chunk, there are potentially many genes and genetic elements that became more common in the population by virtue of their placement near the gene that was actually selected for. Those other genes are the hitchhikers. And it's likely that some hitchhikers are bad news – they're harmful mutations that would normally become rare or extinct in the population, but instead have become common by hitchhiking.

In the last few years, large amounts of genetic information have become available that have enabled biologists to look for evidence of such phenomena in the human genome. Specifically, two major projects have collected genetic data for the purpose of analyzing genetic variation among humans. One project, the International HapMap Project, mapped and quantified sites in the human genome that are known to vary among humans by a single genetic letter. These sites are called single nucleotide polymorphisms, or SNPs (pronounced "snips"). The project has mapped millions of these sites in a group of 270 humans representing various lineages. Another project that has made the news recently is the 1000 Genomes Project, which also seeks to provide a picture of human genetic variation using more people (more than 1000 at present) and slightly different technology. Efforts like these have taken analysis of the human genome to a new level. No longer do we merely wonder what "the" human genome is like – we can begin to learn about how genetic differences give rise to biological differences such as susceptibility to particular diseases.

And we begin to look for evidence of positive selection in the human genome. One goal is to identify those genetic changes that might underlie recent evolutionary changes in our species, whether they are changes that affected the whole species or changes that are specific to particular human subpopulations; the HapMap Project spawned a prominent article in Nature in 2007, and more recent work has expanded on those initial findings. A different goal is to look for evidence that harmful genes can indeed hitchhike into our genome, riding the coattails of the good stuff that natural selection picked out. A recent paper in PloS Genetics ("Evidence for Hitchhiking of Deleterious Mutations within the Human Genome" by Sung Chun and Justin Fay) reports on a very interesting attempt to address that question, using data from the 1000 Genomes Project.

Before we tackle the authors' data, let's carefully frame their question and their assumptions. Picture a random chunk of the genome, large enough to contain a number of genes. Like any chunk of human DNA, it should contain SNPs; in other words, there should be variation in that chunk that would be apparent when it was compared among numerous people. Now, most of the variation will be neutral, meaning that it has no detectable effect on fitness. I have an A at that site, you have a G, and we're both fine. But some of the variation might not be neutral. There might be a harmful SNP in that chunk. I have an A at that site, you have a G, but my A makes me more likely to die before I reproduce. Now, all things being equal, the neutral SNPs should be a lot more common than the harmful ones, because natural selection will weed out the harmful variants but have no effect on the frequency of the neutral SNPs. And this leads to a prediction: hitchhiking should favor the harmful variants compared to the neutral variants, because hitchhiking shields the harmful SNPs from natural selection. Such shielding doesn't benefit the neutral variants at all, because they were never subject to selection in the first place. And so, hitchhiking should increase the frequency of harmful SNPs compared to neutral SNPs, specifically in the chunks of the genome that have been subjected to recent positive selection. It's important to understand this prediction, because the purpose of the authors' analysis was to test the prediction as it applies to humans.

In order to perform their analysis, the authors needed a way to identify whether a given SNP is neutral or harmful. Ideally, this would be done empirically or experimentally. But we know so very little about the effects of particular genetic variants in humans. So Chun and Fay classified SNPs as neutral or harmful by doing a comparison across 32 species using a procedure they had previously developed. Their reasoning was basically as follows: if a particular change is also found in other animals, it's likely to be neutral. While there's no way to precisely know the accuracy of the classification test, it's reassuring to learn that 72% of known human disease genes were classified as harmful using their method.

Let's focus on three of the figures in their paper to see how they reached their conclusions.

Their most basic prediction was that in regions showing evidence of hitchhiking (i.e., regions with reduced variation among humans), harmful SNPs would be increased in frequency compared to neutral SNPs. Figure 3D (below) shows that this is indeed the case, and it applies no matter how the hitchhiking is detected. (There are at least 9 different ways to identify genomic regions that are likely to have experienced hitchhiking.) In the graph, a ratio of 1 indicates that the region of interest is no different than non-hitchhiking regions. A ratio of greater than 1 indicates that there are more harmful SNPs per neutral SNP than in non-hitchhiking regions. Note that the increase in the ratio varies depending on the methods employed, but that the ratio is statistically different no matter what.

A related prediction is this one: the ratio of harmful SNPs to neutral SNPs should be higher when closest to the hitchhiking region, and should decline with distance. This is because the shielding effect of selection depends on proximity to the positively-selected gene; hitchhikers that are further away are more likely to be separated from their ride by recombination. And this is again the case, as illustrated in Figure 4 (below; click to enlarge). Distance away from the epicenter (green dot) of the hitchhiking region increases from left to right, and the harmful-to-neutral ratio declines with distance. The two panels depict results using two different methods of identifying hitchhiking regions, and the authors explain why the method on the right shows a less-dramatic decline (see the last paragraph on page 3 of the PDF).

The preceding data was from analysis of lots of likely hitchhiking regions, identified in various ways. But we know of some specific sites of likely recent positive selection in the human genome – sites where we even know which gene is involved and can infer the reason why that variant is beneficial. So Chun and Fay zoomed in on 10 of those regions, and looked for that telltale sign of harmful freeloaders. Sure enough, harmful SNPs were strongly enriched right around those positively-selected genes. Have a look at Table 2 below (relevant numbers are circled in red; click to enlarge); the harmful-to-neutral ratios are 1.83 (11 to 6) inside the target gene (the positively-selected gene) and 0.69 (24 to 35) in the neighborhood immediately surrounding it. The ratio in non-hitchhiking regions is 0.41.

One further finding that is worth mentioning: Chun and Fay showed that known disease-causing mutations are enriched in hitchhiking regions compared to non-hitchhiking regions. Now for some closing comments.

By providing evidence for enrichment of harmful variants in hitchhiking regions, Chun and Fay have also provided further evidence that such regions are indicative of the influence of positive selection.

The authors suggest that "positive selection has had a significant impact on deleterious polymorphism and may be partly responsible for the high frequency of certain human disease alleles." And they propose a nice way to further test their hypothesis: perform similar analyses of the genomes of domesticated animals, where there has been strong recent positive selection, typically on known traits.

Hitchhiking is just one way that harmful genetic variants can be maintained in a population, and its existence is an expected result in sexually reproducing organisms like us. There are many reasons to distrust highly simplified depictions of evolutionary genetics, and I think this is one of the most important.

The paper is worth spending some time to read. Some sections are technically demanding, but the Discussion section is approachable, I think, and it is a thorough and cautious overview of the findings and their significance and context.

Chun, S. and Fay, J.C. (2011). Evidence for Hitchhiking of Deleterious Mutations within the Human Genome. PLoS Genetics, 7 (8) DOI: 10.1371/journal.pgen.1002240

23 September 2011

Harmful genes, and sneaky, too: Genetic hitchhiking in the human genome

No comments: