29 October 2017

An overnight recipe for a new gene: change the frame

Can a new protein-coding gene be born overnight? That's the theme of this series. The answer, remarkably, is yes, and the Arhgap11b gene is the recent case I'm considering. After surveying the ways that this could happen, I narrowed the possible mechanisms to three:
There are really just three kinds of change that can get it done. All three are mutations that change how the DNA sequence that's already there gets decoded into protein. They are: 1) tiny mutations that shift the reading frame; 2) tiny mutations that change splicing; and 3) large-ish rearrangements that create new combinations of code.
Before looking at the details, let's take note of the fact that the genomes of animals and plants typically have gigantic amounts of DNA that does not code for protein. Humans are merely typical in this regard—at least 95% of the human genome is non-coding DNA, but there are organisms with a lot more and some with a lot less. The point here is not about "junk" or function, it's more basic: an animal genome contains vast amounts of DNA that could code for a protein, but doesn't. A new gene doesn't have to be magicked into a genome, by a demon or by a virus. A new gene can enter the gene library simply by becoming a new way of reading a pre-existing text. This is almost certainly how the vast majority of new genes have arisen in animals and plants for at least half a billion years. And the basic mechanism applies to all living things, for 3-ish billion years: any DNA sequence can become a protein-coding sequence, and those that already do code for protein can be straightforwardly modified to make completely different proteins.