09 August 2023

The known unknowns of biology: welcome to the unknome

'Genome' is now a pretty standard word in our social vocabulary. We have to put up with overloaded metaphors like "blueprint" and reverent talk like "language of god" but it does seem to me that the word is reasonably well understood by laypeople—not as jargony as "gene expression" or as inscrutable as "chromatin." The word was born in 1920 when someone blended 'gene' with 'chromosome'. (The -some in 'chromosome' is from a root that means 'body' as in 'somatic' or 'psychosomatic'.)

So, a genome is a "body" of genetic material, and specifically the whole body of genetic material in an organism (or a cell or a species). For decades now, science has been regularly adding more -omes. The proteome is the full body of proteins. The transcriptome the full body of transcripts. It gets a bit weirder: the phenome is the full body of phenotypes. There's even the spliceome, the full body of splice variants. Many of those are pretty jargony; the point is that -ome is a suffix that's being used a lot like -gate is used in political news (to indicate a kind of scandal, as in Watergate or Gamergate). Among the -omes (let's call it the omeome, ha ha!), the best by far is the unknome: the set of all genes of unknown function.

How big is the unknome? In other words, how many of the 20,000 or so genes in the human genome are unknown (in function)? Is there a gradient of unknown-ness? A new paper in PLOS Biology by Rocha and colleagues introduces us to the unknome and then adds some good stuff, which is the least the authors can do after telling us how little we know about human gene function. (They created a database where you can surf the unknome and explore our ignorance.) The title and abstract:

Functional unknomics: Systematic screening of conserved genes of unknown function

The human genome encodes approximately 20,000 proteins, many still uncharacterised. It has become clear that scientific research tends to focus on well-studied proteins, leading to a concern that poorly understood genes are unjustifiably neglected. To address this, we have developed a publicly available and customisable “Unknome database” that ranks proteins based on how little is known about them. We applied RNA interference (RNAi) in Drosophila to 260 unknown genes that are conserved between flies and humans. Knockdown of some genes resulted in loss of viability, and functional screening of the rest revealed hits for fertility, development, locomotion, protein quality control, and resilience to stress. CRISPR/Cas9 gene disruption validated a component of Notch signalling and 2 genes contributing to male fertility. Our work illustrates the importance of poorly understood genes, provides a resource to accelerate future research, and highlights a need to support database curation to ensure that misannotation does not erode our awareness of our own ignorance.

One nice feature of the database is an index of known-ness. (The authors call it 'knowness' which just looks wrong to me.) So you can browse/search by knowness and by family of genes and so on. I like that, but then the authors took a big cool step: they generated a list of 260 unknown genes (in human) that seem to be represented in the fruit fly. Using a method called RNA interference that lends itself to large-scale experiments, they interfered with the function of those genes in flies and found a range of effects that are listed in the abstract.

Their conclusion (from the Discussion of the paper):

We have developed an approach to tackle directly the huge but under-discussed issue of the large number of well-conserved genes that have no reliably known function, despite the likelihood that they participate in major and even possibly completely new areas of biological function. We hope that our work will inspire others to define and characterise further the unknome and also to seek to ensure that gene annotation has the support and technology to preserve and recognise true ignorance.

"Recognise true ignorance" is, I think, their way of pointing to the importance of knowing what we don't know.

There's another theme here, about how we sometimes seem to prefer to talk about what we don't know and to reach conclusions about that—the origin of life and the so-called Cambrian Explosion are two examples of topics that highlight how much we don't know, and this ignorance can tempt us to think that we can't know how these things work, even in principle. Let's pay attention to the difference between that kind of ignorance and the "true ignorance" that Rocha et al. point to.


Image credit: "The Key to unknowne knowledge. Or, a shop of five windowes ..." 1599. From Wellcome Images, CC-BY 4.0 license.



No comments: