I should preface this blog by stating that I am a nucleic acids gal. My years in the lab were spent with tubes of DNA and RNA. In fact my one and only tentative foray into the field of protein resulted in a Western Blot so ugly that those who witnessed it have been sworn to secrecy. Given all of this, the mapping of the human proteome might seem like an odd topic for me to write about. Except that it isn’t really, because the sequencing of the proteome offers answers to some of the questions that the sequencing of the genome didn’t.
First, let’s start with what a proteome is: A proteome is all the proteins expressed at a certain time point. It can be as limited as the proteome of a single cell or as all encompassing as the proteome of an entire genome. However, unlike the genome, which is genetic information encoded in an organism’s DNA or RNA, the makeup of a proteome can vary dramatically as a result of expression patterns, alternative splicing events and post-translational modifications.
The genome is a constant, what you see today is what will still be there tomorrow. The proteome, on the other hand, is a constantly changing landscape. Up regulation or down regulation of a gene can mean more or less protein is present. Alternative splicing and post-translational modifications can result in fundamental changes to the protein itself.
In other words, if the genome is a beautiful, pristine Ansel Adams print, then the proteome is that same scene as interpreted by Andy Warhol—in Technicolor and 3D.
Earlier this year, two independent teams published first drafts of the human proteome. The teams took different approaches. One group, led by Akhilesh Pandey from John Hopkins University, isolated protein from 30 different tissue types from a single source. The team was able to catalog proteins encoded by about 84% of human genes predicted to code for proteins and determined the relative abundance of each protein using mass spectrometery (1).
The second group led by German researcher, Bernhard Küster from the Technische Universität München, used a different but complimentary approach. Küster’s team created a searchable public database, ProteomicsDB, using existing data from the proteomics community. To fill gaps in the public data, the team generated its own data using over 60 human tissues, 13 body fluids and 147 cancer cell lines. In total, the ProteomicsDB catalogs about 92 percent of human proteins (estimated to be 19,629) (2).
Scientist are still in the early stages of analyzing the results from these two studies, but already some interesting information has come to light. For one, some parts of the genome that we thought were non-coding aren’t, as evidenced by the identification of new proteins from some of these regions. In total, more than 400 translated long, intergenic non-coding RNAs (lincRNAs) were identified, as well as 193 new proteins.
Another, possibly paradigm-shifting, result involved the translation rate of mRNA. The Küster group compared the expression profiles of mRNA and proteins and found that although the level of mRNA and protein varies dramatically between tissue types, the ratio of mRNA to protein was surprisingly conserved for a given protein. This suggests that that at least at steady state, once the ratio for an mRNA/protein pair has been calculated, protein levels can be determined just from specific mRNA levels. Meaning that the translation rate of a particular mRNA is somehow coded into that transcript. If this hold true, it could mean that protein expression levels in a cell is largely controlled by regulating mRNA levels.
All of this new data and the promise of the answers it holds is almost enough to make me want go back into the lab and try my hand at the protein side of things again. Almost.
- Kim, M-S. et al. (2014) A draft of the human proteome. Nature 509, 575–81.
- Wilhelm, M. et al. (2014) Mass-spectrometery-based draft of the human proteome. Nature 509, 582–7.