September 5th saw the simultaneous publication of more than 30 papers in Nature, Genome Research and Genome Biology detailing the findings of the ENCODE project (Encyclopedia of DNA Elements)—an international collaborative research effort involving the work of more than 400 scientists in 32 groups over the last eight years. Building on the work of the human genome project, the goal of the ENCODE project was to catalog and describe all the functional elements in the human genome.
The human genome project revealed the surprising fact that only 1% of our genome encodes proteins–a mere 20,000 genes. The function of the other 99% of our DNA remained a mystery. The ENCODE project was established in 2003 to survey and catalog these unknown DNA regions in a systematic manner.
Sixty years ago, identifying the vehicle of inheritance (DNA) was the big mystery. Now, 11 years after publication of the sequence of the entire human genome, the challenge is still to unravel the complexity of the DNA code and to understand how the blueprint of our 20,000 genes is interpreted and executed to create specialized cells and unique individuals. The ENCODE project has provided a vast body of data for researchers seeking to understand how genes are regulated, how specific genes are turned on and off in different cell types, and how that gene expression affects health and disease. The results reveal a complex array of regulatory mechanisms of known and unknown function—a jungle of interconnected and interdependent components that work together to interpret the code stored in our DNA.
The approaches taken by the ENCODE researchers included isolating and sequencing transcribed RNA, mapping regions of transcription factor association, identifying regions of DNA methylation, and analyzing histone modifications. The undertaking was made possible by collaborative effort, next-generation sequencing technologies, and 21st century computing power for data analysis. The published findings state that a biochemical function has now been ascribed to 80% of the genome.
Gene Definition Implications
The richness and complexity of the regulatory elements described is astounding: 70,000 promoters 400,000 enhancers, non-coding RNAs of known and unknown function, pseudogenes that are transcribed, “forests” dense with transcription factor binding sites…the list goes on. One paper, What is a gene, post-ENCODE?, published in Genome Research, gives a summary of the work and puts it in a historical perspective. Starting with Mendel, the paper discusses the history of our exploration of the gene and goes on to describe the various challenges to the one gene:one polypeptide definition that have been raised by recent developments. The author goes on to propose an expanded (or new) definition based on the findings of the ENCODE project.
A gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products
This definition still supports the one gene:one polypeptide scenario, but expands it to also include genomic sequences encoding more than one overlapping functional product via mechanisms such as alternative splicing, and (significantly) also expands the definition of functional gene products to include RNA transcripts as well as proteins.
Disease Research Implications
Another outcome of the ENCODE project with immediate application is the mapping of certain disease-associated mutations (SNPs) identified from genome-wide association studies. For example, mutations associated with susceptibility to Systemic Lupus Erythematosus (SLE) have been identified that do not map to any known gene, but appeared to be in non-coding regions of unknown function. Results coming out of the ENCODE project showed that these mutations were associated with regulatory regions active in immune cells, illustrating one hopeful and practical application of the ENCODE work.
The idea of redefining the gene, the disease implications, and the fact that 80% of the genome once considered junk DNA has now been ascribed a putative biochemical function are three findings of the ENCODE project that attracted my attention. More on these topics and the rest of the ENCODE work is available in Nature, here, where it is eloquently summarized and supported by excellent video and commentary. The site is well worth a visit.
Here are the papers summarizing the work and discussing the redfinition of the gene:
ENCODE Project Consortium, Bernstein BE, Birney E, Dunham I, Green ED, Gunter C, & Snyder M (2012). An integrated encyclopedia of DNA elements in the human genome. Nature, 489 (7414), 57-74 PMID: 22955616
Gerstein MB, Bruce C, Rozowsky JS, Zheng D, Du J, Korbel JO, Emanuelsson O, Zhang ZD, Weissman S, & Snyder M (2007). What is a gene, post-ENCODE? History and updated definition. Genome research, 17 (6), 669-81 PMID: 17567988