The ENCODE project (for ENCyclopedia Of DNA Elements) is a scientific reconnaissance mission aimed at discovering all regions of the human genome that are crucial to biological function. Scientists have focused on finding the genes, or protein-coding regions, in DNA sequences, but these account for only about 1.5% of the genetic material of humans and other mammals. While compelling evidence exists that other parts of the genome must have important functions, at present we have only very limited information about how they work. The ENCODE project is developing a comprehensive ‘parts list’ of the human genome by identifying and precisely locating all functional elements in our DNA sequence.
This project, sponsored by the National Human Genome Research Institute (NHGRI), involves an international consortium of scientists from government, industry, and academia.
The UCSC group served as the data collection center for the entire project from its inception through 2012, providing the database and web interface for all sequence-related data for the ENCODE project. This includes mapping experimental data to specific human sequence coordinates, integrating the data into the UCSC Human Genome Browser (where it continues to reside) on specialized tracks, and providing more in-depth information on detail pages. UCSC also develops, performs, and presents computational and comparative analyses to glean further genomic and functional information from the collective data.