Skip To Navigation Skip To Content
Center for Biomolecular Science & Engineering
Baskin School of Engineering
  • Home
  • People
    • Faculty Affiliates
    • QB3 Affliliates
    • Staff
  • Research
    • Advancing Biomedical & Biomolecular Research
    • Advancing Technology Development
    • Bioinformatics, Applied Math, Biostatistics
      • Bioinformatics & Computational Biology
      • Bioinformatics Research Projects
      • Bioinformatics Documentation
      • UCSC Genome Browser and Bioinformatics Tools
        • UCSC Genome Browser Overview
        • Genome Browser Licensing
      • ENCODE Project
      • Comparative Genomics
        • Human, Chimp, Mouse
      • Genome Research Primer
      • Human Genome Project
        • What is the Human Genome?
        • Human Genome Project Race
        • Human Genome 10th Anniversary
      • Cancer Genomics Browser
    • Experimental Genomics & Proteomics
    • Molecular & Cellular Biology
    • Stem Cell
    • Structural & Chemical Biology
    • Biochemistry & Biophysics
    • Computer Engineering & Scientific Visualization
    • Bioengineering and Biotechnology
      • Nanopore Project
        • Nanopore Analysis of DNA Molecules
          • Nanopore Movies
        • Nanopore Project Publications
        • Nanopore Project Members
    • Cancer Research
    • Environmental Science
    • Health Science
    • Research Facilities
      • QB3 RFP 1101 Review--All Proposals
      • CBSE Computing Facilities
      • Microarray Facility
        • Microarray Equipment & Protocols
        • Microarray Database
        • Microarray Publications
    • Buildings
    • Funding Opportunities
    • Ethics
  • Education
    • Programs
      • Science & Justice Training Program
      • Science & Justice Introductory Seminar
      • Science & Justice Research Project
    • Resources
    • Undergraduate Scholarships
    • Graduate Fellowships
    • Postdoc Fellowships
    • Academic Calendar
  • Diversity
    • Diversity Outreach
    • RMI & Diversity Awards
      • Undergraduate Scholarships
      • RMI Graduate Fellowships
      • RMI Program Application
      • List of RMI Scholars
    • Events & Classes
    • Professional Organizations and Recruitment Events
    • Internships and Summer Programs
    • Government Programs
    • UCSC Resources
    • State and National Orgs
    • Resources for Educators
  • News & Events
    • CBSE in the News
      • About CBSE
      • About CBSE Faculty Affiliates
      • About David Haussler
      • About Jim Kent
      • About the UCSC Genome Browser
      • Bioinformatics
      • Cancer Research
      • Genome Research
      • Comparative Genomics
      • Bioengineering & Biotechnology
      • Biology & Biochemistry
      • Health Science
      • Environmental Science
    • News
    • Events
    • Clubs & Seminars
    • News Archive
    • Events Archive
  • Jobs
    • Staff Positions
    • Faculty Positions
    • Postdoc Positions
  • About Us
    • Partners
      • Academic Partners
      • Industry Partners
    • About Our Logo
    • Contact Us
    • Directions
You Are Here:
Home
» UCSC and the Human Genome Project

CBSE RESEARCH

CBSE Research Interests

Technology Development

Research Facilities

Buildings

Funding Opportunities

Ethics

RELATED LINKS

Genome research primer

The Human Genome Project at UCSC

Race to complete the first working draft

Science Magazine reflects on the 10th anniversary of human genome sequencing

Comparative genomics

Read about the entire Human Genome Project on the NHGRI website

The challenge of bioinformatics
Essay by David Haussler

Webcast: NGRI genome symposium, "From Double Helix to Human Genome—and Beyond"
April 2003

UCSC Genome Browser

David Haussler

Jim Kent

UCSC and the Human Genome Project

On this page

The first working draft

The finished sequence

How are genomes sequenced?

The International Human Genome Project (IHGP) came to UC Santa Cruz in December 1999 when Eric Lander, the director of the Whitehead sequencing center (Whitehead Institute/MIT Center for Genome Research), invited David Haussler to help annotate the human genome. In particular, Lander wanted help in discovering the locations of the genes, which make up only approximately 1.5% of the sequence.

Haussler had previously applied a mathematical technique known as hidden Markov models (HMMs) to the task of computer gene-finding. This application of HMMs had quickly become the dominant gene-finding methodology and was used successfully on the Drosophila melanogaster (fruit fly) genome.

Haussler enlisted Jim Kent, then a graduate student in UCSC’s Department of Molecular, Cell, & Developmental Biology, along with systems engineer Patrick Gavin and graduate students Terrence Furey and David Kulp (who had led the gene-finding effort on the Drosophila genome). This was the birth of the UCSC Genome Bioinformatics Group.

The First Working Draft

It was a crucial time for the international project. The private company Celera Genomics had announced its intention to assemble the human genome sequence well in advance of the public effort, raising the fear that the sequence would be protected by patents and thus not be freely available to scientists. At this point, a number of groups within the IHGP were trying to assemble the genome sequence, which turned out to be like an extremely difficult jigsaw puzzle having many similar-looking, noncontiguous, overlapping pieces. The progress was slow and arduous.

Motivated to prevent Celera and its clients from locking up significant portions of the human genome in patents, Kent dropped his other work in May of 2000 to focus on the assembly problem. Within 4 weeks, he developed a 10,000 line computer program that assembled the working draft of the human genome. The program, called GigAssembler, finished the job on June 22, 2000, just days before Celera completed its first assembly.

On July 7, 2000, after further examination by the principal scientists of the public genome project, the UCSC Genome Bioinformatics Group released this first working draft on the web at http://genome.ucsc.edu. The scientific community downloaded one-half trillion bytes of information from the UCSC genome server in the first 24 hours of free and unrestricted access to the assembled blueprint of our human species.

  • More about the race to complete the first working draft

With the gene assembly 90% complete, the assembled genome was published along with the findings of hundreds of researchers worldwide in the February 15, 2001 issue of Nature, which was largely devoted to the human genome.

The Finished Sequence

The initial assembled human genome sequence was referred to as a working draft, because there remained gaps where DNA sequence was missing, due either to lack of raw sequence data or ambiguities in the positions of the fragments. In the months following the release of the working draft, the UCSC team worked with other researchers worldwide to fill in the gaps. The resulting finished sequence made its debut in April of 2003. It encompasses 99% of the gene-containing regions of the human genome and is 99.99% accurate.

How are genomes sequenced?

There isn't a laboratory system available to read along the entire length of a DNA strand to determine the order of nucleotide bases (A, G, T, and C for adenine, guanine, cytosine, and thymine). DNA sequences are determined by a variety of methods, some automated. They all involve breaking DNA into fragments by some chemical method such as the use of enzymes and then determining the order of the nucleotides in the fragments.

The task is further complicated by the fact that to get an accurate map, you need considerable redundancy in the sequenced segments. So the sequenced segments contain several times the number of bases in the genome being studied. A supercomputer (such as UCSC’s PitaKluster) tackling this task will spit out a series of longer assembled segments that are contiguous and represent non-overlapping portions of the genome. These are called contigs. To join the contigs together, researchers must go back to the wet lab and get sequences of the gaps between the contigs. They home in on the missing sequences using the ends of the existing ones.

The shotgun approach can be more effective if it is informed by other knowledge of the genome that is already available. The human genome resides on 23 chromosomes. The locations of many genes on these chromosomes are already known, so this allows some sequences to be placed on the map. Then the genome can be pieced together from these fixed segments. This is a bit like solving a jigsaw puzzle using the picture on the cover of the box as a guide.

Center for Biomolecular Science & Engineering • 1156 High St, Mail Stop CBSE/ITI, Santa Cruz, CA 95064
Phone: 831-459-1477 • Fax: 831-459-1809 • E-Mail:

Questions about the UCSC Genome Browser? E-Mail

© 2013 CBSE. All rights reserved. • Last Modified On April 27, 2009 At 12:39 PM

UCSC Home • BSOE Home • CBSE Home • Internal • Log In