From San Diego to Sao Paulo, from Copenhagen to Canberra, biologists are shepherding animals two by two.
From one male and one female, the scientists will scrape tissue samples the size of a penny to freeze in liquid nitrogen at -112Â°F. More than a million copies of DNA will be saved from each chosen animal of the land, sky, and sea. Even extinct animals will be on board.
There's no flood on the way, although there are reasons to rush. Scientists are attempting to decode the DNA of more than 10,000 vertebrates to build a "genome ark."
"This will be our first opportunity to explore the evolution of the leading characters in the drama of life on this planet at the level of the DNA," says bioinformaticist David Haussler of UC Santa Cruz, who is a driving force behind the project called Genome 10K.
Following the giddy success of unraveling the human genome in 2000, biologists in Genome 10K are now beginning the same process for more than 10,000 individual species--about one per vertebrate genus. Recently extinct species will be immortalized in computer bits, perhaps awaiting the day stem cell technology can bring them back. Scientists will gather genetic clues to help protect endangered populations from spiraling out of existence. And if the team can use the record of evolution to figure out how the genome turns genes on and off, the project's databases could guide future doctors in targeting therapies for individual patients.
The zoo's long list includes 60 types of whales, 19 types of penguins, 26 hamsters, 21 kangaroo rats, 89 monkeys, and 3 platypi. Feather or fur, blubbery or brainy, there's one thing that connects each animal in the genome zoo: They all have the same ancestral grandmother, and she was a fish.
The point of this massive undertaking is to understand how this common aquatic ancestor, a fish with a backbone, three-part brain, intricate organs and gills, set the stage for all the large creatures on Earth. Tracing the paths from fish to crocodile, dolphin, or swan will help scientists decipher the still-mysterious sections of the human genome.
"That our universe is capable of producing this variety of beautiful entities," begins Haussler, leaning back in his office chair, pausing to pluck the right words from above his head, "is stunning to me.
"And that it does it all by itself, naturally!" He laughs. "This is absolutely stunning."
Epicenter of genetics
Ten years ago, proposing an enormous species zoo would have been ludicrous. Bankrolled with $3 billion from the government, hundreds of scientists scurried through the 1990s to unravel the DNA just one species: Homo sapiens.
It was the first time a genome's lengthy alphabet would be sequenced, or transcribed letter by letter. DNA contains four distinct chemical units that biologists have denoted with these letters: A, T, C, and G. Each human has a unique permutation of 3.2 billion of them. Sequencing involves splicing the string into millions of fragments and chemically coaxing them to reveal their identity.
Joining the small army of pipette-wielding, white-coated biologists was a relatively anonymous genomics lab at UC Santa Cruz, led by Haussler. He's tan, well over 6 feet fall, and notorious for his Hawaiian shirt and khaki cargo shorts. A painter turned mathematician by training and computer scientist by trade, Haussler was one of the few scientists who applied sophisticated algorithms to scan scraps of DNA for genes.
The lab jumped into the Human Genome Project in 1999, just after a private company called Celera boasted it would sequence the genome faster and for one-tenth the cost of the public project. Finishing first would entitle Celera to the gene patents, squashing the vision of a free, publicly accessible sequence of mankind.
The public project was in bad shape. Snippets of DNA were scattered in unorganized computer files without an overarching map.
In the spring of 2000, Haussler and his colleagues worked day, night, and weekends to program an assembler that would pick up the minuscule pieces of DNA and connect them into something readable. In May, when it seemed imminent that Celera would finish first, Haussler told his bright, computer-savvy graduate student Jim Kent that the project was looking grim.
For the next four weeks, Kent furiously coded GigAssembler, which joined together larger pieces of DNA. He wrote 20,000 lines of code, icing his wrists at night.
"It was an excruciating, nail-biting time," said Haussler.
In June 2000, Celera CEO Craig Venter joined Francis Collins, leader of the Human Genome Project, at the White House to announce they both had sequenced the genome as a tie. Celera scientists finished their assembly the night before. But Kent's had been done for four days.
A week later, Haussler and Kent published the genome online, free to the public, with tools to search through it.
"It was just not right to have our genome locked up in a commercial enterprise, and only available to those who could pay for it," said Haussler.
Over the next ten years, Haussler's lab published more than 100 papers. Some described novel algorithms to speed up the process of analyzing DNA; others overturned the belief that large sections of DNA are just "junk."
Now, the UCSC genome browser includes about three dozen more species, and it gets hundreds of thousands of visitors each week. It's a world-famous reference table among biologists. And it's why Santa Cruz was the logical place to launch the next massive genome project.
Luck favors the prepared
As of early 2010, sequencing an animal's genome is too expensive to justify doing it 10,000 times.
Currently, sequencing an entire genome puts you back at least $30,000, explains Haussler. It's usually closer to $100,000 for a high-quality analysis. The cost per genome must shrink by a factor of 10 to 30 before Genome 10K can begin in earnest.
Luckily for the project, that will happen soon, predicts Haussler. Sequencers are growing cheaper at an unprecedented rate--even faster than Moore's Law, the doubling of the number of transistors that can fit on a computer chip every two years. In the last ten years, the cost of sequencing a base of DNA has plummeted by 10,000. By Moore's Law, computer chips ought to only shrink by a factor of 32 in the same time period.
To stay within its $50 million budget, the project can't spend more than $3,000 per species to accurately sequence an animal's DNA.
"This will be possible. It will happen," says Haussler. "It will happen in the next couple of years."
When that happens, an endeavor that once cost the gross domestic product of Fiji will be done for the price of a used Honda Civic.
Haussler is also optimistic about raising the money, selling a scientific project that he calls "an incredible bargain." He's just begun to pitch the idea to philanthropists, who will have to take it on faith that the cost of sequencing will fall.
Haussler and his colleague Stephen O'Brien, a geneticist at the National Cancer Institute, were chatting a year and a half ago about the economics of sequencing when the idea for Genome 10K was born. Both had worked in the last decade on a panel to advise the National Institutes of Health on prioritizing species for sequencing. The recommended species included popular animals, e.g. a cat, a dog, and a cow, as well as some more obscure species, such as the mysterious platypus. They knew transporting frozen animal tissue samples, tapping species experts, and navigating through paperwork, was the tricky and time-consuming part of the process.
"We thought, why don't we take five years and see if we can gather up specimens?" says O'Brien. They recruited Oliver Ryder, a geneticist who oversees the San Diego Zoo's large collection of animal samples, and held a meeting at Santa Cruz last spring with 75 biologists who have large tissue collections.
"We put them all in the same place and asked them, how would you like all of your species sequenced?" says O'Brien. In November 2009, Haussler, O'Brien, and Ryder announced their plan, including the 55 institutions that had committed their animal collections, in the Journal of Heredity.
"The scientific community has rallied behind it," O'Brien says. "It's basically something most people think is a good idea." Some people grumble that it's taking away money from "real science," he notes, "but we're just trying to complement real biology."
The list of museums and universities with substantial amounts of animal tissue include places in Canada, Brazil, Ireland, Portugal, France, Germany, Denmark, China, and Australia. Researchers must identify each species, separating Cranwell's horned frog from the Suriname horned frog. Only 20 micrograms of DNA are needed from each species, enough for a frozen tissue sample to ship to one of 20 sequencing centers around the world.
Many of these frozen animals have been hibernating for decades.
A cool field trip
The best way to spot animals in a frozen zoo is to huff, puff, and blow away the white haze of nitrogen gas.
Milky vapor at -270Ã‚Â°F pours over the top of a metal barrel, wide as a refrigerator. Geneticist Oliver Ryder, clad in blue gloves like oversized oven mitts, holds up the lid.
Under the fog is a metropolis. Five blue towers are crowded like high-rise apartments, each 13 stories tall. On each floor are 100 neatly spaced vials, half-filled with liquid hues of pink, green, or orange. These are the animals.
"Everyone has an apartment house name," says Ryder.
Six such barrels are the urban living quarters of more than 8,600 species, many endangered, and a few about to be extinct. It's the largest frozen repository of animals in the world, except for its own backup. The entire collection is replicated in an undisclosed location should an earthquake destroy the three-story Beckman Center for Conservation Research in Escondido, California. After all, it's taken 35 years to amass the collection.
"We knew that someday we'd want to study these animals," said Ryder. "But if we don't collect them now, we'll never have another chance."
These samples will form a major part of the Genome 10K collection. Most were snipped from animals at the nearby San Diego Zoo. To avoid stressing out the animals, the researchers only cut skin samples when an animal is already being handled for medical reasons. The sample is chopped up into little pieces and incubated for several weeks until the cells have multiplied many times over. Then, they're taken to their permanent exhibit in the liquid nitrogen room, where they're checked on after a few weeks to make sure the cells are still alive.
The animal samples hibernate for years, or decades, until a researcher dusts off a weathered card catalog--or, more recently, checks a computer database-- and thumbs to the apartment number of the desired specimen before extending a gloved hand into its icy resting place. Just before entering the sequencer, the animals picked for Genome 10K will have 30 minutes to thaw.
DNA scraps to computer bytes
Before the genomes become alphabet strings on Haussler's computer screen, they are tissue samples in DNA sequencing labs like Nader Pourmand's, a colleague of Haussler's at UCSC. There, local favorites like the dolphin, elephant seal, and banana slug, the school mascot, are chilled in giant freezers.
Pourmand and his lab of engineers are trying to sequence on the cheap--they're inventing fast methods of translating twisted strands of DNA into an electronic sequence by tinkering with the current generation of sequencing machines. Compiling a library of DNA slices, for example, takes two or three days on a standard sequencer, explains Pourmand's colleague Akram Sial.
"We now have a robot that does it in two and a half hours," says Sial.
The many steps of sequencing DNA begin with tiny amounts of it, about one millionth the mass of a paper clip. The tissue that provides the DNA runs through solutions that chew through the cell and nuclear membrane. Because the full genome, at 3.2 billion letters of DNA, is too long for the sequencer to process, machines chop the DNA into tiny bits. Whirring sequencers as big as refrigerators can "read" these fragments one letter at a time in just a fraction of a second. The newest sequencers can read thousands of letters at a time.
By enlisting robots to do the grunt work, the lab saves money on time and labor. How fast these "next generation" sequencers increase their output will determine when Genome 10K can actually begin uncovering DNA en masse.
From DNA to drug
Animals are full of enviable medical secrets--sharks don't get cancer and penguins don't contract HIV. As an animal population encounters predators, rough weather, and rampant disease, Darwinian selection demands that only the most fit survive. The more each animal differs from its kin, the more likely it is that some will have a genetic shield against the harsh environment, and thus sexually propagate.
"Animals don't have hospitals, doctors, or HMOs to protect them," says O'Brien. "Just genetic variation."
Studying the genomes of animals like aardvarks, sloths, and ferrets, who somehow avoided diseases like diabetes and cancer, is like watching Mother Nature's medical experiments over millions of years.
"We happen to stumble on most drugs by chance, and they seem to work, but we don't really understand the mechanism," says Haussler.
Part of Haussler's and O'Brien's labs are devoted to understanding the genetic basis of cancer, using DNA from human tumors as well as DNA from the three dozen animals sequenced in the last decade. In a carpeted computer room with two rows of Dell computers, UCSC graduate student Zack Sanborn points to his laptop screen at a section of A's, T's, G's, and C's. Zooming out, the alphabet string is part of a human gene. Under the strip representing the human section of the genome are thirty more strips, with labels like "cow" and "panda."
The strips of DNA, from fish to human, line up remarkably well, like two similar barcodes would at a grocery store. The gene Sanborn is interested in today, called ACYL3, is a single black line among many.
He traces that line from platypus to gorilla with no break--for several hundreds of millions of years, mammals carried that particular gene. But at the chimpanzee, the line vanishes. He says the gene fell out of favor about 5 million years ago, shortly before humans and chimpanzees parted ways, and was jettisoned at that time.
No one knows why ACYL3 was lost, but a similar gene loss, this time the loss of the CCR5 gene in more recent human history, may have provided resistance to infectious diseases. The lucky 10 percent of Europeans whose ancestors lacked this gene are immune to HIV AIDs.
Part of the mystery of the genome is that just 1.5 percent of it actually instructs the body to assemble proteins, the building blocks of cells. These are the "genes." Through careful evolutionary analysis we now know that the other 98.5 percent of the genome, often called "junk DNA," isn't the garbage DNA scientists once thought it was. It somehow orchestrates the entire building operation, turning genes on and off at its discretion.
Until the regulatory mechanisms hidden in the junk are deduced--no small feat--drug makers will be mostly blind in discovering how genes are regulated, for example, in switching division of cancerous cells on and off.
Advances in computer algorithms will speed this along. Comparing two entire genomes side by side is far too sweeping for the naked eye, but it's a flood of information that Haussler's students harness with a thousand computers and a dash of cleverness.
Just a few years ago, running an analysis of a tumor genome could take Sanborn eight hours. He used to start his program late at night, go back to his apartment to sleep, and hope it was done by morning. "Now I just have enough time to get coffee," he says.
Save the whales
It's no accident that one in five of the animals on the ark is endangered or threatened. The team hopes Genome 10K will help embattled species avoid extinction.
Tasmanian devil populations, for instance, have plummeted since the 1990s. Their nemesis is a rare genetic facial tumor. Scientists have tried isolating affected devils so they won't breed, but they can't solve the basic problem: Small populations have less genetic diversity and are much more susceptible to rare diseases because closely related animals have similar genes. If one carries a rarely expressed genetic defect on a recessive gene, the chances are high its mate does too.
Avoiding this critical "bottleneck," the point when many scientists believe it's too late to salvage an inbred population, is just one way the genome ark could save animals. How?
One approach would be to selectively breed disease-free animals. O'Brien helped save the Florida panther from extinction by breeding the panther with the Texas puma. It was just enough to weed out the genetic problems and get the species back on track.
"For purists, it's a little bit of a loss. There's a little Texas puma in there," says Haussler. "But it saved the species!"
In the future, scientists can identify the genetic vulnerabilities in a small population and selectively breed the animals that are disease-free to avoid perpetuating the problem. Or, in an even more sophisticated move, scientists could wipe out the disease altogether by altering the DNA in animal eggs.
Raising the dead?
Oliver Ryder always got the same question when he showcased his Frozen Zoo: Will you be able to recreate animals with your preserved cells?
"We said no," says Ryder. "But then Dolly [the cloned sheep] was born."
Cloning is no longer the stuff of science fiction, but resurrecting extinct animals is still as Hollywood as Jurassic Park. The brains behind Genome 10K are quick to clarify the project won't raise animals from the dead, at least not anytime soon.
"I think it'll probably happen," says O'Brien. "Anything's possible, if you go far enough into the future." A hundred years ago, no one thought gene therapy was possible, he says. No one knew what DNA was.
To be fair, sequencing ancient animals is enormously more complicated than sequencing the recently deceased. The longer an animal is dead, the more its DNA degrades and gets contaminated. Sequencing a Neanderthal or a woolly mammoth is a hundred times more difficult than getting a sequence from a living snake, says O'Brien: "It's very expensive and very interesting, but it's not easy."
Transcribing the human genome was like Gutenberg printing the Bible as the first tome off the printing press, says O'Brien. "Now we can have a library full of such books. What people read, and do with them, is up to our descendants."
If the day comes that biologists can rebuild an animal from its genetic alphabet alone, Genome 10K will be the go-to encyclopedia, frozen in the 21st century.