The first concerted effort to understand all the inner workings of the DNA molecule is overturning a host of long-held assumptions about the nature of genes and their role in human health and evolution, scientists reported yesterday.
The new perspective reveals DNA to be not just a string of biological code but a dauntingly complex operating system that processes many more kinds of information than previously appreciated.
The findings, from a project involving hundreds of scientists in 11 countries and detailed in 29 papers being published today, confirm growing suspicions that the stretches of "junk DNA" flanking hardworking genes are not junk at all. But the study goes further, indicating for the first time that the vast majority of the 3 billion "letters" of the human genetic code are busily toiling at an array of previously invisible tasks.
The new work also overturns the conventional notion that genes are discrete packets of information arranged like beads on a thread of DNA. Instead, many genes overlap one another and share stretches of molecular code. As with phone lines that carry many voices at once, that arrangement has prompted the evolution of complex switching, splicing and silencing mechanisms -- mostly located between genes -- to sort out the interwoven messages.
The new picture of the inner workings of DNA probably will require some rethinking in the search for genetic patterns that dispose people to diseases such as diabetes, cancer and heart disease, the scientists said, but ultimately the findings are likely to speed the development of ways to prevent and treat a variety of illnesses.
One implication is that many, and perhaps most, genetic diseases come from errors in the DNA between genes rather than within the genes, which have been the focus of molecular medicine.
Complicating the picture, it turns out that genes and the DNA sequences that regulate their activity are often far apart along the six-foot-long strands of DNA intricately packaged inside each cell. How they communicate is still largely a mystery.
Altogether, the new project shows that the simple sequence of DNA letters revealed to great fanfare by the $3 billion Human Genome Project in 2003 was but a skeletal version of the human construction manual. It is the alphabet, but not much more, for a syntactically complicated language of life that scientists are just now beginning to learn.
"There's a lot more going on than we thought," said Francis Collins, director of the National Human Genome Research Institute, the part of the National Institutes of Health that financed most of the $42 million project.
"It's like trying to read and understand a very complicated Chinese novel," said Eric Green, the institute's scientific director. "The take-home message is, 'Oh, my gosh, this is really complicated.' "
The findings come from the Encyclopedia of DNA Elements project, nicknamed Encode. While much of the decades-long effort to understand DNA's role in health and disease has been driven by scientists' interest in particular genes, Encode focused on a representative 1 percent of the genome. Using a variety of experimental and computational approaches, the researchers sought to catalogue everything going on there.
The 3 1/2 -year effort was designed as a pilot project to see whether it would be practical to study the entire genome in such depth and to hasten the development of cheaper tools to do so. Encode was so successful, Collins said, that the remaining 99 percent of the genome is expected to be studied the same way for $100 million.
The teams targeted 44 areas along the genome, half of them already of interest and half chosen at random to include gene-dense "urban" areas and expanses of seemingly inactive genetic "desert."
Perhaps most surprising was how much of the human genome is at work at any given time, the scientists said.
Researchers have long known that only about 2 percent of human DNA is involved in making proteins, the molecular workhorses inside cells. That involves a two-step process in which a stretch of DNA -- a gene -- serves as a template to produce a strand of RNA, which is then used as a template to produce a protein.
Recent studies had shown that some snippets of DNA between genes also are transcribed into RNA even though they do not go on to make proteins. Surprisingly, though, the new work shows that most of a cell's DNA gets transcribed, raising questions about what all that RNA is doing.
Some of it may be doing nothing. "It may be like clutter in the attic," Collins said, noting that clutter could be useful when conditions change and evolution needs new material to work with.
But much of it seems to be playing crucial roles: regulating genes, keeping chromosomes properly packaged or helping to control the spectacularly complicated process of cell division, which is key to life and also is at the root of cancer.
"We are increasingly being forced to pay attention to our non-gene DNA sequences," John M. Greally of the Albert Einstein College of Medicine in New York wrote in a commentary in today's issue of the journal Nature, where one of the new reports is being published. The 28 other papers appear in today's issue of Genome Research.
Greally noted that several recent studies have found that people are more likely to have Type 2 diabetes and other diseases if they have small mutations in non-gene parts of their DNA that were thought to be medically irrelevant.
Another aspect of Encode had researchers looking at the equivalent 1 percent of the genomes of more than 20 other mammals, and those results are forcing them to rethink the interplay between genetics and evolution.
The expectation was that many of the most active DNA sequences in humans would be prevalent in other mammals, too, because evolution tends to save and reuse what works best. But more than half were not found in other creatures, which suggests they may not be that important in people, either, said Ewan Birney of the European Bioinformatics Institute in Cambridge, England, a coordinator of the Encode effort.
"I think of them as gate-crashers at a party," Birney said. "They appeared by chance over evolutionary time . . . neither to the organism's benefit nor to its hindrance. That is quite an interesting shift in perspective for many biologists."
Although the new view of the genome may at first complicate efforts to identify DNA stretches of prime medical interest, Encode is sure to help in the long run, said Michael Snyder of Yale University, another coordinator.
"Defining the functional elements helps us zoom in to look for differences in sequence that might relate to disease," he said.