What Is ENCODE, and Why Does It Matter?

Visualization of the DNA double helix.
Visualization of the DNA double helix. (Image credit: National Cancer Institute)

A giant leap has just been taken in humanity's understanding of itself. That leap is called ENCODE. Here's what you need to know.

Eleven years ago, scientists sequenced the human genome. That is, they unraveled the spirals of DNA packed inside the nucleus of each of our cells and figured out the ordering of its 3.3 billion chemical "base pairs," or the molecular letters, of sorts, that spell out instructions for the cells to follow.

But although the Human Genome Project (as the endeavor was called) established the order of the base pairs, most of the code that these letters spelled out remained encrypted.

Scientists could see that roughly 23,000 sections of the genome, made up of about 1,000 base pairs each, coded for proteins. In other words, these sections, called genes, were structured in such a way that cells could read them off to build protein molecules, which then performed cellular functions. But the genes made up less than 2 percent of the total human genome. What did the rest of the endless spirals of DNA base pairs mean? Many scientists thought most of it was useless gobbledygook left over from our evolutionary past. They called it "junk DNA." [How to Speak Genetics: A Glossary]

Now, an international collaboration of 442 scientists has unveiled the Encyclopedia of DNA Elements, nicknamed ENCODE. In more than two dozen articles published in Nature, Science and other journals, the scientists present nine years of research showing that genes are just one element of a long "parts list" that makes up the human genome. Rather than being mostly junk, 80 percent of DNA has a function, and ENCODE is the encyclopedia that describes what all of it does.

Half or more of human DNA acts as "gene switches." These portions of code control when genes turn on and off, affecting how many proteins get built both throughout the day and over the course of a lifetime. There's a gene switch that tells an undifferentiated cell in an embryo to develop into a liver cell, for example; there's another switch that directs a cell in the pancreas to rev up its insulin production after a meal; and there's another that tells a skin cell it's time to bud off, notes Time Magazine.

"What we learned from ENCODE is how complicated the human genome is, and the incredible choreography that is going on with the immense number of switches that are choreographing how genes are used," Eric Green, director of the National Human Genome Research Institute (which ran the nine-year-long ENCODE project), told reporters during a teleconference.

So, why does it matter that we now have an encyclopedia of human DNA?

For one, knowing what so much more of the genetic code actually does will help pinpoint what makes us human; evolutionary biologists can study how the gene switches, as well as the genes, of Homo sapiens diverged from those of other animals.

More importantly, scientists say the new encyclopedia of DNA will tremendously accelerate our understanding of why diseases occur and how to prevent them. That's because, more often than not, diseases stem from changes that occur in regions of the genetic code formerly labeled "junk."

"Most of the changes that affect disease don't lie in the genes themselves; they lie in the switches," Michael Snyder, an ENCODE researcher based at Stanford University, told The New York Times.

Take cancer. It turns out that most of the changes to DNA that make cells turn cancerous do not occur in genes, but in the portions of DNA that exert control over genes: the switches. Knowing what these switches do, researchers say they can begin to develop drugs that target the control circuitry, rather than targeting the genes themselves, which, in many cases, are impervious to direct attack.

The ENCODE project "will definitely have an impact on our medical research on cancer," Dr. Mark Rubin, a prostate cancer genomics researcher at Weill Cornell Medical College, told the Times. [What If We Eradicated All Disease?]

Scientists have already found changes to gene switches that appear to usher in the development of multiple sclerosis, arthritis, Crohn's disease, lupus and celiac disease. Other common diseases such as diabetes, heart disease, hypertension and depression also fit the profile of conditions that likely result from changes to the way genes get turned on or off, rather than changes to genes themselves.

"By and large, we believe rare diseases may be caused by mutations in the protein [or gene-]coding region" of DNA, Green told reporters, while the "more common, complicated diseases may be traced to genetic changes in the switches."

Common diseases: we're coming for you.

Follow Life's Little Mysteries on Twitter @llmysteries. We're also on Facebook & Google+.

Live Science Staff
For the science geek in everyone, Live Science offers a fascinating window into the natural and technological world, delivering comprehensive and compelling news and analysis on everything from dinosaur discoveries, archaeological finds and amazing animals to health, innovation and wearable technology. We aim to empower and inspire our readers with the tools needed to understand the world and appreciate its everyday awe.