Scientists have finally mapped an entire human genome, nearly two decades after researchers first announced that they had sequenced the majority of the roughly 3 billion letters contained in human DNA.
Though the Human Genome Project was hailed worldwide when it was completed in 2003, at the time, many sections of the genome still couldn't be placed. The new work — achieved by a consortium of scientists led by the National Human Genome Research Institute, the University of California, Santa Cruz and the University of Washington in Seattle — finally fills in the last 8% of DNA letters, or base pairs, that had no home in the sequence before.
The new genome paves the way to a better understanding of how people's DNA can differ and how genetic mutations can contribute to disease. The scientists published their findings March 31 in the journal Science (opens in new tab).
In 2003, scientists at the Human Genome Project and the biotech company Celera Genomics solved the biggest chunk of the puzzle. But technological limitations meant that they couldn't fit 15% of the human DNA sequence into the picture. Most of the unmapped regions were concentrated around telomeres (the caps on the ends of chromosomes) and centromeres (the chromosomes' densely packed middle sections). In 2013, researchers narrowed this gap to just 8%, but they still couldn't place 200 million base pairs — the equivalent of an entire chromosome.
"Ever since we had the first draft human genome sequence, determining the exact sequence of complex genomic regions has been challenging," study co-author Evan Eichler, a researcher at the University of Washington School of Medicine, said in a statement. "I am thrilled that we got the job done. The complete blueprint is going to revolutionize the way we think about human genomic variation, disease and evolution."
DNA is made of tiny molecules called nucleotides, each of which contains a phosphate group, a sugar molecule and a nitrogen base. The four types of nitrogen bases (adenine, thymine, guanine and cytosine) pair together to make the rungs on the DNA double helix that encodes our genetic identity. Two strands of these double helices form a chromosome, and humans have a total of 23 chromosome pairs, one from each parent. DNA sequencing is the process of figuring out the order of the base pair building blocks in a section of DNA.
To complete the Human Genome Project, researchers relied on short-read technologies, which scanned several hundred base pairs at a time, separating them out into DNA snippets that were tiny in comparison with the much larger whole genome. This made the project akin to assembling a 10-million-piece puzzle of blue sky and left behind a lot of gaps. The job was also difficult because the two chromosomes in a chromosome pair came from a different person (one from each parent), making it harder to distinguish between DNA sequences from the same stretch of the genome that varied among people and pieces that came from different locations.
To get around these difficulties, the new study's researchers turned to a weird type of human tissue called a complete hydatidiform mole, which forms when a sperm fertilizes an egg without a nucleus. The egg is nonviable, and it attaches to the uterus to grow as a "mole" with all of the chromosomes from the father but none from the mother.
From this mole, the scientists made a cell line (a group of cells which can be grown in the lab) which contained 23 chromosome pairs only from one person. To sequence the hydatidiform mole DNA, the scientists used two new sequencing techniques that turned the sequencing project into a puzzle with tens of thousands of pieces. The new long-read techniques use lasers to scan 20,000 to 1 million base pairs at a time, creating much larger puzzle pieces and, therefore, fewer gaps than before.
The long-read methods enabled the team to piece together some of the most difficult and repetitious sections of the code. The result: They discovered 115 new genes that they think code for proteins, adding to an entire genome total of 19,969.
The creation of the first gapless sequence won't be the end of the researchers' efforts, however. They estimate that around 0.3% of the genome could contain errors, and researchers will need better methods of quality control to verify these hard-to-sequence regions.
Additionally, the sperm cell that fertilized the sequenced hydatidiform mole contained only an X chromosome, so the researchers will need to separately sequence a Y chromosome, which triggers an embryo to develop as biologically male, as well as embark on a more ambitious sequencing of a genome from both parents.
The scientists believe that the more complete map of the human genome will enable future researchers to better understand how DNA varies across individuals and across communities, as well as providing them with a better reference point to study mutations in the genome which can cause harmful diseases.
The researchers have also teamed up with the Human Pangenome Reference Consortium, a group that aims to sequence more than 300 human genomes from around the world. This initiative will not only give scientists a better look at which parts of the genome differ among individuals but also help them better understand how different genetic illnesses emerge and how to best treat them.
"In the future, when someone has their genome sequenced, we will be able to identify all of the variants in their DNA and use that information to better guide their healthcare," Adam Phillippy, a senior investigator at the National Human Genome Research Institute, said in the statement. "Truly finishing the human genome sequence was like putting on a new pair of glasses. Now that we can clearly see everything, we are one step closer to understanding what it all means."
Originally published on Live Science.