When the notorious "Golden State Killer" — known for a series of rapes and murders in California in the 1970s and 1980s — was caught last April, it spurred a collective sigh of relief. But the way authorities found the killer — using data from a genealogy website — left people with unsettling feelings about the power of genetic testing.
That's because the Golden State Killer was nabbed by his DNA when police matched the samples to that of his third cousin who had uploaded genetic data to a genealogy database. Since then, debate has swirled around the ethics of using genealogy websites to aid in forensic investigations. [Genetics by the Numbers: 10 Tantalizing Tales]
And now, a new study demonstrates just how wide-reaching these genealogy websites really are. Researchers found that around 60 percent of people in a database of over 1.2 million people could be matched with at least one other person in the database who was a third cousin or an even closer relation.
Indeed, a genetic database needs to cover only 2 percent of a target population to find at least a third-cousin match to nearly any person, they wrote in the study, published yesterday (Oct. 11) in the journal Science.
The group analyzed data from 1.28 million anonymous people on a genealogy website called My Heritage. (The lead study author, Yaniv Erlich, is the website's chief scientific officer.) By comparing what are called identify-by-descent (IBD) segments in people's DNA, the server can locate even distant relatives such as second or third cousins. The greater the amount of IBD shared between two people, the closer their relation is.
The researchers targeted shared IBD segments that would correspond to second, third or fourth cousins. They found that 60 percent of their searches returned a match — most of them were a third cousin or closer. The researchers then did a similar, but smaller, search on GEDmatch (the database that was used to catch the Golden State Killer) and found that 76 percent of their 30 random searches matched up with a third cousin or closer.
Further, they found that people with Northern European ancestry were easiest to link up. Around 75 percent of the people in the database were from Northern Europe, and they were 30 percent more likely to have a match than individuals with a genetic background from sub-Saharan Africa.
The team found that once those relatives were located, the identity of the anonymous person could be easily figured out through examining family lineages and demographic information, such as the age of the person or where they live. They showed this by discovering the identity of an anonymous woman after finding her distant relatives.
Indeed, between April and August of this year, at least 13 cold cases in the U.S. (including that of the Golden State Killer) were solved by such searches, according to the study. What makes them so powerful is that while forensic database searches — which are tightly regulated — can only find close relatives to the first or second degree, genetic database searches can find more distant ones.
"While policymakers and the general public may be in favor of such enhanced forensic capabilities for solving crimes, it relies on databases and services that are open to everyone," the authors wrote. "Thus, the same technique could also be exploited for harmful purposes, such as re-identification of research subjects from their genetic data."
The researchers propose that policies should be put in place to protect people's genetic data. They also recommend that geneology sites begin to protect raw genetic data files with a secure digital signature to make it more difficult to access that data.
Originally published on Live Science.
Live Science newsletter
Stay up to date on the latest science news by signing up for our Essentials newsletter.
Yasemin is a staff writer at Live Science, covering health, neuroscience and biology. Her work has appeared in Scientific American, Science and the San Jose Mercury News. She has a bachelor's degree in biomedical engineering from the University of Connecticut and a graduate certificate in science communication from the University of California, Santa Cruz.