Scientist recovers coronavirus gene sequences secretly deleted last year in Wuhan

The SARS-CoV-2 virus invades human cells by attaching to ACE2 receptors on the surfaces of those cells.

(Image credit: Shutterstock)

Finding the origin story for SARS-CoV-2, the coronavirus responsible for nearly 3.9 million deaths worldwide, has been largely hampered by lack of access to information from China where cases first popped up.

Now, a researcher in Seattle has dug up deleted files from Google Cloud that reveal 13 partial genetic sequences for some of the earliest cases of COVID-19 in Wuhan, Carl Zimmer reported for The New York Times.

In order to determine exactly how and where the virus originated, scientists need to find the so-called progenitor virus, the one from which all other strains descended. Until now, the earliest sequences are primarily those sampled from cases at the Huanan Seafood Market in Wuhan, which was initially thought to be where the novel coronavirus first emerged at the end of December 2019. However, cases from early December and as far back as November 2019 had no ties to the market, indicating pretty early in the pandemic that the virus emerged from another spot.

There was one nagging issue with those first genetic sequences. Those from cases found at the market include three mutations that are missing in virus samples from cases that popped up weeks later outside of the market. The viruses missing those three mutations matched more closely with the coronaviruses found in horseshoe bats. Scientists are relatively certain that the novel coronavirus somehow emerged from bats, so it's logical to assume the progenitor would also be missing those mutations.

Bloom noticed the missing sequences when he came across a spreadsheet in a study published in May 2020 in the journal PeerJ in which the authors list 241 genetic sequences of SARS-CoV-2 through the end of March 2020; the sequences were part of a Wuhan University project called PRJNA612766 and were supposedly uploaded to the Sequence Read Archive. He searched the archive database for the sequences and got the message "No items found," Bloom wrote in the bioRxiv paper, which has not been peer-reviewed.

"There is no plausible scientific reason for the deletion: the sequences are perfectly concordant with the samples described in Wang et al. (2020a,b)," Bloom wrote in bioRxiv. "There are no corrections to the paper, the paper states human subjects approval was obtained, and the sequencing shows no evidence of plasmid or sample-to-sample contamination. It therefore seems likely the sequences were deleted to obscure their existence."

Jeanna Bryner is managing editor of Scientific American. Previously she was editor in chief of Live Science and, prior to that, an editor at Scholastic's Science World magazine. Bryner has an English degree from Salisbury University, a master's degree in biogeochemistry and environmental sciences from the University of Maryland and a graduate science journalism degree from New York University. She has worked as a biologist in Florida, where she monitored wetlands and did field surveys for endangered species, including the gorgeous Florida Scrub Jay. She also received an ocean sciences journalism fellowship from the Woods Hole Oceanographic Institution. She is a firm believer that science is for everyone and that just about everything can be viewed through the lens of science.