Coronavirus mutations: what we've learned so far

Stylized SEM of the SARS coronavirus.
(Image credit: Images)

In early January, the first genome sequence of Sars-CoV-2 — the virus that causes COVID-19 — was released under the moniker "Wuhan-1". This string of 30,000 letters (the A, T, C and Gs of the genetic code) marked day one in the race to understand the genetics of this newly discovered coronavirus. Now, a further 100,000 coronavirus genomes sampled from COVID-19 patients in over 100 countries have joined Wuhan-1. Geneticists around the world are mining the data for answers. Where did Sars-CoV-2 come from? When did it start infecting humans? How is the virus mutating — and does it matter? Sars-CoV-2 genomics, much like the virus itself, went big and went global.

The term mutation tends to conjure up images of dangerous new viruses with enhanced abilities sweeping across the planet. And while mutations constantly emerge and sometimes sweep — early mutations in Sars-CoV-2 have made their way around the world as the virus spread almost unnoticed — mutations are a perfectly natural part of any organism, including viruses. The vast majority have no impact on a virus's ability to transmit or cause disease.

A mutation just means a difference; a letter change in the genome. While the Sars-CoV-2 population was genetically essentially invariant when it jumped into its first human host in late 2019, over 13,000 of these changes are now found in the 100,000 Sars-CoV-2 sequenced to date. Yet any two viruses from any two patients anywhere in the world differ on average by only ten letters. This is a tiny fraction of the total 30,000 characters in the virus's genetic code and means that all Sars-CoV-2 in circulation can be considered part of a single clonal lineage.

Slowly mutating

It will take some time for the virus to acquire substantial genetic diversity. Sars-CoV-2 mutates fairly slowly for a virus, with any lineage acquiring a couple of changes every month; two to six-fold lower than the number of mutations acquired by influenza viruses over the same period.

Still, mutations are the bedrock on which natural selection can act. Most commonly mutations will render a virus non-functional or have no effect whatsoever. Yet the potential for mutations to affect transmissibility of Sars-CoV-2 in its new human hosts exists. As a result, there have been intense efforts to determine which, if any, of the mutations identifiable since the first Sars-CoV-2 genome was sequenced in Wuhan may significantly alter viral function.

An infamous mutation in this context is an amino acid change in the Sars-CoV-2 spike protein, the protein that gives coronaviruses their characteristic crown-like projections and allows it to attach to host cells. This single character change in the viral genome — termed D614G — has been shown to increase virus infectivity in cells grown in the lab, though with no measurable impact on disease severity. Although this mutation is also near systematically found with three other mutations, and all four are now found in about 80% of sequenced Sars-CoV-2 making it the most frequent set of mutations in circulation.

The challenge with D614G, as with other mutations, is disentangling whether they have risen in frequency because they happened to be present in viruses responsible for seeding early successful outbreaks, or whether they truly confer an advantage to their carriers. While genomics work on a UK dataset suggests a subtle role of D614G in increasing the growth rate of lineages carrying it, our own work could find no measurable impact on transmission.

Simply carried along

D614G is not the only mutation found at high frequency. A string of three mutations in the protein shell of Sars-CoV-2 are also increasingly appearing in sequencing data and are now found in a third of viruses. A single change at position 57 of the Orf3a protein, a known immunogenic region, occurs in a quarter. Other mutations exist in the spike protein while myriad others seem induced by the activity of our own immune response. At the same time, there remains no consensus that these, or any others, are significantly changing virus transmissibility or virulence. Most mutations are simply carried along as Sars-CoV-2 continues to successfully spread.

But replacements are not the only small edits that may affect Sars-CoV-2. Deletions in the Sars-CoV-2 accessory genes Orf7b/Orf8 have been shown to reduce the virulence of Sars-CoV-2, potentially eliciting milder infections in patients. A similar deletion may have behaved in the same way in Sars-CoV-1, the related coronavirus responsible for the Sars outbreak in 2002-04. Progression towards a less virulent Sars-CoV-2 would be welcome news, though deletions in Orf8 have been present from the early days of the pandemic and do not seem to be increasing in frequency.

While adaptive changes may yet occur, all the available data at this stage suggests we're facing the same virus since the start of the pandemic. Chris Whitty, chief medical officer for England, was right to pour cold water on the idea that the virus has mutated into something milder than the one that caused the UK to impose a lockdown in March. Possible decreases in symptom severity seen over the summer are probably a result of younger people being infected, containment measures (such as social distancing) and improved treatment rather than changes in the virus itself. However, while Sars-CoV-2 has not significantly changed to date, we continue to expand our tools to track and trace its evolution, ready to keep pace.

This article was originally published at The Conversation. The publication contributed the article to Live Science's Expert Voices: Op-Ed & Insights.

Lucy van Dorp
Senior Research Fellow, Microbial Genomics, UCL

Lucy van Dorp is a senior research fellow in microbial genomics working at University College London’s Genetics Institute in the United Kingdom. Lucy’s research aims to contribute to the post-genomic revolution in biology and medicine through the use of computational methods applied to whole genome sequencing data to determine the factors giving rise to the patterns of genetic diversity observed in human-associated pathogens. Lucy received a doctorate from University College London in 2017.

  • Chem721
    Actually the genome of this virus uses the bases A, C, U and G. This is an RNA virus and U (uracil) substitutes for T (thymine) used in DNA-based genomes. (Once the virus is incorporated into the host DNA genome, this U position changes to T.)

    In any event, 13,000 mutations is quite a few for a genome of only 30,000 bases. Even more curious, all of these genomes must have produced viable viruses, one suspects, or they would not have been isolated. There are likely many more of those mutations than 13,000. Only viable mutations survive and propagate, and only 100,000 samples have been sequenced. There are millions of more infections not yet sequenced that would likely increase the number of mutations substantially. We are watching evolution of a virus in "real time".

    It was very good to read that the rate of SARS-CoV-2 mutations is "two to six-fold lower than the number of mutations acquired by influenza viruses over the same period." This suggests that immunity and a vaccine may last longer than previously suggested, as long as the mutation rate stays low. All the more reason to prevent its spread and limit the number of mutations. Internal mutations within a strain of virus that impacts a prior immunity is known as "antigenic drift" (1)

    While most of these mutations seem "harmless", some are a bit worrying. The D614G mutation in the spike, which has substantial global spread, does not appear to be involved in antibody binding, which means it should not impact a vaccine developed against a form without the mutation.

    Another mutation at position 57 in the spike is more worrisome because it could be involved in what is known as an "antigenic determinant", a region in the spike that could elicit an antibody response. Any mutations in these regions are cause for concern about any present vaccines, and those immune by infection to the previous "strain". It all depends on what impact the mutation has on antibody binding affinity at that site. Unfortunately they do not specify what amino acid was changed.

    It should be noted that a single change in the amino acid of a protein can have a major impact on its behavior. Sickle cell anemia is caused by a change in one amino acid in hemoglobin that leads to its precipitation in red blood cells, causing severe symptoms.

    The best news out of this report is that the rate of mutation is substantially less than that seen with influenza viruses. This provides good reason to believe that acquired immunity, or any obtained by a vaccine, may last for several years, and possibly longer, especially if we can stop the spread of the virus and limit these mutations.

  • Chem721
    Anyone interested in a article on the nature of these mutations and how they could play out may consider reading the article below. It is an examination of mutations of concern published on-line from NIH (J. Lab Physicians).
  • M&I65
    The U (uracil) substitutes for the T (thymine) in RNA, not the C (cytosine). Thus, U and A form a base pair.
  • Chem721
    M&I65 said:
    The U (uracil) substitutes for the T (thymine) in RNA, not the C (cytosine). Thus, U and A form a base pair.

    Quite right, have not thought of the coding between RNA and DNA in quite a while. Thanks for the correction!