Coronavirus
Structural Task Force

Exoribonuclease: Making the most when mistakes are made

The building plan

Storing the building plans for a virus in its genome is much like how we store ideas in language. This may sound strange but, as an example, typos in spelling, grammar, or word usage, can lead to the meaning of a sentence either changing dramatically, remaining virtually unchanged, or becoming complete nonsense. The SARS-CoV-2 genome consists of RNA. Transcription of this RNA runs into a similar problem: errors can lead to the loss of function, a gain of function, or be completely inconsequential to the resulting protein (Figure 1). Large changes may break the virus, but smaller changes may provide an advantage and are essential for evolution.

Figure 1. What can happen when mistakes are made A. Errors can cause a freeze in transcription. B. Errors can cause a copy to lose meaning and would continue with subsequent copies. C. Errors can be deleted and corrected as information is copied.

Targeting the copy machine

In a previous article we spoke about the copy machinery of the virus, including the RNA-dependent RNA polymerase (RdRp), and drugs targeting it, such as Remdesivir. The goal of these drugs is to jam the enzyme and halt RNA production - or to cause more errors than are sustainable, with the end result being a less infectious virus. The reason the development of drugs targeting the copy machinery of RNA is worthwhile is that humans don’t have machinery to reproduce RNA from RNA. This means drugs targeting this machinery are less likely to interfere with normal processes in people. What if the virus could quickly repair these errors before the new genome is packed into a hull and kicked out the door? That would make finding a therapeutic much more difficult…

Correctional facilities

Unfortunately, SARS-CoV-2 has a way to repair the mistakes. When errors are introduced in transcription through environmental mutagenesis or even mutations caused by nucleotide analogs like Ribavarin​1–3​, the non-structural protein 14 (nsp14) has the ability to remove them. This multifunctional protein removes errors with the exoribonuclease (ExoN) activity of its N-terminal domain, while the C-terminal domain has the unrelated function of methylating the end cap of the viral RNA​3,4​.  

However, this ExoN does not work alone. There is a replication complex made up of proteins performing many roles in the production of new RNA with high fidelity. Nsp12 is the main hub that makes a new RNA chain to complement the template. Nsp7 and nsp8 have a “processivity” role to enable nsp12 to function efficiently. In addition to these proteins there is a two-component proofreading system of Helicase (nsp13) and the ExoN domain of nsp14. Helicase can detect misshapen RNA helices caused by errors made by the copy machinery​5​. It then unwinds these double strands of RNA and feeds the strand containing the error into the ExoN domain of nsp14 where they are chopped out. This results in nsp12 continuing RNA replication where it left off.

Exoribonuclease or no exoribonuclease

Figure 2. Presence of Exoribonuclease (ExoN) is associated with large viral genomes. Viral genomes containing an exoribonuclease proofreading gene highlighted in red. Figure modified from Smith, Denison 2012​6​.

The proofreading ability from Helicase and nsp14 ExoN allows SARS-CoV-2 to have a huge genome as compared to other viruses​6​(Figure 2). The large 29.9 kb genome of SARS-CoV-2 requires much more physical space to accommodate the necessary genetic information for reproduction when compared to other RNA viruses, such as Rhinovirus that has a genome between 7.2 kb and 8.5 kb in size (Figure 3). When no ExoN proofreading is present genomes cannot expand beyond 20 kb in size​6​(Figure 2). Maybe by removing the exoribonuclease activity, irreversible damage could be caused to the genome of SARS-CoV-2.

Figure 3. A high detail 3D printed model of SARS-CoV-2 alongside Rhinovirus. Scaled at 1 to 1,000,000 (1 mm represents 1 nm).

Nsp14 Structure

In order to understand how nsp14 can do this, we need to find out its atomic structure; this may also allow us to develop a drug which hinders its function. However, to this date, no structure of nsp14 from SARS-CoV-2 has been solved. However, structures have been solved of nsp14 in complex with another viral protein, nsp10, both from SARS-CoV (PDB entries 5nfy, 5c8s, 5c8t, 5c8u)​2,7​. As the protein sequences are very similar between SARS-CoV and SARS-CoV-2 (nsp14 is 95%, and nsp10 is 97% identical), it can be assumed that the SARS-CoV-2 structure as well as its functionality are very similar to SARS-CoV. The active site of the ExoN domain of nsp14 from SARS-CoV-2 has a DEEDh motif (named for the one-letter codes of the amino acids involved) containing a histidine as well as two aspartates and two glutamates​2,3,7,8​

Figure 4. Structure (PDB ID: 5c8s) of SARS-CoV nsp14 bound to nsp10. The orange domain of nsp14 is responsible for the exoribonuclease activity with the active site residues highlighted in yellow. The green domain has methyltransferase activity. The dark grey region joining the two domains is flexible. The nsp10-interacting region is shown in pink and finally, nsp10 in blue.

Nsp14 interacts with nsp10

The N-terminus of nsp14 interacts with nsp10 (pink and blue, respectively, in Figure 4). The following domain (orange) has been shown to have exoribonuclease activity on double stranded RNA in a 3’ to 5’ direction​9​. When nsp10 is interacting with nsp14 there is a 35 fold increase in exoribonuclease activity, which is thought to occur due to conformational changes caused by formation of the complex​2,9​. The ExoN domain of nsp14 (orange) is connected to the methyltransferase domain (green) by a flexible hinge (black)​7,10​. This flexible region opens up the methyltransferase active site to allow methylation of the N7 of the 5’ Guanosine triphosphate of RNA​10​. There are three zinc finger motifs in nsp14 with two found in the ExoN domain and one in the methyltransferase domain​2,7​. In combination with the two further zinc sites in nsp10, these zinc fingers hold loops of the proteins together and are involved with nucleotide interaction​2,7​.

Nsp14 has also been demonstrated to form complexes with the copy machinery , nsp12, nsp7, and nsp8, although this interaction is independent of nsp10​2,11,12​.

Exoribonuclease active site and potential drug development

Figure 5. Active site of Exoribonuclease domain from SARS-CoV (PDB entry 5c8s). A. Electrostatic surface with the negatively charged pocket in red. B. Low energy conformation of multiple overlaid ligands from an in silico screen in the DEEDh active site (taken from Khater S. et al 2020).

Scientists are searching for drugs that could be used to target nsp14 in order to find a cure for COVID-19. The active site of the ExoN domain of nsp14 has five residues that are essential for activity that form a negatively charged pocket (Figure 5A)​7​. Currently researchers are using the nsp14 structure from SARS-CoV to model a SARS-CoV-2 structure which can be used to identify compounds that could bind to the active site (Figure 5). These in silico screens start with nucleotide analog drugs like Remdesivir,  Ribivarin or Ritonavir that are currently used as antiviral treatments for other viruses​13–15​. These nucleotide analogs are then changed to achieve a better binding to Nsp14’s active site in order to block it (Figure 5B).

As the ExoN is essential to support the huge 29.9kb genome of SARS-CoV-2, targeting nsp14 could lead to an effective treatment to COVID-19. Although drugs that target just nsp14 could be effective at increasing the error rate in RNA production by the virus, a more effective treatment will require inhibition of the RdRp of the copy machinery at the same time!

Available structures

If you would like to look at the currently available structures for Nsp14(currently only available from SARS-CoV), they are available from our data base; we provide information on the quality of measurement data and models as well as improved structures. The highest resolution structure of nsp14 is PDB entry 5c8t at 3.2Å. This has a bound S-Adenosyl methionine ligand as well as zinc atoms present. Alongside this, another structure of Nsp14 bound to S-Adenosyl homocysteine and a guanosine-triphosphate-adenosine ligand as well as zinc at 3.33Å resolution has been published (PDB: 5c8s). Additionally, two structures with zinc atoms but no ligands are available (PDB 5c8u 3.4Å at and 5nfy at 3.34Å). Both PDB entry 5c8t and 5nfy have improved structures re-refined by our group.

Sources

  1. 1.
    Zuo Y. Exoribonuclease superfamilies: structural analysis and phylogenetic distribution. Nucleic Acids Research. Published online March 1, 2001:1017-1026. doi:10.1093/nar/29.5.1017
  2. 2.
    Ferron F, Subissi L, Silveira De Morais AT, et al. Structural and molecular basis of mismatch correction and ribavirin excision from coronavirus RNA. Proc Natl Acad Sci USA. Published online December 26, 2017:E162-E171. doi:10.1073/pnas.1718806115
  3. 3.
    Barnes MH, Spacciapoli P, Li DH, Brown NC. The 3′–5′ exonuclease site of DNA polymerase III from Gram-positive bacteria: definition of a novel motif structure. Gene. Published online January 1995:45-50. doi:10.1016/0378-1119(95)00530-j
  4. 4.
    Chen Y, Cai H, Pan J, et al. Functional screen reveals SARS coronavirus nonstructural protein nsp14 as a novel cap N7 methyltransferase. Proceedings of the National Academy of Sciences. Published online February 10, 2009:3484-3489. doi:10.1073/pnas.0808790106
  5. 5.
    Chen J, Malone B, Llewellyn E, et al. Structural basis for helicase-polymerase coupling in the SARS-CoV-2 replication-transcription complex. Published online July 8, 2020. doi:10.1101/2020.07.08.194084
  6. 6.
    Smith EC, Denison MR. Implications of altered replication fidelity on the evolution and pathogenesis of coronaviruses. Current Opinion in Virology. Published online October 2012:519-524. doi:10.1016/j.coviro.2012.07.005
  7. 7.
    Ma Y, Wu L, Shaw N, et al. Structural basis and functional analysis of the SARS coronavirus nsp14–nsp10 complex. Proc Natl Acad Sci USA. Published online July 9, 2015:9436-9441. doi:10.1073/pnas.1508686112
  8. 8.
    Eckerle LD, Becker MM, Halpin RA, et al. Infidelity of SARS-CoV Nsp14-Exonuclease Mutant Virus Replication Is Revealed by Complete Genome Sequencing. Emerman M, ed. PLoS Pathog. Published online May 6, 2010:e1000896. doi:10.1371/journal.ppat.1000896
  9. 9.
    Bouvet M, Imbert I, Subissi L, Gluais L, Canard B, Decroly E. RNA 3’-end mismatch excision by the severe acute respiratory syndrome coronavirus nonstructural protein nsp10/nsp14 exoribonuclease complex. Proceedings of the National Academy of Sciences. Published online May 25, 2012:9372-9377. doi:10.1073/pnas.1201130109
  10. 10.
    Ogando NS, Ferron F, Decroly E, Canard B, Posthuma CC, Snijder EJ. The Curious Case of the Nidovirus Exoribonuclease: Its Role in RNA Synthesis and Replication Fidelity. Front Microbiol. Published online August 7, 2019. doi:10.3389/fmicb.2019.01813
  11. 11.
    Subissi L, Posthuma CC, Collet A, et al. One severe acute respiratory syndrome coronavirus protein complex integrates processive RNA polymerase and exonuclease activities. Proc Natl Acad Sci USA. Published online September 2, 2014:E3900-E3909. doi:10.1073/pnas.1323705111
  12. 12.
    Subissi L, Imbert I, Ferron F, et al. SARS-CoV ORF1b-encoded nonstructural proteins 12–16: Replicative enzymes as antiviral targets. Antiviral Research. Published online January 2014:122-130. doi:10.1016/j.antiviral.2013.11.006
  13. 13.
    Khater S, Dasgupta N, Das G. Combining SARS-CoV-2 proofreading exonuclease and RNA-dependent RNA polymerase inhibitors as a strategy to combat COVID-19: a high-throughput in silico screen. Published online June 24, 2020. doi:10.31219/osf.io/7x5ek
  14. 14.
    Shannon A, Le NT-T, Selisko B, et al. Remdesivir and SARS-CoV-2: Structural requirements at both nsp12 RdRp and nsp14 Exonuclease active-sites. Antiviral Research. Published online June 2020:104793. doi:10.1016/j.antiviral.2020.104793
  15. 15.
    Narayanan N, Nair DT. Ritonavir May Inhibit Exoribonuclease Activity of Nsp14 from the SARS-CoV-2 Virus and Potentiate the Activity of Chain Terminating Drugs. chemrxiv.org. Published May 13, 2020. https://chemrxiv.org/articles/Ritonavir_May_Inhibit_Exoribonuclease_Activity_of_Nsp14_from_the_SARS-CoV-2_Virus_and_Potentiate_the_Activity_of_Chain_Terminating_Drugs/12280043

Introduction

Have you heard that the coronavirus “mutates”? Or that there are “several strains” of it around the world? Sounds scary, right? However, the reality is that everything “mutates”. All organisms, over time, acquire differences in their genes, from bacteria to humans. You might be aware that this can happen when your DNA (Deoxyribonucleic Acid) is exposed to UV light (like from the sun!), but this can also happen during DNA replication. This is when a cell uses the template of one of the two DNA strands to make a new complimentary copy of the other strand. Mutation is common to all living organisms (and viruses) and a driver of evolution. This is the first post in a series that will explore coronavirus replication with a focus on the proteins involved. 

How does the coronavirus make more of itself?

SARS-CoV-2 uses single-strand Ribonucleic acid (RNA) to encode its genome, not DNA, and hence belongs to a class of “single-strand RNA viruses”. For this reason, the virus needs a different way to copy its genome than “normal” cells have. The viral protein that copies the RNA is called an “RNA-dependent RNA polymerase” (RdRp). This protein uses the viral RNA as a template to make a new copy of viral RNA, by stringing single ribonucleotides together like beads on a string. This process is called polymerization.

A study by the Morse lab at Texas A&M University showed that SARS-CoV-2 RNA polymerase has a remarkable similarity to the RNA polymerase of SARS-CoV (>95%) as well as MERS-CoV [1], the virus which causes Middle-Eastern Respiratory Syndrome. This means that research performed in response to the SARS and MERS epidemics can inform our response to SARS-CoV-2. Unfortunately, a lack of consistent pandemic-preparedness funding means that we didn’t learn as much about RdRp in time as we could have. Still, RNA polymerase might be a viable drug target for halting the spread and reducing the fatality rate of COVID-19.

Structure of the RNA-Dependent RNA Polymerase

By determining the structure of RdRp, and deeply understanding how it works, we can optimize a drug to specifically target it and hinder its function. To this end, in the last few months, several structures of SARS-CoV-2 RNA polymerase have been published. 

One interesting structure shows RNA polymerase in action, in the process of elongating an RNA strand (see Figure 1).[2] This structure clearly show the polymerase in complex with smaller proteins, non-structural protein 7 and 8 (nsp7 and nsp8). These proteins improve how well the RNA polymerase binds the template RNA and also how long it stays bound before dissociating – a feature called “processivity”.[3]

Figure 1. Front and back views of the structure of elongating RdRp with RNA and two cofactors, nsp7 and nsp8 (PDB ID: 6yyt). Two copies of nsp8 (grey) form sliding poles that help stabilize the RNA (orange ball-and-stick model). One copy of nsp8 binds to the polymerase (blue) directly, but the other copy uses nsp7 (pink) to anchor to a second position on the polymerase.

In the center of the protein is the area where the main action happens, called the “active site”. The amino acids of the polymerase that form the active site have a particular shape and chemical properties, which enable the polymerization reaction to occur very rapidly. In fact, the polymerase can string together as many as 100 nucleotides per second! [3] New RNA molecules can enter the active site through a little window to be added to the growing RNA chain. It is here that the antiviral drugs make their move!

Figure 2. The third view shows the window into the active site through which new nucleotides must enter!

How do antiviral drugs attack RNA-dependent RNA polymerase?

First, let’s talk about Gilead’s FDA-approved drug, Remdesivir, which has taken the spotlight in the search for COVID-19 cures. Remdesivir (which has a fancy chemistry ID, GS-5734, and is sold under the brand name Veklury), is a “nucleotide analog”, which means that it mimics the shape and chemistry of the nucleotides that make up RNA and DNA (see figure). 

Remdesivir was developed originally as a general antiviral drug and was later shown to protect cells (in a test tube) and monkeys (not in a test tube) from the Ebola Virus [4]. However, this was recent enough, and science is slow enough that, until the COVID-19 pandemic, large-scale clinical trials of Remdesivir hadn’t been done yet. So scientists and doctors have been rushing to test the drug in COVID-19 patients. In fact, the US and Japan both approved the drug for “Emergency Use Authorization'' for severe COVID-19 patients as early as May [5], [6]. And, in July, the European Medicines Agency gave Remdesivir a “conditional marketing authorization” (used for drugs that meet an unmet medical need but have insufficient data for normal approval). This allows the use of Remdesivir in severe COVID-19 patients through the next year [7]. So, how the heck does a drug for Ebola, Influenza, or some other viruses also work against COVID-19? I was concerned by this when the news about all the drug trials were coming out – and I’m sure I wasn’t the only one...

The simple answer to that is all these viruses need to do the same thing - copy their RNA genome from an RNA template. And in order to do that, they all end up using basically the same tool, an RNA-Dependent RNA polymerase. And all drugs that are nucleotide analogs use the very same trick: they dress up like ribonucleotides (the "beads on a string" from before) and fool the RNA polymerase into letting them into the active site. Once inside, they get “stuck” in the active site, jamming the polymerase machine. Since this trick should work for any viral RNA polymerase, we can use these drugs for any RNA virus, and call them ‘general antivirals’. Of course, in practice, this doesn't always work, because there are differences between the different RNA polymerases. However, it is a great place to start! In the future, if we have general antivirals for SARS-CoV-2 all ready-to-go, we may be better equipped to deal with another coronavirus outbreak!

Figure 3. We all see what we want to see, I guess.

The Chemistry of Remdesivir

Remdesivir resembles the nucleotide adenine in structure, although it has some fancy chemical add-ons which help make it a better drug (thank you, medicinal chemistry!). When Remdesivir is injected into a vein, it travels through the bloodstream and enters into our cells, which recognize it as a foreign substance and try to digest it. However, what ends up happening is that the cells remove just the fancy chemical add-ons, and then confuse it for a normal adenine nucleotide. In infected cells, the viral RNA-dependent RNA polymerase then starts grabbing these molecules and inserting them into the new viral RNA strand in place of adenine molecules. Remdesivir, now attached to the RNA, jams the polymerase, rendering the virus unable to make more copies of its genome. Ultimately, this halts viral replication and helps the patient fight off the virus.

Figure 4. (A) The red part of Remdesivir makes it a better drug by helping it get from the blood stream into human cells, but it isn’t necessary for jamming the polymerase. It was designed on purpose so that when it gets inside human cells, the cells try to digest it. When they do, they cleave off the red bits, causing it to get confused for an adenine nucleotide.  (B) This causes the cell to add two more phosphates to the molecule, making it the ‘tri’-phosphate form. This is the active form of the molecule, which mimics ATP (C), and is incorporated into the growing RNA chain in the place of ATP. The extra bit sticking off the side (in blue) is called a 1’-cyano group, and makes the RNA get stuck inside the polymerase, jamming it.
Figure 5. Structure of Remdesivir (cyan) in the active site of RNA-dependent RNA polymerase. The window through which new nucleotides enter is to the bottom left of the image. The RNA (orange ball-and-stick model) template strand enters from the bottom right. Remdesivir makes base-pair hydrogen bonds with the opposite uracil base.

Another drug that inhibits the RNA polymerase activity is Favipiravir, sold under brand names Avigan, Abigan, and FabiFlu. Favipiravir has been discovered by Toyama Chemical Co., Ltd. in Japan and it has a similar mechanism to Remdesivir, except that it mimics a guanosine nucleoside instead of an adenine nucleotide [8]. This drug was approved in Japan back in 2014 for use in resistant cases of Influenza A and B, but still remains unapproved in the US (still in Phase II and Phase III clinical trials) and the UK [9]. This drug is also being tested for use against Ebola virus, Lassa virus, and currently SARS-CoV-2 in 43 countries. The approval of Favipiravir for  COVID-19 has been much faster in China (Mar 15, 2020), Russia (Jun 3, 2020), and India (Jun 20, 2020)[10], [11]. Nonetheless, other countries, including Japan, are in various stages of clinical trials, and the results are anticipated to be out by the end of July [10].

So...do we have a cure for SARS-CoV-2?

Sadly, not yet. While the speed at which Remdesivir has gone through clinical trials is unprecedented, more work needs to be done to make sure it is safe and effective. Since (in the big scope of things) not a lot have people have taken Remdesivir, we aren’t really sure what all the side effects are, although there is emerging evidence for liver and kidney damage [12, 13]. The most common side effects are nausea (10% and 9% of patients), indigestion (7%) and increase of transaminases (6% and 8%). In one study, 3.6% of patients in a 10-day trial needed to stop taking therapy due to the latter. However, serious viral infections can also cause liver damage, so separating the two causes is a challenge! Remdesivir is not a cure-all, either. In one study it improved the recovery time from 15 days to 11 days, but it showed no effect for patients with mild to moderate disease, and no difference in median recovery time for patients who were already on a ventilator [14]. Since the drug has to be given by infusion over several days, there is a pretty small window in which Remdesivir can actually help. 

Likewise, Favipiravir has its own side effects such as liver damage, elevated uric acid levels, kidney damage, skin allergies, etc. [15]. These effects restrict it for use by severe diabetes and heart patients. On top of that, it is not suitable for pregnant women because it can cause potential fetal deaths and deformities. It has been shown that Favipiravir works only during the earlier stages of SARS-CoV-2 infection when the body’s immune system isn’t totally drained, whereas it can result in a cytokine storm (when your immune system really freaks out) in severely ill patients. But, unfortunately, the virus doesn’t differentiate between humans while attacking, so a universal drug for COVID-19 has to be safe for use by all people. 

However, these drugs are better than nothing, and by understanding the mechanisms involved, scientists can continue to improve upon the existing drugs for the benefit of all. While most of the ‘general antivirals’ that target RNA Polymerase have failed with SARS-CoV-2, Remdesivir has been relatively successful. Scientists think that this is actually because of a proofreading protein in SARS-CoV-2 called exonuclease. Immediately after the RNA-polymerase makes new RNA, exnuclease checks to make sure the new RNA is correct. In one study, another drug that mimics RNA called Ribivarin was shown to be removed from newly synthesized RNA by exonuclease [16]. Thankfully, Remdesivir is not excised , which is likely why it has been more successful than the other options [17], [18]. To read more about how nsp14 maintains the integrity and virulence of SARS-CoV-2, tune in to a future blog entry!

Figure 6. Hey, we've all been there.

Recommended Structures

For those interested in reviewing the structures further, they are available in our GitHub repo, along with information about validation and, where relevant, improved structures. For a high-resolution comparison of the active site with and without Remdesivir, 7BV2 and 7BV1 (respectively) were published together at 2.5 and 2.8 Å. The elongating structure of the complex shown above (6YYT) has the polymerase as well as the cofactors and RNA very well resolved, with little "missing" density and a resolution of 2.9 Å. It is likely preferable to 6M71 and 7BTF, which were published with a similar resolution but with less of the complex resolved, and no RNA. For those interested, 7C2K and 7BZF (at 2.93 Å and 3.26 Å) show the complex bound to RNA in a pre- and post-translocation state.

Sources

[1] J. S. Morse, T. Lalonde, S. Xu, and W. R. Liu, “Learning from the Past: Possible Urgent Prevention and Treatment Options for Severe Acute Respiratory Infections Caused by 2019-nCoV,” ChemBioChem, vol. 21, no. 5, pp. 730–738, Mar. 2020, doi: 10.1002/cbic.202000047.

[2] H. S. Hillen, G. Kokic, L. Farnung, C. Dienemann, D. Tegunov, and P. Cramer, “Structure of replicating SARS-CoV-2 polymerase,” Nature, May 2020, doi: 10.1038/s41586-020-2368-8.

[3] W. Yin et al., “Structural basis for inhibition of the RNA-dependent RNA polymerase from SARS-CoV-2 by remdesivir,” Science, p. eabc1560, May 2020, doi: 10.1126/science.abc1560.

[4] R. T. Eastman et al., “Remdesivir: A Review of Its Discovery and Development Leading to Emergency Use Authorization for Treatment of COVID-19,” ACS Cent. Sci., May 2020, doi: 10.1021/acscentsci.0c00489.

[5] O. of the Commissioner, “Coronavirus (COVID-19) Update: FDA Issues Emergency Use Authorization for Potential COVID-19 Treatment,” FDA, May 04, 2020. https://www.fda.gov/news-events/press-announcements/coronavirus-covid-19-update-fda-issues-emergency-use-authorization-potential-covid-19-treatment (accessed Jul. 08, 2020).

[6] A. Sternlicht, “Japan Approves Remdesivir For Use On Severe COVID-19 Patients,” Forbes. https://www.forbes.com/sites/alexandrasternlicht/2020/05/07/japan-approves-remdesivir-for-use-on-severe-covid-19-patients/ (accessed Jul. 08, 2020).

[7] D. CZARSKA-THORLEY, “First COVID-19 treatment recommended for EU authorisation,” European Medicines Agency, Jun. 25, 2020. https://www.ema.europa.eu/en/news/first-covid-19-treatment-recommended-eu-authorisation (accessed Jul. 10, 2020).

[8] E. De Clercq, “New Nucleoside Analogues for the Treatment of Hemorrhagic Fever Virus Infections,” Chem. Asian J., vol. 14, no. 22, pp. 3962–3968, Nov. 2019, doi: 10.1002/asia.201900841.

[9] K. Shiraki and T. Daikoku, “Favipiravir, an anti-influenza drug against life-threatening RNA virus infections,” Pharmacol. Ther., vol. 209, p. 107512, May 2020, doi: 10.1016/j.pharmthera.2020.107512.

[10] T. Hornyak, “Japan sending Fujifilm’s flu drug favipiravir to over 40 countries for Covid-19 trials,” CNBC, May 04, 2020. https://www.cnbc.com/2020/05/04/fujifilms-flu-drug-favipiravir-sent-to-43-nations-for-covid-19-trials.html (accessed Jul. 14, 2020).

[11] G. P. Ltd, “Glenmark Becomes the First Pharmaceutical Company in India to Receive Regulatory Approval for Oral Antiviral Favipiravir, for the Treatment of Mild to Moderate COVID-19.” https://www.prnewswire.com/in/news-releases/glenmark-becomes-the-first-pharmaceutical-company-in-india-to-receive-regulatory-approval-for-oral-antiviral-favipiravir-for-the-treatment-of-mild-to-moderate-covid-19-855346546.html (accessed Jul. 14, 2020).

[12] Goldman, J. D. et al. Remdesivir for 5 or 10 Days in Patients with Severe Covid-19. N. Engl. J. Med. (2020) doi:10.1056/NEJMoa2015301

[13] Remdesivir Safety Forecast: Watch the Liver, Kidneys | MedPage Today. https://www.medpagetoday.com/infectiousdisease/covid19/86582

[14] J. H. Beigel et al., “Remdesivir for the Treatment of Covid-19 — Preliminary Report,” N. Engl. J. Med., vol. 0, no. 0, p. null, May 2020, doi: 10.1056/NEJMoa2007764.

[15] Sandhya Ramesh, “Favipiravir, Japanese drug that’s the new Covid treatment hope your chemist will soon stock,” ThePrint, Jun. 25, 2020. https://theprint.in/health/favipiravir-japanese-drug-thats-the-new-covid-treatment-hope-your-chemist-will-soon-stock/447987/ (accessed Jul. 14, 2020).

[16] F. Ferron et al., “Structural and molecular basis of mismatch correction and ribavirin excision from coronavirus RNA,” Proc. Natl. Acad. Sci., vol. 115, no. 2, pp. E162–E171, Jan. 2018, doi: 10.1073/pnas.1718806115.

[17] C. J. Gordon, E. P. Tchesnokov, J. Y. Feng, D. P. Porter, and M. Gotte, “The antiviral compound remdesivir potently inhibits RNA-dependent RNA polymerase from Middle East respiratory syndrome coronavirus,” J. Biol. Chem., Feb. 2020, doi: 10.1074/jbc.AC120.013056.

[18] L. Zhang et al., “Role of 1’-Ribose Cyano Substitution for Remdesivir to Effectively Inhibit both Nucleotide Addition and Proofreading in SARS-CoV-2 Viral RNA Replication,” bioRxiv, p. 2020.04.27.063859, Apr. 2020, doi: 10.1101/2020.04.27.063859.

The coronavirus cannot be seen with the naked eye; it is invisible. That is a huge problem. Imagine if your house were on fire: you would react immediately, leave the house, call the fire brigade and warn the neighbours. The thread would be clearly visible. This is, however, not true for the coronavirus. SARS-CoV-2 cannot be seen or touched. The time between infection and tangible illness is a number of days, with an additional 16 days until the worst-case scenario, death (2). This makes the threat much harder to recognize. Imagine a catastrophe killing 32.362 people in New York City! Still, the same number of deaths occured in recent month in NY – 1 in 250 persons – by COVID-19 (3). The invisibility of the threat makes it hard for people to wear masks and keep distance to each other. It is very difficult to adress a danger you can’t see, and even harder to believe in something you don’t understand.

This is why we want people to see the virus, or even better, to touch it. Make it tangible. And this is why we have designed a scientifically accurate model for 3D print. And the best part: As we are funded by taxpayer money, we have made the files available online for free! (If you use them, please acknowledge the Coronavirus Structural Task Force.)

3D printed corona vrisu model
The printed and painted corona virus model and an antibody in the scale 1:1,000,000. Photo by Judith Flurer / RVZ.

Now that is a bit different from what you have seen in the media, isn’t it? Why? Well, first of all, any kind of spikey ball passes as a corona virus these days. Our virus also differs from the illustration by medical illustrators Alissa Eckert and Dan Higgins from the CDC (4). These are the main differences between their red-and-grey depiction and ours:

  • The virus is not exactly spherical, and rather wobbly; which can be seen clearly in our 3D print.
  • The number of Envelope proteins (orange), membrane proteins (yellow) and spikes (green) are in accordance to the latest findings (2).
  • The spikes are longer, as we now know much more about their structure, and the virus hull is smaller in proportion to them. However, the exact size of the virus varies. The ratio we show is an average.
  • The spikes are glycosylated, making them more irregular (and slimier). The glycosylation is shown in grey. However, it would actually look like this, and could be represented by cotton wool stuck to the spike protein.
  • We are showing the E protein in the pentameric “pore” conformation. Whether that is a correct assumption for the virion remains to be seen. If you want to know more about this, look here.
  • As the virus lives embedded in slimy, wet conditions, we chose colours to represent that instead of the red and grey which the CDC used to represent the threat the virus poses (1)
Merge of two virus representations
Comparison between the Taskforce depiction (by Thomas Splettstößer / SciStyle.com) and the common depiction in the media (CDC).

We have also added a rhinovirus and an antibody in the same 1 : 1,000,000 scale. The antibody binds to the spike specifically, this is how the immune system can recognize the virus – and the rhinovirus shows you how large SARS-CoV-2 is. (Spoilers, it is very big for a virus!)

One should also say that the "virus" is actually a virion - the transport form of a virus, which can be spread - once a host cell is infected, the virus rolls out his RNA genome and makes a whole lot more proteins to hijack the host cell and make more copies of itself. These proteins are also part of the virus, and a main subject of our research.

If you cannot wait to print your own: You can find the files here and the instructions on how to print the virus here!

References:
(1) Rothan Hussin A., and Byrareddy Siddappa N. "The epidemiology and pathogenesis of coronavirus disease (COVID-19) outbreak." Journal of autoimmunity (2020): 102433. pmid:32113704
(2) https://wwwnc.cdc.gov/eid/article/26/7/20-0282_article
(3) 32.362 deaths by 17th of July 2020 as reported by the New York Times in 8.4 million inhabitants (US Census Bureau) gives 0,39 %, roughly equalling 1 in 250.
(4) https://www.nytimes.com/2020/04/01/health/coronavirus-illustration-cdc.html

The instructions and files below will allow you to create your own model of the virus! All you need is some spare time and a 3D printer. In addition, those without access to a 3D printer can still use the STL files to request printing from external services and then follow the instructions on painting and assembling the same way. We do hope that this model will make the virus more tangible, and that the model will not only be printed as a private project, but also be used for outreach activities and in educational institutions.

Our design is based on the best scientific evidence available. Not only are the shapes of the various proteins as true as we can make them, but their numbers as well as the overall size of the virion match experimental results on a scale of 1:1,000,000. If you want to know more about it, please look here. Once you have built a model from our design you will have a good representation of what one of these virions is expected to look like, after being scaled up by a factor of 1,000,000. Therefore 1 mm on the model represents 1 nm (10 Å). (By the way, this would make the RNA that is inside the virus hull 10 metres long and 1 mm thick, and the nucleocapsid around which the RNA is coiled would be about 1 metre and 1 cm in diameter.

We have also designed a scale model of the human anti-body that binds to the spike protein. This is available alongside the virus model and can be attached to the spike protein as desired. For easier printing, painting, and assembly, the virus structure has been broken down into 4 unique components:

To date the structures have been printed successfully on several Fused Deposition Modelling (FDM) printers (Rostok MAX v2 & Prusa I3 MK3 printers), and we anticipate the even higher quality structures will be feasible with alternate methods, such as stereolithography (watch this space). Let us know in the comments! Each of the parts is available in STL format and should be printable through any suitable slicer software. Personal discretion is advised when setting up the prints, as the exact details may differ depending on conditions and equipment. The procedure outlined below will serve as a good starting point.

Printing of the component parts

The first step is to print the individual components. For the virion parts this is very straight forward as the flat surface negates the need for supports. The virion objects can be printed with the minimum infill for support, though infill of 10% is recommended for rigidity.

The other parts (spike proteins and antibodies) provide a more challenging print. The spike protein must be printed 95 times to complete the model, and users can arrange these individually, or using 4 prints of 25x STL file. It is recommended that the spike protein is printed with the crown facing towards the print bed to maximize the support between the bed and eliminating the need to remove supports from the thin delicate stem.

A dual extruder printer would be ideal for spike printing as it would allow supports to be printed in a water soluble plastic, speeding up post-processing. In either case, printing individual or at least fewer spikes with greater spacing generally produces nicer objects that are easier to work with at the price of longer printing time. Indeed, there is a general trade-off between the convenience of the print set-up and the amount of post-processing and tidying needed for all 3D printing tasks, and one must find a compromise which satisfies them.

As stated above, we used FDM printing and ubiquitous poly-lactic acid (PLA), which made the post-processing easier.

Post-processing

Regardless of the approach taken for printing, some amount of tidying will typically be needed to get the objects ready for assembly. Removing the supports can be done with a pair of plyers, while the smaller artifacts and issues will need brushing off or sanding. A dental pick can be quite useful.

Fig. 1. Virion and Spike object surfaces after printing with layer lines and artifacts, such as plastic webbing, clearly visible across surfaces. On the right: Virion after fusing top and bottom and rubbing surfaces with ethyl acetate. Pictures by Ferdinand Kirsten, Matt Reeves.

For PLA, we found the best thing to clean and smooth the surfaces (after support removal), is ethyl acetate. Ethyl acetate dissolves the plastic, breaking down the small extrusion artifacts on the surfaces. This can be used in many ways. We found it best to leave the parts in a sealed ethyl acetate vapour environment, such as a stainless steel pot, which should be cleaned carefully afterwards. This technique results in the most even and clean results, though will take up to a few days to fully smooth each object. The faster method, is to simply submerge the small objects in ethyl acetate for 10-30 seconds, and then remove each object, leaving them to dry out on a surface. For the larger virion parts, the surface can be smoothed by rubbing it down with a cloth damped with ethyl acetate. Ethyl acetate was also used to “weld” the two virion parts together. A small amount was dropped onto the flat surfaces on each section, before the two were pressed together until the plastic fused to become a single object. The seam was then smoothed down using the same process as before. Where one cannot get ethyl acetate from a lab or pharmacy, acetone-free nail-polish remover offers a commercially accessible alternative. you should be using safety glasses and suitable (!) gloves when handling ethyl acetate, ventilate the room well and if there was skin contact use a skin cream after hand washing.

Fig. 2 Spike proteins fresh from printing (left) and after treating with ethyl acetate (right). Picture by Ferdinand Kirsten.

It is worth noting that for the other common 3D printing material acrylonitrile butadiene styrene (ABS) or acetone may produce the same results.

Painting and Gluing

Fig. 3 Computer rendered image of corona virus by Thomas Splettstoesser (left), and finished 3D print by Thorn Lab (right).

As with printing, painting methods and colours is down to personal preference, and here we outline our attempt, which was guided by the illustration by Thomas Splettstoesser as close as possible (see Fig. 3).

The parts were first treated with a primer to help the paint stick to the model. This also acts as a nice even Basecoat. When working with either primer or, as discussed later, an airbrush, one should consider safety: try to do as much as you can in a ventilated space, wearing safety goggles, gloves and a mask. Paint spraying produces a great number of fine particles which you don`t want to breathe in.

For us, the painting process was performed largely with an airbrush, and we highly recommend using one where available, due to the amount of painting required and surface complexity. Where not available, it can of course be done with just a simple brush which will take more time and a higher skill level.

All layer colours, medium thinner, base colours, primer and varnish we used were from Citadel painting. Here is an outline of the specific Citadel colours and materials we used for the model in the figures:

  • Lime: “Moot green”
  • Yellow: “Yriel Yellow”
  • Grey: “Dawnstone”
  • Wheat: “Baneblade Brown”
  • Chocolate: “Doombull Brown”
  • Aqua: “Gauss Blaster Green”
  • Teal: “Kabalite Green”
Fig. 4 Sorting the spike proteins (up left). Spike protein after basecoat (left) and spike protein after highlight with lime green (right). Pictures by Kristopher Nolte.

The spikes were sorted into four sets in order to produce a graded lighting affect, with those on top brighter than those lower down. If you do not plan to use a base and do not have a fixed top and bottom you can skip this part.

We highlighted each Spike Protein with a brighter lime green to achieve more contrast to create depth, which makes the surface topology easier to distinguish. Finally, the highlighting of each spike was intensified by dry-brushing the protein with the “Aqua” colour.

Fig 5. Virion sphere with a zenithal highlight (top right). Virion with features painted by brush (bottom right). Final version on the left. Pictures by Kristopher Nolte.

After painting was complete the spikes and virion were sealed with gloss varnish and matte finish, respectively. This step is optional; however, the varnish protects the paints against damage and wear when being handled.

Finally, the 3D model was assembled. If highlighting was used in the painting step, one should ensure the spikes are placed so that brighter spikes go on top while darker ones at the bottom. Standard modeling glue was used to hold the spikes in place, though superglue or ethyl acetate would also work fine. Because we are planning on mounting this on a stand, we have left a hole at the bottom empty where the rod of our base will go in.

Figure 6. Assembly of virus with spikes individually glued into virion holes using modeling glue. Pictures by Kristopher Nolte.

We hope that our adventure in 3D printing the Corona virus inspires you to give it a try! The process we described was completed in a little over a week. The printing jobs were completed in just over two days, the cleaning and post processing took another two days, while the painting was done over the course of a weekend. This article provides a description of our technique and should provide enough detail on how, with the outlined necessary tools, you could create a similar result. The files have been distributed through Thingiverse, and are distributed under a Creative Commons BY-NC license: You may remix, adapt, and build upon this work non-commercially and acknowledge the "Coronavirus Structural Task Force" as original author.

Figure 7. 3D print illustration by Thomas Splettstösser. Finished corona virus model by Dale Tronrud in Oregon (center) and by the Thorn Lab in Würzburg (right).

As with every 3D printed model, there are many different ways this could be tackled and achieved, and we look forward to seeing the many creative ways explored by others in this endeavor. Please do share experiences and results with us, either through the comments Thingiverse or on Twitter (you can tag us @thornlab or #insidecorona).
For a sense of perspective, we have also produced a model of the highly common rhinovirus, which is available in .stl format at the same scale as the corona virus objects. This is available at: https://www.thingiverse.com/thing:4556845.

Authors

We want to emphasize that the writing of this blog entry was a collaboration of a several people:
Dale Tronrud and Thomas Splettstoesser worked together to create the STL files for the 3D model. Dale was the person to suggest it first (with Andrea Thorn picking up on the idea). Thomas then selected the experimental models and placed all the parts to form a realistic representation. Dale provided the knowledge about the limitations imposed by the nature of 3D printing and broke up Thomas' model into printable parts that can be assembled without too much difficulty. He printed and assembled the first virion from this design.
Matt Reeves was responsible for improving the non-spherical virion model and the printing of the Würzburg model. He also determined the most suitable post-print processing techniques suitable for this project and, along with Dale and others on the team, contributed to many general technical discussions on how the model can be altered or improved further in the future.
Kristopher Nolte took part in the preprocessing and refining of the model together with Ferdinand Kirsten. Kristopher was also responsible for planning and carrying out the assembly and painting process of the Würzburg model.

COVID-19 is caused by the new coronavirus SARS-CoV-2. This virus has a characteristic virus hull featuring surface proteins which are commonly called “spikes”. Protruding from the viral hull like “spikes of a crown”, they give the coronavirus its name (corona = crown).  These proteins make the first contact with human cells and are akin to keys that use a human receptor called “angiotensin-converting enzyme2” (ACE2) as a backdoor to gain access to and infect the cell.

SARS-COV2 Animated picture. Realistic surface and spike proteins with glycosylation. Image: Thomas Splettstoesser; www.scistyle.com
Fig. 1. SARS-COV2 Animated picture. Numerous spike proteins, coloured in green, protrude from the virus hull which is coloured in brown. Spikes enable the coronavirus to invade human epithelial cells. Image: Thomas Splettstoesser; www.scistyle.com

1. Fuction of ACE2

ACE2 is a membrane protein which is anchored in the human cell membrane of epithelial cells. This type of cells can be found on the surface of lung, intestine, heart and kidney tissue. As a type I membrane protein, its primary function is to take part in maturation of angiotensin, a peptide hormone which controls vasoconstriction and blood pressure. ACE2 can be compared to a lock which can be unlocked by the coronavirus spike protein. The virus can then enter the cell and hijack its functions to reproduce itself, thus causing the Covid-19 infection which poses a serious danger to humanity, especially for older people and people with pre-existing conditions. For this reason, one approach to combating SARS-CoV-2 is to target and inhibit the spike to prevent infection. In order to do so, knowledge of the structural features of the spike and its interaction processes with ACE2 are indispensable. (Further information about how macromolecular structures are visualized can be found on our homepage: https://insidecorona.net/visualizing-macromolecular-structures/)

2. Spike: Structure and Fusion Mechanism

Fig. 2. Image of a spike protein (green) protruding out of the viral envelope (brown). This image shows the structure of a spike protein divided into several subdomains. Each subdomain comprises a specific function necessary for binding and fusion. The transmembrane domain anchors the spike protein in the virus membrane.  Heptat repeat 1, 2 and the fusion peptide play key roles in mediation of the fusion process and with the RBD domain, the virus makes contact to human cells. Note that only “stumps” of carbohydrate chains are shown. Image: Thomas Splettstoesser; www.scistyle.com

The Spike protein has a trimeric shape comprising three identical monomeric structural elements. Each of these monomers can fold out akin to a modern car key with a fold-out key element with specific teeth on its surface. This fold-out key element is the so-called “receptor binding domain” (RBD). The spike can only interact with ACE2 when its RBD is in a folded-out position, exposing its teeth, or  “receptor binding motive” (RBM). As the name suggests, it comprises a motive of different amino acids which then can bind and unlock the ACE2 receptor. This key lock mechanism triggers a cascade of events initiating fusion with the host cell. First, protein scissors are recruited to the binding site. These scissors (furin & transmembrane serine protease 2) cleave the spike protein for subsequent activation. The active spike molecule then rearranges itself to form a long structural “hook” (formed of HR1/ HR2 and FP see Fig.2) that brings the epithelial cell and viral cell membrane into close proximity for fusion. Once the fusion is completed, the path for the virus is clear to transfer its genome encoded in ribonucleic acid (RNA) into the host cell. This successful transfer then enables the virus to multiply itself and finally spread from cell to cell, causeing Covid-19 in its wake.

Fig. 3. This image shows a spike protein in complex with the human ACE2 receptor. (PDB:6vsb/6lzg). Left: The structure of a spike protein coloured in orange in complex with the human ACE2 receptor coloured in light orange. The white box shows the interaction site which is shown enlarged in the image ion the right. Right: The interaction site between spike and ACE2. Spike's "receptor binding domain (RBD)" includes a "receptor binding motif (RBM)" whose amino acids interact with those of the human receptor through hydrophilic interactions. These amino acids are shown as sticks protruding from the RBM and ACE2. Image: Sabrina Stäb

3. Evading the Immune System with Carbohydrate Chains

The human immune system normally recognizes the surface proteins of foreign organisms such as viruses or bacteria and reacts with an immune response to combat them. Spike proteins are such surface proteins but because of structural peculiarities, the coronavirus evades both the innate and the adaptive human immune system. The secret of these structural peculiarities are the N-glycans. These are long carbohydrate chains which sit on spike’s surface.  Each spike comprises 66 N-glycans forming a protective shield around the protein. Hence the human immune system has problems recognizing spikes and identifying the coronavirus as an enemy.

Fig. 5. Ribbon diagrams of a spike trimer with N-glycans on its surface coloured in cyan (PDB: 6vxx). In Image a, the spike protein is shown sideways and in b, the trimer can be seen from above. Unfortunately, both X-ray crystallography and cryo-EM cannot resolve long carbohydrate chains, so the structures of the chains shown in Figure 4 contain a maximum of three sugar monomers, while in most cases, the carbohydrate chains are much longer, covering most of the contact surfaces of the upper spike protein. Image: Sabrina Stäb

The COVID 19 pandemic has a massive impact on our lives, our health and the global economy. Scientists around the world are trying to develop new drugs to combat the virus. Since the spike plays a critical role in the infection process, it is a prime target for drug development against the pandemic.  One drug approach to inhibit the interaction between spike and the ACE2 receptor is to cap the spike protein using antibodies. Antibodies are proteins, normally produced by the human immune system to fight viruses. The idea is to treat patients with antibodies that cap the RBD of spike, thus preventing interactions with ACE2. This would lead to a nonfunctional spike, blocking the coronavirus from entering the cell (The key would no longer fit the lock). Another approach includes the development of small molecules that target and inactivate the protein scissor transmembrane serine protease 2 (see chapter 2), as the spike’s functionality depends on its cleavage activity. Since the spike protein decorates the virus hull, it could even be part of a potential vaccine. For this reason,  the spike protein could also become the key in the molecular fight against COVID-19.

Introduction

The short answer to this question is “almost certainly not”. However, we live in an unprecedented time; where people are both tired of experts while simultaneously believing that having read a meme on social media makes one an expert. So, what do I even mean by “almost certainly”? Between the politicians and the scientists on TV you’re probably tired of not getting a straight answer. I can’t speak for the politicians, but there is a reason for this from scientists. Scientists don’t like to work in absolutes. Not because we want to hide something, but because uncertainty is our home ground. Science works by minimising our uncertainty to the point where we can identify the simplest [1] and most likely outcome based on our observations.

So, was SARS-CoV-2 made in a lab? You can’t help but think: “It could have happened though, right?” This guy, Professor Nikolai Petrovsky from Flinders University, certainly thinks it’s possible. He also said it could be “…a chance transmission of a virus from an as yet unidentified animal to human”, but that’s not as interesting a headline. You can watch his full interview on the topic here.


[1] This is called the law of parsimony, or Occam’s razor, which states that the simplest solution is most likely the right one.


Figure 1: Professor Nikolai Petrovsky of Flinders University being interviewed on Sky News Australia

What will be discussed here?

Before I get into the science, I want to clarify the sort of claims I am addressing. As scientist we can only address a claim where the data are available and verifiable.  Many of the arguments for the virus being created in the lab start with something along the lines of President Trump’s statement on April 30th.

“We have people looking at it very, very strongly. Scientific people, intelligence people, and others. We’re going to put it all together. I think we will have a very good answer eventually. And China might even tell us.”

President Donald Trump, April 2020

Ominous, but obviously lacking any real data. When pressed for evidence to prove his claims he retorted with

“I can’t tell you that. I’m not allowed to tell you that.”

President Donald Trump, April 2020

State secrets aside, this does not cut it in this battle ground. It is impossible to make a valid argument from secret data. This would be like me submitting a paper to a scientific journal and replying to reviewer’s comments with

“Just trust me, I have the data to back up my claim that alpacas breath fire when we’re not looking, but I’m not allowed to show you it because big wool is stopping me”

Dr Sam Horrell, June 2020.

Hence, we will only deal with claims for which there are valid data (See Figure 2).

Figure 2: Fire breathing alpacas caught* on tape in the wild. *Disclaimer: This might have been faked

The other argument typically sounds something like

“Of course it looks like the virus evolved naturally, these are very clever people that know how to cover their tracks”

Karen on Facebook, 2020.

This puts you in the unfortunate position of trying to prove a negative with your counter argument, which is, sadly, not possible. If we’re arguing this in a scientific manner, we must adhere to the burden of proof and provide positive evidence which allows us to find the simplest and most likely conclusion from our observations. A classic fallacy along the “very clever people” line is the creationist argument for a young Earth that God has made look old to trick the non-believers. Any science you try and throw at this is credited to God and a lack of faith, so you can’t argue this logically, but it does run into the problem of infinitely increasing complexity. You can see how were rocketing away from the simplest and most likely answer here.

On Natural Selection

Evolution and natural selection are central to this discussion. If you are of the opinion that evolution does not exist, then the rest of this article is not going to convince you and I hope you enjoyed the fire breathing alpaca picture. Natural selection works like this: each time a species reproduces there is a chance a mutation will occur in their genome. If this change grants an advantage (i.e. long necked giraffes), this increases the rate of survival and the chance that the giraffe has offspring allowing the change to persist in the population. If a change results in a considerable disadvantage (i.e. stumpy necked giraffes) it is less likely to be passed on to the next generation and will be selected out. Then there are some mutations which are innocuous and will persist in the genome. Although they are not useful to the species, they are very useful to evolutionary biologists when tracing a species’ genetic lineage. Viruses and bacteria have a considerable advantage when it comes to natural selection, as they reproduce at a much faster rate than us mammals. For example, E. coli cells can divide every 30 minutes, so will go through several generations over the course of a single day, which means a greater chance of stumbling onto a favourable mutation! Ever wonder why antibiotic resistance is such a big problem? Because of speedy evolution.

The Coronavirus Origin Story

The new Coronavirus SARS-CoV-2 was first identified after a pneumonia outbreak on the 12th of December 2019. Its genome was sequenced, and it showed 79.6% sequence identity to the virus causing Severe Acute Respiratory Syndrome (SARS) from 2002 - and 96% sequence identity to a bat coronavirus (RaTG13-CoV) which was recently reported by a lab in Wuhan. Since then all manner of conspiracy theories have popped up suggesting that this Coronavirus was produced in a lab in Wuhan, was intentionally or accidentally released, and had been specifically designed to target humans. And why not? 96% sounds too high to be a coincidence, right? Releasing this bat virus must be the cause of COVID-19! However, if we compare humans to one of their closest relatives, the chimpanzee, we can see that we also share 96% of our genomes. And as you can see from Figure three there are a fair few differences between us. Bringing it back to coronaviruses, that 96% difference accounts for 1,100 differences between these viruses. If we line up the sequences, we see a random distribution of mutations across the genome which follows the natural evolution typical of coronaviruses. We also have the benefit of previous data from the SARS-CoV outbreak in 2002. Human SARS-CoV was found to share 99.8% sequence identity with a palm civet coronavirus, with only 202 differences between the viruses. If this is the level of similarity that has been observed historically, it follows that a 96% identical virus is not likely to be the immediate source of a species jumping global pandemic. Even if it was the immediate source this only proves the virus has come from a bat, a species not known for their molecular biology expertise.


Figure 3: An accurate comparison of 96% identical species, Homo sapiens (left) and Pan troglodytes (right). Picture by Thomas Splettstößer.

Super Villain Interlude

If I was a super villain that had released bat corona virus aiming to shut down the world with a pandemic, I’d effectively be spinning an evolutionary roulette wheel and hoping it landed on unprecedented global health crisis. Not so much maniacal as just lucky. So, it’s highly unlikely (there’s that word again) that SARS-CoV-2 came directly from the bat coronavirus being released from the lab in Wuhan. If we stop for a moment and think about it, the bat corona virus already existed in the world, so what would releasing it from a lab without extensive modification really achieve? It is much more likely that there is an animal intermediate we’re currently missing in the natural evolution of Coronavirus, most likely the result of having animals in close proximity to other animals as well as humans at the animal market in Wuhan. But as of the writing of this blog this route has not been proven.

Still not convinced that the virus did not come from a lab? OK, let’s keep going. How do we even go about making a virus? At this point we are going to have to dig into some molecular biology, so hold on to your butts!  


Homemade Viruses

We start with everyone’s favourite helical molecule, DNA, and a process called transcription. In transcription, DNA is partially unwound and a single stranded complementary (opposite) copy of the DNA sequence is produced, which we call RNA. RNA then is translated into proteins. When a virus infects a cell, it releases its genetic material (DNA or RNA) and uses our own cellular machinery to produce more viruses. If we were so inclined *cough super villain cough*, we could isolate this genetic material and, using an enzyme called reverse transcriptase, make a copy of the viral genome for our own nefarious purposes (or try and make a vaccine). This is called complimentary DNA (cDNA) and can be used to produce an infectious virus in a host which we can manipulate according to our wishes. In fact, this technique has been used already to study caliciviruses, alphaviruses, flaviviruses, arteriviruses, and *drum roll* coronaviruses! This paragraph makes this sound easy but don’t be fooled, this is certainly not the case.  Making a zoonotic virus, an animal virus that can infect humans, is a significant undertaking, but not as significant as making a zoonotic virus that can be spread between humans.

So how do we know this is not where our SARS-CoV-2 comes from? To start, we are going to investigate the genome of SARS-CoV-2 and compare it with other notable coronaviruses. A recent paper published in Nature by Andersen and colleagues has identified two notable features in SARS-CoV-2’s genome that can help us answer this question. The first is that SARS-CoV-2 interacts well with a human protein called ACE2 because of five mutations on the spike protein (the bits poking out of the virus in Figure 4 – for more information on the spike protein see here). The second is that SARS-CoV-2’s spike protein has an additional twelve bases in its RNA sequence which make it particularly infectious and able to jump between host species. On face value, this sounds like a convincing argument for SARS-CoV-2 being made in a lab. Just add a little change to the genome and release it on an unsuspecting populace. Basic super villain stuff. However, as we dig a little deeper into the science behind this, this begins to seem much less likely.


Figure 4: Illustration of SARS-CoV-2 and its spike protein by Thomas Splettstößer.

SARS-CoV-2 and ACE2 Binding

Let’s start with the optimised binding to human Angiotensin-Converting Enzyme 2, or ACE2 for short. ACE2 is a human enzyme that decorates the outer surface of a variety of cells throughout the human body, including the lungs. On a normal day, ACE2 plays an important role in cardiovascular (heart) and renal (kidney) function by producing vasodilators, key molecules that open blood vessels to increase blood flow and lower blood pressure. On an abnormal day an invasive virus (SARS-CoV-2) can bind to ACE2, enter our cells, and hijack our cell’s machinery to produce more viruses. If we compare the receptor binding domains of the spike protein from SARS-CoV (SARS-CoV-2’s 2002 predecessor), bat coronavirus and the SARS-CoV-2, we can see five key differences which improve SARS-CoV-2’s interaction with human ACE2. However, computational simulations show the interaction is far from perfect, and the binding differs from previously predicted binding modes. Furthermore, computational modelling suggests the spike protein is capable of recognising ACE2 in a number of animal species, with the exception of mice or rats. If these five key mutations were the only differences it would be more indicative of deliberate manipulation, however, the presence of 1095 other mutations distributed across the genome is much more suggestive of evolution through an animal intermediate.


Super Villain Interlude II: Electric Boogaloo

If I don my super villain costume again, to cover my tracks and make this look convincing I need to identify and isolate the bat corona virus, produce cDNA from that virus, develop a system to produce and study my new virus in a lab separate from current published methods, perform extensive computational modelling to identify a previously unreported binding mode for the spike protein, and then add in thousands of innocuous mutations without impairing the virus. Is all this possible? Of course, we have the technology as I explained earlier. But is it likely? Not really. This would take a large team of world leading experts from several different fields working for years in complete secrecy at the cutting edge of molecular biology. At this point were entering that rocky ground from earlier where the justification for the conspiracy theory is getting complex to the point of near impossibility.  


Adding Sugar to a Virus Makes it Worse?

Next up, a polybasic furin cleavage site and O-linked glycans! Or, in English, some other stuff that makes SARS-CoV-2 more infectious. Part of SARS-CoV-2’s spike protein has a sequence made up of two different amino acids (RRAR) which is recognised and cut by a protease (a protein cutting enzyme) called furin. Cutting this sequence is predicted to be a key factor in virus binding to and gaining entry to cells. These sites are a signature of other highly infectious avian influenza viruses; affecting the pathogenicity of the virus and the hosts the viruses can infect. Natural selection of these sites can allow it to jump between species and turn a low-level pathogen to a highly pathogenic, ‘we-should-all-be-worried, “it’s over 9000”’-level pathogen.

What does that have to do with glycans? When furin cleaves the spike protein it makes two new sites either side of the cut, which scientist have predicted to be targets for O-linked glycosylation (attachment of a type of sugar to oxygen atoms on a protein). But what do these glycans even do? Well, we don’t exactly know yet for SARS-CoV-2. But we do know from experience that O-linked glycosylation can be used by viruses to avoid the immune system.

So, what does this cleavage site tell us about the possibility of making corona virus in a lab? The development of the furin cleavage site and the prediction of glycans also help us put this conspiracy theory to rest. Such cleavage sites are typically the result of a low-pathogenicity virus interacting with an immune system over many generations. Of course, we have the technology to add in the RRAR sequence into our hypothetical cDNA virus genome no problem, but accurately predicting where to put that site is a wholly different challenge. Natural selection in viruses can manage this by rolling the dice many millions of times until a random change, or more likely changes, grant such a significant advantage that a dominant version of the virus is selected out; a process that has been observed previously with influenza and furin cleavage sites. If you want a cleavage site for your new lab made virus, your best bet is to isolate a genetically similar virus and expose it repeatedly animals with ACE2 receptors akin to human ACE2. Cell culture wouldn’t cut it as interaction with an immune system is the driving factor in these changes, and we’ve already seen that rats and mice aren’t a viable system from the computational modelling. A piece of work on this scale represents a considerable time sink and monetary investment in an inefficient process which relies on roll of the dice to provide the desired results. As we have observed this evolutionary behaviour before in nature it stands to reason that the furin cleavage site is the result of natural selection and not deliberate manipulation.

Conclusion

We’ve covered a lot of ground from abductive reasoning and a young Earth to molecular biology and furin cleavage sites in our quest to unpick this conspiracy theory. As more studies are published the specifics of this may change, but, barring a colossal government coverup being unmasked, the involvement of deliberate manipulation in a lab appears unlikely. The evidence suggests the virus originated in bats, but it is highly unlikely the bat virus (RaTG13-CoV) is the direct precursor to SARS-CoV-2. Our best candidate for an intermediate species comes from a pangolin coronavirus which has been found to share the five mutations in the spike protein that facilitate ACE2 binding, but not the furin cleavage site11. We have shown that it is indeed possible to make our own viruses in a lab, but SARS-CoV-2’s backbone doesn’t match up with any of the currently available reverse genetic systems so this is unlikely to be a factor. And finally, looking deeper into the genome of SARS-CoV-2 we see ample evidence of natural selection across the whole viral genome, not just in the spike protein’s binding region, and the appearance of a furin cleavage site; a well-documented naturally selected phenomenon observed in viruses previously. Based on the available evidence, discounting any secret data that may be being held hostage in a secret lair hidden in a volcano, we come to the most logical and simple answer. SARS-CoV-2 was most likely not made in a lab but evolved naturally from bat coronavirus via an animal intermediate, possibly pangolins.

Further Reading

Acknowledgements

I would like to thank a number of people for help with the writing of this post, Harri Webb for acting as a fire breathing alpaca wrangler, Mary Cruise for proof reading and suggestions, Thomas Splettstöße for the figures that look professionally made, and the members of the Coronavirus structural taskforce, particularly Alex Payne, Dale Tronrud, and Andrea Thorn for all their help and suggestions.

Proteins are complex and fickle molecules. Experimental structure determination can teach us a lot about their function, but this is not the easiest thing to do. It’s not as simple as looking through a microscope, focussing, and taking a picture of the protein. It’s more like when you have a broken arm and the doctor uses X-rays to see what is inside, with the minor caveat that you first need to remove your arm from the rest of your limbs, crystallise your arm with thousands of other identical copies, and THEN shoot the high energy X-rays at the crystal. After successfully navigating a whole lot of maths, and you can finally get a protein structure.

Different Methods

Experimentally determined corona virus protein structures can come from three sources: X-ray crystallography, electron cryo microscopy (cryo-EM) and solution nuclear magnetic resonance (NMR). Each experimental technique has its own advantages and disadvantages. These structural techniques can also be complemented with a number of other techniques such as mass spectrometry, chemical cross-linking, fluorescence resonance energy transfer, and genetics, to fill in the finer details around structure and function.

Cryo-EM

If you are looking to find the structure of a “big” molecule (although it's still quite small, really) electron microscopy can help you. Unlike in other techniques, the biomolecule of interest is imaged directly using a beam of electrons and a system of lenses. The complicated bit is turning these 2D pictures into 3D objects. This can be achieved by imaging thousands of the same kind of biomolecules in different orientations, so that we can reconstruct it in 3D. Although Cryo-EM is historically considered a lower resolution technique, recent technological advances have brought about a resolution revolution, and some structures are almost as detailed as those from X-ray crystallography. As a result, Cryo-EM can now show us amino acid sidechains, surface water molecules, and non-covalently bound ligands, which were previously the purview of X-ray Crystallography alone.

Formation of a Cryo-EM structure. (A): Picture of thousand different angles of the strcuture, (B): Averages of the 16 most populated classes out of 118,556 selected particles, (C): 3.3 Å map of entire complex.
Original pictures from: Matthies, D., Bae, C., Toombes, G.E., Fox, T., Bartesaghi, A., Subramaniam, S., Swartz, K.J. (2018) Life 2018;7:e37558, edited by Ferdinand Kirsten, License: CC BY-ND 2.0
Formation of a Cryo-EM structure. (A): Picture of thousand different angles of the strcuture, (B): Averages of the 16 most populated classes out of 118,556 selected particles, (C): 3.3 Å map of entire complex.
Original pictures from: Matthies, D., Bae, C., Toombes, G.E., Fox, T., Bartesaghi, A., Subramaniam, S., Swartz, K.J. (2018) Life 2018;7:e37558, edited by Ferdinand Kirsten, License: CC BY-ND 2.0

NMR-spectroscopy

An important step in NMR-spectroscopy is the so-called isotope enrichment. While the typical MRI at your doctors just measures the basic whereabouts of the atom nuclei in a certain tissue, this method can help identify the distribution of carbon atoms in a structure. However, this requires that some carbons must differ from others. Different isotopes of carbon, with different numbers of neutrons in their nuclei, are incorporated into the protein in the purification process. After this purification, the protein is suspended in a strong magnetic field and is probed with radio waves. The distinctive resonance of each isotope is then analysed, yielding information about the whereabouts of the different carbon nuclei and revealing the distances and possible connections between them. Using the knowledge about these distances, scientists can solve the Sudoku-like puzzle to generate an atomic model of the protein. This method only works for small to medium sized proteins, as larger structures cause problems with overlapping peaks in the resonance spectra. On the other hand, NMR-spectroscopy has a major advantage in its ability to measure flexible proteins in solution instead of solid states, which may hinder molecular movement.

NMR visualisation: Some of the restraints used to solve the structure of a small monomeric hemoglobin are shown here, using software from the BioMagResBank1. The protein (1vre and 1vrf) is shown in green, and restraints are shown in yellow. https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/methods-for-determining-structure
Visualisation of a data-Set obtained through NMR-spectroscopy. Some of the restraints used to solve the structure of hemoglobin are shown here, using a specific software. The protein (PDB: 1vre and 1vrf) is shown in green, and restraints are shown in yellow.
Picture courtesy of PDB101.rcsb.org

X-ray crystallopraphy

Of the methods discussed here, X-ray crystallography has produced the most structures to date, totalling 145 252 in the PDB compared to 12 965 from NMR and only 4 926 from Cryo-EM. However, X-ray crystallography has a major drawback: the need for a protein crystal. While this common method can provide very detailed atomic information, showing every atom of each amino acid and even of ligands, inhibitors, ions and other molecules included in the structure, the process of crystallization is difficult and might limit which type of protein that is studied. Purifying a protein for crystallisation has become much more straightforward in recent years but is still a non-trivial task. After purification, the production of a protein crystal from can take months to years before structure-worthy data can be measured from it. In particular flexible proteins are much harder to crystallise. As enzymes or receptors, a lot of proteins rely on movable parts and different conformations to fully operate, so unfortunately, interesting proteins are often flexible! Once produced, the crystal is cooled in liquid nitrogen and subjected to an intense X-ray beam. You can compare this to a cystal that you hold to the light and observe its reflections on a wall. In this case, the X-rays hit the protein and get diffracted in a specific pattern, but the diffracted rays cannot give a picture of the crystal, but need to be interpreted with a structural model. The distribution of electrons can be calculated from this pattern, resulting in an electron density map with the estimated location of each atom.

Basic Workflow of R-ray crystallography, from crystal to atomic model. Crystal by Andrea Thorn, Diffraction pattern by Sabrina Stäb, picture by Ferdinand Kirsten
Basic Workflow of X-ray crystallography. The protein crystal ist depicted to a diffraction pattern from X-ray waves. The pattern is then interpreted and solved into an electron density map with mathematical algorythms. An atomic model can be estimated and refined based on this map.
Crystal by Andrea Thorn, Diffraction pattern by Sabrina Stäb, image by Ferdinand Kirsten.

Prospect

The molecular models obtained by these methods open up numerous possibilities: structure-based drug design, computational dynamics simulations and answers to biological questions. But how can we interpret and refine those models to extract every last biological detail? This will be discussed in the next blog entry.

Interested? Learn more about it:

Pharmaceutical drugs can be found by chance, but today, most so-called active pharmaceutical ingredients (APIs) are developed through a long, iterative process of designing and testing them.
How?

Targets and active ingredients

Most medicinal drugs are small molecules with up to 70 atoms, which bind in the body to larger molecules, or macromolecules. These so-called targets are typically proteins (long chains of amino acids), RNA, DNA (long chains of nucleotides) or carbohydrates (long chains of sugars). However, most targets are proteins.

Proteins are the workhorses of all living organisms: fungi, animals, plants, bacteria and even viruses (sic!) utilize them. They are the tools that allow us to digest what we eat, enable cell division, and form muscles and hair. One reason why our diet needs to contain proteins is that they are disassembled into amino acids from which new proteins can be made. Some amino acids we can’t make ourselves, so we HAVE to get them from our diet - these are the “essential” amino acids. To properly function, our body needs to be able to use a large variety of proteins - our genes encode at least 20,000 of them​1​!

An example: How Aspirin works

Our bodies require the proper function of proteins to live, but not all proteins are good for you: For chronic illnesses or metabolic disorders, diminishing the activity of certain proteins might be advantageous. A good example is cyclooxygenase-II or COX for short. This protein is formed when cells are injured or during inflammation and it catalyzes an important step in the production of pain mediators, called prostaglandines​2​. Without cyclooxygenase-II, no pain mediators can be produced.

Acetylsalicylic acid - also known as ASS or Aspirin – binds to cyclooxygenase-II and stops it from working (see image below). As a consequence, your body stops producing those pain mediators, and your pain is relieved*. In this case, cyclooxygenase-II is the target and ASS the active pharmaceutical ingredient.

inhibiion of COX by aspirin
Left: cyclooxygenase-II, a protein that produces pain mediating prostaglandines. Right: acetylation (4 red atoms) by acetyl salicylic acid (aspirin) blocks the channel into the catalytic center, so that cyclooxygenase-II can no longer produce pain mediators. Its effect only ends when the body has produced new cyclooxygenase-II (a few hours). Image by Andrea Thorn.

Rational drug design

There are two major methods in rational drug design:

Indirect or ligand-based drug design utilizes molecules which are similar to known active pharmaceutical ingredients in their shape and charge. These molecules are then tested for binding and/or inhibition of the target, or, just as often, a resulting change in some biological parameter of interest, such as the killing of a virus!

Direct or structure-based drug design utilizes knowledge about the target. The potential API is chosen to bind to the target - for example, in the case of Cyclooxgenase-II, to fit in the channel (see image). In fragment based drug design, several smaller molecules are bound to sites in the target and with this knowledge, an active pharmaceutical ingredient is designed that combines their properties.

For both of these methods a pre-selection of potential molecules can be done by computer-aided drug design. However, for strutcure-based drug design, the macromolecular structure of the target must be known.

The Coronavirus Structural Task Force supports the search for a drug against COVID-19 by validating and, where possible, improving the macromolecular structures of potential coronavirus targets. We also offer drug designers information about different SARS-Cov-2 and SARS-CoV macromolecules. With this, we hope to do our part in the fight against COVID-19.

(blog header image: aspirin tablets by Ragesoss, Wikimedia Commons / license: CC 4.0)

* Unfortunately, instead of prostaglandines, the body then produces more leucotrienes, which can cause asthma attacks - hence, asthma patients should only take aspirin after consulting their GP.

  1. 1.
    Ponomarenko EA, Poverennaya EV, Ilgisonis EV, et al. The Size of the Human Proteome: The Width and Depth. International Journal of Analytical Chemistry. Published online 2016:1-6. doi:10.1155/2016/7436849
  2. 2.
    Ricciotti E, FitzGerald GA. Prostaglandins and Inflammation. Arterioscler Thromb Vasc Biol. Published online May 2011:986-1000. doi:10.1161/atvbaha.110.207449

SARS-CoV-2: Not new, but different

The novel Coronavirus (2019‐nCoV) is classified as a large positive sense single stranded RNA-Virus from the family of betacoronaviruses. It shows high genetic similarity to SARS‐CoV and MERS‐CoV and is even closer related to the Bat-SARS-like corona virus, from which it most likely evolved. Even though it shows a lot of similarities to its ancestors, further insights in the infection mechanism and the structure of its proteins reveal significant differences.

But what is in it?

Like most RNA-viruses, the virus has a lipidic hull, with envelope and other proteins integrated in it. This viral shell is responsible for the interaction with host cells and the protection of the inner parts, most importantly: the viral RNA. This RNA acts as a direct template for the translation of two polyproteins named pp1a and pp1ab which encode the 16 non-structural proteins (nsps) of the replication‐transcription complex (RTC). Those 16 nsps, encoded by about two third of the genome (in terms of length), are cleaved from the polyprotein by the chymotrypsin‐like protease (3CLpro) (=Main protease) and one or two papain‐like proteases to generate the functional single proteins. As a result, the RTC synthesizes a variety of subgenomic RNAs (sgRNAs) in a discontinuous transcription, which serve as templates to produce subgenomic mRNA. Other open reading frames of the genome encode at least four structural proteins, that are necessary for the assembly of the virions, the hull and the infection of cells (called S-, M-,E- and N-protein for spike, membrane, envelope and nucleocapsid).

Visualisation of the SARS-CoV-2 structure.  The envelope-protein E, membrane protein M and spike-protein S bound to the viral envelope, the Nucleocapsin-protein N and single-stranded RNA inside. Image: Thomas Splettstoesser; www.scistyle.com
Visualisation of the SARS-CoV-2 structure. The envelope-protein E, membrane protein M and spike-protein S bound to the viral envelope, the Nucleocapsin-protein N and single-stranded RNA inside. Image: Thomas Splettstoesser; www.scistyle.com

Making contact

The majority of infected cells are ACE2 (Angiotensin-converting enzyme 2)-bearing cells of the respiratory system. The viral mRNA is introduced through endocytosis via the spike glycoprotein of the Coronavirus. What does this mean? The S- or spike protein which forms the "corona" around the virus binds with its receptor-binding domain (RBD) to the receptor, which is located on the surface of the host cells. Afterwards, the virus can merge with the cell through a complicated mechanism, the so-called endocytosis. Once infected, these cells now act as a multiplicator for the virus which provokes a strong reaction of the immune system. Most common symptoms include cough, fever, fatigue, loss of taste, headache, diarrhoea, dyspnoea, and lymphopenia or pneumonia, even causing death of the patient in severe cases.

Crystal structure of spike protein receptor-binding domain from SARS coronavirus epidemic strain complexed with human-civet chimeric receptor ACE2, picture by Ferdinand Kirsten
Crystal structure of spike protein receptor-binding domain (RBD) from SARS coronavirus epidemic strain (2002-2003) (magenta) complexed with human-civet chimeric receptor ACE2 (green). The green bit is usually bound to the host cell and the magenta bit is at the top of the spike on the outside of the virus. PDB: 3SCL, picture by Ferdinand Kirsten

The structure of the virus, its infection mechanism and multiplication offer numerous possibilities for drug targeting, such as the inhibition of the main protease or the polymerases, the disturbance of the assembly of shell and entry proteins or the replication‐transcription complex and direct mRNA antiviral methods. However, none of them has been proven effective in clinical studies to this point.

Learn more:

Form follows function

Proteins are big molecules, ranging from 400 to 20 000 atoms. They are the work horses of the living world – they break down what you eat, build your muscles, organise cell division, make up hair and skin. They are formed from amino acids as a long chain that then cross-links and folds into the functional molecule. The sequence of the amino acids – there are 20 different ones – determines the fold, but in many cases, we cannot predict it: it is too complicated (yet), so that we have to determine the molecule’s shape experimentally.

Molecular model of Penicillin by Dorothy Hodgkin
Molecular model of Penicillin by Dorothy Hodgkin with electron density, ca. 1945. Picture courtesy of https://proteopedia.org/wiki/index.php/Molecular_sculpture

If we learn the structure of a protein, we can understand how it works and what it does. The corona virus encodes its own proteins which are made by human cells when infected. These proteins interact with human proteins, and hence, understanding them is crucial: disabling the viral proteins, or disrupting their interactions with the human host can stop the infection, and permit us to fight the virus and gain the upper hand.

But how do you even measure and visualize something so small? Good question! After the difficult task of making a crystal from such large (and somewhat floppy) molecules and shooting it with X-rays like a madman, the data get interpreted with a model of the structure.

Growth of known protein structures
Growth in the number and complexity of structures in the Protein Data Bank (PDB; courtesy of the RCSB Protein Data Bank http://www.pdb.org/pdb/home)

But even if you have a model, it is really difficult to see anything. There are too many atoms. The increase in the size of molecular structures we can determine experimentally, and their growing number necessitated a better way to visualize them. A big step to solve that problem was taken by Jane Richardson of Duke University, when she created the Ribbon Diagram in 1980.

The Ribbon Diagram

"I don't see how you could possibly describe a protein structure in a thousand words, but you can come a lot closer with one picture."

- Jane Richardson

These three-dimensional, schematic representations are the most common visualization of protein structures. The ribbon shows the backbone (= amino acid chain) of the protein. Depending on the basic fold, determined by so-called hydrogen bonds, the peptide chain can be separated into one of three categories, so called secondary structures: α-helices, β-sheets and loops. These are then shown as ribbon helices, arrows (to indicate parallel β-sheets, in which all amino acid chains go in the same direction, or anti-parallel β-sheets, in which they have alternating directions). Loops are shown as a tube with a smaller diameter than α-helices or β-sheets. Any additional features can then be added, as well as labelling.

Different visualizations of an alpha-helix
Different visualizations of an alpha-helix: The backbone shown as sticks and simplified as a ribbon or cartoon. Picture by Ferdinand Kirsten.
Different visualisations of an anti-parallel beta-sheet
Different visualisations of an anti-parallel beta-sheet with a loop connection its two strands. The backbone shown as sticks and simplified as a ribbon or cartoon. Picture by Ferdinand Kirsten.

Looking at macromolecular structures with visualizations like the ribbon diagram can provide a good overall view of the protein’s inner conformation, its symmetry and possible interaction and binding sites. Different structural features promote different functions in proteins. A uniform presentation of gathered data is the key to better understanding and comparing these small machines, which work in and around us at every second.

Three views of PDB 6vxs
The ADP-ribose-phosphatase of NSP3 from SARS CoV-2 portrayed from different angles (Protein Data bank entry 6vxs). Picture by Ferdinand Kirsten.

Regarding the current problem child – the SARS-Coronavirus-2 (or hCoV2019), ribbon diagrams of viral proteins reveal numerous insights into the structure and function of different parts of the virus and its infection of host cells. While making the atomic model to begin with is a difficult task in its own right, the contributions of Jane Richardson and others form how we, and every new generation of molecular biologists, perceives and thinks of macromolecules and the molecular basis of life.

Got interested? Read more about this at:

https://iubmb.onlinelibrary.wiley.com/doi/full/10.1002/bmb.2002.494030010005

https://research.duke.edu/ribbon-diagrams

https://blogs.sciencemag.org/pipeline/archives/2018/11/05/hail-to-the-ribbon

Coronavirus Structural Taskforce
Top