Coronavirus
Structural Task Force

Exoribonuclease: Making the most when mistakes are made

The building plan

Storing the building plans for a virus in its genome is much like how we store ideas in language. This may sound strange but, as an example, typos in spelling, grammar, or word usage, can lead to the meaning of a sentence either changing dramatically, remaining virtually unchanged, or becoming complete nonsense. The SARS-CoV-2 genome consists of RNA. Transcription of this RNA runs into a similar problem: errors can lead to the loss of function, a gain of function, or be completely inconsequential to the resulting protein (Figure 1). Large changes may break the virus, but smaller changes may provide an advantage and are essential for evolution.

Figure 1. What can happen when mistakes are made A. Errors can cause a freeze in transcription. B. Errors can cause a copy to lose meaning and would continue with subsequent copies. C. Errors can be deleted and corrected as information is copied.

Targeting the copy machine

In a previous article we spoke about the copy machinery of the virus, including the RNA-dependent RNA polymerase (RdRp), and drugs targeting it, such as Remdesivir. The goal of these drugs is to jam the enzyme and halt RNA production - or to cause more errors than are sustainable, with the end result being a less infectious virus. The reason the development of drugs targeting the copy machinery of RNA is worthwhile is that humans don’t have machinery to reproduce RNA from RNA. This means drugs targeting this machinery are less likely to interfere with normal processes in people. What if the virus could quickly repair these errors before the new genome is packed into a hull and kicked out the door? That would make finding a therapeutic much more difficult…

Correctional facilities

Unfortunately, SARS-CoV-2 has a way to repair the mistakes. When errors are introduced in transcription through environmental mutagenesis or even mutations caused by nucleotide analogs like Ribavarin​1–3​, the non-structural protein 14 (nsp14) has the ability to remove them. This multifunctional protein removes errors with the exoribonuclease (ExoN) activity of its N-terminal domain, while the C-terminal domain has the unrelated function of methylating the end cap of the viral RNA​3,4​.  

However, this ExoN does not work alone. There is a replication complex made up of proteins performing many roles in the production of new RNA with high fidelity. Nsp12 is the main hub that makes a new RNA chain to complement the template. Nsp7 and nsp8 have a “processivity” role to enable nsp12 to function efficiently. In addition to these proteins there is a two-component proofreading system of Helicase (nsp13) and the ExoN domain of nsp14. Helicase can detect misshapen RNA helices caused by errors made by the copy machinery​5​. It then unwinds these double strands of RNA and feeds the strand containing the error into the ExoN domain of nsp14 where they are chopped out. This results in nsp12 continuing RNA replication where it left off.

Exoribonuclease or no exoribonuclease

Figure 2. Presence of Exoribonuclease (ExoN) is associated with large viral genomes. Viral genomes containing an exoribonuclease proofreading gene highlighted in red. Figure modified from Smith, Denison 2012​6​.

The proofreading ability from Helicase and nsp14 ExoN allows SARS-CoV-2 to have a huge genome as compared to other viruses​6​(Figure 2). The large 29.9 kb genome of SARS-CoV-2 requires much more physical space to accommodate the necessary genetic information for reproduction when compared to other RNA viruses, such as Rhinovirus that has a genome between 7.2 kb and 8.5 kb in size (Figure 3). When no ExoN proofreading is present genomes cannot expand beyond 20 kb in size​6​(Figure 2). Maybe by removing the exoribonuclease activity, irreversible damage could be caused to the genome of SARS-CoV-2.

Figure 3. A high detail 3D printed model of SARS-CoV-2 alongside Rhinovirus. Scaled at 1 to 1,000,000 (1 mm represents 1 nm).

Nsp14 Structure

In order to understand how nsp14 can do this, we need to find out its atomic structure; this may also allow us to develop a drug which hinders its function. However, to this date, no structure of nsp14 from SARS-CoV-2 has been solved. However, structures have been solved of nsp14 in complex with another viral protein, nsp10, both from SARS-CoV (PDB entries 5nfy, 5c8s, 5c8t, 5c8u)​2,7​. As the protein sequences are very similar between SARS-CoV and SARS-CoV-2 (nsp14 is 95%, and nsp10 is 97% identical), it can be assumed that the SARS-CoV-2 structure as well as its functionality are very similar to SARS-CoV. The active site of the ExoN domain of nsp14 from SARS-CoV-2 has a DEEDh motif (named for the one-letter codes of the amino acids involved) containing a histidine as well as two aspartates and two glutamates​2,3,7,8​

Figure 4. Structure (PDB ID: 5c8s) of SARS-CoV nsp14 bound to nsp10. The orange domain of nsp14 is responsible for the exoribonuclease activity with the active site residues highlighted in yellow. The green domain has methyltransferase activity. The dark grey region joining the two domains is flexible. The nsp10-interacting region is shown in pink and finally, nsp10 in blue.

Nsp14 interacts with nsp10

The N-terminus of nsp14 interacts with nsp10 (pink and blue, respectively, in Figure 4). The following domain (orange) has been shown to have exoribonuclease activity on double stranded RNA in a 3’ to 5’ direction​9​. When nsp10 is interacting with nsp14 there is a 35 fold increase in exoribonuclease activity, which is thought to occur due to conformational changes caused by formation of the complex​2,9​. The ExoN domain of nsp14 (orange) is connected to the methyltransferase domain (green) by a flexible hinge (black)​7,10​. This flexible region opens up the methyltransferase active site to allow methylation of the N7 of the 5’ Guanosine triphosphate of RNA​10​. There are three zinc finger motifs in nsp14 with two found in the ExoN domain and one in the methyltransferase domain​2,7​. In combination with the two further zinc sites in nsp10, these zinc fingers hold loops of the proteins together and are involved with nucleotide interaction​2,7​.

Nsp14 has also been demonstrated to form complexes with the copy machinery , nsp12, nsp7, and nsp8, although this interaction is independent of nsp10​2,11,12​.

Exoribonuclease active site and potential drug development

Figure 5. Active site of Exoribonuclease domain from SARS-CoV (PDB entry 5c8s). A. Electrostatic surface with the negatively charged pocket in red. B. Low energy conformation of multiple overlaid ligands from an in silico screen in the DEEDh active site (taken from Khater S. et al 2020).

Scientists are searching for drugs that could be used to target nsp14 in order to find a cure for COVID-19. The active site of the ExoN domain of nsp14 has five residues that are essential for activity that form a negatively charged pocket (Figure 5A)​7​. Currently researchers are using the nsp14 structure from SARS-CoV to model a SARS-CoV-2 structure which can be used to identify compounds that could bind to the active site (Figure 5). These in silico screens start with nucleotide analog drugs like Remdesivir,  Ribivarin or Ritonavir that are currently used as antiviral treatments for other viruses​13–15​. These nucleotide analogs are then changed to achieve a better binding to Nsp14’s active site in order to block it (Figure 5B).

As the ExoN is essential to support the huge 29.9kb genome of SARS-CoV-2, targeting nsp14 could lead to an effective treatment to COVID-19. Although drugs that target just nsp14 could be effective at increasing the error rate in RNA production by the virus, a more effective treatment will require inhibition of the RdRp of the copy machinery at the same time!

Available structures

If you would like to look at the currently available structures for Nsp14(currently only available from SARS-CoV), they are available from our data base; we provide information on the quality of measurement data and models as well as improved structures. The highest resolution structure of nsp14 is PDB entry 5c8t at 3.2Å. This has a bound S-Adenosyl methionine ligand as well as zinc atoms present. Alongside this, another structure of Nsp14 bound to S-Adenosyl homocysteine and a guanosine-triphosphate-adenosine ligand as well as zinc at 3.33Å resolution has been published (PDB: 5c8s). Additionally, two structures with zinc atoms but no ligands are available (PDB 5c8u 3.4Å at and 5nfy at 3.34Å). Both PDB entry 5c8t and 5nfy have improved structures re-refined by our group.

Sources

  1. 1.
    Zuo Y. Exoribonuclease superfamilies: structural analysis and phylogenetic distribution. Nucleic Acids Research. Published online March 1, 2001:1017-1026. doi:10.1093/nar/29.5.1017
  2. 2.
    Ferron F, Subissi L, Silveira De Morais AT, et al. Structural and molecular basis of mismatch correction and ribavirin excision from coronavirus RNA. Proc Natl Acad Sci USA. Published online December 26, 2017:E162-E171. doi:10.1073/pnas.1718806115
  3. 3.
    Barnes MH, Spacciapoli P, Li DH, Brown NC. The 3′–5′ exonuclease site of DNA polymerase III from Gram-positive bacteria: definition of a novel motif structure. Gene. Published online January 1995:45-50. doi:10.1016/0378-1119(95)00530-j
  4. 4.
    Chen Y, Cai H, Pan J, et al. Functional screen reveals SARS coronavirus nonstructural protein nsp14 as a novel cap N7 methyltransferase. Proceedings of the National Academy of Sciences. Published online February 10, 2009:3484-3489. doi:10.1073/pnas.0808790106
  5. 5.
    Chen J, Malone B, Llewellyn E, et al. Structural basis for helicase-polymerase coupling in the SARS-CoV-2 replication-transcription complex. Published online July 8, 2020. doi:10.1101/2020.07.08.194084
  6. 6.
    Smith EC, Denison MR. Implications of altered replication fidelity on the evolution and pathogenesis of coronaviruses. Current Opinion in Virology. Published online October 2012:519-524. doi:10.1016/j.coviro.2012.07.005
  7. 7.
    Ma Y, Wu L, Shaw N, et al. Structural basis and functional analysis of the SARS coronavirus nsp14–nsp10 complex. Proc Natl Acad Sci USA. Published online July 9, 2015:9436-9441. doi:10.1073/pnas.1508686112
  8. 8.
    Eckerle LD, Becker MM, Halpin RA, et al. Infidelity of SARS-CoV Nsp14-Exonuclease Mutant Virus Replication Is Revealed by Complete Genome Sequencing. Emerman M, ed. PLoS Pathog. Published online May 6, 2010:e1000896. doi:10.1371/journal.ppat.1000896
  9. 9.
    Bouvet M, Imbert I, Subissi L, Gluais L, Canard B, Decroly E. RNA 3’-end mismatch excision by the severe acute respiratory syndrome coronavirus nonstructural protein nsp10/nsp14 exoribonuclease complex. Proceedings of the National Academy of Sciences. Published online May 25, 2012:9372-9377. doi:10.1073/pnas.1201130109
  10. 10.
    Ogando NS, Ferron F, Decroly E, Canard B, Posthuma CC, Snijder EJ. The Curious Case of the Nidovirus Exoribonuclease: Its Role in RNA Synthesis and Replication Fidelity. Front Microbiol. Published online August 7, 2019. doi:10.3389/fmicb.2019.01813
  11. 11.
    Subissi L, Posthuma CC, Collet A, et al. One severe acute respiratory syndrome coronavirus protein complex integrates processive RNA polymerase and exonuclease activities. Proc Natl Acad Sci USA. Published online September 2, 2014:E3900-E3909. doi:10.1073/pnas.1323705111
  12. 12.
    Subissi L, Imbert I, Ferron F, et al. SARS-CoV ORF1b-encoded nonstructural proteins 12–16: Replicative enzymes as antiviral targets. Antiviral Research. Published online January 2014:122-130. doi:10.1016/j.antiviral.2013.11.006
  13. 13.
    Khater S, Dasgupta N, Das G. Combining SARS-CoV-2 proofreading exonuclease and RNA-dependent RNA polymerase inhibitors as a strategy to combat COVID-19: a high-throughput in silico screen. Published online June 24, 2020. doi:10.31219/osf.io/7x5ek
  14. 14.
    Shannon A, Le NT-T, Selisko B, et al. Remdesivir and SARS-CoV-2: Structural requirements at both nsp12 RdRp and nsp14 Exonuclease active-sites. Antiviral Research. Published online June 2020:104793. doi:10.1016/j.antiviral.2020.104793
  15. 15.
    Narayanan N, Nair DT. Ritonavir May Inhibit Exoribonuclease Activity of Nsp14 from the SARS-CoV-2 Virus and Potentiate the Activity of Chain Terminating Drugs. chemrxiv.org. Published May 13, 2020. https://chemrxiv.org/articles/Ritonavir_May_Inhibit_Exoribonuclease_Activity_of_Nsp14_from_the_SARS-CoV-2_Virus_and_Potentiate_the_Activity_of_Chain_Terminating_Drugs/12280043

Introduction

Have you heard that the coronavirus “mutates”? Or that there are “several strains” of it around the world? Sounds scary, right? However, the reality is that everything “mutates”. All organisms, over time, acquire differences in their genes, from bacteria to humans. You might be aware that this can happen when your DNA (Deoxyribonucleic Acid) is exposed to UV light (like from the sun!), but this can also happen during DNA replication. This is when a cell uses the template of one of the two DNA strands to make a new complimentary copy of the other strand. Mutation is common to all living organisms (and viruses) and a driver of evolution. This is the first post in a series that will explore coronavirus replication with a focus on the proteins involved. 

How does the coronavirus make more of itself?

SARS-CoV-2 uses single-strand Ribonucleic acid (RNA) to encode its genome, not DNA, and hence belongs to a class of “single-strand RNA viruses”. For this reason, the virus needs a different way to copy its genome than “normal” cells have. The viral protein that copies the RNA is called an “RNA-dependent RNA polymerase” (RdRp). This protein uses the viral RNA as a template to make a new copy of viral RNA, by stringing single ribonucleotides together like beads on a string. This process is called polymerization.

A study by the Morse lab at Texas A&M University showed that SARS-CoV-2 RNA polymerase has a remarkable similarity to the RNA polymerase of SARS-CoV (>95%) as well as MERS-CoV [1], the virus which causes Middle-Eastern Respiratory Syndrome. This means that research performed in response to the SARS and MERS epidemics can inform our response to SARS-CoV-2. Unfortunately, a lack of consistent pandemic-preparedness funding means that we didn’t learn as much about RdRp in time as we could have. Still, RNA polymerase might be a viable drug target for halting the spread and reducing the fatality rate of COVID-19.

Structure of the RNA-Dependent RNA Polymerase

By determining the structure of RdRp, and deeply understanding how it works, we can optimize a drug to specifically target it and hinder its function. To this end, in the last few months, several structures of SARS-CoV-2 RNA polymerase have been published. 

One interesting structure shows RNA polymerase in action, in the process of elongating an RNA strand (see Figure 1).[2] This structure clearly show the polymerase in complex with smaller proteins, non-structural protein 7 and 8 (nsp7 and nsp8). These proteins improve how well the RNA polymerase binds the template RNA and also how long it stays bound before dissociating – a feature called “processivity”.[3]

Figure 1. Front and back views of the structure of elongating RdRp with RNA and two cofactors, nsp7 and nsp8 (PDB ID: 6yyt). Two copies of nsp8 (grey) form sliding poles that help stabilize the RNA (orange ball-and-stick model). One copy of nsp8 binds to the polymerase (blue) directly, but the other copy uses nsp7 (pink) to anchor to a second position on the polymerase.

In the center of the protein is the area where the main action happens, called the “active site”. The amino acids of the polymerase that form the active site have a particular shape and chemical properties, which enable the polymerization reaction to occur very rapidly. In fact, the polymerase can string together as many as 100 nucleotides per second! [3] New RNA molecules can enter the active site through a little window to be added to the growing RNA chain. It is here that the antiviral drugs make their move!

Figure 2. The third view shows the window into the active site through which new nucleotides must enter!

How do antiviral drugs attack RNA-dependent RNA polymerase?

First, let’s talk about Gilead’s FDA-approved drug, Remdesivir, which has taken the spotlight in the search for COVID-19 cures. Remdesivir (which has a fancy chemistry ID, GS-5734, and is sold under the brand name Veklury), is a “nucleotide analog”, which means that it mimics the shape and chemistry of the nucleotides that make up RNA and DNA (see figure). 

Remdesivir was developed originally as a general antiviral drug and was later shown to protect cells (in a test tube) and monkeys (not in a test tube) from the Ebola Virus [4]. However, this was recent enough, and science is slow enough that, until the COVID-19 pandemic, large-scale clinical trials of Remdesivir hadn’t been done yet. So scientists and doctors have been rushing to test the drug in COVID-19 patients. In fact, the US and Japan both approved the drug for “Emergency Use Authorization'' for severe COVID-19 patients as early as May [5], [6]. And, in July, the European Medicines Agency gave Remdesivir a “conditional marketing authorization” (used for drugs that meet an unmet medical need but have insufficient data for normal approval). This allows the use of Remdesivir in severe COVID-19 patients through the next year [7]. So, how the heck does a drug for Ebola, Influenza, or some other viruses also work against COVID-19? I was concerned by this when the news about all the drug trials were coming out – and I’m sure I wasn’t the only one...

The simple answer to that is all these viruses need to do the same thing - copy their RNA genome from an RNA template. And in order to do that, they all end up using basically the same tool, an RNA-Dependent RNA polymerase. And all drugs that are nucleotide analogs use the very same trick: they dress up like ribonucleotides (the "beads on a string" from before) and fool the RNA polymerase into letting them into the active site. Once inside, they get “stuck” in the active site, jamming the polymerase machine. Since this trick should work for any viral RNA polymerase, we can use these drugs for any RNA virus, and call them ‘general antivirals’. Of course, in practice, this doesn't always work, because there are differences between the different RNA polymerases. However, it is a great place to start! In the future, if we have general antivirals for SARS-CoV-2 all ready-to-go, we may be better equipped to deal with another coronavirus outbreak!

Figure 3. We all see what we want to see, I guess.

The Chemistry of Remdesivir

Remdesivir resembles the nucleotide adenine in structure, although it has some fancy chemical add-ons which help make it a better drug (thank you, medicinal chemistry!). When Remdesivir is injected into a vein, it travels through the bloodstream and enters into our cells, which recognize it as a foreign substance and try to digest it. However, what ends up happening is that the cells remove just the fancy chemical add-ons, and then confuse it for a normal adenine nucleotide. In infected cells, the viral RNA-dependent RNA polymerase then starts grabbing these molecules and inserting them into the new viral RNA strand in place of adenine molecules. Remdesivir, now attached to the RNA, jams the polymerase, rendering the virus unable to make more copies of its genome. Ultimately, this halts viral replication and helps the patient fight off the virus.

Figure 4. (A) The red part of Remdesivir makes it a better drug by helping it get from the blood stream into human cells, but it isn’t necessary for jamming the polymerase. It was designed on purpose so that when it gets inside human cells, the cells try to digest it. When they do, they cleave off the red bits, causing it to get confused for an adenine nucleotide.  (B) This causes the cell to add two more phosphates to the molecule, making it the ‘tri’-phosphate form. This is the active form of the molecule, which mimics ATP (C), and is incorporated into the growing RNA chain in the place of ATP. The extra bit sticking off the side (in blue) is called a 1’-cyano group, and makes the RNA get stuck inside the polymerase, jamming it.
Figure 5. Structure of Remdesivir (cyan) in the active site of RNA-dependent RNA polymerase. The window through which new nucleotides enter is to the bottom left of the image. The RNA (orange ball-and-stick model) template strand enters from the bottom right. Remdesivir makes base-pair hydrogen bonds with the opposite uracil base.

Another drug that inhibits the RNA polymerase activity is Favipiravir, sold under brand names Avigan, Abigan, and FabiFlu. Favipiravir has been discovered by Toyama Chemical Co., Ltd. in Japan and it has a similar mechanism to Remdesivir, except that it mimics a guanosine nucleoside instead of an adenine nucleotide [8]. This drug was approved in Japan back in 2014 for use in resistant cases of Influenza A and B, but still remains unapproved in the US (still in Phase II and Phase III clinical trials) and the UK [9]. This drug is also being tested for use against Ebola virus, Lassa virus, and currently SARS-CoV-2 in 43 countries. The approval of Favipiravir for  COVID-19 has been much faster in China (Mar 15, 2020), Russia (Jun 3, 2020), and India (Jun 20, 2020)[10], [11]. Nonetheless, other countries, including Japan, are in various stages of clinical trials, and the results are anticipated to be out by the end of July [10].

So...do we have a cure for SARS-CoV-2?

Sadly, not yet. While the speed at which Remdesivir has gone through clinical trials is unprecedented, more work needs to be done to make sure it is safe and effective. Since (in the big scope of things) not a lot have people have taken Remdesivir, we aren’t really sure what all the side effects are, although there is emerging evidence for liver and kidney damage [12, 13]. The most common side effects are nausea (10% and 9% of patients), indigestion (7%) and increase of transaminases (6% and 8%). In one study, 3.6% of patients in a 10-day trial needed to stop taking therapy due to the latter. However, serious viral infections can also cause liver damage, so separating the two causes is a challenge! Remdesivir is not a cure-all, either. In one study it improved the recovery time from 15 days to 11 days, but it showed no effect for patients with mild to moderate disease, and no difference in median recovery time for patients who were already on a ventilator [14]. Since the drug has to be given by infusion over several days, there is a pretty small window in which Remdesivir can actually help. 

Likewise, Favipiravir has its own side effects such as liver damage, elevated uric acid levels, kidney damage, skin allergies, etc. [15]. These effects restrict it for use by severe diabetes and heart patients. On top of that, it is not suitable for pregnant women because it can cause potential fetal deaths and deformities. It has been shown that Favipiravir works only during the earlier stages of SARS-CoV-2 infection when the body’s immune system isn’t totally drained, whereas it can result in a cytokine storm (when your immune system really freaks out) in severely ill patients. But, unfortunately, the virus doesn’t differentiate between humans while attacking, so a universal drug for COVID-19 has to be safe for use by all people. 

However, these drugs are better than nothing, and by understanding the mechanisms involved, scientists can continue to improve upon the existing drugs for the benefit of all. While most of the ‘general antivirals’ that target RNA Polymerase have failed with SARS-CoV-2, Remdesivir has been relatively successful. Scientists think that this is actually because of a proofreading protein in SARS-CoV-2 called exonuclease. Immediately after the RNA-polymerase makes new RNA, exnuclease checks to make sure the new RNA is correct. In one study, another drug that mimics RNA called Ribivarin was shown to be removed from newly synthesized RNA by exonuclease [16]. Thankfully, Remdesivir is not excised , which is likely why it has been more successful than the other options [17], [18]. To read more about how nsp14 maintains the integrity and virulence of SARS-CoV-2, tune in to a future blog entry!

Figure 6. Hey, we've all been there.

Recommended Structures

For those interested in reviewing the structures further, they are available in our GitHub repo, along with information about validation and, where relevant, improved structures. For a high-resolution comparison of the active site with and without Remdesivir, 7BV2 and 7BV1 (respectively) were published together at 2.5 and 2.8 Å. The elongating structure of the complex shown above (6YYT) has the polymerase as well as the cofactors and RNA very well resolved, with little "missing" density and a resolution of 2.9 Å. It is likely preferable to 6M71 and 7BTF, which were published with a similar resolution but with less of the complex resolved, and no RNA. For those interested, 7C2K and 7BZF (at 2.93 Å and 3.26 Å) show the complex bound to RNA in a pre- and post-translocation state.

Sources

[1] J. S. Morse, T. Lalonde, S. Xu, and W. R. Liu, “Learning from the Past: Possible Urgent Prevention and Treatment Options for Severe Acute Respiratory Infections Caused by 2019-nCoV,” ChemBioChem, vol. 21, no. 5, pp. 730–738, Mar. 2020, doi: 10.1002/cbic.202000047.

[2] H. S. Hillen, G. Kokic, L. Farnung, C. Dienemann, D. Tegunov, and P. Cramer, “Structure of replicating SARS-CoV-2 polymerase,” Nature, May 2020, doi: 10.1038/s41586-020-2368-8.

[3] W. Yin et al., “Structural basis for inhibition of the RNA-dependent RNA polymerase from SARS-CoV-2 by remdesivir,” Science, p. eabc1560, May 2020, doi: 10.1126/science.abc1560.

[4] R. T. Eastman et al., “Remdesivir: A Review of Its Discovery and Development Leading to Emergency Use Authorization for Treatment of COVID-19,” ACS Cent. Sci., May 2020, doi: 10.1021/acscentsci.0c00489.

[5] O. of the Commissioner, “Coronavirus (COVID-19) Update: FDA Issues Emergency Use Authorization for Potential COVID-19 Treatment,” FDA, May 04, 2020. https://www.fda.gov/news-events/press-announcements/coronavirus-covid-19-update-fda-issues-emergency-use-authorization-potential-covid-19-treatment (accessed Jul. 08, 2020).

[6] A. Sternlicht, “Japan Approves Remdesivir For Use On Severe COVID-19 Patients,” Forbes. https://www.forbes.com/sites/alexandrasternlicht/2020/05/07/japan-approves-remdesivir-for-use-on-severe-covid-19-patients/ (accessed Jul. 08, 2020).

[7] D. CZARSKA-THORLEY, “First COVID-19 treatment recommended for EU authorisation,” European Medicines Agency, Jun. 25, 2020. https://www.ema.europa.eu/en/news/first-covid-19-treatment-recommended-eu-authorisation (accessed Jul. 10, 2020).

[8] E. De Clercq, “New Nucleoside Analogues for the Treatment of Hemorrhagic Fever Virus Infections,” Chem. Asian J., vol. 14, no. 22, pp. 3962–3968, Nov. 2019, doi: 10.1002/asia.201900841.

[9] K. Shiraki and T. Daikoku, “Favipiravir, an anti-influenza drug against life-threatening RNA virus infections,” Pharmacol. Ther., vol. 209, p. 107512, May 2020, doi: 10.1016/j.pharmthera.2020.107512.

[10] T. Hornyak, “Japan sending Fujifilm’s flu drug favipiravir to over 40 countries for Covid-19 trials,” CNBC, May 04, 2020. https://www.cnbc.com/2020/05/04/fujifilms-flu-drug-favipiravir-sent-to-43-nations-for-covid-19-trials.html (accessed Jul. 14, 2020).

[11] G. P. Ltd, “Glenmark Becomes the First Pharmaceutical Company in India to Receive Regulatory Approval for Oral Antiviral Favipiravir, for the Treatment of Mild to Moderate COVID-19.” https://www.prnewswire.com/in/news-releases/glenmark-becomes-the-first-pharmaceutical-company-in-india-to-receive-regulatory-approval-for-oral-antiviral-favipiravir-for-the-treatment-of-mild-to-moderate-covid-19-855346546.html (accessed Jul. 14, 2020).

[12] Goldman, J. D. et al. Remdesivir for 5 or 10 Days in Patients with Severe Covid-19. N. Engl. J. Med. (2020) doi:10.1056/NEJMoa2015301

[13] Remdesivir Safety Forecast: Watch the Liver, Kidneys | MedPage Today. https://www.medpagetoday.com/infectiousdisease/covid19/86582

[14] J. H. Beigel et al., “Remdesivir for the Treatment of Covid-19 — Preliminary Report,” N. Engl. J. Med., vol. 0, no. 0, p. null, May 2020, doi: 10.1056/NEJMoa2007764.

[15] Sandhya Ramesh, “Favipiravir, Japanese drug that’s the new Covid treatment hope your chemist will soon stock,” ThePrint, Jun. 25, 2020. https://theprint.in/health/favipiravir-japanese-drug-thats-the-new-covid-treatment-hope-your-chemist-will-soon-stock/447987/ (accessed Jul. 14, 2020).

[16] F. Ferron et al., “Structural and molecular basis of mismatch correction and ribavirin excision from coronavirus RNA,” Proc. Natl. Acad. Sci., vol. 115, no. 2, pp. E162–E171, Jan. 2018, doi: 10.1073/pnas.1718806115.

[17] C. J. Gordon, E. P. Tchesnokov, J. Y. Feng, D. P. Porter, and M. Gotte, “The antiviral compound remdesivir potently inhibits RNA-dependent RNA polymerase from Middle East respiratory syndrome coronavirus,” J. Biol. Chem., Feb. 2020, doi: 10.1074/jbc.AC120.013056.

[18] L. Zhang et al., “Role of 1’-Ribose Cyano Substitution for Remdesivir to Effectively Inhibit both Nucleotide Addition and Proofreading in SARS-CoV-2 Viral RNA Replication,” bioRxiv, p. 2020.04.27.063859, Apr. 2020, doi: 10.1101/2020.04.27.063859.

The instructions and files below will allow you to create your own model of the virus! All you need is some spare time and a 3D printer. In addition, those without access to a 3D printer can still use the STL files to request printing from external services and then follow the instructions on painting and assembling the same way. We do hope that this model will make the virus more tangible, and that the model will not only be printed as a private project, but also be used for outreach activities and in educational institutions.

Our design is based on the best scientific evidence available. Not only are the shapes of the various proteins as true as we can make them, but their numbers as well as the overall size of the virion match experimental results on a scale of 1:1,000,000. If you want to know more about it, please look here. Once you have built a model from our design you will have a good representation of what one of these virions is expected to look like, after being scaled up by a factor of 1,000,000. Therefore 1 mm on the model represents 1 nm (10 Å). (By the way, this would make the RNA that is inside the virus hull 10 metres long and 1 mm thick, and the nucleocapsid around which the RNA is coiled would be about 1 metre and 1 cm in diameter.

We have also designed a scale model of the human anti-body that binds to the spike protein. This is available alongside the virus model and can be attached to the spike protein as desired. For easier printing, painting, and assembly, the virus structure has been broken down into 4 unique components:

To date the structures have been printed successfully on several Fused Deposition Modelling (FDM) printers (Rostok MAX v2 & Prusa I3 MK3 printers), and we anticipate the even higher quality structures will be feasible with alternate methods, such as stereolithography (watch this space). Let us know in the comments! Each of the parts is available in STL format and should be printable through any suitable slicer software. Personal discretion is advised when setting up the prints, as the exact details may differ depending on conditions and equipment. The procedure outlined below will serve as a good starting point.

Printing of the component parts

The first step is to print the individual components. For the virion parts this is very straight forward as the flat surface negates the need for supports. The virion objects can be printed with the minimum infill for support, though infill of 10% is recommended for rigidity.

The other parts (spike proteins and antibodies) provide a more challenging print. The spike protein must be printed 95 times to complete the model, and users can arrange these individually, or using 4 prints of 25x STL file. It is recommended that the spike protein is printed with the crown facing towards the print bed to maximize the support between the bed and eliminating the need to remove supports from the thin delicate stem.

A dual extruder printer would be ideal for spike printing as it would allow supports to be printed in a water soluble plastic, speeding up post-processing. In either case, printing individual or at least fewer spikes with greater spacing generally produces nicer objects that are easier to work with at the price of longer printing time. Indeed, there is a general trade-off between the convenience of the print set-up and the amount of post-processing and tidying needed for all 3D printing tasks, and one must find a compromise which satisfies them.

As stated above, we used FDM printing and ubiquitous poly-lactic acid (PLA), which made the post-processing easier.

Post-processing

Regardless of the approach taken for printing, some amount of tidying will typically be needed to get the objects ready for assembly. Removing the supports can be done with a pair of plyers, while the smaller artifacts and issues will need brushing off or sanding. A dental pick can be quite useful.

Fig. 1. Virion and Spike object surfaces after printing with layer lines and artifacts, such as plastic webbing, clearly visible across surfaces. On the right: Virion after fusing top and bottom and rubbing surfaces with ethyl acetate. Pictures by Ferdinand Kirsten, Matt Reeves.

For PLA, we found the best thing to clean and smooth the surfaces (after support removal), is ethyl acetate. Ethyl acetate dissolves the plastic, breaking down the small extrusion artifacts on the surfaces. This can be used in many ways. We found it best to leave the parts in a sealed ethyl acetate vapour environment, such as a stainless steel pot, which should be cleaned carefully afterwards. This technique results in the most even and clean results, though will take up to a few days to fully smooth each object. The faster method, is to simply submerge the small objects in ethyl acetate for 10-30 seconds, and then remove each object, leaving them to dry out on a surface. For the larger virion parts, the surface can be smoothed by rubbing it down with a cloth damped with ethyl acetate. Ethyl acetate was also used to “weld” the two virion parts together. A small amount was dropped onto the flat surfaces on each section, before the two were pressed together until the plastic fused to become a single object. The seam was then smoothed down using the same process as before. Where one cannot get ethyl acetate from a lab or pharmacy, acetone-free nail-polish remover offers a commercially accessible alternative. you should be using safety glasses and suitable (!) gloves when handling ethyl acetate, ventilate the room well and if there was skin contact use a skin cream after hand washing.

Fig. 2 Spike proteins fresh from printing (left) and after treating with ethyl acetate (right). Picture by Ferdinand Kirsten.

It is worth noting that for the other common 3D printing material acrylonitrile butadiene styrene (ABS) or acetone may produce the same results.

Painting and Gluing

Fig. 3 Computer rendered image of corona virus by Thomas Splettstoesser (left), and finished 3D print by Thorn Lab (right).

As with printing, painting methods and colours is down to personal preference, and here we outline our attempt, which was guided by the illustration by Thomas Splettstoesser as close as possible (see Fig. 3).

The parts were first treated with a primer to help the paint stick to the model. This also acts as a nice even Basecoat. When working with either primer or, as discussed later, an airbrush, one should consider safety: try to do as much as you can in a ventilated space, wearing safety goggles, gloves and a mask. Paint spraying produces a great number of fine particles which you don`t want to breathe in.

For us, the painting process was performed largely with an airbrush, and we highly recommend using one where available, due to the amount of painting required and surface complexity. Where not available, it can of course be done with just a simple brush which will take more time and a higher skill level.

All layer colours, medium thinner, base colours, primer and varnish we used were from Citadel painting. Here is an outline of the specific Citadel colours and materials we used for the model in the figures:

  • Lime: “Moot green”
  • Yellow: “Yriel Yellow”
  • Grey: “Dawnstone”
  • Wheat: “Baneblade Brown”
  • Chocolate: “Doombull Brown”
  • Aqua: “Gauss Blaster Green”
  • Teal: “Kabalite Green”
Fig. 4 Sorting the spike proteins (up left). Spike protein after basecoat (left) and spike protein after highlight with lime green (right). Pictures by Kristopher Nolte.

The spikes were sorted into four sets in order to produce a graded lighting affect, with those on top brighter than those lower down. If you do not plan to use a base and do not have a fixed top and bottom you can skip this part.

We highlighted each Spike Protein with a brighter lime green to achieve more contrast to create depth, which makes the surface topology easier to distinguish. Finally, the highlighting of each spike was intensified by dry-brushing the protein with the “Aqua” colour.

Fig 5. Virion sphere with a zenithal highlight (top right). Virion with features painted by brush (bottom right). Final version on the left. Pictures by Kristopher Nolte.

After painting was complete the spikes and virion were sealed with gloss varnish and matte finish, respectively. This step is optional; however, the varnish protects the paints against damage and wear when being handled.

Finally, the 3D model was assembled. If highlighting was used in the painting step, one should ensure the spikes are placed so that brighter spikes go on top while darker ones at the bottom. Standard modeling glue was used to hold the spikes in place, though superglue or ethyl acetate would also work fine. Because we are planning on mounting this on a stand, we have left a hole at the bottom empty where the rod of our base will go in.

Figure 6. Assembly of virus with spikes individually glued into virion holes using modeling glue. Pictures by Kristopher Nolte.

We hope that our adventure in 3D printing the Corona virus inspires you to give it a try! The process we described was completed in a little over a week. The printing jobs were completed in just over two days, the cleaning and post processing took another two days, while the painting was done over the course of a weekend. This article provides a description of our technique and should provide enough detail on how, with the outlined necessary tools, you could create a similar result. The files have been distributed through Thingiverse, and are distributed under a Creative Commons BY-NC license: You may remix, adapt, and build upon this work non-commercially and acknowledge the "Coronavirus Structural Task Force" as original author.

Figure 7. 3D print illustration by Thomas Splettstösser. Finished corona virus model by Dale Tronrud in Oregon (center) and by the Thorn Lab in Würzburg (right).

As with every 3D printed model, there are many different ways this could be tackled and achieved, and we look forward to seeing the many creative ways explored by others in this endeavor. Please do share experiences and results with us, either through the comments Thingiverse or on Twitter (you can tag us @thornlab or #insidecorona).
For a sense of perspective, we have also produced a model of the highly common rhinovirus, which is available in .stl format at the same scale as the corona virus objects. This is available at: https://www.thingiverse.com/thing:4556845.

Authors

We want to emphasize that the writing of this blog entry was a collaboration of a several people:
Dale Tronrud and Thomas Splettstoesser worked together to create the STL files for the 3D model. Dale was the person to suggest it first (with Andrea Thorn picking up on the idea). Thomas then selected the experimental models and placed all the parts to form a realistic representation. Dale provided the knowledge about the limitations imposed by the nature of 3D printing and broke up Thomas' model into printable parts that can be assembled without too much difficulty. He printed and assembled the first virion from this design.
Matt Reeves was responsible for improving the non-spherical virion model and the printing of the Würzburg model. He also determined the most suitable post-print processing techniques suitable for this project and, along with Dale and others on the team, contributed to many general technical discussions on how the model can be altered or improved further in the future.
Kristopher Nolte took part in the preprocessing and refining of the model together with Ferdinand Kirsten. Kristopher was also responsible for planning and carrying out the assembly and painting process of the Würzburg model.

Introduction

Before I started writing this article, the first thing I did was to google the name of my protein “NendoU” and was greeted by Figure 1. Needless to say, this is not what I was expecting. So, if you’re an anime fan looking for Riki Nendou, a dutiful yet dull-witted boy who likes helping people, particularly prioritising the weak, from The Disastrous Life of Saiki K: I’m afraid you have come to the wrong place. However, now that you’re here, maybe you’d like to learn about an interesting protein involved in SARS-CoV-2 viral replication? It can bind to and process six RNA molecules at a time! Six!

Figure 1: Not the NendoU you were looking for

After that interlude, I should get this blog post back on track! So… viruses and proteins. SARS-CoV-2 is an enveloped coronavirus with a non-segmented positive-sense RNA genome, in English this means the RNA genome in SARS-COV-2 can be used “as is” to make viral proteins without prior modification. SARS-CoV-2 has one of the largest RNA genomes among RNA viruses, made up of a replicase gene encoding non-structural proteins (nsps), as well as various structural and accessory genes. During viral replication, depending on the starting point (a.k.a. a ribosomal frame shift), the replicase gene can produce one of two poly-protein chains, which are then cleaved to produce 15-16 individual viral nsps (non-structural Proteins). These nsps then form a large membrane-bound replicase complex with multiple enzymatic activities, like a tiny viral Voltron.

What’s in a Name?

This blog post will focus on SARS-CoV-2 Nsp15, a nidoviral RNA uridylate‐specific endoribonuclease (NendoU). That is a very long and complicated name which conveys a lot of information, so let’s break it down into its individual parts, like when Voltron separates to become several small robots. It’s possible I’ve watched too many cartoons during lockdown:

  • Nidoviral – An order of RNA viruses which infect vertebrates and invertebrates.
  • RNA – Genetic material used to produce proteins
  • Uridylate-specific – Cuts Uridine (U) in RNA, not Cytosine (C), Adenine (A) or Guanine (G)
  • Endo – A Greek word meaning inside or within
  • Ribonuclease – An enzyme that cuts RNA into smaller pieces.

So, what’s in a name? Well, Nsp15 is a viral enzyme that likes to cut at uridine (a building block of RNA) in the middle of an RNA sequence. Quite a lot really. The final bit of the name “NendoU” goes into even more specifics on our protein, as it defines a common family of proteins which share certain traits. The first is that when Nsp15 cuts RNA, it gives a 2′‐3′ cyclic phosphodiester and 5′‐hydroxyl terminus. If we look at Figure 2, you’ll see a purple RNA chain made of two bases linked by an orange phosphate in the middle. When RNA is cleaved by Nsp15, a 2′‐3′ cyclic phosphodiester is made: in the two resulting molecules, the phosphate ion has been incorporated into a 5-membered ring (orange), and the other half of the RNA has a 5′‐hydroxyl, or and OH- group on another 5-membered ring (green). The second thing being a member of the NendoU family tells us is that the catalytic domain of the protein (the business end) is found on the C-terminal end of the protein (the latter half) as this is a shared trait within the NendoU family.

Figure 2: RNA Cleavage to give a 2′‐3′ cyclic phosphodiester and 5′‐hydroxyl terminus. Image generated in PyMOL using molecules made with Coot’s Ligand builder by Sam Horrell.

Domains

One Nsp15 monomer is made up of three distinct domains, the aforementioned N-terminal oligomerisation domain (green), a middle domain in… well, the middle (orange), and the catalytic NendoU domain at the C-terminal (purple, Figure 3b). Overall SARS-CoV-2 Nsp15 shows high sequence identity with SARS-CoV Nsp15 (88%) and, somewhat lower identity with MERS-CoV (51%) (Youngchang 2020), but the overall structural similarity is very high between the three viruses. For a more detailed breakdown of the secondary structure that makes up individual Nsp15 domains, check out our proteopedia entry!

Figure 3: Nsp15 monomer coloured by domain. Image generated in PyMOL using PDB 6X4I by Sam Horrell. 

Tertiary Structure

Nsp15 forms a double-ring hexamer made up of a dimer of trimers stabilised by an N-terminal oligomerisation domain. So, three monomers form a trimer which then binds another trimer of monomers. However, If you open a crystal structures this can be confusing as you might not be presented with the whole complex. A crystal is composed of an infinite array of identical (or near enough) molecules related to each other by symmetry. To eliminate the need to store an infinite number of atoms on your computer the PDB file gives you just enough of the crystal to define the unique part. You are then expected to remember that the rest are generated by symmetry. This subset is called the asymmetric unit. Should you want to try and generate the whole crystal you can try, but your computer will likely grind to a halt on its way to infinity (and beyond).

For most structures the asymmetric unit is the interesting part. Often, when the biologically relevant complex has symmetry itself, like Nsp15 does, only part of the complex will be present in the file from the PDB. In the case of the PDB model 6X4I the molecules of each trimer obey the crystal’s three-fold symmetry. The file you download contains two molecules, one monomer from each trimer, and you must generate the symmetry related molecules (shown in green and orange in figure 3) to build the entire complex. These six monomers all come together to form the active enzyme, a 100 Å long and 10-15 Å wide channel, open to solvent from the top, bottom, and three separate side openings in the middle of the hexamer (Figure 4). Formation of the hexamer has been shown to be essential for enzymatic activity, making the oligomerisation interfaces a potential target for structure-based drug design. I’m not sure if I should be proud or disappointed that I didn’t mention Voltron once back there.

Figure 4: The Structure of the Nsp15 hexamer showing a side on view generated by crystallographic symmetry (a) and a top down view (b) looking down the 10-15 Å wide channel. Image generated in PyMOL using PDB 6X4I by Sam Horrell. 

The Active Site

SARS-CoV-2 Nsp15 is a Mn2+ dependent endoribonuclease, meaning it relies on the coordination of manganese to perform the transesterification reaction (cutting RNA). Unfortunately, the structure of SARS-CoV-2 Nsp15 has not been solved with manganese present, but we do have a structure with 3’ uridine monophosphate in the active site (PDBID: 6X4I). It has been proposed that the presence of manganese help stabilise the active site and substrate, but it is yet to been seen. Based on sequence alignment against related enzymes from other viruses we know the active site is made up of six conserved residues that sit in a shallow groove between two β-sheets (His235, His250, Lys290, Thr341, Tyr343, and Ser294), as shown in Figure 5. His235, His250, and Lys290 are predicted to act as a catalytic triad, His235 as a general acid, and His250 as a base with Lys290 governing U specificity.

Figure 5: SARS-CoV-2 Nsp15 active site conserved residues without (top) and with (bottom) 3’ uridine monophosphate. β-sheets are coloured purple, α-helices in orange, loops and ligands in green and waters in red. Image by Sam Horrell generated in PyMOL using PDB 6X4I.

But What Does it do?

After all that we have a pretty clear picture of what Nsp15 NendoU looks like, but what does it actually do? The fact that it cuts RNA would immediately suggest a role in viral replication, but Nsp15 deficient coronaviruses are still able to replicate. So maybe not, at least it's not essential for replication. Another suggestion is that Nsp15 is involved in interfering with the hosts innate immune response, but other studies suggest this is independent of Nsp15 activity. Finally, it has been suggested that Nsp15 degrades viral RNA as a means of hiding viral infection from the host immune system. So why does coronavirus bother with Nsp15? I’m afraid we don’t exactly know yet, but we’re working on it.

With that I’m going to leave you with one final Voltron reference for making it to the end. Good job, you earned this.

Figure 6: A perfectly good use of my time. Nsp15 coloured as Voltron featuring the arm monomers (forest and firebrick), leg monomers (skyblue and yelloworange), chest/back monomers (aquamarine and grey70), all loops (black), waters (white), and bound ligands (cyan). Image by Sam Horrell generated in PyMOL using PDB 6X4I.

Introduction

The novel Coronavirus SARS-CoV-2 incorporates various structural proteins in its protective coat. In order to find a potential drug target against the spreading pandemic, a lot of scientific research focusses on the characteristic spike glycoprotein as a therapeutic target. But apart from the spikes, several other structural proteins were found to decorate the virus hull of which the envelope protein (E protein) is the smallest one, consisting of only 75 amino acids. Even though it is an integral membrane protein, the envelope protein is also localized in the host ER, Golgi, and ERGIC (ER-Golgi intermediate compartment) [1], where it is essential for virus formation.

Interestingly, research on this protein can shed light on the origin of the novel coronavirus, which currently dominates everyday life all over the globe. Sequence comparisons of several different envelope proteins strengthen the assumption that SARS-CoV-2 may originate from Bat-CoV or Pangolin-CoV due to a high sequence homology [2]. The E protein of SARS-CoV-2's "older brother" SARS-CoV exhibits a nearly identical sequence with 91% homology [2] as well and has been structurally determined based on nuclear magnetic resonance (NMR) data. Yet, until now, solving the 3D structure of SARS-CoV-2 E protein turns out to be quite challenging, and hence no experimental structure is available for the new coronavirus [3].

Structure comparison with SARS-CoV E protein

We, as structural biologists aim to uncover and refine the structures of as many of the novel virus's proteins as possible. But, as long as no structures of SARS-CoV-2 E protein have been solved, this could only be achieved by comparing it to the existing structures of SARS-CoV envelope protein.

Topology and structural features

The topology of SARS-CoV E protein is mainly separated into three domains:  A short hydrophilic N-terminus, that has an identical sequence in SARS-CoV-2 [3] and works as a Golgi-targeting signal; a long mainly hydrophobic transmembrane domain (TMD), and a long hydrophilic C-terminal domain. Studies on the question whether the C- and the N-terminus are luminal or cytoplasmic have had different results, suggesting that the E protein’s topology could differ depending on its multiple functions [3].

Domains of the E protein in SARS-CoV
Figure 1: The topology of the SARS-CoV E protein is colored to indicate the different parts. The N-terminus is displayed in red, the Transmembrane domain (TMD) in bright orange and the C-terminus in cyan. (A) The topology of E as an oligomer. (B) The topology of E as a monomer.
Image by Luise Kandler

The E protein of SARS-CoV comprises several interesting structural features: A long α-helix with amphipathic parts forms the Transmembrane domain (TMD). The C-terminus, however, incorporates a short α-helix which is believed to be in a dynamic equilibrium with a less abundant β-coil-β-motif. Both helices are connected by a turn [4]. The β-coil-β-motif with a conserved proline residue (Pro-54) has been proposed to function as Golgi targeting signal, and to switch its conformation in order to alter the E protein's function in the host cell [4]. Furthermore, the C-terminus contains a PDZ-binding motif (PBM) at residues 73-76 [3]. This PBM domain slightly differs in coronaviruses but a DLLV motif is conserved in the E proteins of  SARS-CoV, Bat-CoV, and SARS-CoV-2 [2]. Unfortunately, there are no PDB structures available that exhibit the β-coil-β-motif nor the PBM domain.

Structural features of E protein monomer.
Figure 2: Front view of the structural features of SARS-CoV E protein. The hydrophilic residues of the amphipathic α-helix are displayed in magenta, the hydrophobic rest in bright orange. The short C-terminal α-helix is colored in cyan and slate blue. Slate blue indicates residues that are in dynamic equilibrium with the β-coil-β-motif.
Image by Luise Kandler

Structural variants - Oligomerization and posttranslational modifications

The E protein comes in two different forms. Apart from a monomeric structure, the protein also oligomerizes to form a pentameric viroporin in the host cell's Golgi membrane. Whether the E proteins that are embedded in the viral hull are pentamers or monomers is not yet clear. Oligomerization is induced by the amphipathic α-helix of the TMD [3] and is proposed to be mainly mediated by residue Val-25 as well as residue Asn-15 being slightly involved [3]. Both residues are conserved in SARS-CoV-2 as well. To anchor the pore in the Golgi membrane, the hydrophobic amino acids of the TMD orientate towards the phospholipids. Additionally, basic positively charged residues interfere with the negatively charged phospholipids via electrostatic interactions [3].

Top view of Ion channel pore and positively charged residues of the C-terminus.
Figure 3: (A) Top view of the SARS-CoV viroporin surface displaying the ion channel built up in the center. (B) Positively charged residues, which can interfere with the negatively charged membrane lipids to anchor the pore.
Image by Luise Kandler

Other structural variants are obtained by posttranslational modifications, which have been detected in the E protein of SARS-CoV and other coronaviruses. Palmitoylation is the addition of palmitic fatty acid to cysteine residues which increases the protein's hydrophobicity. Hence, the palmitoylation of E protein assists in membrane anchoring and probably aids Golgi targeting. Ubiquitination of the E protein might function as negative regulation of E protein levels [3]. It has been shown that the optimal amount of E protein present in the host cell is important for a successful production of new viruses. Another modification, namely glycosylation, adds oligosaccharide fragments to asparagine residues in a certain motif (Asn-X-Ser/Thr) which is also conserved in E protein. In SARS-CoV, residue Asn-66 embedded in the motif Asn-Ser-Ser was proven to be glycosylated. This may help to recruit chaperone proteins of the host cell to aid in the correct folding of newly synthesized viral proteins as well as in defense against the host immune system. Experimental data suggest that glycosylation of Asn-66 might also promote E protein's monomeric functions as it prevents oligomerization [3].

Connecting function and structure

To understand a molecule's biological function is the main goal of experimental structure determination. A viral protein can be targeted by drugs best if the atomic structure is known. The envelope protein has various structural conformations and thus multiple functions, both as a monomer and as a pentamer.

Monomeric functions: Golgi-targeting and viral assembly

The E protein comprises a Golgi-targeting signal in the β-coil-β motif of the C-terminus and another one in the N-terminal domain. Additionally, palmitoylation is believed to be involved in this function. Accordingly, after being translated at the ER, the E protein is located to the Golgi membrane. From there, the virus acquires the membrane for a new viral envelope [3]. Once the protein is located to the Golgi, one of its main functions as a monomer is in viral assembly, which means the process of gathering all the viral macromolecules (proteins and the RNA genome) to form a virus-like particle. During this assembly, the virus-like particle buds into the lumen of ERGIC and follows the way through the host cell's secretory pathway. Several experiments confirm the involvement of the envelope protein together with the membrane protein (M) into this process. It has been proposed that the E protein rather induces membrane curvature and scission, whereas the M protein may coordinate viral assembly. Nevertheless, SARS-CoV-infected cells still produce virus-like particles in the absence of E protein, but virus trafficking to the cell surface and viral secretion are hampered, resulting in a lower number of mature virions, an atypic morphology and a higher rate of propagation incompetent virions [3]. Further investigation will be necessary to analyze the exact mechanism behind the membrane formation of virions.
After finding its way through the secretory pathway, the mature virion is released from the host cell. The process of detaching from the host membrane is known as scission and is either coordinated by the virus's own scission proteins or by the host cell's scission machinery (called ESCRT). Which one is the case for SARS-CoV-2 is still unclear. Infected cells lacking the scission machinery exhibit a “beads-on-a-string” morphology, with the virions being stuck to the host membrane in an elongated shape. This morphology was found in influenza-infected cells lacking the M2 protein, which proves that M2 is involved in this scission process. Given that SARS-CoV E protein is suggested to be functionally equivalent to M2, due to similar structural features, the E protein is proposed to be involved in the scission process as well [3].

Sars-CoV-2 life cycle.
Figure 4: (A) The Lifecycle of Sars-CoV-2. The envelope protein is colored in yellow. After its translation at the ER (5) the E protein is transported to the Golgi, where it is involved in viral assembly (6) and the release of the virus (7). (B) Zoomed-in image of the E protein’s involvement in the lifecycle.
Image by Ann (Hui) Liu, https://animationlab.utah.edu/ [5]

Pentameric functions: Ion-channel activity

While located at the Golgi, some of the SARS-CoV E proteins oligomerize and form a pentameric viroporin. These pores of SARS-CoV E protein function as ion channels. They mainly favor the transport of Na+ and K+, but were also found to be permeable for Ca2+ ions and eventually for H+ ions. Even though the primary purpose of transporting cations is not yet clear, Ca2+ is proposed to trigger the inflammatory response seen in acute respiratory distress syndrome [7].
Residue Asn-15 has been suggested to act as a "filter" for this ion selectivity [6], which can further be affected by the charge of the membrane's lipid head group. Deletion of the envelope protein in its pentameric form demonstrates that ion channel activity is not essential for viral replication, but yet attenuates the virulence [8].

This illustration shows the E protein as a pentameric ion channel embedded in a membrane.
Figure 5: Illustration of the E protein ion channel anchored in a membrane.
Image by Thomas Splettstoesser, www.scistyle.com

Pathogenesis and the E protein as a potential drug target

Interactions of viral proteins with host cell proteins de-regulate many physiological processes. In patients suffering from SARS-CoV infections, these de-regulating protein-protein interactions greatly contribute to pathogenesis. Some of the observed symptoms are also present in a SARS-CoV-2 infected patient.
Interactions of the envelope protein with proteins of the host cell are mediated by its PDZ-binding motif (PBM) at the very end of the C-terminus. The motif binds to the PDZ domain of adaptor proteins, which are subsequently bound by other cellular proteins, activating a signaling cascade that may result in pathogenesis. Some of these interactions were proposed or even proven to induce symptoms like lymphopenia [9], changes in fluid volume, blood pressure, and water homeostasis, as well as tissue damage, edema and acute respiratory distress syndrome (ARDS) [10], due to an overexpression of inflammatory cytokines (which are also regulated by the leader protein nsp1). Another protein-protein interaction was found to disrupt tight junctions of pulmonary epithelial cells in the lungs. This eventually results in an epithelial barrier failure and virions breaking through the alveolar wall causing a systemic infection [11]. O. Wittekindt writes [12]: "The breakdown of the epithelial barrier is a hallmark in respiratory distress syndromes (...)" Furthermore, the ion channel activity of the E protein activates the inflammatory pathway by channeling Ca2+ resulting in lung damage in infected mice [7]. Inhibition of the viroporin by hexamethylene amiloride (HMA) [8] reduces the activation of the inflammasome, which makes the ion channel of E protein a potential therapeutic target. Additionally, as a part of the host cell's viral defense, the ER stress response is activated, once the protein folding capacity of the ER is overloaded by additional expression of viral proteins. This can lead to apoptosis of the host cell. However, experiments confirm that the E protein contributes to pathogenesis by suppressing the ER stress response to maintain the survival of the host cell [3].
As a potential target for drug treatment, protein-protein interactions of the E protein are quite promising. Its PBM domain can bind cellular proteins that are involved in pathogenesis. Experimental truncation of this domain shows that it may be possible to find a live vaccine with a mutated but intact PBM and thus attenuated pathogenicity. Identifying more interacting partners could provide a more targeted therapy, though. The absence of E protein furthermore leads to reduced viral titers, crippled viral maturation, and propagation-defective progeny [3], making E protein-deficient virions also a potential vaccine candidate.

In conclusion, one could say that the E protein of SARS-CoV-2 is another valuable drug target. While the protein's "older brother" SARS-CoV E protein gives us much insight into its function, an experimental structure determination of SARS-CoV-2 E protein would be highly desirable. Until then, the envelope protein SARS-CoV-2 remains a small but mysterious structure.

Best PDB structures available

  1. 2MM4: This NMR structure is a monomer of SARS-CoV envelope protein (E). It covers the transmembrane domain completely and the C- and N-terminus partly. The structure is involved in membrane curvature, membrane scission, and viral assembly.
  2. 5X29: This NMR structure is an oligomer of SARS-CoV envelope protein (E). The structure is a pentamer of five identical monomers. It covers the transmembrane domain completely and the C- and N-terminus partly. This structure functions as a membrane-anchored ion channel.

Literature

[1] J. Nieto-Torres, M. DeDiego, E. Álvarez, J. Jiménez-Guardeño, J. Regla-Nava, M. Llorente, et al.: Subcellular location and topology of severe acute respiratory syndrome coronavirus envelope protein, Virology, 2011

[2] M. Bianchi, D. Benvenuto, M. Giovanetti, S. Angeletti, M. Ciccozzi, S. Pascarella: Sars-CoV-2 Envelope and Membrane proteins: differences from closely related proteins linked to cross-species transmission, Preprint, 2020

[3] D. Schoeman, B. Fielding: Coronavirus envelope protein: current knowledge, Virology Journal, 2019

[4] Y. Li, W. Surya, S. Claudine, J. Torres: Structure of a Conserved Golgi Complex-targeting Signal in Coronavirus Envelope Proteins, The Journal Of Biological Chemistry, 2014

[5] Ann (Hui) Liu, in https://animationlab.utah.edu/

[6] K. Pervushin, E. Tan, K. Parthasarathy, X. Lin, F. Jiang, D. Yu, A. Vararattanavech, T. Soong, D. Liu, J. Torres: Structure and Inhibition of the SARS Coronavirus Envelope Protein Ion Channel, PloS Pathogens, 2009

[7] J. Nieto-Torres, C. Verdiá-Báguena, J. Jimenez-Guardeño, J. Regla-Nava, C. Castaño-Rodriguez, R. Fernandez-Delgado, et al.: Severe acute respiratory syndrome coronavirus E protein transports calcium ions and activates the NLRP3 inflammasome, Virology, 2015

[8] J. Nieto-Torres, M. DeDiego, C. Verdiá-Báguena, J. Jimenez-Guardeño, J. Regla-Nava, R. Fernandez-Delgado, et al.: Severe acute respiratory syndrome coronavirus envelope protein ion channel activity promotes virus fitness and pathogenesis, PLoS Pathogens, 2014

[9] Y. Yang, Z. Xiong, S. Zhang, Y. Yan, J. Nguyen, B. Ng, et al.: Bcl-xL inhibits T-cell apoptosis induced by expression of SARS coronavirus E protein in the absence of growth factors, Biochemical Journal, 2005

[10] J. Jimenez-Guardeño, J. Nieto-Torres, M. DeDiego, J. Regla-Nava, R. Fernandez-Delgado, C. Castaño-Rodriguez, et al.: The PDZ-binding motif of severe acute respiratory syndrome coronavirus envelope protein is a determinant of viral pathogenesis, PLoS Pathogens, 2014

[11] K. Teoh, Y. Siu, W. Chan, M. Schlüter, C. Liu, J. Peiris, et al.: The SARS coronavirus E protein interacts with PALS1 and alters tight junction formation and epithelial morphogenesis, Mol Biol Cell, 2010

[12] O. Wittekindt: Tight junctions in pulmonary epithelia during lung inflammation, Springer Verlag, 2016

An important drug target

In the first part of this series we compared the protein nsp3 from SARS-CoV and SARS-CoV-2 by sequence. Now we delve deeper into the differences between these two proteins and follow through by analyzing the structure of one domain of nsp3 in particular: papain-like protease. This domain is a very relevant drug target because of its ability not only to cleave the polyprotein, but also remove some of the post-translational modification our cells use to fight these viruses. Without papain-like protease, the virus would be unable to spread COVID-19.

Like the entire nsp3 protein, the papain-like-protease (Pl2pro) domain is localized close to the endoplasmic reticulum’s (ER) membranes. The transmembrane domains hold it in place while the majority of the protein protrudes out of the ER membrane into the cytoplasm.[1]

SARS-CoV genome
Fig 1: Position of the nsp3 gene on the SARS-CoV-1 genome. Nsp3 is seperated into 12 domains. Picture by Thomas Splettstoesser, scistyle.com.

Ubiquitin-like-domain 2

We cannot discuss the Pl2pro domain without its little neighbor, which has been speculated to influence protease domain functionality.

In ubiquitin-specific proteases, the function of comparable Ubl2 domains is attributed to substrate recruitment or an increase in catalytic efficency. Ubiquitin-like-domain 2 (Ubl2) is the domain residing directly adjacent to the N-terminus of the Pl2pro catalytic domain. These ubiquitin-like domain seems to be more conserved compared to Ubl1 in different coronavirus species.[2]

If, in SARS-CoV and Murine coronavirus (MHV), Ubl2 is removed, Pl2pro loses its structural integrity. In addition, Pl2pro is then no longer able to act as an Interferon (IFN) antagonist (see below). However, some studies suggest that the Ubl2 domain in MERS-CoV might not be as essential as originally thought and in cell-based studies of this virus, Pl2pro could retain some of its enzymatic functions without the Ubl2 domain.[3]

To date, several inconsistent roles of Ubl2 were reported, and its exact function and inner workings remain enigmatic. This is being highlighted the fact that there are significant differences between the coronaviruses, and as a consequence, we need to exercise caution in applying our findings to SARS-CoV-2.

Combating the Host's Immune System

In the family of coronaviridae, viruses with either one or two Plpro domains can be found, with SARS-CoV and SARS-CoV-2 only having one. Confusingly, this single domain is however still called Pl2pro, even if it is the only papain-like protease domain in the viral genome.

Pl2pro cleaves the polyprotein from nsp1 (leader protein) up to nsp3. While Pl2pro cuts between nsp1-( ELNGG↓AV)-nsp2-( RLKGG↓AP)-nsp3-( SLKGG↓KI)-nsp4, the nsp5 (3c-like protease) cleaves the rest of the polyprotein. [2] The cysteine protease Plpro is similar to human ubiquitin-specific-protease (USP) in that it adopts a right-hand fold with "thumb", "palm" and "finger" subdomains.

Different regions of Plpro
Fig. 2: Plpro of nsp3 SARS-CoV (PDB-ID: 5E6J) with the catalytic triad marked in red. The Finger domain (blue), palm domain (light green) and thumb domain (forest green). Picture by Kristopher Nolte

Despite the variations of Pl2pro in different coronaviridae, the same catalytic motif of three amino acid residues is essential for the stability and proteolytic activity of the domain: Cys112 is located in the thumb, His273 and Asp287 are located in the palm subdomain. (The numbers identifying these residues can vary between species.)

nsp3Plpro catalytic Mechanism
Fig 3: Catalytic cycle and proposed chemical mechanism of SARS-CoV PL2pro proteolysis. Active site residues of the catalytic triad (Cys112, His273, Asp287) and oxyanion hole residue Trp107 are shown in black. The peptide substrate is shown in green and a catalytic water molecule is shown in blue. [1] Source: The SARS-coronavirus papain-like protease: Structure, function and inhibition by designed antiviral compounds, Beaz-Santos et al.

In addition, Pl2pro has deubiquitinating and deISGylating (removal of ISG15 from target proteins) abilities.[4] Both ubiquitin and ISG15 regulate facets of the immune response and through their removal Pl2pro poses as an antagonist to the human immune response. They can stimulate the production of cytokines, chemokines and other IFN-stimulated gene products which have antiviral properties. [6] ISG 15 is an ubiquitin-like modifier composed of two ubiquitin-like folds that has an essential role in marking newly synthesized proteins during the antiviral response.[3] Post-translational modification by ubiquitin and interferon-stimulating gene 15 (ISG15) is reversed by isopeptide bond hydrolysis. Figure 3 shows a proposed mechanism for the cleaving of isopeptide bonds by SARS-CoV.

Ubiquitin bound to Plpro
Fig. 4: Ubiquitin (light blue) bound to Plpro (green) with the catalytic triad marked red. (PDB-ID: 5E6J) Picture by Kristopher Nolte

An example

Toll-like receptors (TLRs) are an important part of the machinery of the human immune response, which recognizes the pathogen-associated molecular patterns. The ability of the host cell to transduce the so-called Toll-like receptor 7 (TLR7) mediated immune response is diminished (Fig. 5) by Pl2pro as it removes Lys63-linked-ubiquitin from the TNF receptor associated factors TRAF3 and TRAF6. [5]

In addition, SARS-CoV can hamper the antiviral activities of interferon. The Pl2pro domain inhibits in combination with a transmembrane (TM) domain the STING mediated activation of interferon expression. PL2pro-TM interacts with TRAF3, TBK1, IKKε, STING and IRF3, the key components assembling a regulatory complex for activation of IFN expression.[5]

Fig. 5: Different ways in which Pl2pro of various coronaviruses interact with the human immune response. A pointed circle symbol means the binding of one protein to another. If the binding has positive effect on the protein it is marked with a plus. The triangle marks the cleavage of ubiquitin from the target protein. Also,nsp3 cleaves ISG15 off target shown on the right. Picture by Kristopher Nolte.

Another tool to fight the coronavirus in human cells is the "guardian of the genome", p53. The tumor supressor protein p53 impedes the replication of SARS-CoV, though the virus fights back with Pl2pro, which binds a p53 degradation stimulator named "RING finger and CHY zinc finger domain-containing protein 1" (or short: RCHY1). Enhanced by the Macro somains in NSP3, this binding enhances the stability of RCHY1 and hence promotes the degradation of p53. In addition, Pl2pro blocks another crucial cellular defense mechanism: The NF-κB pathway, which regulates immune responses to infections. SARS-CoV Pl2pro can stabilize IκBα, an inhibitor of NF-κB.[3]

Although all Pl2pro in different coronaviridae suppress the immune response, the targets differ between various species. For example, SARS-CoV Pl2pro preferentially processes Lys48 linked poly-ubiquitin chains, which are markers for proteasome degradation. MERS, on the other hand, shows no differences in effectivity between Lys48 and Lys63 linked di-Ubq chains. Lys63-linked chains are related to signal transduction cascades of the host immune system. Studies have shown that specificity among Pl2pro for Ubiquitin and ISG15 substrates can be altered with as little as a single amino acid change.[6] However, even though there are differences, for SARS-CoV-2, it is likely that at least some of the functions are similar.

Structural comparison

In order to predict Pl2Pro function for the novel Coronavirus SARS-CoV-2, we start by aligning their sequence like we did in the first part of this series to comapare the sequence with the one from SARS-CoV-2. Both domains share a similarity of 82.8% over the length of 313 amino acids. However, this time, we go for a more detailed analysis of the 54 individual differences, which are:

T3R N14I V20V N48N H49S V56Y D60N E66V D75T S77P P95Y G99N S114A V115T L116A L119T E123I K125L P129P A134D A143E N155C H170S L171Y Q173F S179D K181C C191T T195Q T196Q G200K N214E L215Q K216F G218K I221Q C225T D228K A229Q Y232K F240P Y250Q L252E Q254K G255H C259T E262S H274K K278S I284C L289L S293S T300I S308N
(The first letter refers to SARS-CoV, and the second to the amino acid residue in SARS-CoV-2.)

Figure 6: SARS-CoV (PDB-ID: 5Y3Q) and SARS-CoV-2 (PDB-ID: 6WZU) Pl2pro overlaid over each other. RMSD = 0.758. Differences in SARS-CoV and SARS-CoV-2 marked in red. Picture by Kristopher Nolte

The mutations are evenly spread over the protein. None of the catalytic triad (Cys 112, His 273, Asp287) are changed as is to be expected given their conservation in all other coronaviruses. On further investigation, however, in the motif which interacts with ubiquitin six sites are different: S170T, Y171H, F216L, Q195K, T225V, and K232Q. Earlier studies concluded that the mutation of position 232 from Glutamine to Lysine increases the affinity for ubiquitin at the expense of the de-ubiquitination effectiveness.[6] The kinetics of SARS-CoV-2 nsp3 Pl2pro were studied to test if the protease domain of nsp3 has a reduced effectiveness in binding ubiquitin compared to nsp3 from SARS-CoV, MERS-CoV.
All three Pl2Pro variants cleave more ISG15 than ubiquitin. SARS-CoV has the fastest kinetics of the three viruses. And, the slower kinetics of SARS-CoV-2 resemble those of MERS-CoV rather more than SARS-CoV, having a 10 times higher turnover rate (kcat) as a deISGylase than as a deubiquitase.[6]

Besides the kinetics, the Pl2pro’s affinity for different poly-ubiquitin linkage sites was measured. The result shows that while SARS-CoV-2 can cut K48-Ub linked polyproteins, it seems to lack an ability to cut other polyubiquitin chains. Those K48-Ub linked polyproteins are cleaved at a slower rate than by SARS-CoV. In this regard, SARS-CoV-2 distinguishes itself from MERS-CoV which has the ability to cleave K63-linkages. It is suggested that the decrease in deubiquitinase effectiveness may not be irrelevant, but could lead to the often-mild symptoms that are a factor in why SARS-CoV-2 has been able to evade our efforts in quarantine. But this is mere speculation and a lot more research is needed to resolve the matter.[6]

PL2pro as a drug target

Pl2pro was a potential drug target early on in SARS-CoV-2 research. Hilgenfeld et al. name two major challenges we have to overcome to find a drug targeting Pl2pro. One is that the binding sites are tailor-made to bind glycine residues. Also, this very specific binding motif is rather ubiquitious in our cells. These two problems make it difficult to find an inhibitor which fits and is specific to Pl2pro. However, scientists found a weak spot: a loop called Blocking Loop 2 (BL2) regulates substrate binding and may be a promising target to inhibit PL2pro.[2] Naphthalene based inhibitors, which were earlier proposed to inhibit the BL2 of SARS-CoV, were shown to also inhibit SARS-CoV-2 Pl2pro, in particular an inhibitor called GRL-0617.[6]

For in-silico drug development, it might be prudent to choose high-resolution structures which already have a ligand or inhibitor bound, such as 6yva, 6wuu, 6wx4 or 6yaa. Technically speaking, 6wrh, albeit being a mutant, is one of the highest-quality structures available for SARS-CoV-2 Pl2pro.

In fact, a lot of research is still required to consolidate our understanding of this protein and its domains. In spite of that, we are making progress in our endeavor to fight this virus - and every step we take is one more to win this fight.

Sources

[1] Báez-Santos YM, St John SE, Mesecar AD. The SARS-coronavirus papain-like protease: structure, function and inhibition by designed antiviral compounds. Antiviral Res. 2015;115:21-38. doi:10.1016/j.antiviral.2014.12.015, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5896749/

[2] Lei J, Kusov Y, Hilgenfeld R. Nsp3 of coronaviruses: Structures and functions of a large multi-domain protein. Antiviral Res. 2018;149:58-74. doi:10.1016/j.antiviral.2017.11.001, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7113668/

[3] Clasman JR, Báez-Santos YM, Mettelman RC, O'Brien A, Baker SC, Mesecar AD. X-ray Structure and Enzymatic Activity Profile of a Core Papain-like Protease of MERS Coronavirus with utility for structure-based drug design. Sci Rep. 2017;7:40292. Published 2017 Jan 12. doi:10.1038/srep40292, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5228125/

[4] Lei J, Hilgenfeld R. RNA-virus proteases counteracting host innate immunity. FEBS Lett. 2017;591(20):3190-3210. doi:10.1002/1873-3468.12827, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7163997/

[5] Chen X, Yang X, Zheng Y, Yang Y, Xing Y, Chen Z. SARS coronavirus papain-like protease inhibits the type I interferon signaling pathway through interaction with the STING-TRAF3-TBK1 complex. Protein Cell. 2014;5(5):369-381. doi:10.1007/s13238-014-0026-3, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3996160/

[6] Freitas BT, Durie IA, Murray J, et al. Characterization and Noncovalent Inhibition of the Deubiquitinase and deISGylase Activity of SARS-CoV-2 Papain-Like Protease [published online ahead of print, 2020 Jun 4]. ACS Infect Dis. 2020;acsinfecdis.0c00168. doi:10.1021/acsinfecdis.0c00168, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7274171/

Crystallography has a problem. Some amino acid side chains in our structures simply can’t be seen in our maps (Fig. 1). Crystallographic maps represent many protein molecules in a crystal lattice, thousands of copies of the same molecule averaged over measurement time and unit cells. So, what happens with inherently flexible regions of our protein? The average of many different conformations leaves us with no map to guide us in modelling our side chain. So, what is the best way to deal with this as a model builder?

Figure 1: The sequence tells us this amino acid is a lysine but there is clearly no density to support this side chain model.

A passionate discussion within the Task Force has resulted in the following options for dealing with this situation:

  1. Set the occupancy of the unresolved atoms to 0
  2. Leave the atoms at full occupancy and allow the B-factors to inflate
  3. Trim the side chains to what can be resolved by the density
  4. Mutate the residue to a Proline, set your computer on fire, and walk away laughing maniacally.

Just to be clear, option four should only be considered in the direst of circumstances. Please consider options one to three before resorting to proline and fire, and even then, only with a computer you own. With that said, what is the best option? Sadly, none are ideal solutions to the problem so let’s discuss. 

Option 1 can be misleading as the residue appears to be present in the model (Fig. 2), despite there being no experimental evidence for it, until you check the occupancy or load the corresponding map with your model which will tell you otherwise. An occupancy of zero also adds no useful information to the model and may even exclude atoms in this position, like opening the airlock and sending it flying out into the vacuum of space.

Figure 2: Option 2, where side chain atoms with an occupancy of zero are marked in Coot by dots on the atoms

Option 2 is effectively the opposite of option 1, providing a full occupancy side chain in a sensible rotamer conformation and accept the resulting phase bias*. However, this can be equally misleading if the downstream user doesn’t check the B-factors of the sidechain, which will be very large, as they represent not only (smaller) displacement but (larger) disorder. In addition, allowing the B-factor to “explode” is not always an effective way to deal with this problem, as strong negative peaks can still be observed around the side chain in some cases. Another argument for maintaining an occupancy of 1 is that the protein sequence tells us a certain amino acid is present at a position, unless evidence of chemical clipping has been provided (mass spec, for example). Therefore, the atoms must be present in the protein so should be included in the model for the B-factors to deal with the physics of the situation. Options 1 and 2 both have the advantage of providing a complete set of atoms for downstream use in molecular modelling.

*During refinement our model will always bias the phase calculation which gives us our maps. Ideally, we would like out model to maximally affect the phases when we are confident our model is correct and minimally affect the phases when we are less confident. So, an occupancy of 1 (high confidence) where we observe no peaks in our map (low confidence) will lead to what we call phase bias. This can work both ways by underestimating the contribution of our model by setting the occupancy to 0 (option 1).

This brings us onto option 3: trimming down the side chain to what we can in the map (Fig. 3). The “make them work for it” option. If a downstream user is paying attention and realises that, for example, the side chain they are looking at is meant to be a lysine, despite the model only having atoms up to Cß, this should be the least misleading of all the options. The residue should not be mutated to, say, Alanine in this case, as that would mean you are wilfully misleading downstream users. Upon realising the atoms are missing, the downstream user can then model a (hopefully sensible) rotamer for their simulations if needed. The downside is that this approach does introduce some negative bias in favour of modelling bulk solvent into this area. Like I said, none of the options are ideal solutions.

Figure 3: Lysine following a haircut.

So, following this discussion between Nick Pearce, Dale Tronrud, Gianluca Santoni, Andrea Thorn, and I, we recommend option 3 as the best of the available solutions. We believe that the end goal of a crystallographic experiment should be to build atoms justified by the experimental data, i.e. the map, and leave the prediction of unobservable atoms to downstream users. We (crystallographers) are not here to “make it easier for users to avoid thinking about it”. However, after publishing the first iteration of this article a number of crystallographers made the case for option 2 on twitter and a poll of those involved resulted in 53.8% in favour of option 2 (Figure 4), so the matter is still far from resolved.

Figure 4: Twitter poll for options 1 to 4.

However, it’s nice to know that if we really can’t agree on the best method we can at least agree on not option 1, and there's always the fall back plan of option 4 and watch the PDB burn if we get desperate.

Figure 5: Option 4. Sorry not sorry.

COVID-19 is caused by the new coronavirus SARS-CoV-2. This virus has a characteristic virus hull featuring surface proteins which are commonly called “spikes”. Protruding from the viral hull like “spikes of a crown”, they give the coronavirus its name (corona = crown).  These proteins make the first contact with human cells and are akin to keys that use a human receptor called “angiotensin-converting enzyme2” (ACE2) as a backdoor to gain access to and infect the cell.

SARS-COV2 Animated picture. Realistic surface and spike proteins with glycosylation. Image: Thomas Splettstoesser; www.scistyle.com
Fig. 1. SARS-COV2 Animated picture. Numerous spike proteins, coloured in green, protrude from the virus hull which is coloured in brown. Spikes enable the coronavirus to invade human epithelial cells. Image: Thomas Splettstoesser; www.scistyle.com

1. Fuction of ACE2

ACE2 is a membrane protein which is anchored in the human cell membrane of epithelial cells. This type of cells can be found on the surface of lung, intestine, heart and kidney tissue. As a type I membrane protein, its primary function is to take part in maturation of angiotensin, a peptide hormone which controls vasoconstriction and blood pressure. ACE2 can be compared to a lock which can be unlocked by the coronavirus spike protein. The virus can then enter the cell and hijack its functions to reproduce itself, thus causing the Covid-19 infection which poses a serious danger to humanity, especially for older people and people with pre-existing conditions. For this reason, one approach to combating SARS-CoV-2 is to target and inhibit the spike to prevent infection. In order to do so, knowledge of the structural features of the spike and its interaction processes with ACE2 are indispensable. (Further information about how macromolecular structures are visualized can be found on our homepage: https://insidecorona.net/visualizing-macromolecular-structures/)

2. Spike: Structure and Fusion Mechanism

Fig. 2. Image of a spike protein (green) protruding out of the viral envelope (brown). This image shows the structure of a spike protein divided into several subdomains. Each subdomain comprises a specific function necessary for binding and fusion. The transmembrane domain anchors the spike protein in the virus membrane.  Heptat repeat 1, 2 and the fusion peptide play key roles in mediation of the fusion process and with the RBD domain, the virus makes contact to human cells. Note that only “stumps” of carbohydrate chains are shown. Image: Thomas Splettstoesser; www.scistyle.com

The Spike protein has a trimeric shape comprising three identical monomeric structural elements. Each of these monomers can fold out akin to a modern car key with a fold-out key element with specific teeth on its surface. This fold-out key element is the so-called “receptor binding domain” (RBD). The spike can only interact with ACE2 when its RBD is in a folded-out position, exposing its teeth, or  “receptor binding motive” (RBM). As the name suggests, it comprises a motive of different amino acids which then can bind and unlock the ACE2 receptor. This key lock mechanism triggers a cascade of events initiating fusion with the host cell. First, protein scissors are recruited to the binding site. These scissors (furin & transmembrane serine protease 2) cleave the spike protein for subsequent activation. The active spike molecule then rearranges itself to form a long structural “hook” (formed of HR1/ HR2 and FP see Fig.2) that brings the epithelial cell and viral cell membrane into close proximity for fusion. Once the fusion is completed, the path for the virus is clear to transfer its genome encoded in ribonucleic acid (RNA) into the host cell. This successful transfer then enables the virus to multiply itself and finally spread from cell to cell, causeing Covid-19 in its wake.

Fig. 3. This image shows a spike protein in complex with the human ACE2 receptor. (PDB:6vsb/6lzg). Left: The structure of a spike protein coloured in orange in complex with the human ACE2 receptor coloured in light orange. The white box shows the interaction site which is shown enlarged in the image ion the right. Right: The interaction site between spike and ACE2. Spike's "receptor binding domain (RBD)" includes a "receptor binding motif (RBM)" whose amino acids interact with those of the human receptor through hydrophilic interactions. These amino acids are shown as sticks protruding from the RBM and ACE2. Image: Sabrina Stäb

3. Evading the Immune System with Carbohydrate Chains

The human immune system normally recognizes the surface proteins of foreign organisms such as viruses or bacteria and reacts with an immune response to combat them. Spike proteins are such surface proteins but because of structural peculiarities, the coronavirus evades both the innate and the adaptive human immune system. The secret of these structural peculiarities are the N-glycans. These are long carbohydrate chains which sit on spike’s surface.  Each spike comprises 66 N-glycans forming a protective shield around the protein. Hence the human immune system has problems recognizing spikes and identifying the coronavirus as an enemy.

Fig. 5. Ribbon diagrams of a spike trimer with N-glycans on its surface coloured in cyan (PDB: 6vxx). In Image a, the spike protein is shown sideways and in b, the trimer can be seen from above. Unfortunately, both X-ray crystallography and cryo-EM cannot resolve long carbohydrate chains, so the structures of the chains shown in Figure 4 contain a maximum of three sugar monomers, while in most cases, the carbohydrate chains are much longer, covering most of the contact surfaces of the upper spike protein. Image: Sabrina Stäb

The COVID 19 pandemic has a massive impact on our lives, our health and the global economy. Scientists around the world are trying to develop new drugs to combat the virus. Since the spike plays a critical role in the infection process, it is a prime target for drug development against the pandemic.  One drug approach to inhibit the interaction between spike and the ACE2 receptor is to cap the spike protein using antibodies. Antibodies are proteins, normally produced by the human immune system to fight viruses. The idea is to treat patients with antibodies that cap the RBD of spike, thus preventing interactions with ACE2. This would lead to a nonfunctional spike, blocking the coronavirus from entering the cell (The key would no longer fit the lock). Another approach includes the development of small molecules that target and inactivate the protein scissor transmembrane serine protease 2 (see chapter 2), as the spike’s functionality depends on its cleavage activity. Since the spike protein decorates the virus hull, it could even be part of a potential vaccine. For this reason,  the spike protein could also become the key in the molecular fight against COVID-19.

Overview

The surface proteins, also called the “spike” or S-proteins, protrude from the viral envelope of SARS-CoV-2 like “spikes of a crown”, thus giving the coronavirus its name. They mediate entry into the host cell by binding to a cellular receptor called angiotensin-converting enzyme (ACE2), triggering a cascade of events leading to membrane fusion and entry. The Spike protein is formed by three identical monomers, each consisting of the two subunits S1 and S2. Subunit S1 comprises a receptor binding domain (RBD), which interacts with ACE2 on human epithelial cells. ACE2 is a type I membrane protein expressed in lungs, heart, kidneys, and intestines, and takes part in maturation of angiotensin, a peptide hormone which controls vasoconstriction and blood pressure.

Fig. 1. Image of a spike glycoprotein (yellow) protruding out of the viral envelope. Spike subunits S1 and S2 can be divided into several subdomains. The S1 subunit comprises an N-terminal domain (NTD) followed by the receptor binding domain (RBD). The S2 subunit is mainly composed of a fusion peptide (FP) and two heptad repeats (HR1 and 2) which play a key role in mediating fusion with the host cell. Spike proteins are anchored in the virus envelope via a transmembrane domain (TM) and the cytoplasmic tail (CP), both of which have not yet been structurally determined – so their depiction in this image is an educated guess. Note that only “stumps” of carbohydrate chains are shown. Image: Thomas Splettstoesser; www.scistyle.com

Binding Mechanism

To engage the ACE2 receptor, the RBD of S1 undergoes a hinge-like conformational rearrangement that transiently exposes the residues necessary for receptor binding. The hepta-repeat 1 and 2 domains (HR1 and HR2) play a key role in mediating fusion and entry (see Fig. 1). The exact mechanism of entry and fusion of SARS-CoV-2 with and into the host cell is still not fully established, but it is likely that the fusion mechanism is similar to SARS-CoV. The putative mechanism is that after RBD binds to the ACE2 receptor, the S2 subunit binds to the host membrane via a fusion peptide (FP), and changes conformation to trigger the association between the HR1 and HR2 domains to form the “fusion core”, which brings the viral and cellular membranes in close proximity for fusion.

The structure of the RBD in complex with the human ACE2 receptor reveals that the interaction occurs via the spike protein RBD and the ACE2 N-terminal peptidase domain. The RBD consists of a twisted five stranded antiparallel β-sheet (β1, β2, β3, β4 und β7) forming the core together with short connecting α-helices, β-sheets and loops. These short α-helices, β-sheets and loops constitute the receptor binding motif (RBM) which is located as an extended insertion between two β-strands (β4 and β7) and contains most of the ACE2 contacting residues. The ACE2 N-terminal peptidase domain consists of two lobes that form the substrate binding site. The contact between the RBM and ACE2 is made at the bottom side of the ACE2 small lobe, with a concave outer surface in the RBM accommodating the N-terminal helix of the ACE2 and thus generating an interface of 1687Å2 (see Fig. 2).

Fig. 2. This image shows the spike RBD/ RBM in complex with the ACE2 receptor (PDB: 6lzg). a. The complex between the RBD (yelloworange) and the small lobe of ACE2 (cyan)  is shown. b. The interface of the RBM (yelloworange) and the N-terminal α-helix of ACE2 (cyan) comprises 15 hydrophilic interactions (dashed lines). Image: Sabrina Stäb

The RBM/ACE2 interface contains a network of different interactions, including hydrophilic interactions with 13 hydrogen bonds and 2 salt bridges which are shown in Fig.3. Key residues for receptor binding include the amino acids Leu-455, Phe-486, Gln-493, and Asn-501. The RBD residues Gln-493 and Asn-501 form hydrogen bonds with the respective ACE2 residues Glu-35 and Tyr 41. Phe-486 interacts with the ACE2 amino acids Gln-24, Leu-79 as well as Tyr-83 and makes contact to Met-82 by Van-der-Waals forces. Another important interaction takes place between the non-polar RBD Leu‑455 and ACE2 Asp-30, Lys-31 and His-34. Outside the RBM the amino acids Lys-417 and the ACE2 Asp-30 contribute to receptor binding by forming a salt bridge. Binding of the host cell receptor by subunit S1 destabilizes the prefusion trimer and triggers a structural rearrangement resulting in cleavage and shedding of the S1 subunit and transition of the S2 subunit to a stable postfusion conformation.

Fig. 3. This image shows some of the interactions between RBM and ACE2 (PDB: 6lzg). The table on the right site lists polar interactions between amino acids of the RBM and ACE2, such as hydrogen bonds and salt bridges. The images on the right (a-d) show 13 hydrogen bonds present in this Interface. The RBM is coloured in yelloworange and ACE2 in cyan. Image: Sabrina Stäb

The Role of Glycosilation

The surface of coronavirus spike proteins is densely decorated with heterogenous N-linked glycans protruding from the trimeric surface. SARS-CoV-2 spike comprises 22 N-linked glycosylation sequons per protomer. N-linked glycans play a key role in proper protein folding and in priming for fusion by host proteases. Glycans can also shield the amino acid residues and other epitopes from cells and antibody recognition, so glycosylation enables the coronavirus to evade both the innate and adaptive immune responses. It may also play a role in binding to the host cell. Unfortunately, both X-ray crystallography and cryo-EM cannot resolve long carbohydrate chains, so the structures (below) contain a maximum of three sugars. In most cases, the carbohydrate chains are much longer, covering most of the contact surfaces of the upper spike protein.

Fig. 4. This image shows ribbon diagrams of a spike trimer wit N-glycans on its surface coloured in cyan (PDB: 6vxx). In Image a, the spike protein is shown sideways and in b, the trimer can be seen from above. Image: Sabrina Stäb

Summary

The spike protein acts as key molecule for fusion and entry, so development of drugs directly targeting this protein may be essential to contain the COVID-19 pandemic. "Capping" the spike proteins with antibodies would interrupt infection. Binding of antibodys to S1 RBD could lead to an inhibition of the RBD-ACE2 interaction, which then could prevent fusion with the host cell. In addition, in lung cells, spike functionality depends on furin-mediated pre-cleavage at the S1/S2 site for subsequent activation by TMPRSS2 (transmembrane Serinprotease 2). Thus, inhibitors of either furin or TMRPSS2 could also be considered as a potential treatment for COVID-19. As the spike protein decorates the virus hull, it could also be part of a vaccine. All of this makes the spike protein a major target in the molecular fight against COVID-19.


Further reading:

Chemistry and Engineering New: Adding the missing sugars to coronavirus protein structures

https://www.nature.com/articles/s41423-020-0426-7

https://www.nature.com/articles/s41586-020-2180-5

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7102599/

https://www.nature.com/articles/s41423-020-0426-7

https://www.dpz.eu/en/home/single-view/news/die-vermehrung-von-sars-coronavirus-2-im-menschen-verhindern.html

The world holds its breath as the novel Coronavirus continues to spread across the world, bringing our lives to a halt. We have gathered a lot of knowledge about the virus but there are still many gaps to fill. The non-structural-protein 3 (nsp3) represents one of these gaps in our knowledge. As the largest protein encoded by the coronaviruses genome, untangling its structure and function poses a huge task.

However, we can glean some knowledge around the specific function of SARS-CoV-2 nsp3 by looking at the virus‘s subfamily,  Orthocoronaviridae. As related viruses do share some common traits, academics were not completely unprepared when SARS-CoV-2 came. In the background, while only very few people were worried about a new corona virus, scientists around the world had been investigating the invisible enemy for decades. Building on this past work we look at the functions of proteins from other coronaviruse, like Murine Hepatitis Virus (MHV) and SARS-CoV, to learn more about how best to fight against SARS-CoV-2.

Fig. 1: The crystal structure of papain-like protease of SARS CoV-2 nsp3 (PDB-ID: 6w9c). Picture by Kristopher Nolte.

The gene which produces nsp3 lies on the open reading frame 1a (ORF1a) which encodes polyprotein 1a. The sequence for nsp3 of SARS-CoV is 1922 amino acids long and sandwiched between nsp2 and nsp4. It not only cleaves itself from the polyprotein by its papain-like protease domain but also nsp1 and nsp2. In coronaviruses, 18 different domains have been found in nsp3. Each virus type has 10 to 16 of these, out of which eight domains and two transmembrane regions form the conserved part of nsp3, which can be found in every coronavirus known to date [1]:

  1. Ubiquitin-like-domian (Ubl1)
  2. Ubiquitin-like-domain (Ubl2)
  3. Papain-like protease (PlPro)
  4. Macro domain / X domain (Mac)
  5. Hypervariable region / Glu-rich acidic domain (HVR)
  6. Transmembrane regions (TM1)
  7. Transmembrane regions (TM2)
  8. Ectodomain / Zinc finger domain (3ecto)
  9. Nidovirus-conserved domain of unknown function (Y1)
  10. Coronvirus specific carboxyl-terminal domain (CoV-Y)

To start our investigation on SARS-CoV-2 related structural data, we will look into the protein sequences of SARS-CoV and SARS-CoV-2 to learn where they are similar and where they differ.

Genetic Comparsion of SARS-CoV and SARS-CoV-2

SARS-CoV has 16 domains which span 1922 amino acids. The nsp3 protein of SARS-CoV-2 is a bit longer at 1945 amino acids. When compared to each other, there is an overall similarity of 75,97%.[2] In Addition to the ten conserved domains the nsp3 gene of SARS-CoV-2 codes for four domains:

Fig 1: Position of the nsp3 gene on the SARS-CoV-1 genome. Nsp3 is seperated into 12 domains. Picture by Thomas Splettstoesser, scistyle.com.
  1. Nucleic-acidic-binding domain (NAB)
  2. Betacoronavirus specific marker domain (βSM)
  3. Domain preceding Ubl2 and PL2pro (DPUP)
  4. Amphipathic helix 1 (AH1)

The two domains at the N-terminal end, Ubl1 and HVR, have an alignment of 79% and 64%, respectively. There seems to be a trend in coronaviridae for these domains to be poorly conserved, but Ubl1 still adopts the expected conserved fold.[4] If this proves true, could be analysed by comparing the sequence alignment and the structural similarity. It is unsurprising that the "high variable region" lives up to its name and shows the worst alignment of all. In the related MHV nsp3, this domain is dispensable for replication.[5]
It has been speculated that the Mac1 domain functions as an ADP ribose 1"-phosphatase, however, the effects of mutation in this region differ from virus to virus.[4] As a result, it is difficult to judge what significance the bad alignment of this domain will have on our understanding of SARS-CoV-2 without further research.

Table. 1: The domain amino acid range for SARS-CoV-1 was taken from Hilgenfeld et al.,2018 [2]. The range for SARS-CoV-2 was determined by taking the amino acid ranges of CoV-1 and using BLAST [2] to search for the best alignment of the domain sequences. Picture by Kristopher Nolte

The Mac1 domain, also known as the X-domain, is followed by two macrodomains which were originally called "SARS-CoV Unique domains" (SUD-N and SUD-M), but were renamed when they were found to not be unique to SARS-CoV. It has since been observed that only Mac3 plays an essential role in viral RNA replication[6], which could explain why Mac3 is one the most conserved domains in the alignment of SARS-CoV and SARS-CoV-2.

Pl2Pro and its neighbouring domain Ubl2 show some of the highest sequence alignments of all domain comaprisons. This could be explained by their essential function to cleave nsp3 from the polyprotein.
Little is known about the domains following Pl2Pro and our current structural knowledge is limited to a nuclear magnetic resonance (NMR) structure of NAB. While the structure and function of Y1 and CoV-Y from SARS-CoV-2 are currently unknown, their sequence, which compromises a fifth of the genome, is highly conserved in all coronaviruses.

Fig. 2: The location of the aligned domains of SARS-CoV (abbreviated CoV-1) and SARS-CoV-2 (abbreviated CoV-2) is shown over the length of nsp3 (TM1 = 1, TM2 = 2, AH1 =A). Picture by Tim Scharf.

In the second part of the series of Untangling Nsp3 of SARS-CoV-2 we will delve deeper into some structures of nsp3 of SARS-CoV-1 and SARS-CoV-2 and will try to find out how the differences in the sequence may have influenced some structures of the protein. For a further in-depth reading on the topics discussed here I highly recommend the sources below.  

Table. 2: For each domain and their respective counterpart in SARS-CoV-2 a BLAST search was contucted to search for fitting PDB-IDs. Last Update: 18.05.2020. The scripts and the PDB-data can be found in our Git repository [3]
Picture by Kristopher Nolte

Sources

  • [1] Lei J, Kusov Y, Hilgenfeld R. Nsp3 of coronaviruses: Structures and functions of a large multi-domain protein. Antiviral Res. 2018 Jan;149:58-74. doi: 10.1016/j.antiviral.2017.11.001. Epub 2017 Nov 8. PMID: 29128390; PMCID: PMC7113668.
  • [2] Madden T. The BLAST Sequence Analysis Tool. 2002 Oct 9 [Updated 2003 Aug 13]. In: McEntyre J, Ostell J, editors. The NCBI Handbook [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2002-. Chapter 16. Available from: http://www.ncbi.nlm.nih.gov/books/NBK21097/
  • [3] https://github.com/thorn-lab/coronavirus_structural_task_force
  • [4] Benjamin W. Neuman, Bioinformatics and functional analyses of coronavirus nonstructural proteins involved in the formation of replicative organelles, Antiviral Research, Volume 135, 2016, Pages 97-107, ISSN 0166-3542, https://doi.org/10.1016/j.antiviral.2016.10.005.
  • [5] K.R. Hurst, C.A. Koetzner, P.S. Masters, Characterization of a critical interaction between the coronavirus nucleocapsid protein and nonstructural protein 3 of the viral replicase-transcriptase complex J. Virol., 87 (2013), pp. 9159-9172
  • [6] Kusov Y, Tan J, Alvarez E, Enjuanes L, Hilgenfeld R. A G-quadruplex-binding macrodomain within the "SARS-unique domain" is essential for the activity of the SARS-coronavirus replication-transcription complex. Virology. 2015 Oct;484:313-22. doi: 10.1016/j.virol.2015.06.016. Epub 2015 Jul 3. PMID: 26149721; PMCID: PMC4567502.

Coronavirus Structural Taskforce
Top