The novel Coronavirus SARS-CoV-2 incorporates various structural proteins in its protective coat. In order to find a potential drug target against the spreading pandemic, a lot of scientific research focusses on the characteristic spike glycoprotein as a therapeutic target. But apart from the spikes, several other structural proteins were found to decorate the virus hull of which the envelope protein (E protein) is the smallest one, consisting of only 75 amino acids. Even though it is an integral membrane protein, the envelope protein is also localized in the host ER, Golgi, and ERGIC (ER-Golgi intermediate compartment) , where it is essential for virus formation.
Interestingly, research on this protein can shed light on the origin of the novel coronavirus, which currently dominates everyday life all over the globe. Sequence comparisons of several different envelope proteins strengthen the assumption that SARS-CoV-2 may originate from Bat-CoV or Pangolin-CoV due to a high sequence homology . The E protein of SARS-CoV-2's "older brother" SARS-CoV exhibits a nearly identical sequence with 91% homology  as well and has been structurally determined based on nuclear magnetic resonance (NMR) data. Yet, until now, solving the 3D structure of SARS-CoV-2 E protein turns out to be quite challenging, and hence no experimental structure is available for the new coronavirus .
We, as structural biologists aim to uncover and refine the structures of as many of the novel virus's proteins as possible. But, as long as no structures of SARS-CoV-2 E protein have been solved, this could only be achieved by comparing it to the existing structures of SARS-CoV envelope protein.
The topology of SARS-CoV E protein is mainly separated into three domains: A short hydrophilic N-terminus, that has an identical sequence in SARS-CoV-2  and works as a Golgi-targeting signal; a long mainly hydrophobic transmembrane domain (TMD), and a long hydrophilic C-terminal domain. Studies on the question whether the C- and the N-terminus are luminal or cytoplasmic have had different results, suggesting that the E protein’s topology could differ depending on its multiple functions .
The E protein of SARS-CoV comprises several interesting structural features: A long α-helix with amphipathic parts forms the Transmembrane domain (TMD). The C-terminus, however, incorporates a short α-helix which is believed to be in a dynamic equilibrium with a less abundant β-coil-β-motif. Both helices are connected by a turn . The β-coil-β-motif with a conserved proline residue (Pro-54) has been proposed to function as Golgi targeting signal, and to switch its conformation in order to alter the E protein's function in the host cell . Furthermore, the C-terminus contains a PDZ-binding motif (PBM) at residues 73-76 . This PBM domain slightly differs in coronaviruses but a DLLV motif is conserved in the E proteins of SARS-CoV, Bat-CoV, and SARS-CoV-2 . Unfortunately, there are no PDB structures available that exhibit the β-coil-β-motif nor the PBM domain.
The E protein comes in two different forms. Apart from a monomeric structure, the protein also oligomerizes to form a pentameric viroporin in the host cell's Golgi membrane. Whether the E proteins that are embedded in the viral hull are pentamers or monomers is not yet clear. Oligomerization is induced by the amphipathic α-helix of the TMD  and is proposed to be mainly mediated by residue Val-25 as well as residue Asn-15 being slightly involved . Both residues are conserved in SARS-CoV-2 as well. To anchor the pore in the Golgi membrane, the hydrophobic amino acids of the TMD orientate towards the phospholipids. Additionally, basic positively charged residues interfere with the negatively charged phospholipids via electrostatic interactions .
Other structural variants are obtained by posttranslational modifications, which have been detected in the E protein of SARS-CoV and other coronaviruses. Palmitoylation is the addition of palmitic fatty acid to cysteine residues which increases the protein's hydrophobicity. Hence, the palmitoylation of E protein assists in membrane anchoring and probably aids Golgi targeting. Ubiquitination of the E protein might function as negative regulation of E protein levels . It has been shown that the optimal amount of E protein present in the host cell is important for a successful production of new viruses. Another modification, namely glycosylation, adds oligosaccharide fragments to asparagine residues in a certain motif (Asn-X-Ser/Thr) which is also conserved in E protein. In SARS-CoV, residue Asn-66 embedded in the motif Asn-Ser-Ser was proven to be glycosylated. This may help to recruit chaperone proteins of the host cell to aid in the correct folding of newly synthesized viral proteins as well as in defense against the host immune system. Experimental data suggest that glycosylation of Asn-66 might also promote E protein's monomeric functions as it prevents oligomerization .
To understand a molecule's biological function is the main goal of experimental structure determination. A viral protein can be targeted by drugs best if the atomic structure is known. The envelope protein has various structural conformations and thus multiple functions, both as a monomer and as a pentamer.
The E protein comprises a Golgi-targeting signal in the β-coil-β motif of the C-terminus and another one in the N-terminal domain. Additionally, palmitoylation is believed to be involved in this function. Accordingly, after being translated at the ER, the E protein is located to the Golgi membrane. From there, the virus acquires the membrane for a new viral envelope . Once the protein is located to the Golgi, one of its main functions as a monomer is in viral assembly, which means the process of gathering all the viral macromolecules (proteins and the RNA genome) to form a virus-like particle. During this assembly, the virus-like particle buds into the lumen of ERGIC and follows the way through the host cell's secretory pathway. Several experiments confirm the involvement of the envelope protein together with the membrane protein (M) into this process. It has been proposed that the E protein rather induces membrane curvature and scission, whereas the M protein may coordinate viral assembly. Nevertheless, SARS-CoV-infected cells still produce virus-like particles in the absence of E protein, but virus trafficking to the cell surface and viral secretion are hampered, resulting in a lower number of mature virions, an atypic morphology and a higher rate of propagation incompetent virions . Further investigation will be necessary to analyze the exact mechanism behind the membrane formation of virions.
After finding its way through the secretory pathway, the mature virion is released from the host cell. The process of detaching from the host membrane is known as scission and is either coordinated by the virus's own scission proteins or by the host cell's scission machinery (called ESCRT). Which one is the case for SARS-CoV-2 is still unclear. Infected cells lacking the scission machinery exhibit a “beads-on-a-string” morphology, with the virions being stuck to the host membrane in an elongated shape. This morphology was found in influenza-infected cells lacking the M2 protein, which proves that M2 is involved in this scission process. Given that SARS-CoV E protein is suggested to be functionally equivalent to M2, due to similar structural features, the E protein is proposed to be involved in the scission process as well .
While located at the Golgi, some of the SARS-CoV E proteins oligomerize and form a pentameric viroporin. These pores of SARS-CoV E protein function as ion channels. They mainly favor the transport of Na+ and K+, but were also found to be permeable for Ca2+ ions and eventually for H+ ions. Even though the primary purpose of transporting cations is not yet clear, Ca2+ is proposed to trigger the inflammatory response seen in acute respiratory distress syndrome .
Residue Asn-15 has been suggested to act as a "filter" for this ion selectivity , which can further be affected by the charge of the membrane's lipid head group. Deletion of the envelope protein in its pentameric form demonstrates that ion channel activity is not essential for viral replication, but yet attenuates the virulence .
Interactions of viral proteins with host cell proteins de-regulate many physiological processes. In patients suffering from SARS-CoV infections, these de-regulating protein-protein interactions greatly contribute to pathogenesis. Some of the observed symptoms are also present in a SARS-CoV-2 infected patient.
Interactions of the envelope protein with proteins of the host cell are mediated by its PDZ-binding motif (PBM) at the very end of the C-terminus. The motif binds to the PDZ domain of adaptor proteins, which are subsequently bound by other cellular proteins, activating a signaling cascade that may result in pathogenesis. Some of these interactions were proposed or even proven to induce symptoms like lymphopenia , changes in fluid volume, blood pressure, and water homeostasis, as well as tissue damage, edema and acute respiratory distress syndrome (ARDS) , due to an overexpression of inflammatory cytokines (which are also regulated by the leader protein nsp1). Another protein-protein interaction was found to disrupt tight junctions of pulmonary epithelial cells in the lungs. This eventually results in an epithelial barrier failure and virions breaking through the alveolar wall causing a systemic infection . O. Wittekindt writes : "The breakdown of the epithelial barrier is a hallmark in respiratory distress syndromes (...)" Furthermore, the ion channel activity of the E protein activates the inflammatory pathway by channeling Ca2+ resulting in lung damage in infected mice . Inhibition of the viroporin by hexamethylene amiloride (HMA)  reduces the activation of the inflammasome, which makes the ion channel of E protein a potential therapeutic target. Additionally, as a part of the host cell's viral defense, the ER stress response is activated, once the protein folding capacity of the ER is overloaded by additional expression of viral proteins. This can lead to apoptosis of the host cell. However, experiments confirm that the E protein contributes to pathogenesis by suppressing the ER stress response to maintain the survival of the host cell .
As a potential target for drug treatment, protein-protein interactions of the E protein are quite promising. Its PBM domain can bind cellular proteins that are involved in pathogenesis. Experimental truncation of this domain shows that it may be possible to find a live vaccine with a mutated but intact PBM and thus attenuated pathogenicity. Identifying more interacting partners could provide a more targeted therapy, though. The absence of E protein furthermore leads to reduced viral titers, crippled viral maturation, and propagation-defective progeny , making E protein-deficient virions also a potential vaccine candidate.
In conclusion, one could say that the E protein of SARS-CoV-2 is another valuable drug target. While the protein's "older brother" SARS-CoV E protein gives us much insight into its function, an experimental structure determination of SARS-CoV-2 E protein would be highly desirable. Until then, the envelope protein SARS-CoV-2 remains a small but mysterious structure.
 J. Nieto-Torres, M. DeDiego, E. Álvarez, J. Jiménez-Guardeño, J. Regla-Nava, M. Llorente, et al.: Subcellular location and topology of severe acute respiratory syndrome coronavirus envelope protein, Virology, 2011
 M. Bianchi, D. Benvenuto, M. Giovanetti, S. Angeletti, M. Ciccozzi, S. Pascarella: Sars-CoV-2 Envelope and Membrane proteins: differences from closely related proteins linked to cross-species transmission, Preprint, 2020
 D. Schoeman, B. Fielding: Coronavirus envelope protein: current knowledge, Virology Journal, 2019
 Y. Li, W. Surya, S. Claudine, J. Torres: Structure of a Conserved Golgi Complex-targeting Signal in Coronavirus Envelope Proteins, The Journal Of Biological Chemistry, 2014
 Ann (Hui) Liu, in https://animationlab.utah.edu/
 K. Pervushin, E. Tan, K. Parthasarathy, X. Lin, F. Jiang, D. Yu, A. Vararattanavech, T. Soong, D. Liu, J. Torres: Structure and Inhibition of the SARS Coronavirus Envelope Protein Ion Channel, PloS Pathogens, 2009
 J. Nieto-Torres, C. Verdiá-Báguena, J. Jimenez-Guardeño, J. Regla-Nava, C. Castaño-Rodriguez, R. Fernandez-Delgado, et al.: Severe acute respiratory syndrome coronavirus E protein transports calcium ions and activates the NLRP3 inflammasome, Virology, 2015
 J. Nieto-Torres, M. DeDiego, C. Verdiá-Báguena, J. Jimenez-Guardeño, J. Regla-Nava, R. Fernandez-Delgado, et al.: Severe acute respiratory syndrome coronavirus envelope protein ion channel activity promotes virus fitness and pathogenesis, PLoS Pathogens, 2014
 Y. Yang, Z. Xiong, S. Zhang, Y. Yan, J. Nguyen, B. Ng, et al.: Bcl-xL inhibits T-cell apoptosis induced by expression of SARS coronavirus E protein in the absence of growth factors, Biochemical Journal, 2005
 J. Jimenez-Guardeño, J. Nieto-Torres, M. DeDiego, J. Regla-Nava, R. Fernandez-Delgado, C. Castaño-Rodriguez, et al.: The PDZ-binding motif of severe acute respiratory syndrome coronavirus envelope protein is a determinant of viral pathogenesis, PLoS Pathogens, 2014
 K. Teoh, Y. Siu, W. Chan, M. Schlüter, C. Liu, J. Peiris, et al.: The SARS coronavirus E protein interacts with PALS1 and alters tight junction formation and epithelial morphogenesis, Mol Biol Cell, 2010
 O. Wittekindt: Tight junctions in pulmonary epithelia during lung inflammation, Springer Verlag, 2016
In the first part of this series we compared the protein nsp3 from SARS-CoV and SARS-CoV-2 by sequence. Now we delve deeper into the differences between these two proteins and follow through by analyzing the structure of one domain of nsp3 in particular: papain-like protease. This domain is a very relevant drug target because of its ability not only to cleave the polyprotein, but also remove some of the post-translational modification our cells use to fight these viruses. Without papain-like protease, the virus would be unable to spread COVID-19.
Like the entire nsp3 protein, the papain-like-protease (Pl2pro) domain is localized close to the endoplasmic reticulum’s (ER) membranes. The transmembrane domains hold it in place while the majority of the protein protrudes out of the ER membrane into the cytoplasm.
We cannot discuss the Pl2pro domain without its little neighbor, which has been speculated to influence protease domain functionality.
In ubiquitin-specific proteases, the function of comparable Ubl2 domains is attributed to substrate recruitment or an increase in catalytic efficency. Ubiquitin-like-domain 2 (Ubl2) is the domain residing directly adjacent to the N-terminus of the Pl2pro catalytic domain. These ubiquitin-like domain seems to be more conserved compared to Ubl1 in different coronavirus species.
If, in SARS-CoV and Murine coronavirus (MHV), Ubl2 is removed, Pl2pro loses its structural integrity. In addition, Pl2pro is then no longer able to act as an Interferon (IFN) antagonist (see below). However, some studies suggest that the Ubl2 domain in MERS-CoV might not be as essential as originally thought and in cell-based studies of this virus, Pl2pro could retain some of its enzymatic functions without the Ubl2 domain.
To date, several inconsistent roles of Ubl2 were reported, and its exact function and inner workings remain enigmatic. This is being highlighted the fact that there are significant differences between the coronaviruses, and as a consequence, we need to exercise caution in applying our findings to SARS-CoV-2.
In the family of coronaviridae, viruses with either one or two Plpro domains can be found, with SARS-CoV and SARS-CoV-2 only having one. Confusingly, this single domain is however still called Pl2pro, even if it is the only papain-like protease domain in the viral genome.
Pl2pro cleaves the polyprotein from nsp1 (leader protein) up to nsp3. While Pl2pro cuts between nsp1-( ELNGG↓AV)-nsp2-( RLKGG↓AP)-nsp3-( SLKGG↓KI)-nsp4, the nsp5 (3c-like protease) cleaves the rest of the polyprotein.  The cysteine protease Plpro is similar to human ubiquitin-specific-protease (USP) in that it adopts a right-hand fold with "thumb", "palm" and "finger" subdomains.
Despite the variations of Pl2pro in different coronaviridae, the same catalytic motif of three amino acid residues is essential for the stability and proteolytic activity of the domain: Cys112 is located in the thumb, His273 and Asp287 are located in the palm subdomain. (The numbers identifying these residues can vary between species.)
In addition, Pl2pro has deubiquitinating and deISGylating (removal of ISG15 from target proteins) abilities. Both ubiquitin and ISG15 regulate facets of the immune response and through their removal Pl2pro poses as an antagonist to the human immune response. They can stimulate the production of cytokines, chemokines and other IFN-stimulated gene products which have antiviral properties.  ISG 15 is an ubiquitin-like modifier composed of two ubiquitin-like folds that has an essential role in marking newly synthesized proteins during the antiviral response. Post-translational modification by ubiquitin and interferon-stimulating gene 15 (ISG15) is reversed by isopeptide bond hydrolysis. Figure 3 shows a proposed mechanism for the cleaving of isopeptide bonds by SARS-CoV.
Toll-like receptors (TLRs) are an important part of the machinery of the human immune response, which recognizes the pathogen-associated molecular patterns. The ability of the host cell to transduce the so-called Toll-like receptor 7 (TLR7) mediated immune response is diminished (Fig. 5) by Pl2pro as it removes Lys63-linked-ubiquitin from the TNF receptor associated factors TRAF3 and TRAF6. 
In addition, SARS-CoV can hamper the antiviral activities of interferon. The Pl2pro domain inhibits in combination with a transmembrane (TM) domain the STING mediated activation of interferon expression. PL2pro-TM interacts with TRAF3, TBK1, IKKε, STING and IRF3, the key components assembling a regulatory complex for activation of IFN expression.
Another tool to fight the coronavirus in human cells is the "guardian of the genome", p53. The tumor supressor protein p53 impedes the replication of SARS-CoV, though the virus fights back with Pl2pro, which binds a p53 degradation stimulator named "RING finger and CHY zinc finger domain-containing protein 1" (or short: RCHY1). Enhanced by the Macro somains in NSP3, this binding enhances the stability of RCHY1 and hence promotes the degradation of p53. In addition, Pl2pro blocks another crucial cellular defense mechanism: The NF-κB pathway, which regulates immune responses to infections. SARS-CoV Pl2pro can stabilize IκBα, an inhibitor of NF-κB.
Although all Pl2pro in different coronaviridae suppress the immune response, the targets differ between various species. For example, SARS-CoV Pl2pro preferentially processes Lys48 linked poly-ubiquitin chains, which are markers for proteasome degradation. MERS, on the other hand, shows no differences in effectivity between Lys48 and Lys63 linked di-Ubq chains. Lys63-linked chains are related to signal transduction cascades of the host immune system. Studies have shown that specificity among Pl2pro for Ubiquitin and ISG15 substrates can be altered with as little as a single amino acid change. However, even though there are differences, for SARS-CoV-2, it is likely that at least some of the functions are similar.
In order to predict Pl2Pro function for the novel Coronavirus SARS-CoV-2, we start by aligning their sequence like we did in the first part of this series to comapare the sequence with the one from SARS-CoV-2. Both domains share a similarity of 82.8% over the length of 313 amino acids. However, this time, we go for a more detailed analysis of the 54 individual differences, which are:
T3R N14I V20V N48N H49S V56Y D60N E66V D75T S77P P95Y G99N S114A V115T L116A L119T E123I K125L P129P A134D A143E N155C H170S L171Y Q173F S179D K181C C191T T195Q T196Q G200K N214E L215Q K216F G218K I221Q C225T D228K A229Q Y232K F240P Y250Q L252E Q254K G255H C259T E262S H274K K278S I284C L289L S293S T300I S308N
(The first letter refers to SARS-CoV, and the second to the amino acid residue in SARS-CoV-2.)
The mutations are evenly spread over the protein. None of the catalytic triad (Cys 112, His 273, Asp287) are changed as is to be expected given their conservation in all other coronaviruses. On further investigation, however, in the motif which interacts with ubiquitin six sites are different: S170T, Y171H, F216L, Q195K, T225V, and K232Q. Earlier studies concluded that the mutation of position 232 from Glutamine to Lysine increases the affinity for ubiquitin at the expense of the de-ubiquitination effectiveness. The kinetics of SARS-CoV-2 nsp3 Pl2pro were studied to test if the protease domain of nsp3 has a reduced effectiveness in binding ubiquitin compared to nsp3 from SARS-CoV, MERS-CoV.
All three Pl2Pro variants cleave more ISG15 than ubiquitin. SARS-CoV has the fastest kinetics of the three viruses. And, the slower kinetics of SARS-CoV-2 resemble those of MERS-CoV rather more than SARS-CoV, having a 10 times higher turnover rate (kcat) as a deISGylase than as a deubiquitase.
Besides the kinetics, the Pl2pro’s affinity for different poly-ubiquitin linkage sites was measured. The result shows that while SARS-CoV-2 can cut K48-Ub linked polyproteins, it seems to lack an ability to cut other polyubiquitin chains. Those K48-Ub linked polyproteins are cleaved at a slower rate than by SARS-CoV. In this regard, SARS-CoV-2 distinguishes itself from MERS-CoV which has the ability to cleave K63-linkages. It is suggested that the decrease in deubiquitinase effectiveness may not be irrelevant, but could lead to the often-mild symptoms that are a factor in why SARS-CoV-2 has been able to evade our efforts in quarantine. But this is mere speculation and a lot more research is needed to resolve the matter.
Pl2pro was a potential drug target early on in SARS-CoV-2 research. Hilgenfeld et al. name two major challenges we have to overcome to find a drug targeting Pl2pro. One is that the binding sites are tailor-made to bind glycine residues. Also, this very specific binding motif is rather ubiquitious in our cells. These two problems make it difficult to find an inhibitor which fits and is specific to Pl2pro. However, scientists found a weak spot: a loop called Blocking Loop 2 (BL2) regulates substrate binding and may be a promising target to inhibit PL2pro. Naphthalene based inhibitors, which were earlier proposed to inhibit the BL2 of SARS-CoV, were shown to also inhibit SARS-CoV-2 Pl2pro, in particular an inhibitor called GRL-0617.
For in-silico drug development, it might be prudent to choose high-resolution structures which already have a ligand or inhibitor bound, such as 6yva, 6wuu, 6wx4 or 6yaa. Technically speaking, 6wrh, albeit being a mutant, is one of the highest-quality structures available for SARS-CoV-2 Pl2pro.
In fact, a lot of research is still required to consolidate our understanding of this protein and its domains. In spite of that, we are making progress in our endeavor to fight this virus - and every step we take is one more to win this fight.
 Báez-Santos YM, St John SE, Mesecar AD. The SARS-coronavirus papain-like protease: structure, function and inhibition by designed antiviral compounds. Antiviral Res. 2015;115:21-38. doi:10.1016/j.antiviral.2014.12.015, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5896749/
 Lei J, Kusov Y, Hilgenfeld R. Nsp3 of coronaviruses: Structures and functions of a large multi-domain protein. Antiviral Res. 2018;149:58-74. doi:10.1016/j.antiviral.2017.11.001, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7113668/
 Clasman JR, Báez-Santos YM, Mettelman RC, O'Brien A, Baker SC, Mesecar AD. X-ray Structure and Enzymatic Activity Profile of a Core Papain-like Protease of MERS Coronavirus with utility for structure-based drug design. Sci Rep. 2017;7:40292. Published 2017 Jan 12. doi:10.1038/srep40292, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5228125/
 Lei J, Hilgenfeld R. RNA-virus proteases counteracting host innate immunity. FEBS Lett. 2017;591(20):3190-3210. doi:10.1002/1873-3468.12827, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7163997/
 Chen X, Yang X, Zheng Y, Yang Y, Xing Y, Chen Z. SARS coronavirus papain-like protease inhibits the type I interferon signaling pathway through interaction with the STING-TRAF3-TBK1 complex. Protein Cell. 2014;5(5):369-381. doi:10.1007/s13238-014-0026-3, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3996160/
 Freitas BT, Durie IA, Murray J, et al. Characterization and Noncovalent Inhibition of the Deubiquitinase and deISGylase Activity of SARS-CoV-2 Papain-Like Protease [published online ahead of print, 2020 Jun 4]. ACS Infect Dis. 2020;acsinfecdis.0c00168. doi:10.1021/acsinfecdis.0c00168, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7274171/
Crystallography has a problem. Some amino acid side chains in our structures simply can’t be seen in our maps (Fig. 1). Crystallographic maps represent many protein molecules in a crystal lattice, thousands of copies of the same molecule averaged over measurement time and unit cells. So, what happens with inherently flexible regions of our protein? The average of many different conformations leaves us with no map to guide us in modelling our side chain. So, what is the best way to deal with this as a model builder?
A passionate discussion within the Task Force has resulted in the following options for dealing with this situation:
Just to be clear, option four should only be considered in the direst of circumstances. Please consider options one to three before resorting to proline and fire, and even then, only with a computer you own. With that said, what is the best option? Sadly, none are ideal solutions to the problem so let’s discuss.
Option 1 can be misleading as the residue appears to be present in the model (Fig. 2), despite there being no experimental evidence for it, until you check the occupancy or load the corresponding map with your model which will tell you otherwise. An occupancy of zero also adds no useful information to the model and may even exclude atoms in this position, like opening the airlock and sending it flying out into the vacuum of space.
Option 2 is effectively the opposite of option 1, providing a full occupancy side chain in a sensible rotamer conformation and accept the resulting phase bias*. However, this can be equally misleading if the downstream user doesn’t check the B-factors of the sidechain, which will be very large, as they represent not only (smaller) displacement but (larger) disorder. In addition, allowing the B-factor to “explode” is not always an effective way to deal with this problem, as strong negative peaks can still be observed around the side chain in some cases. Another argument for maintaining an occupancy of 1 is that the protein sequence tells us a certain amino acid is present at a position, unless evidence of chemical clipping has been provided (mass spec, for example). Therefore, the atoms must be present in the protein so should be included in the model for the B-factors to deal with the physics of the situation. Options 1 and 2 both have the advantage of providing a complete set of atoms for downstream use in molecular modelling.
*During refinement our model will always bias the phase calculation which gives us our maps. Ideally, we would like out model to maximally affect the phases when we are confident our model is correct and minimally affect the phases when we are less confident. So, an occupancy of 1 (high confidence) where we observe no peaks in our map (low confidence) will lead to what we call phase bias. This can work both ways by underestimating the contribution of our model by setting the occupancy to 0 (option 1).
This brings us onto option 3: trimming down the side chain to what we can in the map (Fig. 3). The “make them work for it” option. If a downstream user is paying attention and realises that, for example, the side chain they are looking at is meant to be a lysine, despite the model only having atoms up to Cß, this should be the least misleading of all the options. The residue should not be mutated to, say, Alanine in this case, as that would mean you are wilfully misleading downstream users. Upon realising the atoms are missing, the downstream user can then model a (hopefully sensible) rotamer for their simulations if needed. The downside is that this approach does introduce some negative bias in favour of modelling bulk solvent into this area. Like I said, none of the options are ideal solutions.
So, following this discussion between Nick Pearce, Dale Tronrud, Gianluca Santoni, Andrea Thorn, and I, we recommend option 3 as the best of the available solutions. We believe that the end goal of a crystallographic experiment should be to build atoms justified by the experimental data, i.e. the map, and leave the prediction of unobservable atoms to downstream users. We (crystallographers) are not here to “make it easier for users to avoid thinking about it”. However, after publishing the first iteration of this article a number of crystallographers made the case for option 2 on twitter and a poll of those involved resulted in 53.8% in favour of option 2 (Figure 4), so the matter is still far from resolved.
However, it’s nice to know that if we really can’t agree on the best method we can at least agree on not option 1, and there's always the fall back plan of option 4 and watch the PDB burn if we get desperate.
The surface proteins, also called the “spike” or S-proteins, protrude from the viral envelope of SARS-CoV-2 like “spikes of a crown”, thus giving the coronavirus its name. They mediate entry into the host cell by binding to a cellular receptor called angiotensin-converting enzyme (ACE2), triggering a cascade of events leading to membrane fusion and entry. The Spike protein is formed by three identical monomers, each consisting of the two subunits S1 and S2. Subunit S1 comprises a receptor binding domain (RBD), which interacts with ACE2 on human epithelial cells. ACE2 is a type I membrane protein expressed in lungs, heart, kidneys, and intestines, and takes part in maturation of angiotensin, a peptide hormone which controls vasoconstriction and blood pressure.
To engage the ACE2 receptor, the RBD of S1 undergoes a hinge-like conformational rearrangement that transiently exposes the residues necessary for receptor binding. The hepta-repeat 1 and 2 domains (HR1 and HR2) play a key role in mediating fusion and entry (see Fig. 1). The exact mechanism of entry and fusion of SARS-CoV-2 with and into the host cell is still not fully established, but it is likely that the fusion mechanism is similar to SARS-CoV. The putative mechanism is that after RBD binds to the ACE2 receptor, the S2 subunit binds to the host membrane via a fusion peptide (FP), and changes conformation to trigger the association between the HR1 and HR2 domains to form the “fusion core”, which brings the viral and cellular membranes in close proximity for fusion.
The structure of the RBD in complex with the human ACE2 receptor reveals that the interaction occurs via the spike protein RBD and the ACE2 N-terminal peptidase domain. The RBD consists of a twisted five stranded antiparallel β-sheet (β1, β2, β3, β4 und β7) forming the core together with short connecting α-helices, β-sheets and loops. These short α-helices, β-sheets and loops constitute the receptor binding motif (RBM) which is located as an extended insertion between two β-strands (β4 and β7) and contains most of the ACE2 contacting residues. The ACE2 N-terminal peptidase domain consists of two lobes that form the substrate binding site. The contact between the RBM and ACE2 is made at the bottom side of the ACE2 small lobe, with a concave outer surface in the RBM accommodating the N-terminal helix of the ACE2 and thus generating an interface of 1687Å2 (see Fig. 2).
The RBM/ACE2 interface contains a network of different interactions, including hydrophilic interactions with 13 hydrogen bonds and 2 salt bridges which are shown in Fig.3. Key residues for receptor binding include the amino acids Leu-455, Phe-486, Gln-493, and Asn-501. The RBD residues Gln-493 and Asn-501 form hydrogen bonds with the respective ACE2 residues Glu-35 and Tyr 41. Phe-486 interacts with the ACE2 amino acids Gln-24, Leu-79 as well as Tyr-83 and makes contact to Met-82 by Van-der-Waals forces. Another important interaction takes place between the non-polar RBD Leu‑455 and ACE2 Asp-30, Lys-31 and His-34. Outside the RBM the amino acids Lys-417 and the ACE2 Asp-30 contribute to receptor binding by forming a salt bridge. Binding of the host cell receptor by subunit S1 destabilizes the prefusion trimer and triggers a structural rearrangement resulting in cleavage and shedding of the S1 subunit and transition of the S2 subunit to a stable postfusion conformation.
The surface of coronavirus spike proteins is densely decorated with heterogenous N-linked glycans protruding from the trimeric surface. SARS-CoV-2 spike comprises 22 N-linked glycosylation sequons per protomer. N-linked glycans play a key role in proper protein folding and in priming for fusion by host proteases. Glycans can also shield the amino acid residues and other epitopes from cells and antibody recognition, so glycosylation enables the coronavirus to evade both the innate and adaptive immune responses. It may also play a role in binding to the host cell. Unfortunately, both X-ray crystallography and cryo-EM cannot resolve long carbohydrate chains, so the structures (below) contain a maximum of three sugars. In most cases, the carbohydrate chains are much longer, covering most of the contact surfaces of the upper spike protein.
The spike protein acts as key molecule for fusion and entry, so development of drugs directly targeting this protein may be essential to contain the COVID-19 pandemic. "Capping" the spike proteins with antibodies would interrupt infection. Binding of antibodys to S1 RBD could lead to an inhibition of the RBD-ACE2 interaction, which then could prevent fusion with the host cell. In addition, in lung cells, spike functionality depends on furin-mediated pre-cleavage at the S1/S2 site for subsequent activation by TMPRSS2 (transmembrane Serinprotease 2). Thus, inhibitors of either furin or TMRPSS2 could also be considered as a potential treatment for COVID-19. As the spike protein decorates the virus hull, it could also be part of a vaccine. All of this makes the spike protein a major target in the molecular fight against COVID-19.
The world holds its breath as the novel Coronavirus continues to spread across the world, bringing our lives to a halt. We have gathered a lot of knowledge about the virus but there are still many gaps to fill. The non-structural-protein 3 (nsp3) represents one of these gaps in our knowledge. As the largest protein encoded by the coronaviruses genome, untangling its structure and function poses a huge task.
However, we can glean some knowledge around the specific function of SARS-CoV-2 nsp3 by looking at the virus‘s subfamily, Orthocoronaviridae. As related viruses do share some common traits, academics were not completely unprepared when SARS-CoV-2 came. In the background, while only very few people were worried about a new corona virus, scientists around the world had been investigating the invisible enemy for decades. Building on this past work we look at the functions of proteins from other coronaviruse, like Murine Hepatitis Virus (MHV) and SARS-CoV, to learn more about how best to fight against SARS-CoV-2.
The gene which produces nsp3 lies on the open reading frame 1a (ORF1a) which encodes polyprotein 1a. The sequence for nsp3 of SARS-CoV is 1922 amino acids long and sandwiched between nsp2 and nsp4. It not only cleaves itself from the polyprotein by its papain-like protease domain but also nsp1 and nsp2. In coronaviruses, 18 different domains have been found in nsp3. Each virus type has 10 to 16 of these, out of which eight domains and two transmembrane regions form the conserved part of nsp3, which can be found in every coronavirus known to date :
To start our investigation on SARS-CoV-2 related structural data, we will look into the protein sequences of SARS-CoV and SARS-CoV-2 to learn where they are similar and where they differ.
SARS-CoV has 16 domains which span 1922 amino acids. The nsp3 protein of SARS-CoV-2 is a bit longer at 1945 amino acids. When compared to each other, there is an overall similarity of 75,97%. In Addition to the ten conserved domains the nsp3 gene of SARS-CoV-2 codes for four domains:
The two domains at the N-terminal end, Ubl1 and HVR, have an alignment of 79% and 64%, respectively. There seems to be a trend in coronaviridae for these domains to be poorly conserved, but Ubl1 still adopts the expected conserved fold. If this proves true, could be analysed by comparing the sequence alignment and the structural similarity. It is unsurprising that the "high variable region" lives up to its name and shows the worst alignment of all. In the related MHV nsp3, this domain is dispensable for replication.
It has been speculated that the Mac1 domain functions as an ADP ribose 1"-phosphatase, however, the effects of mutation in this region differ from virus to virus. As a result, it is difficult to judge what significance the bad alignment of this domain will have on our understanding of SARS-CoV-2 without further research.
The Mac1 domain, also known as the X-domain, is followed by two macrodomains which were originally called "SARS-CoV Unique domains" (SUD-N and SUD-M), but were renamed when they were found to not be unique to SARS-CoV. It has since been observed that only Mac3 plays an essential role in viral RNA replication, which could explain why Mac3 is one the most conserved domains in the alignment of SARS-CoV and SARS-CoV-2.
Pl2Pro and its neighbouring domain Ubl2 show some of the highest sequence alignments of all domain comaprisons. This could be explained by their essential function to cleave nsp3 from the polyprotein.
Little is known about the domains following Pl2Pro and our current structural knowledge is limited to a nuclear magnetic resonance (NMR) structure of NAB. While the structure and function of Y1 and CoV-Y from SARS-CoV-2 are currently unknown, their sequence, which compromises a fifth of the genome, is highly conserved in all coronaviruses.
In the second part of the series of Untangling Nsp3 of SARS-CoV-2 we will delve deeper into some structures of nsp3 of SARS-CoV-1 and SARS-CoV-2 and will try to find out how the differences in the sequence may have influenced some structures of the protein. For a further in-depth reading on the topics discussed here I highly recommend the sources below.
The novel Coronavirus (2019‐nCoV) is classified as a large positive sense single stranded RNA-Virus from the family of betacoronaviruses. It shows high genetic similarity to SARS‐CoV and MERS‐CoV and is even closer related to the Bat-SARS-like corona virus, from which it most likely evolved. Even though it shows a lot of similarities to its ancestors, further insights in the infection mechanism and the structure of its proteins reveal significant differences.
Like most RNA-viruses, the virus has a lipidic hull, with envelope and other proteins integrated in it. This viral shell is responsible for the interaction with host cells and the protection of the inner parts, most importantly: the viral RNA. This RNA acts as a direct template for the translation of two polyproteins named pp1a and pp1ab which encode the 16 non-structural proteins (nsps) of the replication‐transcription complex (RTC). Those 16 nsps, encoded by about two third of the genome (in terms of length), are cleaved from the polyprotein by the chymotrypsin‐like protease (3CLpro) (=Main protease) and one or two papain‐like proteases to generate the functional single proteins. As a result, the RTC synthesizes a variety of subgenomic RNAs (sgRNAs) in a discontinuous transcription, which serve as templates to produce subgenomic mRNA. Other open reading frames of the genome encode at least four structural proteins, that are necessary for the assembly of the virions, the hull and the infection of cells (called S-, M-,E- and N-protein for spike, membrane, envelope and nucleocapsid).
The majority of infected cells are ACE2 (Angiotensin-converting enzyme 2)-bearing cells of the respiratory system. The viral mRNA is introduced through endocytosis via the spike glycoprotein of the Coronavirus. What does this mean? The S- or spike protein which forms the "corona" around the virus binds with its receptor-binding domain (RBD) to the receptor, which is located on the surface of the host cells. Afterwards, the virus can merge with the cell through a complicated mechanism, the so-called endocytosis. Once infected, these cells now act as a multiplicator for the virus which provokes a strong reaction of the immune system. Most common symptoms include cough, fever, fatigue, loss of taste, headache, diarrhoea, dyspnoea, and lymphopenia or pneumonia, even causing death of the patient in severe cases.
The structure of the virus, its infection mechanism and multiplication offer numerous possibilities for drug targeting, such as the inhibition of the main protease or the polymerases, the disturbance of the assembly of shell and entry proteins or the replication‐transcription complex and direct mRNA antiviral methods. However, none of them has been proven effective in clinical studies to this point.
Due to a new outbreak of pulmonary diseases caused by SARS-CoV-2, the development of new drugs is essential to contain the COVID-19 pandemic. One promising drug target is the 3C-like protease, also known as main protease or MPro. Most of the virus proteins are translated as one long polypeptide chain, which then has to be cleaved into functional proteins. For the viral polyproteins ppa1a and ppa1ab, 11 sites at the C-terminal end (downwards from nsp4) are cleaved by main protease. As the RNA polymerase complex is part of this chain (nsp7, nsp8, nsp12 and nsp14), inhibition of this main protease stops replication.
The 3C-like protease is a cysteine protease which is characterized by a catalytic dyad consisting of the amino acids cysteine and histidine. The homodimer is comprised of two perpendicular protomers forming a catalytic cleft in between. Each of these protomer is composed of three domains. Domain I and II (N-terminal domain) form an antiparallel chymotrypsin-like β-barrel structure in which the substrate binding site is located. Domain III (C-terminal end) consist of five α-helices arranged in a cluster regulating dimerization through a salt-bridge interaction between Glu‑290 of one protomer and Arg-4 of the other.
The N-terminal residues called “N-finger” (see image) make contact predominantly to domain II of the other protomer generating a contact interface of ~1394 Å2. The dimerization in essential for protease activity as the N-terminal residue Ser-1 of one protomer interacts with Glu-166 of the other protomer keeping a substrate binding site in the right shape. This substrate binding site contains a catalytic dyad consisting of the residues Cys-145 and His-41. Next to the catalytic dyad is the substrate binding pocket called S1. It consists of the side chains Phe-140, His-163 and the main chain atoms of Glu-166, Asn-142, Gly-143 and His-172. This pocket mediates the high specificity for a Gln [Leu-Gln↓(Ser,Ala,Gly)] of the substrate to be cut, as the carbonyl oxygen of this Gln is stabilized by the amino acids Gly‑143 and Cys-145.
The specific substrate binding site S1 is thought to bind an inhibitor to work as a drug. This would then inhibit the cleavage of polyproteins and hence stop replication of the virus RNA. An advantage of 3C-like protease as a drug target is that up to date, no human proteases with similar cleavage specificity are known, and as a consequence, newly designed drugs are unlikely to be toxic. The potential inhibitors can be divided into two classes based on their chemical structures: The first class involves peptide chains that fit the catalytic site of the enzyme by making a covalent link with Cys-145, therefore blocking substrate binding. The second class consists of small organic compounds that bind to the enzyme active site, acting as a competitive inhibitor and hinder the substrate from entering the active site cavity. A potential drug which belongs to the second class is Lopinavir, a HIV1 protease inhibitor, which seems to be a promising candidate for the treatment of coronavirus infections. If the efficacy of Lopinavir against SARS-CoV-19 is confirmed, it would have the advantage that it is already approved as an HIV drug for humans.