Yunyun Gao, Johannes Kaub, Gianluca Santoni, Nicholas Pearce & Andrea
Thorn

This blog post was published in Crystallography Reviews, please cite: https://doi.org/10.1080/0889311X.2023.2222275

Abstract

The SARS-CoV-1/SARS-CoV-2 main protease cleaves the nascent viral polyproteins into biologically functional molecules, which are essential for viral reproduction inside the host cell. With more than 500 crystal structures available, it is one of the most heavily researched coronavirus proteins and a popular drug target. This review focuses on putting the function and structure of the main protease into a historical perspective, highlighting the structure-based design of inhibitors of the main protease and discussing potential future research directions.

1. Introduction

Main protease (often called MPro or Mpro) is one of the most important non-structural coronavirus proteins, responsible for cutting viral polyproteins into functional units, and thus essential for the infection cycle. Main protease cleaves major parts of coronavirus polyproteins (pp1a and pp1ab) at eleven conserved sites [1,2], producing fully functional proteins which ultimately allow the virus to hijack the host cell and facilitate viral amplification [3].Main protease is the fifth non-structural protein (nsp) on both pp1a and pp1ab and is therefore also known as nsp5 [2]. Inhibition of main protease can effectively prevent coronavirus replication, making it an ideal target for drug development [4,5]. While this review focuses on severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) as well as severe acute respiratory syndrome coronavirus (SARS-CoV-1), the main protease is present in all known coronaviridae. Main protease is a cysteine protease characterized by a conserved cysteine-histidine catalytic dyad and is therefore also referred to as the 3C-like protease (3CLpro) due to the similarity in the cleavage-site specificity to that of picornavirus ‘3-Chymotrypsin’ protease [1]. Experimental investigations into main proteases of enveloped, positive-stranded RNA coronaviruses were able to reveal many features even before the first molecular structure was solved. Firstly, the catalytic residues are affixed in two chymotrypsin-like β-domains in a form of dyad or triad [3,6,7]. Secondly, the specificity of the protease is modulated by the substrate sequence around the cleavage site. For a given substrate P with a sequence of . . . P2–P1↓P1–P2 . . . (where ↓ denotes the cleavage site), the most important sites are P1, P2 and P1. For cleavage by main protease, the favoured sequence is (Leu/Met/Phe)–Gln↓(Ala/Ser/Gly) [8,9]. Thirdly, the C-terminal region has an extension of 110 amino acid residues – unique compared to other known prototypic 3C proteases – which is required for proteolytic activity [10]. Nevertheless, in the absence of a structural model, the precise mechanism of the proteolytic activity could only be speculated. The first available structure of a coronavirus main protease, from the transmissible gastroenteritis virus (TGEV), was solved using selenomethionine-based multiwavelength anomalous dispersion (MAD) and reported in 2002 [11]. This crystal structure confirmed that the catalytic centre is a dyad with a cysteine acting as the nucleophile and a histidine as the general acid/base. The protomer folds into three domains of which domain I and domain II form the chymotrypsin-like antiparallel β-barrels and the C-terminal extension forms a five-α-helical domain III. Two protomers assemble into an obligate homodimer by the intermolecular interactions between the interfacial residues of domains II and III and the N-terminus residues of the other monomer. The structural arrangement also provided evidence that the autocatalytic cleavage occurs with the cleaved bond in a trans conformation [11]. Themost studied coronavirus soon became SARS-CoV-1 (severe acute respiratory syndrome coronavirus) as it spread rapidly during the SARS outbreak in 2002 and 2003 [12–14]. This virus, a member of the coronavirus subfamily sarbecoviruses (severe acute respiratory syndrome-related coronaviruses) has a main protease with sequence identities of around 50% compared to those in other prominent members of the coronavirus family such as human coronavirus 229E (HCoV-229E) and Middle East respiratory syndrome coronavirus (MERS-CoV). This indicates highly conserved tertiary and quaternary structures [1,11,15,16]. Due to the prior knowledge of coronavirus gene expression and withmultiplemodels available formolecular replacement, structures of SARS-CoV-1main protease were quickly determined: the first crystal structure was available just two months after the sequencing of the viral RNA[15,17]. The SARS-CoV-1main protease shares common structural features with other previously solved coronavirusmain protease structures, such as dimerization and the substrate-specific active sites. This similaritymade inhibitors of other coronavirusmain proteases potential drug candidates for SARS [1], but substratebound structures of the SARS-CoV-1 main protease revealed that the catalytic site varies around the catalytic dyad, affecting the site specificity of the substrate [15,18,19]. While the outbreak of SARS was successfully controlled by public healthcare campaigns, reemergence of a coronavirus pathogen for humans was still considered a potential risk with the identification of new strains including human coronavirus NL63 (HCoV-NL63), human coronavirus HKU1 (HCoV-HKU1) and MERS-CoV [20,21]. Over the following years, crystallographic SARS-CoV-1 research focussed on twomajor efforts: to better understand the enzymatic reaction mechanism, especially the dimerization [22–27], and to enable structure-based drug design [18,23,24,28–37].Thewealth of knowledge derived fromthese studies later provided a head start in understanding the main protease of a new virus, SARS-CoV-2. The genome for the novel coronavirus SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) was published on January 25th 2020 [38]. The virus and the associated disease, COVID-19, spread across the world within months and caused a global pandemic, which continues to impact almost all aspects of society. In the campaign against COVID-19, the SARS-CoV-2 main protease was once more given considerable attention due to its small size, the amount of existing research, and important role in the infection cycle. The main proteases of SARS-CoV-1 and SARS-CoV-2 have a sequence identity of ∼96%, with the 12 amino acid differences distributed across residues distant from the active site [39]. The first SARS-CoV-2 crystal structure was solved swiftly, within three weeks of the deposition of the viral RNA sequence [16], using previously solved main protease structures as homologous models. In the following two years, more than 400 crystal structures of the SARS-CoV-2 main protease have been deposited in the PDB, including several large-scale efforts: a fragment screen conducted at Diamond Light Source (UK) [4], a drug molecule screen at Deutsches Elektronen- Synchrotron (DESY; Germany) [5] and most recently a compound optimization screen at the MAX IV Laboratory (Sweden) [40]. Various studies have used the available structures to identify/develop/optimize potential inhibitors, to elucidate enzyme mechanisms, to recognize the binding mode of repurposed drugs and as a target for method research (Figure 1).

Structural biology of SARS-CoV-1/SARS-CoV-2 mainprotease 1

Figure 1. Combined PDB depositions of SARS-CoV-1 and SARS-CoV-2 structures over time. Lines indicate the depositions cited by published studies focusing on different topics: structure-based inhibitor design, enzyme mechanism, drug repurposing and methods research. The number of total depositions includes additional structures without associated publications. Data from September 2022.

2. Structural overview

Coronavirus main protease molecules are readily crystallizable and almost all available structures have been solved using X-ray crystallography (712 total depositions in the PDB as of Sep. 2022: 705 X-ray structures, two NMR structures, four joint neutron/X-ray structures, and one electron microscopy structure). 93% of the deposited sarbecovirus main protease crystal structures have a high-resolution limit better than 2.5 Å, and 64% are better than 2.0 Å. The highest resolution structure is 7k3t, a free apo formdetermined to 1.2 Å resolution. 6lu7 [16] is the most cited structure and a popularmodel formolecular docking studies [41]. In sarbecoviruses, the mature catalytically active form of the protease forms obligate homodimers (Figure 2(a)), with each monomer comprising 306 amino acid residues. In each protomer (Figure 2(b)), domain I (residues 8 to 101) and domain II (residues 102 to 184) are fairly rigid [1,11]. The active dyad Cys145–His41 is embedded in a pocket formed by the domain I and domain II barrels. N-terminus residues 1–7 are a remnant of the autoproteolytic process, and often called the ‘N-finger’. One of these residues, Ser1∗ (residues from the neighbouring protomer will be indicated in the rest of the text as e.g. Ser1∗), forms interactions with conserved residues in the other protomer, making a salt bridge to the side chain of Glu166 and backbone interactions with Phe140. These interactions structurally ‘cap’ the outside surface of the active site. Domain III (residues 200 to 302), also known as C-terminal domain, is connected to domain II by an interdomain loop (residues 177 to 199) which links β-strand 12 and the α-helix 3. Domain III is more flexible compared to the N-terminal domains [42–44], and though distant from the active site, domain III is functionally important as it is indispensable for enzyme dimerization [22,45].

Figure 2. An overview of the structure of the SARS-CoV-2 main protease (PDB ID: 6LU7). (a) Surface and cartoon representation of the homodimer in the free apo form with each monomer in a different colour. (b) Domain structure of the main protease monomer: domain I, domain II, domain III and the N-finger are indicated (blue, purple, cyan and grey, respectively). (c) Introspection of the binding cleft and active site. S1 subsite is highlighted in blue, S2 in cyan, S1 in green and S3 to S5 in purple. Pull-outs: views of subsites with component residues labelled. The Cys145–His41 catalytic dyad is highlighted yellow. Molecular surface representations were rendered using Protein Imager [173]. Creator: Coronavirus Structural Task Force - Yunyun Gao, License: cc-by-sa

2.1. The active site

The Cys145–His41 active dyad is conserved across all coronaviruses [46]. Both kinetic studies [47] and the following molecular dynamics simulation [48] suggest that Cys145 and His41 adopt an uncharged rest state in the free apo form. An interpretation of neutron data of a SARS-CoV-2 main protease crystal at pH 7.0 and room temperature (PDB ID: 7jun) shows that His41 is positively charged through protonation of both nitrogen atoms of the imidazole side chain and that Cys145 is deprotonated and negatively charged [49]. It has also been indicated that Cys145 is readily oxidized to sulfinic acid [11,50], but such oxidation could have been triggered by incident X-ray radiation, especially for those structures determined at room temperature [50]. For sarbecoviruses, the distance between the thioester S of Cys145 and the imidazole N of His41 is in the range of 3.6 to 3.7 Å; this distance is beyond the length for a typical hydrogen bond between the thiol group and the histidine nitrogen. A catalytic water is trapped in a pocket formed by His164, Asp187, Arg40 and His41, which likely plays an important role in holding the active site in the proper geometry [44,51]. This water molecule forms hydrogen bonds to the N atom of His41, the Oδ atom of Asp187 and the Nδ atom of His164 [1,49,52]. Two residues in this pocket – Arg40 and Asp187 – forma salt bridge and are absolutely conserved [51], whereasHis164 is not absolutely conserved and the H164Lmutation generates no significant effect on the proteolytic activity [11]. The specificity of substrate binding is provided by several subsite pockets around the active dyad, whose names derive from the substrate residue that occupies them, i.e. S3, S2, S1 and S1 (the P1 residue binds in subsite S1).

2.2. The S1 subsite

The S1 subsite is a hydrophilic pocket occupied by a glutamine residue for all cleavage sites on the main protease substrates (polyproteins pp1a and pp1ab) [13,38], which implies the significance of this site for substrate recognition. An oxyanion hole is formed by the loop comprising residues 140–144 and the cap described above formed by Ser1∗, Glu166 and Phe140 (Figure 2(c); bottom right). Such an arrangement is a key component of the catalytic machinery that stabilizes a transition state oxyanion during substrate hydrolysis [53]. The main chain atoms ofMet165 and the two side chains of His163 and His172 form the back wall of the pocket. In all the reported monomer structures, the S1 subsite loses the specificity and the oxyanion hole also completely collapses [22,25,54]. However, while Ser1∗ may be important for substrate recognition, truncating this residue does not lead to a complete loss of enzymatic activity (see dimerization section below).

2.3. The S2 subsite

The S2 site is a wide hydrophobic pocket, and although it has a high specificity for Leu, it can also accommodate Phe and Met residues [44]. The subsite is bordered by Asp187, Arg188, Gln189 and His164 and capped by the side chains of His41, Met49 and Met165 (Figure 2(c); centre left). The side chains ofMet49,Met165 andGln189 have been proposed to be flexible to adapt to the presence of various P2 groups [5,16,51,55]. For SARS-CoV-1, it was previously reported that a S2–S3 cooperative binding mode exists (S3 subsite residues: Thr25, His41, Cys44–Ala46, Met49). Cooperative binding would mean that for the substrate residues flanking the cleavage sites in SARS-CoV-1 pp1a/pp1ab, a Phe residue at P3 is exclusively required whenever the S2 subsite is occupied by a Phe residue and the S2-Phe binding conformation appears [27]. This mechanism seems not always to be relevant in SARS-CoV-2: for example, in the apo structure, the S1–S3 pockets are similar to those in the acyl-enzyme intermediate structure (PDB ID: 7kph) [51], which indicates the formation of S3 is not dependent on the binding of P2.However, a docking study using the acyl-enzyme intermediate structure shows that a highly active non-covalent inhibitor, 17a [36], occupies the S2 and S3 subsites with two phenyl-like groups at the same time [51], suggesting that the S2–S3 cooperative bindingmodemay be important for designing a high-affinity inhibitor. It is also suggested that when designing an inhibitor targeting S2–S3 cooperative binding, a structure with the S2–Phe binding conformation such as 7jkv could be considered [56].

2.4. Other subsites

Other subsites are generally regarded as less specific to certain amino acids. The S3 subsite has no well-defined pocket [57] but can be characterized by the backbone of Glu166: the interactions between the Glu166 backbone and the P2/P3 amide carbonyl along with the P3/P4 amide hydrogen are conserved for physiological substrates [27,51,58]. The S1 subsite is a shallow pocket in proximity to the catalytic dyad surrounded by Thr25, Thr26, Leu27 and Gly143 (Figure 2(c); top left). The S2 subsite is a narrow but deep pocket composed of residues Thr26, Asn28, Tyr118, Asn119 and Gly143. While the S1 site exhibits specificity to small side groups such as Ser, Ala and Gly, crystallographic snapshots from the natural substrate-bound structures suggest that the P1 side chain is not fully sterically matched to the S1 pockets; the same observation applies to P2 and P3 [24,51]. The S4 and S5 subsites are exposed to the solvent and structurally less defined (Figure 2(c); bottom left). The S4 subsite is a deep hydrophobic pocket formed by the loop consisting of residues 185–192 and the side chains of Met165 and Leu167. Although the binding specificity is less pronounced for these subsites, they are important for lead optimization:modifications targeting S1 and S4 have been structurally evidenced to be effective [57,59–61].

2.5. Dimerization

The dimer interface includes the N-finger, α-helix 1 (residues 10 to 15), β-strand 9 (residues 121 to 129), residues 132–142 and α-helix 7 (Arg 298, Gln299) (Figure 3). Sequence conservation is particularly pronounced for the N-finger and loop residues 137–142 across all the coronavirus main proteases [11,62]. The tip of the N-finger is also a component of the S1 subsite.Mutagenesis studies [63] show that truncation of residues 1–3 results in a loss of enzyme activity by 24%, despite preservationof the dimer.However, once the truncation reaches conservedArg4, themain protease dissociates into amonomer form with little to no activity.The side chain ofArg4 reaches into the domain II–domain III interface of the other protomer and forms hydrogen bondswith Lys137∗ (highly conserved) and Glu290∗ (completely conserved in known coronaviruses). The importance of these interactions is confirmed by the complete loss of activity and dimerization in an E290Amutant [64]. Molecules with a domain III deletion also do not form dimers [65–67]. The highly conserved α-helix 1 is another key contributor to dimerization, as the hydrogen bonds formed by Ser10–Ser10∗, Gly11–Glu14∗ and Glu14–Gly11∗ act as an anchor to fix the Nfingers of the two monomers [63,65] and S10A, G11A [54] and E14A [68]mutants all form predominantly monomers with no activity. Gln299 is completely conserved among coronavirus main proteases and the side chain of Gln299 hydrogen bonds the main chain of Arg4, including those in the three monomeric structures [22,25]. Additional hydrogen-bonding residues Ser123–Arg298∗, Ser139–Gln299∗ and their inverses are also conserved [52,69]. Interestingly, S123A and S139A mutants lead to very limited dissociation and retain substantial enzyme activity [25,67,70], while the R298A and Q299A mutants result in an activity of ∼1% compared to the wild-type enzyme [22,70]. A Gln299 to Glu, Lys, or Asp mutation reduces enzyme activity by more than 90%, but R298K has no significant effect on the enzyme activity [67,70]. In a structure of the immature protease (PDB ID: 7kfi), where the dimerization was interfered by a N-terminal insertion, the intermolecular interactions Ser123–Arg298∗ and Ser139–Gln299∗ do not exist while the hydrogen bonds – either intermolecular or intramolecular – related to the five absolutely conserved residues Arg4, Ser10, Gly11, Glu14 and Gln299 are intact. In the immature dimer, the N-finger Ser1 hydrogen bonds to Phe140∗, but does not fully cap the S1 subsite, leading to an incomplete oxyanion hole. In the active assay referring to this immature structure, the protease shows only 6% of the enzyme activity compared to the mature wild-type enzyme [71]. In the active dimer, domain III of the two protomers depart fromeach other. One crystal structure of domain III alone (PDB ID: 3ebn) was reported to form a dimer with swapped subdomains [66], in which the α-helix 3 (residues 200-214) of one domain III protomer is embedded in the other four helices from the other protomer. The crystal structure of a monomeric G11A mutant (PDB ID: 2pwx) shows a conformation that domain III bends toward the side of the N-finger, and was considered as evidence that the dimerization is initiated through the domain III association followed by separation in the mature protein [54].However, it is hard to imagine that such a domain swap is energetically favoured in the formation of the active dimer. Further studies postulate instead that dimerization might not be initiated by domain III association but rather by two key anchor components, Arg4 and α-helix 1. Then the intermolecular interactions between α-helix 7 and the serine cluster (Ser123, Ser139) brings two protomers into contact and secures Ser1∗ into a complete S1 subsite [18,51,54].While domain III may not initiate the dimerization, compounds binding to domain III do interrupt the form of dimer [72]. Essentially, these allosteric inhibitors disrupting the dimerization should warrant more consideration as novel drugs.

Structural biology of SARS-CoV-1/SARS-CoV-2 mainprotease 3

Figure 3. Structure-based sequence representation of the SARS-CoV-2 main protease. α-helices, β- strands and N-finger are marked as red squares, navy squares and grey circles. The Cys145–His41 active dyad is marked as yellow diamonds. Key residues for dimerization are highlighted in blue. Dashed boxes indicate regions associated with the dimer interface. Residue markers are created using Protter [174]. Creator: Coronavirus Structural Task Force - Yunyun Gao, License: cc-by-sa

2.6. Structural variability and flexibility

There has already been an extensive analysis of the differences between the large number of main protease structures [42]. Structural alignments of available structures indicate that the global differences between structures determined at room temperature and at cryogenic temperatures as well as using different radiation sources – various synchrotrons, X-ray free-electron lasers and in-house diffractometers – are not significant, and that the overall configuration of the two N-terminus domains is largely structurally invariant [42]. However, some of the conformational space ofmolecules is revealedbydifferences between the different crystal forms: for example, one crystal form shows distinct reorientations for the C-terminal domain, but it is unknownwhether this is biologically relevant or merely a conformation induced by crystal packing [42]. Comparisons between ligand-bound and apo protein structures under comparable physical conditions have revealed significant deformations of the active site, suggesting an expansion of the binding site upon substrate acquisition [51,52], though there is also considerable variability within sets of apo and ligand-bound structures [42], suggesting inherent flexibility rather than purely substrate-induced conformational changes. Although a number of residues surrounding the binding site are observed to rearrange upon binding, two sites are identified as particularly flexible in molecular dynamics simulations: the P2 helix and the P5 loop [52]. The flexibility of these elements is supported by ensemble refinements against the crystallographic data [73] aswell as byanewapproach for extracting molecular flexibility directly fromexperimental B-factors [74]. The latter analysis also hints towards heterogeneous behaviour of an external loop (residues 62-80), which exhibits varying levels of disorder in different crystal forms especially for lower-resolution structures [74], though once more this may only be a crystal-induced conformation. The C-terminus of the protein is confirmed as flexible and partially disordered in structures [42,51]; however, its proximity to the binding site raises the question of what this flexibility could mean for the role of the C-terminus in substrate binding. It has been suggested that differences between structures at different temperatures may make the more physiologically relevant structures preferable for secondary applications such as molecular docking into the binding site [52], though some of the identified differences were due to simple modelling errors [42]. Another study collected a series of crystals at different temperatures and showed that though the average atomic positions of the protein may be consistent across temperatures, the extent of variability changes significantly as the temperature is increased from cryogenic to near-physiological [73]. Although there may be some question as to the precise nature of the changes between temperatures, all of these studies agree on the highly flexible/plastic nature of the main protease binding site cleft.

3. Main protease inhibitors as therapeutics

Viral protease inhibitors have been developed as the antiviral agents for many infamousviruses, such as the human immunodeficiency virus (HIV) and the hepatitis C virus[75–77]. TheCOVID-19 pandemic and the structural conservation ofmain protease acrossrelated viruses make it and its homologs one of the most extensively studied antiviral drugtargets.The most eminent coronavirus main protease inhibitors were peptidomimetic compoundstargeting SARS-CoV-1 [1,15,31,33]. The peptide chain first interacts specificallyand non-covalently with the enzyme at the active (sub-)sites, bringing a warhead intothe catalytic site. The warhead group then attacks Cys145, forming a covalent bond andblocking the active site irreversibly. The mechanism of a number of peptidomimetic warheadinhibitors has been directly observed in crystal structures.Warhead groups observedin crystal structures include chloromethyl ketone [15], unsaturated ester [19,28,60,78],epoxide [29,79], aldehyde [32,33,44,55,57,78,80], nitrile [35], α-acyloxymethylketone [81]and various Michael acceptors [59,82–84]. Almost all potent peptidomimetic warheadinhibitors contain a 2-pyrrolidone (often referred as γ -lactam)moiety in the P1 position asa substitute of glutamine in the natural substrate.Utilized P2moieties include hydrophobicsubstitutes such as leucine and phenylalanine side chains, and cyclopropyl and cyclohexylgroups can be similarly accommodated [55,83]. The crystal structures of the main proteasein complex with inhibitors show that 2-pyrrolidone can form hydrogen bonds withHis163, Glu166, and Phe140 in the S1 subsite. Besides the P1 position, systematicmodificationson other subsites – such as P1 [28], P2/P3 [55,57,83,85] and P4/P5 [44,60,86] – havealso been investigated in order to improve the binding affinity. In addition to the substrateanalogues, a few warhead inhibitors are identified by drug repurposing [87–89]. Amongthese, one of the most prominent compounds is GC376 [87,89–92], a preclinical inhibitoragainst the main proteins of feline infectious peritonitis coronavirus.GC376 adapts a bisulfitewarhead and consists of a 2-pyrrolidone, a leucine and a benzyl group in the P1, P2and P3 position, respectively (Figure 4(c)). A number of subsequent lead optimizationstudies, with the evidence of specific binding modes, demonstrated an improved effectivenessof GC376 analogs against main protease by modification of its chemical structure[55,57,84,93,94].The peptide warhead inhibitors also led to the design of non-peptide inhibitors withlow molecular weight. Two early crystal structures in complex with non-peptide warheadinhibitors bearing benzotriazole ester [31] and halomethyl ketone moieties [95]were reported. Such research is limited, potentially due to the fact that these non-peptideinhibitors of shorter lengths did not show a significant increase in affinity compared to thepeptidomimetic ones [37,96].Recently, however, large screening campaigns have identifieda number of highly potent non-peptide covalent inhibitors, including ebselen derivatives[97] and myricetin derivatives [98].The development of non-covalent inhibitors is another important approach to overcomethe potential off-target side effects and toxicity that accompanies the irreversible bindingof covalent inhibitors [40,99]. Despite extensive virtual screening campaigns describinginhibitors that could bind to the main protease [100], only a few hits were validatedby the crystal structures of their complexes [30,34,36,61,101–105]. Among them, ML-300 [36] (Figure 4(h)), ML-188 [101] (Figure 4(i)) and perampanel [106] (Figure 4(j)) have been frequently used as the lead compounds for further optimization [61,102,107]. The potential scaffolds identified by virtual screening were diverse, but the experimentally determined complexes mostly reported hydrogen-bonding interactions between a pyridinyl-like or a chlorophenyl-like moiety on the inhibitor and the His163 side chain of the S1 subsite; this interaction is therefore thought to be key to achieving binding affinity in a non-covalent inhibitor [61,108,109]. There are also efforts to convert the identified noncovalent lead compound to more potent covalent analogues by combining the identified chemical features and a reactive warhead [110,111]. To date, the most potent non-covalent inhibitor with a known binding mode is s-217622 (Figure 4(k)), which was a structure-based optimization over a pharmacophore hit from a combination of virtual and biological screening [112]. s-217622, later named ensitrelvir, shows ideal drug metabolism and pharmacokinetic profiles and has a confirmed antiviral efficiency in phase 2a trials [113]. In addition to structure-based drug design efforts, the broad SARS-CoV-2 drug repurposing campaigns also generated attractive drug candidates against main protease. Multiple crystal structures of main protease in complex with different drugs – baicalein [114], carmofur [115], shikonin [116], leupeptin [5,117], narlaprevir [117,118], boceprevir [89,117,119,120], telaprevir [117,119,120], masitinib [121] and myricetin [98] – have been reported. Among these compounds, baicalein, shikonin and leupeptin are reported as reversible inhibitors since the binding modes involve only hydrogen-bonding interactions to the S1 and S2 subsites. Carmofur, an approved antineoplastic drug [122], binds covalently to the catalytic Cys145 residue and occupies the S2 subsite with the alkyl chain of the hexylcarbamoyl group.However, a study on the enzymatic assay and the binding assaywith the addition of the reducing reagent dithiothreitol suggest that shikonin and carmofur are in fact non-specificmain protease inhibitors [123]. The identification ofmyricetin provides pyrogallol as a warhead in the design of new inhibitors [98]. Narlaprevir, boceprevir and telaprevir are approved serine protease inhibitors against hepatitis C virus infection and are peptidomimetic warhead inhibitors, bearing α-ketoamide warheads but with different hydrophobic substituents. Narlaprevir, boceprevir and telaprevir complexes show similar conformational changes in the active subsites as observed in the other peptidomimetic inhibitors [51,117], suggesting oncemore that the active pocket is intrinsically flexible and able to adapt to different chemical groups by alternating the conformations of the subsites. Narlaprevir, boceprevir and telaprevir also showa certain binding specificity against SARSCoV-2 main protease [100,123,124], which makes them promising lead compounds for further structure-based optimization. Interestingly, advanced optimization based on variable warheads and the backbone structures of the three serine protease inhibitors and their binding modes have also been explored and yield a number of highly potent inhibitors, such as MI-23, MI-09 and MI-30 [119] as well as UAWJ9-36-1 and UAWJ9-36-3 [125] (Figure 4(e)). PF-07321332 (PF-332) [126–128] (Figure 4(f)), a novel reversible covalent peptidelike main protease inhibitor by Pfizer (nirmatrelvir; the oral antiviral drug Paxlovid is a tablet of nirmatrelvir+ritonavir), has confirmed efficacy from phase 2/3 trials [69]. Compared to the initial candidate PF-00835231 [59], several notable features were introduced: the nitrile warhead is selected based on the considerations of bioavailability and scale-up synthesis; the cyclic P2 moiety from boceprivir/MI-09 [119] was adapted to not only better fit the S2 subsite but also to improve the passive permeability by removing the hydrogen bond donor from the P2/P3 amide; and a unique capping group, a P3 tertbutyl group on the solvent susceptible side and a trifluoroacetamide occupying the S4 pocket, is designed for better antiviral activity as well as metabolic stability. The optimization from PF-00835231 to PF-07321332 indeed resembles the advanced example on how structural biology, pharmacokinetics and scale-up consideration interact with each other in the process of rational drug development. Tables 1–3 categorize PDB entries of SARS-CoV-1/SARS-CoV-2 main protease structures in complex with peptidomimetic warhead inhibitors, covalent non-peptide inhibitors and noncovalent inhibitors, respectively.

Structural biology of SARS-CoV-1/SARS-CoV-2 mainprotease 4

Figure 4. Compounds bound to the active site of SARS-CoV-2; a-b, d-f: peptidomimetic warhead inhibitors, c, g: repurposed drugs, h-k: non-covalent inhibitors, l: physiological substrate. (a) 13b is an α-ketoamide peptidomimetic inhibitor. The P3-P2 amide is replaced by a pyridone ring to increase metabolic stability. (b) 11b is an aldehyde peptidomimetic inhibitor. The indole group at P3 improves metabolic stability and forms an extra hydrogen bond with Glu166. (c) GC376, with a bisulfite warhead, was identified through drug repurposing and is an important lead compound. It has good S1 and S2 interactions but less optimal S3 to S5 capping. (d) MPI8 is a highly potent aldehyde inhibitor. S3 to S5 subsites are effectively capped by the hydrophilic groups at P3 and P4. (e) UAWJ9-36-3 was designed through a hybrid approach. The P1 and P3 groups are adapted from GC376 (c), whereas the P2 and P2/P3 amide are altered to a 6,6-dimethyl-3-azabicyclo[3,0,1]hexane as in boceprivir. (f) PF-07321332 is a nitrile peptidomimetic inhibitor and an approved drug. The P1 and P2 moieties are the same as the ones in UAWJ9-36-3 (e). The trifluoroacetamide effectively caps the S4 pocket. Together with the tert-butyl, the capping group achieves an improved S3 interaction. (g) Telaprevir is an example of α- ketoamide-bearing serine inhibitors identified through drug repurposing. (h, i) ML300 and ML188 are non-covalent inhibitors identified through virtual screening. (j) compound 26 is a highly potent noncovalent inhibitor being optimized from perampanel. (k) s-217622 is a highly potent non-covalent oral drug candidate under clinical trials. The catalytic pole His41 is flipped away from Cys145 as a result of the π–π stacking between the trifluorobenzylic moiety and His41 and the hydrogen bonds mediated by a water molecule. (l) An acyl-enzyme intermediate structure with the C-terminal autoprocessing site bound and Gln306 covalently bound to Cys145. The P5 to P1 residues are shown. The S1, S1, S2, S3 and S4 subsites are highlighted. The arrows indicate the His41–Cys145 catalytic dyad. Molecular surface representations were rendered using Protein Imager [173]. Creator: Coronavirus Structural Task Force - Yunyun Gao, License: cc-by-sa

4. Summary

The overall conformation of the main protease is considered stable; however, the binding site is highly flexible. The important role of the main protease in the viral life cycle and its structural conservation makes it one of the primary targets for antiviral drug development. The structural optimization of peptidomimetic warhead inhibitors as well as high-throughput screens using large compound libraries have generated valuable information for drug development against COVID-19. Crystal structures of the complexes with potent inhibitors have revealed variable binding modes and even mechanistic features, providing crucial insights for drug discovery. However, the inherent flexibility of the binding sites may lead to the emergence of promiscuous inhibitors requiring rigorous further inspection, as do drug candidates interfering with the dimerization interface between the two monomers.

5. Discussion and outlook

During the on-going pandemic, large numbers of sarbecovirus main protease crystal structures have been determined and published, in both apo and intermediate forms, aswell as in complexes with various compounds.While the number of available structures is large, few newstudies focus on revealing the enzymemechanism. It iswell accepted that theN-finger, consisting of only a few key residues, is essential to form the active quaternary structure. However, discussions on the underlying mechanistic details are inconsistent [46,65], and it remains unclear how the maturation of themain protease proceeds (i.e. the dimerization mechanism) due to a lack of atomic resolution information on the key intermediate states. Considering that only the matured dimer shows enzymatic activity, more attention should be given to capture these intermediate states. Such a study could be incredibly beneficial for exploring new drug design strategies. While structural biology can be an excellent tool for drug development, caution should be exercised in interpreting the pharmacology of the binding compounds. Some noncovalent inhibitors with distinct binding modes can be non-specific binders [123]. While a reactive group alone is able to form a complex with the enzyme, the binding affinity would still be significantly lower than the natural substrate in the absence of the peptide chain [158], causing a null inhibitory effect or lack of antiviral effect [62]. It is clear that, in addition to biochemical activity, in vivo drug metabolism and pharmacokinetic profiles such as antiviral activity, absorptive permeability, metabolic stability as well as inhibition reversibility should also be taken into consideration in structure-based SARS-CoV-2 drug design. To date, the continuous endeavour to structure-based drug design has inspired two antiviral drug candidates, the reversible covalent inhibitor PF-07321332 and the noncovalent inhibitor s-217622. For both discoveries, drug metabolism and pharmacokinetic profiles are addressed along with the binding affinity. Current studies of the SARS-CoV-2 main protease of prevalent variants of concern, namely mutants K90R (Alpha, Beta, Gamma), G15S (Lambda) and P132H (Omicron), suggest that mutations lead to almost identical backbone and active site conformations [150]. The in vivo antiviral activity of PF-07321332 remains the same for these different variants [134,150,159]. Other prevalent mutations, T21I (Beta), L89F (Beta), L205V (Zeta) also remain susceptible to PF-07321332 in biochemical assays [69]. Unlike the spike protein, where mutations lead to resistance, this treatment is therefore unaffected by currently-observed prevalent mutations, making continued targeting of the main protease a core antiviral therapy against COVID-19. Should resistance arise, led by the use of antiviral monotherapies [160], we may need to explore alternative approaches such as targeting the dimerization interface or directly interfering with the dimerization process. However, very limited results have been reported on dimer-interfering inhibitors, and only a few compounds binding to the dimerization interface were shown by crystallographic fragment screening campaigns [4,5]. Dimerization inhibitors targeting many viral enzymes, including HIV protease, have been reported [161–163], so it is nonetheless likely that inhibitors of main protease inspired by such design strategies will eventually become available. So many models and crystallographic data sets of the same protein, solved in the same relatively short time period, also provide an unique opportunity for computational method developers. The crystallographic data sets can serve as a consistent resource for developing new or improving current crystallography pipelines and methodologies. For this, the dedicated databases aiming to validate and re-refine the deposited coronavirus protein models are of great value, such as covid-19.bioreproducibility.org [164] and the Coronavirus Structural Task Force [165]. Moreover, the structural stability and robust crystallizability of the main protease suggests it can be used as a model system for experimental crystallographic method development as well. Potential use cases include testing protocols of high-throughput drug screening by serial femtosecond X-ray crystallography [166], accessing new screening technologies [167–169] or new experimental compound optimization strategies [170], evaluating the performance of a new fluorescent probes/assays [171,172] and validating the sensitivity of computational inhibitor design pipelines [111].

This blog post was published in Crystallography Reviews, please cite: https://doi.org/10.1080/0889311X.2023.2222275

Acknowledgements

We thank RosemaryWilson for proofreading. All figures are courtesy of the Coronavirus Structural Task Force (insidecorona.net) which retains copyright for both the text and the figures.

References

[1] Anand K, Ziebuhr J, Wadhwani P, et al. Coronavirus main proteinase (3CLpro) structur basis for design of anti-SARS drugs. Science. 2003;300:1763–1767.
[2] Fehr AR, Perlman S. Coronaviruses: an overview of their replication and pathogenesis. Methods Mol Biol. 2015;1282:1–23.
[3] Ziebuhr J, Snijder EJ, Gorbalenya AE. Virus-encoded proteinases and proteolytic processing in the Nidovirales. J Gen Virol. 2000;81:853–879.
[4] Douangamath A, Fearon D, Gehrtz P, et al. Crystallographic and electrophilic fragment screening of the SARS-CoV-2 main protease. Nat Commun. 2020;11:5047.
[5] Günther S, Reinke PYA, Fernández-García Y, et al. X-ray screening identifies active site and allosteric inhibitors of SARS-CoV-2 main protease. Science. 2021;372:642–646.
[6] Matthews DA, Smith WW, Ferre RA, et al. Structure of human rhinovirus 3C protease reveals a trypsin-like polypeptide fold, RNA-binding site, and means for cleaving precursor polyprotein. Cell. 1994;77:761–771.
[7] Mosimann SC, Cherney MM, Sia S, et al. Refined X-ray crystallographic structure of the poliovirus 3C gene product. J Mol Biol. 1997;273:1032–1047.
[8] Hegyi A, Ziebuhr J.Conservation of substrate specificities among coronavirus main proteases.J Gen Virol. 2002;83:595–599.
[9] Hegyi A, Friebe A, Gorbalenya AE, et al. Mutational analysis of the active centre of coronavirus 3C-like proteases. J Gen Virol. 2002;83:581–593.
[10] Ziebuhr J, Heusipp G, Siddell SG. Biosynthesis,: purification, and characterization of the human coronavirus 229E 3C-like proteinase. J Virol. 1997;71:3992–3997.
[11] Anand K, Palm GJ, Mesters JR, et al. Structure of coronavirus main proteinase reveals combination of a chymotrypsin fold with an extra alpha-helical domain. EMBO J. 2002;21:3213–3224.
[12] Drosten C, Günther S, Preiser W, et al. Identification of a novel coronavirus in patients with severe acute respiratory syndrome. N Engl J Med. 2003;348:1967–1976.
[13] Ksiazek TG, ErdmanD, Goldsmith CS, et al. A novel coronavirus associated with severe acute respiratory syndrome. N Engl J Med. 2003;348:1953–1966.
[14] Ziebuhr J. Molecular biology of severe acute respiratory syndrome coronavirus. Curr Opin Microbiol. 2004;7:412–419.
[15] Yang H, Yang M, Ding Y, et al. The crystal structures of severe acute respiratory syndrome virus main protease and its complex with an inhibitor. Proc Natl Acad Sci USA. 2003;100:13190–13195.
[16] Jin Z, Du X, Xu Y, et al. Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors. Nature. 2020;582:289–293.
[17] Qin E, ZhuQ,Yu M, et al.Acomplete sequence and comparative analysis of a SARS-associated virus (Isolate BJ01). Chin Sci Bull. 2003;48:941–948.
[18] Hsu M-F, Kuo C-J, Chang K-T, et al.Mechanism of the maturation process of SARS-CoV 3CL protease. J Biol Chem. 2005;280:31257–31266.
[19] Ghosh AK, Xi K, Ratia K, et al. Design and synthesis of peptidomimetic severe acute respiratory syndrome chymotrypsin-like protease inhibitors. J Med Chem. 2005;48:6767–6771.
[20] Pyrc K, Berkhout B, van der Hoek L. The novel human coronaviruses NL63 and HKU1. J Virol. 2007;81:3051–3057.
[21] Zaki AM, van Boheemen S, Bestebroer TM, et al. Isolation of a novel coronavirus fromaman with pneumonia in Saudi Arabia. N Engl J Med. 2012;367:1814–1820.
[22] Shi J, Sivaraman J, Song J. Mechanism for controlling the dimer-monomer switch and coupling dimerization to catalysis of the severe acute respiratory syndrome coronavirus 3C-like protease. J Virol. 2008;82:4620–4629.
[23] Tan J, Verschueren KHG, Anand K, et al. pH-dependent conformational flexibility of the SARS-CoV main proteinase (M(pro)) dimer: molecular dynamics simulations and multiple X-ray structure analyses. J Mol Biol. 2005;354:25–40.
[24] Xue X, Yu H, Yang H, et al. Structures of two coronavirus main proteases: implications for substrate binding and antiviral drug design. J Virol. 2008;82:2515–2527.
[25] Hu T, Zhang Y, Li L, et al. Two adjacent mutations on the dimer interface of SARS coronavirus 3C-like protease cause different conformational changes in crystal structure. Virology.2009;388:324–334.
[26] Barrila J, Gabelli SB, Bacha U, et al. Mutation of Asn28 disrupts the dimerization and enzymatic activity of SARS 3CL(pro). Biochemistry. 2010;49:4308–4317.
[27] Muramatsu T, Takemoto C, Kim Y-T, et al. SARS-CoV 3CL protease cleaves its Cterminal autoprocessing site by novel subsite cooperativity. Proc Natl Acad Sci USA.2016;113:12997–13002.
[28] Yang H, XieW, Xue X, et al. Design of wide-spectrum inhibitors targeting coronavirus main proteases. PLoS Biol. 2005;3:e324.
[29] Lee T-W, Cherney MM, Huitema C, et al. Crystal structures of the main peptidase from the SARS coronavirus inhibited by a substrate-like aza-peptide epoxide. J Mol Biol.2005;353:1137–1151.
[30] Lu I-L, Mahindroo N, Liang P-H, et al. Structure-based drug design and structural biology study of novel nonpeptide inhibitors of severe acute respiratory syndrome coronavirus main protease. J Med Chem. 2006;49:5154–5161.
[31] Verschueren KHG, Pumpor K, Anemüller S, et al. A structural view of the inactivation of the SARS coronavirus main proteinase by benzotriazole esters. Chem Biol. 2008;15:597–606.
[32] Akaji K, Konno H, Mitsui H, et al. Structure-based design, synthesis, and evaluation of peptide-mimetic SARS 3CL protease inhibitors.
J Med Chem. 2011;54:7962–7973.
[33] Zhu L, George S, Schmidt MF, et al. Peptide aldehyde inhibitors challenge the substrate specificity of the SARS-coronavirus main protease. Antiviral Res. 2011;92:204–212.
[34] Jacobs J,Grum-TokarsV, ZhouY, et al.Discovery, synthesis, and structure-based optimization of a series of N-(tert-butyl)-2-(N-arylamido)-2-(pyridin-3-yl) acetamides (ML188) as potent noncovalent small molecule inhibitors of the severe acute respiratory syndrome coronavirus (SARS-CoV) 3CL protease. J Med Chem. 2013;56:534–546.
[35] Chuck C-P, Chen C, Ke Z, et al. Design, synthesis and crystallographic analysis of nitrilebased broad-spectrum peptidomimetic inhibitors for coronavirus 3C-like proteases. Eur J Med Chem. 2013;59:1–6.
[36] Turlington M, Chun A, Tomar S, et al. Discovery of N-(benzo[1,2,3]triazol-1-yl)-N-(benzyl)acetamido)phenyl) carboxamides as severe acute respiratory syndrome coronavirus (SARS-CoV) 3CLpro inhibitors: identification of ML300 and noncovalent nanomolar inhibitors with an induced-fit binding. Bioorg Med Chem Lett. 2013;23:6172–6177.
[37] Shimamoto Y, Hattori Y, Kobayashi K, et al. Fused-ring structure of decahydroisoquinolin as a novel scaffold for SARS 3CL protease inhibitors. Bioorg Med Chem. 2015;23:876–890.
[38] Wu F, Zhao S, Yu B, et al. A new coronavirus associated with human respiratory disease in China. Nature. 2020;579:265–269.
[39] Roe MK, Junod NA, Young AR, et al. Targeting novel structural and functional features of coronavirus protease nsp5 (3CLpro, Mpro) in the age of COVID-19. J Gen Virol. 2021;102:001558.
[40] Luttens A, Gullberg H, Abdurakhmanov E, et al. Ultralarge virtual screening identifies SARSCoV-2 main protease inhibitors with broad-spectrum activity against coronaviruses. J Am Chem Soc. 2022;144:2905–2920.
[41] Antonopoulou I, Sapountzaki E, Rova U, et al. Inhibition of themain protease of SARS-CoV-2 (Mpro) by repurposing/designing drug-like substances and utilizing nature’s toolbox of bioactive compounds. Comput Struct Biotechnol J. 2022;20:1306–1344.
[42] Jaskolski M, Dauter Z, Shabalin IG, et al. Crystallographic models of SARS-CoV-2 3CLpro: in-depth assessment of structure quality and validation. IUCrJ. 2021;8:238–256.
[43] Tekpinar M, Yildirim A. Impact of dimerization and N3 binding on molecular dynamics of SARS-CoV and SARS-CoV-2 main proteases. J Biomol Struct Dyn. 2022;40:6243–6254.
[44] Wang H,He S, DengW, et al. Comprehensive insights into the catalytic mechanism of Middle East respiratory syndrome 3C-like protease and severe acute respiratory syndrome 3C-like
protease.
ACS Catal. 2020;10:5871–5890.
[45] Komatsu TS, Okimoto N, Koyama YM, et al. Drug binding dynamics of the dimeric SARSCoV-2main protease, determined bymolecular dynamics simulation. Sci Rep. 2020;10:16986.
[46] Xiong M, Su H, Zhao W, et al. What coronavirus 3C-like protease tells us: from structure, substrate selectivity, to inhibitor design. Med Res Rev. 2021;41:1965–1998.
[47] Huang C, Wei P, Fan K, et al. 3C-like proteinase from SARS coronavirus catalyzes substrate hydrolysis by a general base mechanism. Biochemistry. 2004;43:4568–4574.
[48] Paasche A, Zipper A, Schäfer S, et al. Evidence for substrate binding-induced zwitterion formation in the catalytic Cys-His dyad of the SARS-CoV main protease. Biochemistry.2014;53:5930–5946.
[49] Kneller DW, Phillips G, Weiss KL, et al. Unusual zwitterionic catalytic site of SARS-CoV-2 main protease revealed by neutron crystallography. J Biol Chem. 2020;295:17365–17373.
[50] Kneller DW, Phillips G, O’Neill HM, et al. Room-temperature X-ray crystallography reveals the oxidation and reactivity of cysteine residues in SARS-CoV-2 3CL Mpro: insights into enzyme mechanism and drug design. IUCrJ. 2020;7:1028–1035.
[51] Lee J, Worrall LJ, Vuckovic M, et al. Crystallographic structure of wild-type SARS-CoV-2 main protease acyl-enzyme intermediate with physiological C-terminal autoprocessing site. Nat Commun. 2020;11:5877.
[52] Kneller DW, Phillips G, O’Neill HM, et al. Structural plasticity of SARS-CoV-2 3CL Mpro active site cavity revealed by room temperature X-ray crystallography. Nat Commun. 2020;11:3202.
[53] Ménard R, Storer AC.Oxyanion hole interactions in serine and cysteine proteases. Biol Chem Hoppe-Seyler. 1992;373:393–400.
[54] Chen S, Hu T, Zhang J, et al. Mutation of Gly-11 on the dimer interface results in the complete crystallographic dimer dissociation of severe acute respiratory syndrome coronavirus 3C-like protease: crystal structure with molecular dynamics simulations. J Biol Chem. 2008;283:554–564.
[55] DaiW, Zhang B, JiangX-M, et al. Structure-based design of antiviral drug candidates targeting the SARS-CoV-2 main protease. Science. 2020;368:1331–1335.
[56] Hattori S-I, Higashi-Kuwata N, HayashiH, et al. A small molecule compound with an indole moiety inhibits themain protease of SARS-CoV-2 and blocks virus replication. Nat Commun.2021;12:668.
[57] Yang KS, Ma XR, Ma Y, et al. A quick route to multiple highly potent SARS-CoV-2 main protease inhibitors. ChemMedChem. 2021;16:942–948.
[58] MacDonald EA, Frey G, NamchukMN, et al. Recognition of divergent viral substrates by the SARS-CoV-2 main protease. ACS Infect Dis. 2021;7:2591–2595.
[59] Hoffman RL, Kania RS, Brothers MA, et al. Discovery of ketone-based covalent inhibitors
of coronavirus 3CL proteases for the potential therapeutic treatment of COVID-19.
J Med Chem. 2020;63:12725–12747.
[60] Ghosh AK, Xi K, Grum-Tokars V, et al. Structure-based design, synthesis, and biological evaluation of peptidomimetic SARS-CoV 3CLpro inhibitors. Bioorg Med Chem Lett. 2007;17:5876–5880.
[61] Zhang C-H, Stone EA, Deshmukh M, et al. Potent noncovalent inhibitors of the main protease of SARS-CoV-2 frommolecular sculpting of the drug perampanel guided by free energy
perturbation calculations.
ACS Cent Sci. 2021;7:467–475.
[62] Zhang L, Lin D, Kusov Y, et al. α-Ketoamides as broad-spectrum inhibitors of coronavirus
and enterovirus replication: structure-based design, synthesis, and activity assessment.
JMed Chem. 2020;63:4562–4578.
[63] HsuW-C,Chang H-C, Chou C-Y, et al. Critical assessment of important regions in the subunit association and catalytic action of the severe acute respiratory syndrome coronavirus main protease. J Biol Chem. 2005;280:22741–22748.
[64] Chou C-Y, Chang H-C, HsuW-C, et al. Quaternary structure of the severe acute respiratory syndrome (SARS) coronavirus main protease. Biochemistry. 2004;43:14958–14970.
[65] Zhong N, Zhang S, Zou P, et al. Without itsN-finger, the main protease of severe acute respiratory syndrome coronavirus can forma novel dimer through its C-terminal domain. J Virol.2008;82:4227–4234.
[66] Zhong N, Zhang S, Xue F, et al. C-terminal domain of SARS-CoV main protease can form a 3D domain-swapped dimer. Protein Sci. 2009;18:839–844.
[67] Chen S, Zhang J, Hu T, et al. Residues on the dimer interface of SARS coronavirus 3C-like protease: dimer stability characterization and enzyme catalytic activity analysis. J Biochem. 2008;143:525–536.
[68] Shan Y-F, Li S-F, Xu G-J. A novel auto-cleavage assay for studying mutational effects on the active site of severe acute respiratory syndrome coronavirus 3C-like protease. Biochem Biophys Res Commun. 2004;324:579–583.
[69] Ullrich S, Ekanayake KB, Otting G, et al. Main protease mutants of SARS-CoV-2 variants remain susceptible to nirmatrelvir. Bioorg Med Chem Lett. 2022;62:128629.
[70] Lin P-Y, Chou C-Y, Chang H-C, et al. Correlation between dissociation and catalysis of SARSCoV main protease. Arch Biochem Biophys. 2008;472:34–42.
[71] Noske GD, Nakamura AM, Gawriljuk VO, et al. A crystallographic snapshot of SARS-CoV-2 main protease maturation process. J Mol Biol. 2021;433:167118.
[72] Sun Z, Wang L, Li X, et al. An extended conformation of SARS-CoV-2main protease reveals allosteric targets. Proc Natl Acad Sci USA. 2022;119:e2120913119.
[73] Ebrahim A, Riley BT, Kumaran D, et al. The temperature-dependent conformational ensemble of SARS-CoV-2 main protease (M pro). BioRxiv. 2021.
[74] Pearce NM, Gros P. A method for intuitively extracting macromolecular dynamics from structural disorder. Nat Commun. 2021;12:5493.
[75] Ghosh AK, Osswald HL, Prato G. Recent progress in the development of HIV-1 protease inhibitors for the treatment of HIV/AIDS. J Med Chem. 2016;59:5172–5208.
[76] de LeuwP, StephanC. Protease inhibitors for the treatment ofhepatitisCvirus infection. GMS Infect Dis. 2017;5:Doc08.
[77] Zephyr J, Kurt Yilmaz N, Schiffer CA. Viral proteases: structure, mechanism and inhibition. Enzymes. 2021;50:301–333.
[78] Lee C-C, Kuo C-J, Ko T-P, et al. Structural basis of inhibition specificities of 3C and 3C-like proteases by zinc-coordinating and peptidomimetic compounds. J Biol Chem.
2009;284:7646–7655.
[79] Lee T-W, Cherney MM, Liu J, et al. Crystal structures reveal an induced-fit binding of a substrate-like Aza-peptide epoxide to SARS coronavirus main peptidase. J Mol Biol. 2007;366:916–932.
[80] Yang S, Chen S-J, Hsu M-F, et al. Synthesis, crystal structure, structure-activity relationships, and antiviral activity of a potent SARS coronavirus 3CL protease inhibitor. J Med Chem. 2006;49:4971–4980.
[81] Bai B, Belovodskiy A, Hena M, et al. Peptidomimetic α-acyloxymethylketone warheads with six-membered lactam P1 glutamine mimic: SARS-CoV-2 3CL protease inhibition, coronavirus antiviral activity, and in vitro biological stability. J Med Chem. 2022;65:2905–2925.
[82] Yin J, NiuC, CherneyMM, et al. Amechanistic viewof enzyme inhibitionandpeptide hydrolysis in the active site of the SARS-CoV3C-like peptidase. J Mol Biol. 2007;371:1060–1074.
[83] Zhang L, Lin D, Sun X, et al. Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved α-ketoamide inhibitors. Science. 2020;368:409–412.
[84] Rathnayake AD, Zheng J, Kim Y, et al. 3C-like protease inhibitors block coronavirus replication in vitro and improve survival in MERS-CoV-infected mice. Sci Transl Med. 2020;12:eabc5332.
[85] Dampalla CS, Kim Y, Bickmeier N, et al. Structure-guided design of conformationally constrained cyclohexane inhibitors of severe acute respiratory syndrome coronavirus-2 3CL protease. J Med Chem. 2021;64:10047–10058.
[86] Dampalla CS, Rathnayake AD, Perera KD, et al. Structure-guided design of potent inhibitors of SARS-CoV-2 3CL protease: structural, biochemical, and cell-based studies. J Med Chem. 2021;64:17846–17865.
[87] Vuong W, Khan MB, Fischer C, et al. Feline coronavirus drug inhibits the main protease of SARS-CoV-2 and blocks virus replication. Nat Commun. 2020;11:4282.
[88] Sacco MD, Ma C, Lagarias P, et al. Structure and inhibition of the SARS-CoV-2 main protease reveal strategy for developing dual inhibitors against Mpro and cathepsin L. Sci Adv. 2020;6:eabe0751.
[89] Fu L, Ye F, Feng Y, et al. Both Boceprevir and GC376 efficaciously inhibit SARS-CoV-2 by targeting its main protease. Nat Commun. 2020;11:4417.
[90] Ma C, Sacco MD, Hurst B, et al. Boceprevir, GC-376, and calpain inhibitors II, XII inhibit SARS-CoV-2 viral replication by targeting the viralmain protease. Cell Res. 2020;30:678–692.
[91] Arutyunova E, Khan MB, Fischer C, et al. N-Terminal finger stabilizes the S1 pocket for the reversible feline drug GC376 in the SARS-CoV-2 Mpro dimer. J Mol Biol. 2021;433:167003.
[92] Shi Y, Shuai L,Wen Z, et al. The preclinical inhibitor GS441524 in combination with GC376 efficaciously inhibited the proliferation of SARS-CoV-2 in themouse respiratory tract. Emerg Microbes Infect. 2021;10:481–492.
[93] VuongW, FischerC, KhanMB, et al. Improved SARS-CoV-2Mpro inhibitors based on feline antiviral drug GC376: structural enhancements, increased solubility, and micellar studies. Eur J Med Chem. 2021;222:113584.
[94] Liu H, Iketani S, Zask A, et al. Development of optimized drug-like smallmolecule inhibitors of the SARS-CoV-2 3CL protease for treatment of COVID-19. Nat Commun. 2022;13:1891.
[95] Bacha U, Barrila J, Gabelli SB, et al. Development of broad-spectrum halomethyl ketone inhibitors against coronavirus main protease 3CL(pro). Chem Biol Drug Des. 2008;72:34–49.
[96] Ullrich S, Sasi VM, MahawaththaMC, et al. Challenges of short substrate analogues as SARSCoV-2 main protease inhibitors. BioorgMed Chem Lett. 2021: 128333.
[97] Amporndanai K, Meng X, Shang W, et al. Inhibition mechanism of SARS-CoV-2 main protease by ebselen and its derivatives. Nat Commun. 2021;12:3061.
[98] Su H, Yao S, Zhao W, et al. Identification of pyrogallol as a warhead in design of covalent inhibitors for the SARS-CoV-2 3CL protease. Nat Commun. 2021;12:3623.
[99] Kitamura N, Sacco MD, Ma C, et al. Expedited approach toward the rational design of noncovalent SARS-CoV-2 main protease inhibitors. J Med Chem. 2022;65:2848–2865.
[100] Mslati H, Gentile F, Perez C, et al. Comprehensive consensus analysis of SARS-CoV-2 drug repurposing campaigns. J Chem Inf Model. 2021;61:3771–3788.
[101] Lockbaum GJ, Reyes AC, Lee JM, et al. Crystal structure of SARS-CoV-2 main protease in complex with the non-covalent inhibitor ML188. Viruses. 2021;13.
[102] Deshmukh MG, Ippolito JA, Zhang C-H, et al. Structure-guided design of a perampanelderived pharmacophore targeting the SARS-CoV-2 main protease. Structure. 2021;29:
823–833.e5.
[103] Clyde A, Galanie S, Kneller DW, et al. High-throughput virtual screening and validation of a SARS-CoV-2 main protease noncovalent inhibitor. J Chem Inf Model. 2022;62:116–128.
[104] RedheadMA, Owen CD, Brewitz L, et al. Bispecific repurposed medicines targeting the viral and immunological arms of COVID-19. Sci Rep. 2021;11:13208.
[105] Iketani S, Forouhar F, Liu H, et al. Lead compounds for the development of SARS-CoV-2 3CL protease inhibitors. Nat Commun. 2021;12:2016.
[106] Gimeno A, Mestres-Truyol J, Ojeda-Montes MJ, et al. Prediction of novel inhibitors of the main protease (M-pro) of SARS-CoV-2 through consensus docking and drug reposition. Int J Mol Sci. 2020;21:3793.
[107] Han SH, Goins CM, Arya T, et al. Structure-based optimization of ML300-derived, noncovalent inhibitors targeting the severe acute respiratory syndrome coronavirus 3CL protease (SARS-CoV-2 3CLpro). J Med Chem. 2022;65:2880–2904.
[108] Llanos MA, Gantner ME, Rodriguez S, et al. Strengths and weaknesses of docking simulations in the SARS-CoV-2 era: the main protease (Mpro) case study. J Chem Inf Model. 2021;61:3758–3770.
[109] Tanaka S, Tokutomi S, Hatada R, et al. Dynamic cooperativity of ligand-residue interactions evaluated with the fragment molecular orbital method. J Phys Chem B. 2021;125:6501–6512.
[110] Stille JK, Tjutrins J, Wang G, et al. Design, synthesis and in vitro evaluation of novel SARSCoV-2 3CLpro covalent inhibitors. Eur J Med Chem. 2022;229:114046.
[111] Zaidman D, Gehrtz P, Filep M, et al. An automatic pipeline for the design of irreversible derivatives identifies a potent SARS-CoV-2 Mpro inhibitor. Cell Chem Biol. 2021;28:1795–1806.e5.
[112] Unoh Y, Uehara S, Nakahara K, et al. Discovery of S-217622, a noncovalent oral SARSCoV-2 3CL protease inhibitor clinical candidate for treating COVID-19. J Med Chem. 2022;65:6499–6512.
[113] Shionogi Co., Ltd. New Data for Shionogi’s COVID-19 Once-Daily Oral Antiviral S-217622 Show Rapid Virus Clearance. News. Shionogi Co., Ltd. [Internet]. [cited 2022 May 3].
Available from: https://www.shionogi.com/global/en/news/2022/04/20220424.html.
[114] Su H-X, Yao S, Zhao W-F, et al. Anti-SARS-CoV-2 activities in vitro of Shuanghuanglian preparations and bioactive ingredients. Acta Pharmacol Sin. 2020;41:1167–1177.
[115] Jin Z, Zhao Y, Sun Y, et al. Structural basis for the inhibition of SARS-CoV-2 main protease by antineoplastic drug carmofur. Nat Struct Mol Biol. 2020;27:529–532.
[116] Li J, Zhou X, Zhang Y, et al. Crystal structure of SARS-CoV-2 main protease in complex with the natural product inhibitor shikonin illuminates a unique binding mode. Sci Bull (Beijing).2021;66:661–663.
[117] Kneller DW, Galanie S, Phillips G, et al. Malleability of the SARS-CoV-2 3CLMpro active-site cavity facilitates binding of clinical antivirals. Structure. 2020;28:1313–1320.e3.
[118] Bai Y, Ye F, Feng Y, et al. Structural basis for the inhibition of the SARS-CoV-2 main protease by the anti-HCV drug narlaprevir. Signal Transduct Target Ther. 2021;6:51.
[119] Qiao J, Li Y-S, Zeng R, et al. SARS-CoV-2 Mpro inhibitors with antiviral activity in a transgenic mouse model. Science. 2021;371:1374–1378.
[120] Oerlemans R, Ruiz-Moreno AJ, Cong Y, et al. Repurposing the HCV NS3-4A protease drug boceprevir as COVID-19 therapeutics. RSC Med Chem. 2020;12:370–379.
[121] Drayman N, DeMarco JK, Jones KA, et al. Masitinib is a broad coronavirus 3CL inhibitor that blocks replication of SARS-CoV-2. Science. 2021;373:931–936.
[122] Dementiev A, Joachimiak A, Nguyen H, et al. Molecular mechanism of inhibition of acid ceramidase by carmofur. J Med Chem. 2019;62:987–992.
[123] Ma C, Hu Y, Townsend JA, et al. Ebselen, disulfiram, carmofur, PX-12, tideglusib, and shikonin Are nonspecific promiscuous SARS-CoV-2 main protease inhibitors. ACS Pharmacol Transl Sci. 2020;3:1265–1277.
[124] Baker JD, Uhrich RL, Kraemer GC, et al. A drug repurposing screen identifies hepatitis C antivirals as inhibitors of the SARS-CoV2 main protease. PLoS ONE. 2021;16:e0245962.
[125] Xia Z, Sacco M, Hu Y, et al. Rational design of hybrid SARS-CoV-2 main protease inhibitors guided by the superimposed cocrystal structures with the peptidomimetic inhibitors GC-376, telaprevir, and boceprevir. ACS Pharmacol Transl Sci. 2021;4:1408–1421.
[126] Owen DR, Allerton CMN, Anderson AS, et al. An oral SARS-CoV-2 Mpro inhibitor clinical candidate for the treatment of COVID-19. Science. 2021;374:1586–1593.
[127] Abdelnabi R, Foo CS, Jochmans D, et al. The oral protease inhibitor (PF-07321332) protects Syrian hamsters against infection with SARS-CoV-2 variants of concern. Nat Commun.2022;13:719.
[128] Li J, Lin C, Zhou X, et al. Structural basis of the main proteases of coronavirus bound to drug candidate PF-07321332. J Virol. 2022;96:e0201321.
[129] Goetz DH, Choe Y, Hansell E, et al. Substrate specificity profiling and identification of a new class of inhibitor for the major protease of the SARS coronavirus. Biochemistry.2007;46:8744–8752.
[130] Konno S,Kobayashi K, Senda M, et al. 3CL protease inhibitorswith an electrophilic arylketone moiety as anti-SARS-CoV-2 agents. J Med Chem. 2022;65:2926–2939.
[131] Li J, Lin C, Zhou X, et al. Structural basis of main proteases of coronavirus bound to drug candidate PF-07304814. J Mol Biol. 2022;434:167706.
[132] Ma Y, Yang KS, Geng ZZ, et al. A multi-pronged evaluation of aldehyde-based tripeptidyl main protease inhibitors as SARS-CoV-2 antivirals. Eur J Med Chem. 2022;240:114570.
[133] Bai B, Arutyunova E, Khan MB, et al. Peptidomimetic nitrile warheads as SARS-CoV-2 3CL protease inhibitors. RSC Med Chem. 2021;12:1722–1730.
[134] Sacco MD, Hu Y, Gongora MV, et al. The P132H mutation in the main protease of Omicron SARS-CoV-2 decreases thermal stability without compromising catalysis or small-molecule drug inhibition. Cell Res. 2022;32:498–500.
[135] Dampalla CS, Rathnayake AD, Galasiti Kankanamalage AC, et al. Structure-guided design of potent spirocyclic inhibitors of severe acute respiratory syndrome coronavirus-2 3C-like protease. J Med Chem. 2022;65:7818–7832.
[136] Lee C-C, Kuo C-J,Hsu M-F, et al. Structural basis ofmercury- and zinc-conjugated complexes as SARS-CoV 3C-like protease inhibitors. FEBS Lett. 2007;581:5454–5458.
[137] Pillaiyar T, Flury P, Krüger N, et al. Small-molecule thioesters as SARS-CoV-2 main protease inhibitors: enzyme inhibition, structure-activity relationships, antiviral activity, and X-ray structure determination. J Med Chem. 2022;65:9376–9395.
[138] Fu L, Shao S, Feng Y, et al. Mechanism of microbial metabolite leupeptin in the treatment of COVID-19 by traditional Chinese medicine herbs. MBio. 2021;12:e0222021.
[139] Andi B,KumaranD,KreitlerDF, et al. HepatitisCvirus NS3/4A inhibitors and other drug-like compounds as covalent binders of SARS-CoV-2 main protease. Sci Rep. 2022;12:12197.
[140] Kneller DW, Phillips G, Weiss KL, et al. Direct observation of protonation state modulation in SARS-CoV-2 main protease upon inhibitor binding with neutron crystallography. J Med Chem. 2021;64:4991–5000.
[141] Lockbaum GJ, Henes M, Lee JM, et al. Pan-3C protease inhibitor rupintrivir binds SARSCoV-2 main protease in a unique bindingmode. Biochemistry. 2021;60:2925–2931.
[142] Kuzikov M, Costanzi E, Reinshagen J, et al. Identification of inhibitors of SARS-CoV-2 3CLPro enzymatic activity using a small molecule in vitro repurposing screen. ACS Pharmacol Transl Sci. 2021;4:1096–1110.
[143] Costanzi E, Kuzikov M, Esposito F, et al. Structural and biochemical analysis of the dual inhibition of MG-132 against SARS-CoV-2 main protease (Mpro/3CLpro) and human cathepsin-L. Int J Mol Sci. 2021;22:11779.
[144] Ghosh AK, Raghavaiah J, Shahabi D, et al. Indole chloropyridinyl ester-derived SARS-CoV-2 3CLpro inhibitors: enzyme inhibition, antiviral efficacy, structure-activity relationship, and X-ray structural studies. J Med Chem. 2021;64:14702–14714.
[145] Malla TR, Brewitz L,Muntean D-G, et al. Penicillin derivatives inhibit the SARS-CoV-2 main protease by reaction with its nucleophilic cysteine. J Med Chem. 2022;65:7682–7696.
[146] Ma C, Xia Z, SaccoMD, et al. Discovery of di- and trihaloacetamides as covalent SARS-CoV-2 main protease inhibitors with high target specificity. JAmChem Soc. 2021;143:20697–20709.
[147] Alugubelli YR, Geng ZZ, Yang KS, et al. A systematic exploration of boceprevir-based main protease inhibitors as SARS-CoV-2 antivirals. Eur J Med Chem. 2022;240:114596.
[148] Zhao Y, Fang C, Zhang Q, et al. Crystal structure of SARS-CoV-2 main protease in complex with protease inhibitor PF-07321332. Protein Cell. 2022;13:689–693.
[149] Kneller D, Li H, Phillips G, et al. Covalent narlaprevir- and boceprevir-derived hybrid inhibitors of SARS-CoV-2 main protease: room-temperature X-ray and neutron crystallography, binding thermodynamics, and antiviral activity. [Preprint]. Res Sq. 2022.
[150] Greasley SE,Noell S, PlotnikovaO, et al. Structural basis for the in vitro efficacy of nirmatrelvir against SARS-CoV-2 variants. J Biol Chem. 2022: 101972.
[151] XiongM, Nie T, Shao Q, et al. In silico screening-based discovery of novel covalent inhibitors of the SARS-CoV-2 3CL protease. Eur J Med Chem. 2022;231:114130.
[152] Cantrelle F-X, Boll E, Brier L, et al. NMR spectroscopy of the main protease of SARS-CoV-2 and fragment-based screening identify three protein hotspots and an antiviral fragment.Angew Chem Int Ed. 2021;60:25428–25435.
[153] Kneller DW, Li H, Galanie S, et al. Structural, electronic, and electrostatic determinants for inhibitor binding to subsites S1 and S2 in SARS-CoV-2 main protease. J Med Chem.2021;64:17366–17383.
[154] Zhong B, Peng W, Du S, et al. Oridonin inhibits sars-cov-2 by targeting its 3c-like protease.Small Sci. 2022;2:2100124.
[155] Rossetti GG, Ossorio MA, Rempel S, et al. Non-covalent SARS-CoV-2 Mpro inhibitors developed from in silico screen hits. Sci Rep. 2022;12:2505.
[156] Zhang Y, Gao H, Hu X, et al. Structure-based discovery and structural basis of a novel broad-spectrum natural product against the main protease of coronavirus. J Virol. 2022;96:e0125321.
[157] Yang KS, Alex Kuo S-T, Blankenship LR, et al. Repurposing halicin as a potent covalent inhibitor for the SARS-CoV-2 main protease. Curr Res Chem Biol. 2022;2:100025.
[158] Shaqra AM, Zvornicanin SN, Huang QYJ, et al. Defining the substrate envelope of SARSCoV-2 main protease to predict and avoid drug resistance. Nat Commun. 2022;13:3556.
[159] Vangeel L, Chiu W, De Jonghe S, et al. Remdesivir, Molnupiravir and Nirmatrelvir remain active against SARS-CoV-2 omicron and other variants of concern. Antiviral Res.2022;198:105252.
[160] Sidebottom DB, Smith DD, Gill D. Safety and efficacy of antivirals against SARS-CoV-2. Br Med J. 2021;375:n2611.
[161] Ye C, Bian P, Zhang J, et al. Structure-based discovery of antiviral inhibitors targeting the E dimer interface of Japanese encephalitis virus. Biochem Biophys Res Commun.2019;515:366–371.
[162] Pietrucci F, Vargiu AV, Kranjc A. HIV-1 Protease dimerization dynamics reveals a transient druggable binding pocket at the interface. Sci Rep. 2015;5:18555.
[163] Bannwarth L, Kessler A, Pèthe S, et al. Molecular tongs containing amino acid mimetic fragments:new inhibitors of wild-type and mutated HIV-1 protease dimerization. J Med Chem.2006;49:4657–4664.
[164] Brzezinski D, Kowiel M, Cooper DR, et al. Covid-19.bioreproducibility.org: A web resource for SARS-CoV-2-related structural models. Protein Sci. 2021;30:115–124.
[165] Croll TI, Diederichs K, Fischer F, et al. Making the invisible enemy visible. Nat Struct Mol Biol. 2021;28:404–408.
[166] GuvenO,Gul M,Ayan E, et al. Case study of high-throughput drug screening and remote data collection for SARS-CoV-2main protease by using serial femtosecond X-ray crystallography.Crystals. 2021;11:1579.
[167] Chamakuri S, Lu S, Ucisik MN, et al. DNA-encoded chemistry technology yields expedient access to SARS-CoV-2 Mpro inhibitors. Proc Natl Acad Sci USA. 2021;118:e2111172118.
[168] Gildea RJ, Beilsten-Edmands J, Axford D, et al. Xia2.multiplex: a multi-crystal data-analysis pipeline. Acta Crystallogr D Struct Biol. 2022;78:752–769.
[169] Narayanan A, Narwal M, Majowicz SA, et al. Identification of SARS-CoV-2 inhibitors targeting Mpro and PLpro using in-cell-protease assay. Commun Biol. 2022;5:169.
[170] Sutanto F, Shaabani S, Oerlemans R, et al. Combining high-throughput synthesis and highthroughput protein crystallography for accelerated Hit identification. Angew Chem Int Ed.2021;60:18231–18239.
[171] Deetanya P, Hengphasatporn K, Wilasluck P, et al. Interaction of 8-anilinonaphthalene-1-sulfonate with SARS-CoV-2 main protease and its application as a fluorescent probe for inhibitor identification. Comput Struct Biotechnol J. 2021;19:3364–3371.
[172] Moghadasi SA, Esler MA, Otsuka Y, et al. Gain-of-signal assays for probing inhibition of SARS-CoV-2 Mpro/3CLpro in living cells. MBio. 2022;13:e0078422.
[173] Tomasello G,Armenia I,Molla G. The protein imager: a full-featured onlinemolecular viewer interface with server-side HQ-rendering capabilities. Bioinformatics. 2020;36:2909–2911.
[174] Omasits U, Ahrens CH, Müller S, et al. Protter: interactive protein feature visualization and integration with experimental proteomic data. Bioinformatics. 2014;30:884–886

Lea C. von Soosten, Maximilian Edich, Kristopher Nolte, Johannes Kaub, Gianluca Santoni and Andrea Thorn

This blog post was published in Crystallography Reviews.
Please cite: https://doi.org/10.1080/0889311X.2022.2098281

Abstract

With up to 17 domains, non-structural protein 3 (nsp3) is the largest protein of SARS-CoV-2. In part due to its large size, many of its functions still remain a mystery. It is known that nsp3 fulfils several essential functions in the cycle of infection, however most of its domains have not been structurally determined. One of its essential functions is to cleave the polyprotein, which is translated first upon infection into other functional non-structural proteins. Nsp3 is also involved in the evasion of the host immune system and forms large pore complexes important for viral replication. Furthermore, it interacts with more than 30 other host and viral proteins, resulting in a multitude of potential ways to affect the host cell and viral replication. The many roles of this coronaviral Swiss army knife make it a promising drug target. In this review, we aim to clarify naming conventions and give an overview on the structures and functions of its domains as a starting point for further research.

Introduction

Non-structural protein 3 (nsp3) is the largest protein of SARS-CoV-2 [1] and plays an important role within the infection cycle. Even though many of its functions are currently still unknown or uncertain, several studies do enable insight into its mechanisms, some of which are essential for viral replication. This makes nsp3 a promising drug target for therapeutics against COVID-19. The complexity behind nsp3 however lies not only in its size, but also in its great number of separate functional domains (see Figure 1), making it a ‘Swiss army knife’ of viral proteins. Depending on the definition of the domain borders, it consists of up to 17 domains. At the time of this review, only six of those are structurally solved.
Upon infection by SARS-CoV-2, the two polyproteins pp1a and pp1ab are translated and cleaved into individual non-structural proteins by two viral proteases. While the proteins nsp1 to nsp3 are cleaved from the polyprotein by the Papain-like Protease domain of nsp3, the remaining proteins nsp4 to nsp16 are cleaved by the 3C-like protease (also known as Mpro or nsp5) [2]. Some of the other domains of nsp3 interact with host proteins and interrupt the anti-viral signal transduction [3], while others interact with nsp4 to form double membrane vesicles [4], where it is fixed by two transmembrane domains [3] (see Figure 2). These vesicles protect the replication machinery against host proteases and ensure the replication of viral RNA. Together with nsp4 and nsp6, nsp3 also forms molecular pores in these vesicles through which the RNA can be exported for packing into new virions [5]. Altogether, nsp3 is an immensely complex protein, the overall fold of which is still unclear.

The Swiss army knife of SARS-CoV-2: the structures and functions of NSP3 5

Figure 1. Domains of nsp3 in sequence. Structural examples for the domains from SARS-CoV-2 or SARS-CoV-1 are depicted below or above the domain chart, respectively. For domains depicted in grey, no Sarbecovirus structures have been solved yet. Figure was created using Protein Imager [40].


A detailed review on the nsp3 of coronaviruses up until 2018 (excluding SARS-CoV-2) was published by Lei et al. [3], giving a good overview of its function in different viruses.
Here, we provide an overview of the current state of nsp3 structural biology, with a view to recent discoveries boosted by the COVID-19 pandemic.

The Swiss army knife of SARS-CoV-2: the structures and functions of NSP3 6

Figure 2. Interaction network of nsp3 domains and various interaction partners from the virus or the host. Domains are depicted in round boxes; viral components are shown as grey boxes and host components are shown in orange. The connecting lines show if the interaction was shown in vivo (black), shown in vitro (grey, solid) or only proposed (grey, dashed). Some of those interactions have only been proven for SARS-CoV-1 so far. The details for each interaction are found in the text.

Function and structure

Some nsp3 domains are present in all coronaviruses, whereas others only exist in specific coronaviruses. The multifunctionality of this protein along with inconsistent domain nomenclature makes it hard to understand in its entirety. An overview of all domains in SARS-CoV-2 nsp3 and their different names used in literature is given in Table 1. So far, no full-length structure of nsp3 has been solved and structural models are available for only 6 of its 17 domains. These include both ubiquitin-like domains (Ubl1 and Ubl2), the macrodomain 1 (Mac1), the Papain-like Protease (PL2pro) and the nucleic-acid-binding domain (NAB) as well as the Y3 domain. At the time of writing this review more than 300 experimentally determined structures of nsp3 domains are available in the PDB. A selection of those is listed in Table 2, describing at least one structure per available domain. In the following, a short introduction to the function of each domain is given, followed by a section with a closer look at existing structures, functional elements, and corresponding mechanisms, where available.

Domains

Table 1 Names and abbreviations. Cells shaded in grey indicate domains with available protein structures. The subsection column lists alternative names for (multi-)domain regions, which were used in the past. The used NCBI reference sequences are NP_828862.2 and YP_009742610.1 for SARS-CoV-1 and SARS-CoV-2, respectively. References: Lei et al. [3]; S. M. Korn et al. [6]; Schuller et al. [7]; N. Salvi et al. [8].
a These numbers are based on the predictions by the TMHMMM 2.0 Server [9].
b These numbers are based on the numbers of known structures and/or the predicted transmembrane regions and these regions might include also disordered linker sequences. It was not possible to determine the exact amino acid separating Y1 and CoV-Y.
c Y3 is a part of CoV-Y. Ranges without annotation were determined via available structures.
d Sequence identity was calculated for Y1 combined with CoV-Y without Y3, as their separation is not clear.

Complete NameAlternative NameUsed AbbreviationSubsectionAmino acid numbers/rangeSequence identity to SARS-CoV-1 [%]
Ubiquitin-like domain 1 Ubl1nsp3a1–11176.58
Hypervariable regionGlu-rich acidic region, intrinsically disordered region (IDR)HVR112–20635.79
Macrodomain 1X-domain, Macrodomain, ADP-ribose-100-phosphatase (ADRP), S2-MacroD, MacroDMac1nsp3b207–37971.68
Macrodomain 2SARS-unique domain N (SUD-N)Mac2nsp3c      413–55069.57
Macrodomain 3SARS-unique domain M (SUD-M)Mac3551–67581.60
Domain preceding Ubl2 and PL2proSARS-unique domain C (SUD-C)DPUP676–74574.29
Ubiquitin-like domain 2 Ubl2nsp3d746–80489.83
Papain-like ProteasePLproPL2pro805–106381.08
Nucleic-acidic-binding domain NABnsp3e1089–120381.74
Betacoronavirus-specific marker domainGroup-2-specific marker domain (G2M)ßSM1204–1412b68.90
Transmembrane region 1 TM1 1413–1435a73.91
Nsp3 ectodomainLumenal loop3Ecto 1436–1522a70.11
Transmembrane region 2 TM2 1532–1554a78.26
Amphipathic helix 1 AH1 1561–1583a86.96
Nidovirus-conserved domain of unknown function Y1 1584–? b88.08d
Coronavirus-specific C-terminal domainCoV-YY2 ? –1843b88.08d
Coronavirus-specific C-terminal domain  CoV-YY3 1844–1945c90.20

Nsp3a – ubiquitin-like domain 1 and the hypervariable region

Starting at the nsp3 N-terminus, the first two domains are the ubiquitin-like domain 1 (Ubl1) and the Glu-rich acidic region (AC domain), with the latter alternatively known as hypervariable region (HVR). Both exist in all coronaviruses and together are called nsp3a in some of the literature [3]. Although the specific function of the coronaviral ubiquitin-like domain is unknown, studies indicate that the domain interacts with the viral nucleocapsid protein [8,10] and is capable of binding to single-stranded RNA in SARS-CoV-1 [3,11]. Nucleocapsid packs and protects viral RNA in the virion. This suggests that Ubl1 acts as a key interaction partner to facilitate association between the nucleocapsid, the replication/transcription complex (RTC) and the viral RNA [10]. It may also play an essential role in the viral replication process; one study for example showed that mutants with deleted Ubl1 core regions in mouse hepatitis viruses were not able to replicate [3].

Table 2 PDB entries for nsp3 domains. For each entry we list ID, technique, a comment on its importance and an evaluation of the most intense Fourier difference peaks as calculated from Coot as well as a statement on the model quality. A structure for each domain was chosen if possible. For Mac2, Mac3, and DPUP, structures where only available for SARS-CoV-1. For Mac1 and PL2pro, we listed one structure bound to a ligand and an additional structure without a ligand and the highest resolution.

SARS-CoV-2
IDMethodResolution in ÅDescriptionComment about the highest Fourier difference peak from Coot
7KAGX-ray diffraction3.21Ubl1 domain shown as a dimer. Model includes 13 molecules of ethylene glycol and one sulfate. Both monomers show contact at their N-terminal β-sheet.22 peaks, with the highest being around Ser93.
7KQPX-ray diffraction0.88Macrodomain in complex with ADP-ribose. Only one monomer of the shown dimer is bound to the ligand.87 peaks, mainly water molecules missing.
7KR0X-ray diffraction0.77Macrodomain without inhibitor at 100 K in highest resolution. Model shows a monomer.87 peaks, mainly water molecules or crystal solution components missing.
7JRNX-ray diffraction2.48PL2pro and Ubl2 with in complex with inhibitor GRL0617. Model shows a dimer and includes next to two inhibitor molecules two zinc atoms located at the finger domains and five sulfates in total.12 peaks. The highest one suggests that sulphate 405 could be a water molecule instead.
7D6HX-ray diffraction1.6PL2pro and Ubl2 C111S mutant without inhibitor in highest resolution. Shown is a monomer with zinc atom at the finger domain and a phosphate.59 peaks, mainly water molecules missing.
7LGOX-ray diffraction2.45NAB domain shown as a dimer surrounded by water.4 peaks, the highest could point to an alternative conformation of Thr92 in chain B
7RQGX-ray diffraction2.17Tetramer of Y3 domain, where a loop parts of the C-terminus of one monomer are not modelled.3 peaks, the highest indicates a possibly cleaved disulphide bridge between Cys1926 from chains B and D.
SARS-CoV-1
2W2GX-ray diffraction2.22Dimer of Mac2 and Mac3, where the linker between those is only partially modelled. The model includes two sulfates.2 peaks, no major problems can be found in the structure..
2KQWSolution NMR DPUP as a monomer. Although it contains the sequence of the SUD region, only the DPUP domain is modelled. 

Comparison of coronavirus genomes indicates a co-evolution between nucleocapsid and Ubl1. While any coronaviral nucleocapsid binds to any Ubl1, strong binding affinity was only measured for proteins of the same virus. For example, in experiments with bovine coronavirus (BCoV) and mouse hepatitis virus (MHV), the binding of BCoV nucleocapsid and MHV Ubl1 was lower than the binding of MHV nucleocapsid to MHV Ubl1 by a factor of 260 [3]. Additionally, Ubl1 shows a high structural similarity to human ubiquitin and the ubiquitin-like domain of human interferon-stimulated gene 15 (ISG15). Therefore, it is suggested that Ubl1 interacts with ubiquitin- and/or ISG15-targeting proteins and thus interferes with anti-viral signal transduction, since these kinds of proteins are often involved in immune signal transduction pathways [3].
The function of the Glu-rich acidic region remains a mystery. It lives up to its alternative name (hypervariable region), as – despite its presence in all coronaviruses – it is poorly conserved. From its 95 residues in SARS-CoV-2, glutamic acid and aspartic acid make up 22% and 11%, respectively. In contrast, the Glu-rich acidic region of SARS-CoV-1 consists of 69 residues, from which glutamic acid and aspartic acid make up 36% and 12%, respectively. It is also suggested to be intrinsically disordered [3]. Possible functions include a regulatory role and the interaction with other non-structural proteins of the virus [3], while Glu-/Asp-rich proteins in general are also adept at mimicking DNA or RNA [12], supporting the interaction between Ubl1 and the nucleocapsid.

Overall structure and functional features

Currently, one structure of the SARS-CoV-2 ubiquitin-like domain 1 is available (PDB: 7KAG) (see Figure 3 and Table 2). For the hypervariable region, no structure has been deposited so far, likely due to its proposed disordered nature [3]. This is indicated by its high variability among all coronaviruses, since it does not suggest common conserved structural elements.
For SARS-CoV-1 Ubl1, one NMR ensemble (PDB ID: 2GRI) and one conformer resembling the mean coordinates (PDB ID: 2IDY) have been published. Sequence identity between the structures from 7KAG and 2IDY amounts to 76.58%. The root mean square deviation (RMSD) between these structures’ Cα positions is 4.7 Å. However, both termini seem to be disordered. Thus, removing the first 18 residues and everything beyond residue 105 reduces the RMSD to 1.8 Å. Despite high sequence similarity, both folds differ at the disordered 12 N-terminal residues and slightly in the length of sheets and helices (Figure 3(a)). The secondary structure elements of 7KAG follow the sequence β1–β2–β3–α1–β4–α2–α3–α4–β5. The overall shape of the domains is similar in both viruses, indicating a conserved function.

Nsp3b – macrodomain 1/ ADP-ribose-phosphatase (ADRP)

The macrodomain 1, also known as nsp3b, X-domain or ADP-ribose phosphatase domain (ADRP), is a conserved domain found in all coronaviruses [3]. ADP-ribosylation of proteins and DNA is a post-translational modification, forming either monomeric ADP-ribose or poly-ADP-ribose (PAR) [13,14] conjugated through the C1 of the distal ribose at the end of the ADP-ribose molecule [7]. These modifications can be formed by the human poly(ADP-ribose) polymerases (PARP), which are involved in numerous processes, including stress response, protein degradation, signalling and several more [16]. Many of these PARPs are expressed during interferon (IFN) response, are part of the immune response and hold antiviral activity [14,15].Macrodomain 1, despite being called ADP-ribose phosphatase, is in fact a mono(ADP-ribosyl)hydrolase, i.e. it binds to monomeric (ADP-ribosyl) moieties (MAR) and hydrolyses them from target proteins [16]. The macrodomain 1 of SARS-CoV-2 has also been shown to reverse the human PARP14-derived ADP-ribosylation [15] and therefore antagonizes the host’s immune response. With this, the macrodomain plays an important role in the viral pathogenesis and therefore constitutes an interesting drug target [17]. This function is conserved throughout coronaviruses despite a sequence divergence of 28% between SARS-CoV-2 and SARS-CoV-1 (Table 1) and 59%, between SARS-CoV-2 and MERS-CoV [16].While a poly(ADP-ribosyl)hydrolase activity has been reported in other viral macrodomains including SARS-CoV-1 [18,19], it binds weakly without being able to hydrolyse these poly(ADP-ribose) moieties in SARS-CoV-2. Additionally, a mono(ADP-ribosyl)hydrolase activity has been shown [20].

The Swiss army knife of SARS-CoV-2: the structures and functions of NSP3 7

Figure 3. Structures of the domains Ubl1 (a) (PDB-ID: 7KAG), NAB (b) (PDB-ID: 7LGO), and Y3 (c) (PDB-ID: 7RQG) from SARS-CoV-2. The left column shows the structures in cartoon representation with its surface, the central column shows it in cartoon alone and the right column shows for Ubl1 and NAB domains the superimposition between the respective structure from SARS-CoV-2 (blue) and its counterpart from SARS-CoV-1 in purple (PDB-ID for Ubl1: 2IDY; PDB-ID for NAB: 2K87). Figure was created using Protein Imager [40].

Overall structure and functional features

The overall structure of this domain is formed by two layers of helices and β-sheets of seven strands wedged between them, following the characteristic structure of a MacroD-type macrodomain. The β-sheet comprises the seven strands β1, β2, β7, β6, β3, β5 and β4, the two layers encompassing the β-sheet consist of α1, α2 and α3 on one side and η1, α4/η2, η3, α5 and α6 on the other side [17] (see Figure 4).
The ADP-ribose-binding pocket is formed by the C-terminal ends of the β-strands β3, β5, β6 and β7 in the centre of the sheet and the N-terminus of the α1-helix. The substrate is further surrounded by the 310 helix η3 formed by the loop between β6 and α5 as well as the connecting loop between β3 and α2 [17].
The pocket can be divided into the adenosine site, formed by the subsites involved in the binding of adenine and the proximal ribose, and the catalytic site, formed by the subsites involved in binding the diphosphate and distal ribose [7,17]. Within the structure, four regions that are highly conserved in different coronaviruses can be identified [16,17]. The first in sequence is the VNAAN motif (residues 36–40) in β3, with the N40 forming hydrogen bonds with the distal ribose 31 hydroxyl group. The second conserved region resides in the loop between β3 and α2 around the GGG motif (residues 46–48); it is involved in binding the distal ribose 21 and 11 hydroxyl groups [16]. The third region is located at the end of β5 with the amino acids VGP at positions 96–98 [16,17]. The last highly conserved region can be found at the end of β6 up to the end of η3, containing the GIF motif at 130–132. It interacts with the phosphates and is also involved in the binding of the distal ribose, where Phe132 might be involved in hydrolysis [16,17]. The substrate-binding regions show some degree of flexibility, which does not lead to large conformational changes within the rest of the protein, but instead to local shifts and rearrangements of certain structural elements like the β3–α2 or β6–(η3–)-α5 loop [17].

The Swiss army knife of SARS-CoV-2: the structures and functions of NSP3 8

Figure 4. The SARS-CoV-2 macrodomain 1. Left: with surface and its substrate ADP-ribose, PDB ID: 7KQP. Right: labelled structure without substrate (PDB ID: 7KQO). Figure was created using Protein Imager [40].

Available structures

To date, 265 structures of the SARS-CoV-2 macrodomain 1 are available. Here, a selection of relevant structures will be given. A total of 10 apo-structures are currently available for SARS-CoV-2 nsp3 macrodomain 1, ranging from 0.77 Å to 2.03 Å in resolution (PDB IDs: 7KR0, 7KQO, 7KQW, 6WEY, 5S74, 5S73, 6WEN, 7KG3, 7KR1, 6VXS). Of the macrodomain bound to ADP-ribose, 7 structures are available, their resolution ranging from 0.88 Å to 3.83 Å (PDB IDs: 7KQP, 6W02, 6Z5T, 6WOJ, 6YWL, 7CZ4, 7C33). The remaining 248 structures show the domain in complex with various molecules, including buffers like MES (PDB IDs: 6WCF, 6YWM) and HEPES (PDB ID: 6YWK), nucleotides (PDB IDs: 6W6Y, 7BF4), ADP-ribose analogues (PDB IDs: 7BF5, 7BF6), adenosine (PDB ID: 7BF3) and cAMP (PDB IDs: 7JME). Other structures show complexes with the novel compounds PARG-345 (PDB IDs: 7LG7) and PARG-329 (PDB IDs: 7KXB) as well as the remdesivir metabolite GS-441524 (PDB ID:7BF6), ADP-ribose phosphate (PDB ID: 7BF5). Numerous structures were solved as part of a study conducted by Schuller et al. [7] through a combination of computational docking and crystallographic screening of small molecule fragments, resulting in more than 230 structures with 214 unique fragments bound to macrodomain 1.
In total, 192 of the fragments were located at the active site, 14 were bound to a distant pocket at Lys90 located at the backside of the protein, the rest was spread across the protein’s surface. Another interesting finding of the study is the distribution of fragments within the active site. While most were found to bind at the adenine subsite, 54 fragments were found to bind to a location formed by the backbone nitrogens of Phe156 and Asp157 close to the adenine subsite, and only a few bound to the catalytic site [7].
Further, a comparison between crystal structures in the apo-form at different temperatures (100 K: PDB ID 7KR0; 310 K: PDB ID 7KR1) was carried out, showing a more compact structure and some loop displacement close to the active site at lower temperature [7]. Besides crystallization conditions such as the presence of ligands, a number of studies additionally observed a strong influence of small alterations within the N- and C-termini of the domain’s amino acid sequence on the resulting crystal forms [7,15,16]. Additionally, different crystal forms exhibit different accessibility for molecules to the active site of the macrodomain, as their symmetry mates partly obstruct possible access points for example in the C2 crystal packing. Hence, the slightly more accessible P43 crystal packing appearsto be the most represented under the available structures [7]. The macrodomain 1 is an interesting drug target as it acts as a keen antagonist of ADP-ribosylation by cellular (ADP-ribosyl)transferases such as PARP14 [15], making it a crucial part of the virus due to its ability to interfere with the host immune response [14,15]. Even though the exact mechanism of action is not completely known yet, the vast number of available structures of different complexes already provides plenty of information about this domain and forms a good baseline for further exploration for suitable inhibitors.

The Swiss army knife of SARS-CoV-2: the structures and functions of NSP3 9

Figure 5. Nsp3c or SUD of SARS-CoV-1: (A) Structure of macrodomain 2 with (left) and without surface (right), PDB ID: 2W2G, amino acids 389–516. (B) Structure of macrodomain 3 with (left) and without surface (right), PDB ID: 2W2G, amino acids 527–652. (C) Structure of the DPUP with (left) and without surface (right), PDB ID: 2KQW. Figure was created using Protein Imager [40].

Nsp3c – macrodomain 2, macrodomain 3 and the domain preceding Ubl2 and PL2pro

Nsp3c, formerly known as the SARS-unique domain (SUD), is predominantly found in the clade of Sarbecoviruses, which contains SARS-CoV-1 and SARS-CoV-2. After it had also been found outside of SARS coronaviruses, the domains were renamed, but all former names are currently still in use. Nsp3c consists of macrodomain 2 (Mac2, formerly SUD-N), macrodomain 3 (Mac3, formerly SUD-M) and the domain preceding Ubl2 and PL2pro (DPUP, formerly SUD-C) [3]. Mac2 is separated from Mac1 via a linker of 33 residues. Noticeably, the Mac2 domain does not exist in MERS-CoV [21]. So far, no protein structures of these domains of SARS-CoV-2 are available, but a few were published for SARS-CoV-1, enabling predictions about their relatives in SARS-CoV-2 due to the high sequence similarity (Table 1). These include one structure of Mac2 (PDB ID: 6YXJ), five of Mac3 (PDB IDs: 2JZD, 2JZE, 2JZF, 2RNK, 2KQV), two structures spanning over Mac2 and Mac3 (PDB IDs: 2W2G, 2WCT) and one showing DPUP (PDB ID: 2KQW).

Overall structure and functional features

Available NMR chemical shift assignments for SARS-CoV-2 reveal details about the secondary structure elements of the three domains. The Mac2 consists of the elements β1–α1–β2–α2–α3–β3–β4–α4 [22], Mac3 of the elements β1–α1–β2–α2–β3–β4–α3–β5–α4–α5–β6–α6 and the DPUP of α1–β1–β2–β3–β4–α2 [23]. When comparing these to the available structures of SARS-CoV-1 (Figure 5), all three domains share a very high secondary structure identity with their SARS-CoV-1 counterparts.
Even though Mac2 and Mac3 also follow an αβα sandwich fold as seen in macrodomain 1, they differ in their respective secondary structure elements as well as their function [23]. In both SARS-CoV-1 and SARS-CoV-2, the Mac2 domain has been shown to bind to the middle domain of the human poly(A)-binding protein-interacting protein 1 (Paip1) [21]. Paip1 is part of the host translation machinery and stimulates translation [24]. For SARS-CoV-1, the crystal structure of Mac2 in complex with the middle domain of Paip1 (PDB ID: 6YXJ) shows that the Mac2 mainly interacts with Paip1 via its N-terminal loop. Furthermore, the SUD has been shown to increase the binding affinity between Paip1 and its binding partner human poly(A)-binding protein. The SUD has been shown in cellulo to increase only viral protein translation levels, suggesting an interaction with the aforementioned human proteins. As the sequence identity between these domains in SARS-CoV-1 and SARS-CoV-2 (Table 1) and especially in the N-terminal loop of Mac2 binding to Paip1 is relatively high, a similar function can be assumed for these domains in both viruses [21]. Another interesting feature found in the SARS-CoV-1 Mac2 and Mac3 domains is the binding to oligo(G)-containing nucleic acids capable of forming G-quadruplexes [25,26]. A replacement of the amino acids interacting with the oligo(G)/G-quadruplex as well as a deletion of the Mac3 domain has been shown to lead to an abrogation of the viral genome replication [25]. Furthermore, the cellular tumour suppressor p53, which has in other viruses already been shown to hold antiviral activity, can be degraded by the involvement of the SUD together with the PLpro domain [27].

Nsp3d – ubiquitin-like domain 2 and Papain-like Protease 2

While many coronaviruses encode nsp3 proteins with two Papain-like Protease domains, which are known as PLpro and PL2pro, nsp3 of SARS-CoV-2, SARS-CoV-1 and MERS-CoV, only encodes one such protease [28]. Together with the ubiquitin-like domain 2 (Ubl2), it forms a region that is also called nsp3d. Although only the protease corresponding to the PL2pro domain is found in the Sarbecoviruses, both names, PLpro and PL2pro, are in use. Sometimes, even the whole nsp3 protein is found under the name of PLpro, adding further to the confusion. Ubiquitin-like domain 2 (Ubl2) is directly followed by PL2pro and is sometimes seen as a subdomain of the protease [29]. One of its major targets for cleavage is the viral polyprotein, which includes all 16 nsps. Human cells use similar proteases connected to ubiquitin-like domains, the class of deubiquitinating enzymes, to regulate several pathways by cleaving ubiquitin from the target protein. Such ubiquitin-specific proteases can be regulated by their respective ubiquitin- ike domain. However, such a regulation of PL2pro activity by its attached Ubl2 in SARS-CoV and SARS-CoV-2 is unlikely, due to structural differences in the linker compared to its counterparts in human cells [3]. The exact function of Ubl2 has not yet been identified. Mutation and deletion experiments in Ubl2 led to inconsistent results; while the deletion of Ubl2 had no impact on the thermal stability of PL2pro, a mutation in the hydrophobic core of Ubl2 did. The mutation, however, may disrupt the fold. Nonetheless, Ubl2 is more conserved than Ubl1 among different coronaviruses [3], which may indicate an important function. PL2pro recognizes a LXGG↓XX motif and cleaves the polyproteins pp1a and pp1ab to release nsp1, nsp2 and nsp3 after translation of the viral genome, making it indispensable for infection and replication and thus a promising drug target. As an LRGG motif constitutes the C-terminus of ubiquitin and the interferon-stimulated gene 15 protein (ISG15), PL2pro further exhibits activity for de-ubiquitination and removal of ISG15 [2]. The substrate binding prior to the cleaving is performed via two ubiquitin-binding sites, Ub1 and Ub2, and is regulated via the blocking loop BL2 [3]. In addition, it can also bind polyubiquitin and cleaves di-ubiquitin from those. Since both ubiquitin and ISG15 are involved in various host pathways including pathways related to immune response, PL2pro can interact with the immune response. Two of the main pathways are the interferon regulatory factor 3 and NF-κB pathways, which produce antiviral cytokines [3].
Besides its de-ubiquitinating and de-ISG-ylating activity, PL2pro interacts also in addi-tional ways with the immune response. It inhibits proteins from the interferon regulatory factor 3 (IRF3) pathway and the NF-κB signalling pathway, although due to inconsistent results the exact mechanisms are still up for debate [3]. Interestingly, SARS-CoV-1 PL2pro and SARS-CoV-2 PL2pro show different preferences for their substrates. While PL2pro of SARS-CoV-1 favours ubiquitin chains, the variant from SARS-CoV-2 prefers the interaction with ISG15, which leads to a de-ISG-ylation of interferon regulatory factor 3 and could thus lead to better evasion from immune responses [30]. Differences were also observed in the enzyme activities, as SARS-CoV-2 PL2pro is 2500–3000 times more efficient towards ISG15 as a substrate than to the polyprotein, while SARS-CoV-1 PL2pro is only 100 times more efficient for ISG15 over the polyprotein [28]. Also, SARS-CoV-1 PL2pro demonstrates differences in substrate specificity in vivo and in vitro experiments. The digestion of Lys48-linked poly ubiquitin chains from TRAF3 and TRAF6, two regulators of IRF3 and NF-κB, is preferred over the digestion of Lys63-linked poly ubiquitin in vitro. Furthermore, the Lys48-linked chains are not removed in vivo at all [3]. A possible reason might be the interaction between viral proteins and host factors [3], as well as the rearrangement of PL2pro and other domains within nsp3.
Different constructs consisting of nsp3 regions including PL2pro also interact with the viral proteins nsp2, nsp4, nsp6, the viral RNA polymerase nsp12 as well as with ORF3a, ORF7a, and ORF9b proteins, as shown through various protein–protein interaction assays [3]. The interaction with nsp2, ORF3a, and ORF9b was shown for the region from PL2pro to the C-terminus, while the interaction with nsp4, nsp6, nsp12, and ORF7a was shown for the region from PL2pro to the betacoronavirus-specific marker domain. The nature of these interactions remains unclear [3].

Overall structure and functional features

SARS-CoV-2 PL2pro comprises four subdomains [29] and follows a ‘thumb-palm-finger’ (Figure 6) architecture [31]. The four subdomains are: the N-terminal ubiquitin-like domain (Ubl2), the thumb, the zinc-finger and the palm subdomain [29], of which the last three form the catalytic part [31]. The Ubl2 subdomain consists of one α- and one 310-helix as well as five β-strands. The thumb consists of six α-helices and a β-hairpin. The palm subdomain has six β-strands and contains the catalytic triad of Cys111, His272, and Asp286 at its interface to the thumb subdomain [31]. The residues Gly266–Gly271 form a mobile β-loop close to the active site, known as the blocking loop 2 (BL2), which is involved in the regulation of substrate binding [3] by recognizing the LXGG↓XX motif and changing its conformation when binding to a substrate or inhibitor [29,31].
The finger subdomain consists of six β-strands and two α-helices. Within the loops between the β-strands reside four conserved cysteine residues (Cys189, Cys192, Cys224, Cys226), forming a zinc finger and coordinating a zinc ion [29,31].
The PL2pro domain exhibits two different binding sites, S1 and S2 (Figure 6), for cleavage of ubiquitin molecules or ISG15. However, both binding sites differ in substrate specificity and activities, which are important to characterize the protein’s effect on the host immune system. The S1 Ub/Ubl-binding site can interact specifically with ubiquitin and ISG15: ubiquitin interacts with the palm and finger subdomains; its C-terminus reaches into the catalytic centre, forming an open hand conformation. ISG15 also interacts with the palm subdomain, but with the thumb subdomain instead of the fingers. The main contacts between ISG15 and PL2pro are facilitated by Trp123 and Pro130/Glu132 of ISG15 interacting with the α7-helix of the PL2pro thumb domain (Figure 6) [32]. The S2 ubiquitin-binding site preferentially cleaves polyubiquitin linked at its residue Lys48 with higher activity for longer ubiquitin chains. The binding site consists of the Phe69, located within the conserved α2-helix in the thumb domain [32]. The SARS-CoV-2 PL2pro closely resembles SARS-CoV-1 PL2pro [29] with a sequence identity of 82.8% and a RMSD of 0.8 Å. Out of 54 residue differences, only six are located in the ubiquitin-interacting motif. Previous mutations in SARS-CoV-1 PL2pro at these sites showed increased activity for ISG15 at the cost of lower de-ubiquitinase activity [28]. This suggests that PL2pro from both, SARS-CoV-1 and SARS-CoV-2, might show differences in its specific substrate activities due to the mutations [28]. This is of particular interest when studying the interference of the SARS-CoV-2 PL2pro with the innate immune system and the host’s inflammatory signalling pathways. A full explanation of the effects would be beyond the scope of this review. However, a detailed depiction of the involved processes has been written by Shin et al. [30].

The Swiss army knife of SARS-CoV-2: the structures and functions of NSP3 10

Figure 6. Structure PDB ID 7JRN of PL2pro with attached Ubl2 domain (purple) and bound to inhibitor GRL-0617. In addition to Ubl2, PL2pro consists of the fingers- (dark blue, left), thumb- (cyan, right), and palm-subdomains (light blue, bottom). The catalytic active site is highlighted in the lower introspection, the blocking loop interacting with the inhibitor is highlighted in the upper introspection. S2 binding site is located at Phe69. Location of S1 varies depending on the substrate and is therefore not labelled. The pink helix α7 interacts with ISG15 binding at S1. Figure was created using Protein Imager [40].

Available structures

Currently, 41 X-ray crystallographic structures of the SARS-CoV-2 PL2pro domain are available. Since this list is constantly growing, we refer here to the repository of the Coronavirus Structural Task Force, which is updated every week [33]. From these 41 structures, 28 were bound to a potential inhibitor.
It is also worth noting that crystallization of a wild type PL2pro domain without a bound ligand is a challenging task. Therefore, many available structures resemble the C111S mutant [29] which is easier to crystallize.

Therapeutic interest

Due to its many functions regarding viral replication as well as the interaction with the host cell and immune response pathways, nsp3 is – along with Mpro and nsp12 – a major drug target for COVID-19 [3,31]. The main targets within nsp3 are the macrodomain 1 and the Papain-like Protease, which made them the subjects of numerous studies on the development or discovery of suitable inhibitors and drugs [7,28] However, targeting Papain-like Protease comes with two major challenges as stated by Lei et al. [3] and by Báez-Santos et al. [34]: first, the S1 and S2 binding sites bind tightly to glycines in the substrate. Mimicking such a molecule limits the design of a suitable inhibitor. Second, similar binding motifs are used also in host proteins, making the design of drugs specifically for PL2pro more difficult. The blocking loop 2 (Figure 6) on the other hand is better suited as a SARS-CoV-2 specific drug target due to its uniqueness among host proteins and coronaviruses [3], which recently has been shown to bind the molecule GRL-0617 as a strong non-covalent inhibitor in SARS-CoV-2 [28].

Nsp3e – nucleic-acid-binding domain and the betacoronavirus-specific marker domain

The nucleic-acid-binding domain (NAB) and the betacoronavirus-specific marker domain (βSM) together form nsp3e and are unique to betacoronaviruses [3]. In SARS-CoV-1, the nucleic-acid-binding domain has been shown to unwind DNA and to bind single-stranded RNA consisting of repeats of three consecutive guanines [3,35], but its exact targets or specific function remain unclear [6]. As the structures of the nucleic-acid-binding domain are similar between SARS-CoV-1 and SARS-CoV-2, the function is likely the same in both Sarbecoviruses (see below). Another group of coronaviruses sharing a similar region are gammacoronaviruses, which contain a gammacoronavirus-specific marker domain instead of a betacoronavirus-specific marker domain. Although these might share a common function, no structural information is yet available and previous structure predictions indicated that most of this region is intrinsically disordered [3].

Overall structure and functional features

Currently, two nucleic-acid-binding domain structures are available, a crystal structure (PDB ID: 7LGO) for SARS-CoV-2 and an NMR structure (PDB ID: 2K87) for SARS-CoV-1 (Figure 3(b)). They exhibit a sequence identity of 81.74% and similar structures with a Cα RMSD of 2.7 Å. The secondary structure elements of PDB entry 7LGO (Figure 3(b), Table 2) occur in the sequence of β1–β2–α1–β3–β4–η1–β5–β6–α2, where η1 representsa 310 helix with a proline-induced kink in its middle. In the nucleic-acid-binding domain of SARS-CoV-1, the RNA binding is achieved through a positively charged surface patch [3]. Specifically, these are Lys74, Lys75, Lys98, and Arg105, which bind to single stranded-RNA with the preferred sequence pattern mentioned above [3]. Sequence alignment reveals that these same amino acids are conserved in SARS-CoV-2.

Transmembrane domains, amphipathic helix and the ectodomain

One of the key features of nsp3 is the fact that it is anchored in the membrane. Of all other non-structural proteins, only nsp4 and nsp6 share this property [36]. Two transmembrane domains, TM1 and TM2, pass through the membrane of the endoplasmic reticulum or the attached vesicles, with the nsp3 ectodomain (3Ecto) located on the lumenal side of the membrane between the two transmembrane domains [3]. Two N-glycosylation sites are located in the ectodomain of SARS-CoV-1. Both sites are also found in SARS-CoV- 2, although the first site mutated from Asn-Ser-Ser to Asn-Ser-Thr. The ectodomain is sometimes referred to as zinc-finger domain, since such a structural feature was found in this region. However, the zinc-finger is not conserved among all coronaviruses, leading to the renaming of this domain [3]. For SARS-CoV-1 nsp3, three transmembrane domains were predicted in silico, but glycosylation experiments investigating their topology indicated that only the first two of these traverse the membrane [36]. The third one is an amphipathic helix (AH1) and follows the second transmembrane domain. It could interact with the membrane, but the exact mode is unknown [3]. For SARS-CoV-2, the transmembrane prediction tool TMHMM 2.0 [9] finds four potential transmembrane helices, located at the regions of TM1, TM2, AH1 and at the C-terminal region of the ectodomain. The latter one, however, was not experimentally examined whether it is membrane spanning in SARS-CoV-1. For both, SARS-CoV-1 and murine hepatitis virus, it was shown that the nsp3 N-terminus and C-terminus are located in the cytoplasm, setting the requirement of an even number of transmembrane domains [36]. Otherwise, PL2pro and the corresponding cleavage site between nsp3 and nsp4 would be separated by the membrane. In absence of the transmembrane region, however, no cleaving by PL2pro is executed [37]. For SARS-CoV-2, no experiments on the membrane topology were performed up until now but it likely resembles the same topology as SARS-CoV-1. Further, in murine hepatitis coronavirus (MHV) – also a betacoronavirus – nsp3 has been identified as a major component in the formation of molecular pores within the membranes of double membrane vesicles (DMVs), which consist not only of nsp3 but also of the transmembrane proteins nsp4 and nsp6 [5]. These vesicles serve as viral replication organelles, protecting the viral RNA and replication machinery from host proteins, and are formed from the membrane of the endoplasmic reticulum [36]. For the translation, the viral mRNA strands are released from the double membrane vesicles into the cytosol, which is accomplished by the molecular pores. These pores show an overall six-fold symmetry and are complexes consisting of multiple proteins, of which nsp3 contributes the largest mass [5]. The double membrane vesicles are formed by the interaction between ectodomain of nsp3 and the lumenal regions of nsp4, inducing an increased membrane curvature [4]. So far, no high-resolution structure of the ectodomain and the transmembrane domains were experimentally determined. However, a 30.5 Å resolution architecture obtained by cryogenic electron microscopy showing the double membrane vesicle pore complex from murine hepatitis coronavirus provides first insights into the shape of the complex. An experiment with green fluorescent protein fused to the N-terminus of nsp3 also revealed the location of the Ubl1 domain within this complex. Those insights are likely also applicable to SARS-CoV-2 [5].

Nidovirus-conserved domain of unknown function and the coronavirus-specific carboxl-terminal domain (CoV-Y)

The last transmembrane domain TM2 is followed by the nidovirus-conserved domain of unknown function (Y1) and the coronavirus-specific carboxl-terminal domain (CoV-Y), which consists of the subdomains Y2 and Y3. Together, they make up the C-terminus of nsp3 with a total length of 362 residues. While Y1 is conserved among the whole order of nidoviruses, CoV-Y is only conserved within coronaviruses [3]. Studies have shown an improved binding of nsp3 to nsp4 if the domains Y1 and CoV-Y are present [3]. This interaction is important for the formation of double membrane vesicles, which in turn is important for the protection against proteases from the host cell. Additionally, possible interactions between the C-terminal domains of nsp3 and several other viral nsps have been shown in various in vitro experiments (Figure 2), including yeast two-hybrid screening, protein complex immunoprecipitation, and GST pull-down assays, although these findings might not apply in vivo [3]. So far, only one structure of the Y3 domain (PDB ID: 7RQG) has been deposited andat the time of writing no publication for this PDB entry has been made available. High sequence identity of 90% was shown for the alignment between SARS-CoV-2 Y3 and the C-terminal region of SARS-CoV-1 nsp3. A sequence alignment between both viruses for the proposed region of Y1 and Y2 domains showsa sequence identity of 88%. The observed high sequence similarity, in comparison to other coronaviruses, suggests an important functional role [38]. No comparable structures from other viruses have been published, so no further comparison could be drawn.

Summary

With 1945 amino acids, non-structural protein 3 is the largest protein of SARS-CoV-2 and consists of up to 17 domains. An overview is given in Table 1. However, to date few domains have been studied in great detail. Exceptions are Mac1 and PL2pro, involved in host immune response evasion, and in the viral replication process, respectively. In addition to these two, only structures of Ubl1, Ubl2, NAB and Y3 have been determined so far. While PL2pro cleaves nsp1 to nsp3 from the polyproteins pp1a and pp1ab, Mac1 removes post-translational modifications applied by the host, leading to an interference with the host’s signalling system, including its reaction. The ubiquitin-like domain 1, known as Ubl1, interacts with the nucleocapsid protein. The structurally similar Ubl2 is seen as a subdomain of PL2pro and probably supports its activity, although the details are not clear. The nucleic acid-binding domain (NAB) was shown to bind DNA and RNA, but its exact function has not been revealed yet. From the C-terminus of nsp3 only the Y3 domain has been solved and studies indicate it could be involved in the formation of double membrane vesicles. In addition, several domains are also present in nsp3 and are either disordered or their structures await to be solved. From those, only functions of the transmembrane domains and the SUD-region, consisting of the domains Mac2, Mac3, and DPUP, were suggested. While the transmembrane domains anchor nsp3 to the double membrane vesicles, the domains of the SUD-region are involved in the enhancement of viral replication by binding to oligo(G)/G-quadruplexes and the human Paip1.
Various interactions between the remaining domains of nsp3 with other viral or host proteins have been observed. Some domains such as those of the SUD-region have been demonstrated to be indispensable or of high importance to SARS-CoV-1 replication and might therefore play a more important role in SARS-CoV-2 than commonly thought. A lot of effort has gone into researching the major drug target domains of nsp3 so that their structures and functions are well established. But to complete the picture of the role of nsp3 in the infection cycle, both the overall architecture and the remaining domains need to be understood as well.

Discussion and outlook

Non-structural protein 3 is known to play an essential role in the viral replication cycle, although the functions of the majority of its domains are still under discussion. Extensive research has been carried out on the domains Mac1 and PL2pro, which have been identified as promising drug targets. However, to fully understand the role of nsp3 in infection, the structures of the other domains need to be solved and the overall architecture of nsp3 resolved. Additionally, all interactions between nsp3 and other non-structural proteins of the virus, as well as all interactions with host proteins and metabolites should be investigated to get a complete picture of the effects this huge protein has on host cells. Only then can we understand all of its functions and its evolution, as well as identify all related drug targets.
A good starting point is to establish the exact role of nsp3 and the arrangement of each domain within the molecular pore complex. This could reveal new or consolidate proposed interactions between domains within nsp3 and could also give an idea for interactions between the replication/transcription complex and the pore complex. Nevertheless, solving the entire protein structure at atomic resolution is difficult as many domains are separated by flexible linkers or disordered regions. This is exacerbated by the assembly into a large multimeric membrane spanning pore complex as the only known biologically active form. These difficulties leave currently low resolution cryo-EM imaging and electron tomography as the only viable experimental options, which in turn however can be combined with computational methods such as structure prediction and integrative modelling. A question still waiting to be answered concerns the unusual length of nsp3. Polypeptides of immense lengths have higher chances of accumulating errors from translation or during folding, leading to misfolded molecules [39]. Viruses producing large proteins would therefore require a mechanism to prevent or counteract these errors or to ensure a sufficient number of copies of this protein. Nevertheless, all domains of nsp3 being part of a single polypeptide seems to be advantageous in all betacoronaviruses, since the containment of all those functional domains within one polypeptide is well preserved, although only few mutations would be necessary to let a PL2pro or Mpro cleavage site emerge. One possible selection pressure could lie on the pore complex of the double membrane vesicles, as each pore consists of a hexameric complex, where each unit is made of nsp3, nsp4, nsp6 and an unknown number of additional components [5]. If all domains were individual non-structural proteins, their self-arrangement into such a complex could become a very rare event, if it would happen at all. Having 17 of those domains connected already, however, minimizes the number of moving parts and limits the number of possible arrangements of components within the complex. And although most domains remain functional as isolated proteins, their colocalization might allow for so far unknown functions. The nonstandard naming conventions of the individual domains make surveying current literature difficult, hence we give an overview of all synonyms in Table 1; a consistent naming scheme, as well as clear definitions of where each domain begins and where it ends within nsp3, would further increase understanding and avoid confusion in coronavirus nsp3 research. In conclusion, research focus should move away from individual nsp3 domains to looking at interactions and colocalization of multiple domains within the pore complex with the aim of answering open questions concerning the multi-layered mechanisms of coronavirus replication.

Acknowledgements

The authors would also like to thank Rosemary Wilson and Caitlin Hatton for support and discussion. All figures are courtesy of the Coronavirus Structural Task Force (insidecorona.net) who retains the rights for text and figures. L. C. v. S. and M. E. both contributed equally in writing this review and share first authorship. The decision on the first author was made by a game of rock-paper-scissors.

Funding

This work was supported by the German Federal Ministry of Education and Research [grant number 05K19WWA], Deutsche Forschungsgemeinschaft [project TH2135/2-1].

This blog post was published in Crystallography Reviews.
Please cite: https://doi.org/10.1080/0889311X.2022.2098281

References

  1. Yoshimoto FK. The proteins of severe acute respiratory syndrome coronavirus-2 (SARS CoV-2 or n-COV19), the cause of COVID-19. Protein J. 2020;39(3):198–216.
  2. Rut W, Lv Z, Zmudzinski M, et al. Activity profiling and crystal structures of inhibitor-bound SARS-CoV-2 papain-like protease: a framework for anti–COVID-19 drug design. Sci Adv. 2020;6:eabd4596.
  3. Lei J, Kusov Y, Hilgenfeld R. Nsp3 of coronaviruses: structures and functions of a large multi- domain protein. Antiviral Res. 2018;149:58–74.
  4. Hagemeijer MC, Monastyrska I, Griffith J, et al. Membrane rearrangements mediated by coronavirus nonstructural proteins 3 and 4. Virology. 2014;458–459:125–135.
  5. Wolff G, Zheng S, Koster AJ, et al. A molecular pore spans the double membrane of the coronavirus replication organelle. Science. 2020;369:1395–1398.
  6. Korn SM, Dhamotharan K, Fürtig B, et al. 1H, 13c, and 15N backbone chemical shift assignments of the nucleic acid-binding domain of SARS-CoV-2 non-structural protein 3e. Biomol NMR Assign. 2020;14:329–333.
  7. Schuller M, Correy GJ, Gahbauer S, et al. Fragment binding to the Nsp3 macrodomain of SARS-CoV-2 identified through crystallographic screening and computational docking. Sci Adv. 2021;7:eabf8711.
  8. Salvi N, Bessa LM, Guseva S, et al. 1H, 13c and 15N backbone chemical shift assignments of SARS-CoV-2 nsp3a. Biomol NMR Assign. 2021;15:173–176.
  9. Krogh A, Larsson B, von Heijne G, et al. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes11. J Mol Biol. 2001;305:567–580.
  10. Carlson CR, Asfaha JB, Ghent CM, et al. Phosphoregulation of phase separation by the SARS-CoV-2 N protein suggests a biophysical basis for its dual functions. Mol Cell. 2020;80:1092–1103.e4.
  11. Serrano P, Johnson MA, Almeida MS, et al. Nuclear magnetic resonance structure of the N-terminal domain of nonstructural protein 3 from the severe acute respiratory syndrome coronavirus. J Virol. 2007;81:12049–12060.
  12. Chou C-C, Wang AH-J. Structural D/E-rich repeats play multiple roles especially in gene regulation through DNA/RNA mimicry. Mol BioSyst. 2015;11:2144–2151.
  13. Munnur D, Bartlett E, Mikolčević P, et al. Reversible ADP-ribosylation of RNA. Nucleic Acids Res. 2019;47:5658–5669.
  14. Alhammad YMO, Fehr AR. The viral macrodomain counters host antiviral ADP-ribosylation. Viruses. 2020;12:384.
  15. Rack JGM, Zorzini V, Zhu Z, et al. Viral macrodomains: a structural and evolutionary assessment of the pharmacological potential. Open Biol. 2020;10:200237.
  16. Alhammad YMO, Kashipathy MM, Roy A, et al. The SARS-CoV-2 conserved macrodomain is a mono-ADP-ribosylhydrolase. J Virol [Internet]. 2021 [cited 2022 Feb 23];95. Available from: https://journals.asm.org/doi/10.1128/JVI.01969-20.
  17. Michalska K, Kim Y, Jedrzejczak R, et al. Crystal structures of SARS-CoV-2 ADP-ribose phosphatase: from the apo form to ligand complexes. IUCrJ. 2020;7:814–824.
  18. Li C, Debing Y, Jankevicius G, et al. Viral macro domains reverse protein ADP-ribosylation. J Virol. 2016;90:8478–8486.
  19. Eckei L, Krieg S, Bütepage M, et al. The conserved macrodomains of the non-structural proteins of chikungunya virus and other pathogenic positive strand RNA viruses function as mono-ADP-ribosylhydrolases. Sci Rep. 2017;7:41746.
  20. Lin M-H, Chang S-C, Chiu Y-C, et al. Structural, biophysical, and biochemical elucidation of the SARS-CoV-2 nonstructural protein 3 macro domain. ACS Infect Dis. 2020;6:2970–2978.
  21. Lei J, Ma-Lauer Y, Han Y, et al. The SARS-unique domain (SUD) of SARS-CoV and SARS-CoV-2 interacts with human Paip1 to enhance viral RNA translation. EMBO J [Internet]. 2021 [cited 2022 Feb 23];40. Available from: https://onlinelibrary.wiley.com/doi/10.15252/embj.2019102277.
  22. Gallo A, Tsika AC, Fourkiotis NK, et al. 1H,13C and 15N chemical shift assignments of the SUD domains of SARS-CoV-2 non-structural protein 3c: ‘the N-terminal domain-SUD-N’. Biomol NMR Assign. 2021;15:85–89.
  23. Gallo A, Tsika AC, Fourkiotis NK, et al. 1H,13C and 15N chemical shift assignments of the SUD domains of SARS-CoV-2 non-structural protein 3c: ‘the SUD-M and SUD-C domains’. Biomol NMR Assign. 2021;15:165–171.
  24. Derry MC, Yanagiya A, Martineau Y, et al. Regulation of poly(A)-binding protein through PABP-interacting proteins. Cold Spring Harbor Symp Quant Biol. 2006;71:537–543.
  25. Kusov Y, Tan J, Alvarez E, et al. A G-quadruplex-binding macrodomain within the ‘SARS-unique domain’ is essential for the activity of the SARS-coronavirus replication–transcription complex. Virology. 2015;484:313–322.
  26. Tan J, Vonrhein C, Smart OS, et al. The SARS-unique domain (SUD) of SARS coronavirus contains two macrodomains that bind G-quadruplexes. PLoS Pathog. 2009;5:e1000428.
  27. Ma-Lauer Y, Carbajo-Lozoya J, Hein MY, et al. P53 down-regulates SARS coronavirus replication and is targeted by the SARS-unique domain and PL pro via E3 ubiquitin ligase RCHY1. Proc Natl Acad Sci USA. 2016;113:E5192–E5201.
  28. Freitas BT, Durie IA, Murray J, et al. Characterization and noncovalent inhibition of the deubiquitinase and deISGylase activity of SARS-CoV-2 papain-like protease. ACS Infect Dis. 2020;6:2099–2109.
  29. Gao X, Qin B, Chen P, et al. Crystal structure of SARS-CoV-2 papain-like protease. Acta Pharm Sin B. 2020;11:237–245. S2211383520306985.
  30. Shin D, Mukherjee R, Grewe D, et al. Papain-like protease regulates SARS-CoV-2 viral spread and innate immunity. Nature. 2020;587:657–662.
  31. Osipiuk J, Azizi S-A, Dvorkin S, et al. Structure of papain-like protease from SARS-CoV-2 and its complexes with non-covalent inhibitors. Nat Commun. 2021;12:743.
  32. Klemm T, Ebert G, Calleja DJ, et al. Mechanism and inhibition of the papain-like protease, PLpro, of SARS-CoV-2. EMBO J [Internet]. 2020 [cited 2020 Dec 9];39. Available from: https://onlinelibrary.wiley.com/doi/10.15252/embj.2020106275.
  33. Croll TI, Diederichs K, Fischer F, et al. Making the invisible enemy visible. Nat Struct Mol Biol. 2021;28:404–408.
  34. Báez-Santos YM, John SES, Mesecar AD. The SARS-coronavirus papain-like protease: struc- ture, function and inhibition by designed antiviral compounds. Antiviral Res. 2015;115:21–38.
  35. Neuman BW, Joseph JS, Saikatendu KS, et al. Proteomics analysis unravels the functional repertoire of coronavirus nonstructural protein 3. J Virol. 2008;82:5279–5294.
  36. Oostra M, Hagemeijer MC, van Gent M, et al. Topology and membrane anchoring of the coro- navirus replication complex: not all hydrophobic domains of nsp3 and nsp6 are membrane spanning. J Virol. 2008;82:12392–12405.
  37. Harcourt BH, Jukneliene D, Kanjanahaluethai A, et al. Identification of severe acute respiratory syndrome coronavirus replicase products and characterization of papain-like protease activity. J Virol. 2004;78:13600–13612.
  38. Neuman BW. Bioinformatics and functional analyses of coronavirus nonstructural proteins involved in the formation of replicative organelles. Antiviral Res. 2016;135:97–107.
  39. Allan Drummond D, Wilke CO. The evolutionary consequences of erroneous protein synthesis. Nat Rev Genet. 2009;10:715–724.
  40. Tomasello G, Armenia I, Molla G. The protein imager: a full-featured online molecular viewer interface with server-side HQ-rendering capabilities. Bioinformatics. 2020;36:2909–2911.

Oliver Kippes, Andrea Thorn & Gianluca Santoni

This blog post was published in Crystallography Reviews.
Please cite: https://doi.org/10.1080/0889311X.2022.2072835

Abstract

The main focus of drug development against COVID-19 is on the spike protein and proteases. However, such drugs can be problematic because of mutations (in the case of the spike protein) and harmful to cellular homologs (in case of the proteases). Here, we review a viral protein that due to its conserved and multifunctional nature may be an alternative drug target: SARS-CoV-2 nucleocapsid. This protein consists of two ordered and three disordered domains, all of which exhibit RNA binding activity and are important for ribonucleoprotein complex assembly. This complex protects the viral RNA and is important for viral replication. Nucleocapsid might also be connected to modulation of the host cell cycle, replication, translation, viral assembly, and other parts of the infection cycle. The two ordered domains, the RNA binding domain and the dimerization domain, mediate packaging of the RNA into the ribonucleoprotein complex and bind it to membrane proteins. The actual organization of this complex has not been conclusively verified yet, but the large SARSCoV-2 RNA genome is efficiently packed yet is very flexible. A better understanding of this protein could lead to an efficient therapeutic measure against the virus and would improve our understanding of COVID-19.

Role and function

Severe Acute Respiratory Syndrome Corona Virus 2 (SARS-CoV-2) and the COVID-19 pandemic it triggered are in the focus of international research.Newviralmutations remain a constant thread in the race against the virus and necessitate adaptation to face these new pathogens in an efficient way. Other highly conserved viral protein targets hence are a good opportunity to gain an advantage against the mutations, as they offer a therapeutic or vaccine target [1].
The viral RNA of SARS-CoV-2 encodes many proteins; Accessory and non-structural proteins, which facilitate the viral infection cycle after infection, and four types of structural proteins. Structural proteins are present in the virion to initiate infection and protect the viral RNA: these are spike-protein, envelope-protein, membrane-protein and nucleocapsid. Nucleocapsid is encoded, along with the other structural proteins, by the last third of the viral genome (5 to 3) [2]. Nucleocapsid is 419 amino acid residues long and one of the most conserved proteins of all coronaviruses, with 91% homology to SARS-CoV. In this review, we present the latest structural information and discuss opportunities for exploiting the nucleocapsid for developing novel therapeutics.
The primary role of the nucleocapsid is to protect the viral RNA by packaging it in a ribonucleoprotein complex inside the virion. In the so-called ribonucleoprotein complex, the nucleocapsid proteins need to be oligomerised, which increases the protein activity [3] thus allowing the formation of the complex with the single-stranded viral RNA [4]. Dimerized nucleocapsid is shown to bind RNA with higher affinity, suggesting a structural relevance of the dimerization domain to latch on the RNA and form a stable complex [5]. The protection of the viral RNA makes nucleocapsid an essential protein for the viral infection cycle, and viral assembly [6]. The interactions between four types of structural proteins, in particular between nucleocapsid and membrane protein [7], are also very important for the viral assembly [8]. It has been suggested that nucleocapsidmay also play a role in virus budding. It has been shown to not be strictly required for the budding process [9], but virus-like particle experiments showed that the presence of nucleocapsid proteins can increase the virus-like particle yield, which suggests at least a participation of the protein [10] in the budding process. Furthermore, overexpression of nucleocapsid proteins showed an enhanced replication of SARS-CoV-2 viruses. This phenomenon is explainable through the nucleocapsids ability to antagonize the Interferon 1 mediated antiviral pathway, meaning that the nucleocapsid protein can supress the immune signalling of cells [11]. As an antagonist the Nucleocapsid inhibits phosphorylation of STAT1 and STAT2, which supresses their translocation to the nucleus of the host cell. This means that inhibition of nucleocapsid functionwould not only hinder the viral replication cycle butmay also permit a better immune response of the host cell, making it an excellent drug target. Since the nucleocapsid protein is required for optimal replication of coronaviruses [6,12], it is also possible that the protein is involved in RNA synthesis. This is implied by the importance of the nucleocapsid translation during the stimulation of genomic RNA infection [13] and that SARS-CoV nucleocapsids in an early stage of infection colocalise intracellularly with replicase components [14]. Experiments with the mouse hepatitis virus nucleocapsids also showed that the interaction between non-structural protein 3 (nsp3) and the nucleocapsid protein is important to form a complex on the 3’ end of the Mouse coronavirus genome that initiates RNAsynthesis, thereby promoting infectivity of genomic RNA [15,16]. In addition to its relevance for viral replication, nucleocapsid also acts in host cell regulation, modulating the host cell cycle to stop in the DNA replication phase by regulating cyclin-dependent kinase activity, leading to a halted S phase progression [17]. This blockage of the synthesis phase of cell division is postulated to give the virus enough time to use the cellular raw materials for replication of its genome, viral assembly and budding [18]. Studies on the transmissible gastroenteritis coronavirus and SARS-CoV together with in-silico studies of other coronaviruses measured chaperone activity in SARS-CoV and other coronaviruses showed similar patterns in this field this highly suggests that also SARS-CoV-2 nucleocapsids have RNA chaperone activity. This means that the nucleocapsid helps the virus as a RNA chaperone to enhance ribozyme cleavage, enable rapid and accurate RNA annealing, and to facilitate strand transfer and exchange [19]. Furthermore, the nucleocapsid protein also affects the cell stress responses of the host cells, as elongation factor 1α has been shown to interact with the dimerization domain of SARS-CoV nucleocapsid proteins thereby suppressing translation. The binding of elongation factor 1α also leads to an inhibited cytokinesis [20], i.e. separation of the eukaryotic cell into two daughter cells. While many of these mechanisms have only been demonstrated in SARS-CoV-1, due to the similarity between both nucleocapsids most, if not all, are relevant also for SARS-CoV-2.

Structural overview

The SARS-CoV-2 nucleocapsid consists of five different domains: three intrinsically disordered domains and two ordered domains (Figure 1). Both ordered domains, described below, show RNA binding activity which promotes ribonucleoprotein packaging [21]. Intrinsically disordered domains (IDRs) are challenging for conventional structural characterization, and are mostly researched by molecular simulations [22]. As they affect the conformation of the Nucleocapsid in the ribonucleic protein complex, this means that the precise details of the binding toRNAare still poorlyunderstood. It is still unclear, for example, how this complex is organized in the virus. There are two proposed formations of the complex which will both be described in this review. Two of the intrinsically disordered regions are located at the N- and C-terminus of the protein, with a third located in the middle, linking between the two ordered domains. The two ordered domains are called RNA-binding domain (N terminal) and dimerization domain (C terminal), both of which have been determined by both X-ray diffraction and solution NMR [23,24]. All the available structures are summarized in Table 1.

Ordered domains

The two ordered domains, RNA-binding and dimerization domain, together make up 257 of the 422 residues.Despite the fact that one domain is called the RNA-binding domain the binding epitope is distributed over all five nucleocapsid domains (see Figure 1), including the disordered ones [6].

The RNA-Binding domain

During the viral infection cycle, the RNA-binding domain captures the viral RNA and mediates the packaging of the RNA into the ribonucleoprotein complex [25]. The RNA binding domain is rich in aromatic and basic residues ordered into a right-hand-like shape with a protruded basic finger, a basic palm, and an acidicwrist (see Figure 2(A)). The SARSCoV- 2 nucleocapsid structures show similar loops that surround the β-sheet core in a sandwiched structure. The β-sheet core consists of four antiparallel β-strands, a short 310- helix in front of the β2-strand and a protruding β-hairpin that is located between the β2- and β5-strands (see Figure 2(B)). The structural basis for RNA binding by nucleocapsid is not yet known, but comparisons with the less pathogenic virus type human coronavirus OC43 suggest that SARS-CoV and SARS-CoV-2 have a unique potential RNA-binding pocket beside the β-sheet core [26,27]. Another position for the RNA-binding site could be between the basic fingers and the palm region. Analysis of the electrostatic potential reveals a highly positively charged cleft, which could bind RNA. NMR-based titration with single-stranded RNA revealed that the RNA interacted with the residues L56, G60, K61, K65, F66, A90, R93, I94, R95, K102, D103, L104, T165, T166, G175, and R177, forming a U-shaped binding epitope. The structures determined in this experiment are saved in the PDB under the IDs 6YI3, 7ACT and 7ACS [28]. Also, simulations suggested that residues T57, H59, S105A, R107A, G170, F171 and Y172 are interesting targets for drug research due to their simulated connection with RNA binding of SARS-CoV-2 [25].

Structural biology of SARS-CoV-2 nucleocapsid 11

Figure 1. Schematic overview of the SARS-CoV-2 Nucleocapsid domains with structural examples below. The protein consists of two ordered domains, the RNA-binding domain and the Dimerization domain, connected with a disordered linker. At the ends of the protein are the intrinsically disordered N-terminal arm and the C-terminal tail. Creator: Coronavirus Structural Task Force - Protein Imager, Oliver Kippes License: cc-by-sa

The dimerization domain

The nucleocapsid dimerization domain anchors nucleocapsid to the viralmembrane inside the virion and the RNA binding affinity of the dimerization domain allows a physical linkage between the viral membrane and the RNA [25]. The dimerization domain consists of three 310-helices, five α-helices and two antiparallel β-strands, which create a β-hairpin. This β-hairpin forms a C-shape together with other parts of the dimerization domain. Two domains fromtwo nucleocapsidmolecules forma tight homodimer with a rectangular slab shape, with the β-hairpins from each nucleocapsid protein on one side and the helices on the opposite side. The dimer is stabilized through hydrogen bonds and hydrophobic interactions. The dimerization domain is only stable when several nucleocapsidmolecules form a dimer or oligomer and the domain arranges self-association into tetramer, hexamer and higher oligomeric forms [29]. It can be assumed that this dimerization is a driving force for the viral ribonucleoprotein assembly [30]. An electrophoretic mobility shift assay showed that the amount of 17-mer single stranded SARS-CoV-2 RNA oligonucleotides decreased when put together with the recombinantly produced dimerization domain protein of the SARS-CoV-2 nucleocapsid protein [26,31]. This shows that the dimerization domain also has RNA binding activity. As of the writing of this review, the PDB has 26 SARS-CoV-2 nucleocapsid structures and 6 SARS-CoV structures, with the majority of the structures showing the RNA-binding and the dimerization domains. There are also some special structures that show the RNA-binding domain in complex with double-stranded RNA and structures that show short segments from the protein. Most of the structures are based on X-ray diffraction data and a few on NMR data and all of them are listed and commented on in Table 1.

Structural biology of SARS-CoV-2 nucleocapsid 12

Figure 2. Structures of the RNA-binding domain (A: PDB 7CDZ, B: 6VYO). A: Electostatic surface of the RNA-binding domain, red shows the negative charge potential and blue the positive charge potential. The protein consists of an acidic wrist,the basic finger and the basic palm. The area between the basic finger and the basic palm is marked as a possible RNA binding site. B: The core of the protein consists of four antiparallel β-strands (β1-β2-β5-β6) together with a short 310-helix in front of the β2-strand. The core is surrounded by loops that enclose the core. The basic finger is formed by a β hairpin (β3- β4). Creator: Coronavirus Structural Task Force - Protein Imager, Oliver Kippes License: cc-by-sa

Table 1. PDB entries for nucleocapsid. For each entrywe list ID, technique, a comment on its importance
as well as an evaluation of the most intense Fourier difference peaks as calculated from Coot as well as a statement on the model quality.

SARS-CoV-2
IDMethodResolution in ÅDescriptionComment about the highest Fourier difference peak from Coot
6m3mX-ray diffraction2.7RNA-binding domain. It consists of four identical monomers. The structure displays a great overall similarity to other nucleocapsid protein RNA-binding domains but exhibits a unique potential RNA-binding pocket alongside the β-sheet core.Near C-terminal Ala 174, disordered conformer for this residue.
6vyoX-ray diffraction1.7RNA-binding domain. The structure consists of four identical monomers and exhibits interactions with the ligands Cl, Zn2+, Glycerol, and 2-(N-morpholino)ethanesulfonic acid. The N-terminal domain provides structural features for RNA binding.Probable unmodeled buffer molecule near Thr 54. Glycerol binding seems questionable.
6wjiX-ray diffraction2.05Dimerization domain. The structure consists of six identical monomers forming three homodimers and exhibits interactions with the ligand Cl. The C-terminus provides structural features for oligomerisation.Highest FoFc peaks all seem simply related to poorly resolved sidechains of surface residues.
6wkpX-ray diffraction2.67RNA-binding domain in a monoclinic crystal form. The structure consists of four identical monomers and exhibits interactions with the ligands Zn2+ and 2-(N-morpholino)ethanesulfonic acid. The N-terminal domain provides structural features for RNA binding.3 peaks, the highest is a 5.2 σ peak near Glu118 and is due to a missing water molecule?.
6wzoX-ray diffraction1.42Dimerization domain in a triclinic (P1) crystal form. In solution, this structure was found to build a homodimer, but it is also proposed to be involved in tetramer formation. The structure 6wzo and 6wzq is a tetramer of two homodimers and 6wzq exhibits interactions with the ligand SO4 2−.52 peaks in the map. Highest at 6.97σ is an alternative conformer for Thr334 in chain A.
6wzqX-ray diffraction1.4529 peaks. Many are clustered around residues 280-284, suggesting a wrong modelling of this loop in chain B,C, and D.
6yi3SOLUTION NMR Monomeric RNA-binding domain. The N-terminal domain provides structural features for RNA binding. The selection criteria for this structure was the fact that it has the least restraint violations. 
6yunX-ray diffraction1.44Dimerization domain in an orthorhombic (P212121) crystal form. The structure is a homodimer and provides structural features for oligomerisation.45 peaks above 5σ. A 12.4σ peak shows a wrongly positioned sidechain for Arg31 in chain A. Alternate conformers are also missing from multiple residues (e.g. Arg11 or Ser14)
6zcoX-ray diffraction1.36Dimerization domain in an orthorhombic (I41) crystal form. The structure is a monomer and provides structural features for oligomerisation.7 peaks, unmodeled water molecules.
7acsSOLUTION NMR N-terminal RNA-binding domain in complex with 7mer dsRNA. 
7actSOLUTION NMR N-terminal RNA-binding domain. It is in complex with single-stranded RNA 5'-UCUCUAAACG-3'. It demonstrates the binding capability of the charged binding groove. 
7c22X-ray diffraction2.0Dimerization domain in a triclinic (P1) crystal form. The structure is a tetramer of two identical homodimers and exhibits interaction with ligands diethylene glycol and acetate ion.5 peaks, the strongest (5.3σ) could suggest an alternative conformation of Met 317 /D
7cdzX-ray diffraction1.8RNA-binding domain. Part of a discussion about possible ways to form the ribonucleoprotein complex.Badly modelled regions highlighted by 35 peaks. Chain A: loop 98-104 and 10 residues at C-ter ; ChainB 98-104. Overall modelling seems questionable.
7ce0X-ray diffraction1.39Dimerization domain in dimerised form. The group published this structure together with an N-terminal domain structure to discuss possible ways how the ribonucleoprotein-complex could form.75 peaks highest at 13 σ, overall bad fit between map and model.
7de1X-ray diffraction2.0Dimerization domain in dimerized form. Structure was used to show that the dimerization domain has a role in viral RNA binding and transcriptional regulatory sequences.17 peaks, no major faults with this structure.
Structures containing peptides from SARS-CoV-2
7kgoX-ray diffraction2.15Human leukocyte antigen HLA-A*0201 in complex with the nucleocapsid peptides 351-359, 316-324, 222-230, 159- 167, 128- 146, 226- 234 and Human leukocyte antigen HLA-B*0702 in complex with the nucleocapsid epitope SPRWYFYYL. HLA-A*0201 and HLA-B*0702 consists of an MHC-class-I antigen and a Beta-2-microglobulin. The peptides are derived from the nucleocapsid and bound between the alpha-helices. The structures were solved to determine if and how nucleocapsid derived SARS-CoV-2 peptides trigger CD8+ T-cell immune responses. 
7kgpX-ray diffraction1.396 
7kgqX-ray diffraction1.34 
7kgrX-ray diffraction1.55 
7kgsX-ray diffraction1.58 
7kgtX-ray diffraction1.9 
7lg0X-ray diffraction2.296 
7ltuX-ray diffraction1.126-residue segments from nucleocapsid. Shows the three 5 residue segments that are connected to liquid-liquid phase separation and drive amyloid fibril formation. 7ltu and 7lux have the residue chain AALALL. 7luz has the residue chain GQTVTK and 7lv2 has the residue chain GSQASS. Peptide structures with negligible, if any, difference peaks.
7luxX-ray diffraction1.3
7luzX-ray diffraction1.1
7lv2X-ray diffraction1.3
SARS-CoV 
1sskSOLUTION NMR RNA-binding domain. It is a monomer with a length of 158 amino acids and exhibits a five-stranded β-sheet. This structure binds single-stranded RNA to enable the packaging of the viral genome RNA into a helical ribonucleocapsid (RNP). 
2cjrX-ray diffraction2.5This crystal structure shows the dimerization domain. Interactions within this structure stabilise the oligomerisation. Crystal packaging of the octamer forms the helical structure of the nucleocapsid. The structure also has RNA-binding activity.3 peaks present. A 6.4 σ shows a wrong conformation for Ile331 / E
2gibX-ray diffraction1.75This crystal structure shows the dimerization domain. The structure is a dimer of two identical monomers and exhibits interactions with the ligand SO42−. Strong interactions between both subunits suggest that the dimeric form is a functional unit.High negative peaks on Glu324 and 368 suggest possible radiation damage.
2jw8SOLUTION NMR Dimerization domain. The structure is a homodimer and was modelled by the stereo-array isotope labelling (SAIL) method to determine a high-quality solution structure. The selection criteria for this structure was the fact that it has the least restraint violations. 
2ofzX-ray diffraction1.17These crystal structures show the RNA-binding domain. The domain provides structural features for RNA binding, the study tried to identify important residues of the protein to understand the RNA binding mechanism. They used comparisons with the homologous avian infectious bronchitis virus. 2ofz shows a monomer in a monoclinic crystal form and exhibits interactions with the ligand 1,2-Ethandiol.56 peaks, clear indication of radiation damage on Glu.
2og3X-ray diffraction1.8518 peaks. At C ter a peak of 8.9 σ indicates a missing residue.

Disordered domains

Intrinsically disordered domains generally feature a larger number of polar and charged aminoamides compared to ordered domains. The electrostatic repulsion together with a lack of stabilized hydrophobic cores prevents them from assuming a well-defined structure.
Despite the disordered nature of the N- and C-terminal regions, it is possible that they form transient helices made up of various residues [22]. The helices from the N-terminal domain and the flexible linker flank the RNA- binding domain and organize arginine residues to flank the same direction, and they drive the RNA binding. Transient helices from the C-terminal domain showed positive charges which are critical for protein RNA interactions, they also mediate membrane protein binding in other corona viruses, so it is likely that they have a similar function in SARS-CoV-2. The N-terminal conformation is affected significantly by the neighbouring ordered RNA-binding domain through electrostatic repulsion of the positively charged N-terminal domain from its positive surface and an attraction of the slightly negatively charged parts. This could lead to engagement of RNA [22]. The linker domain is composed of residues 174–246. It incorporates polar regions which are repelled by the neighbouring ordered domains. A positively charged serine- and arginine-rich motif is likely to function as a phosphorylation site for a direct interaction with RNA, viral membrane proteins, and nsp3 [22]. Simulations suggest that the linker either contributes to oligomerisation or acts as a recognitionmotif for the binding of other proteins. Intrinsically disordered regions could be involved in a number of regulatory functions including modulation of transcription, translation, post-translational modifications such as phosphorylation, and cell signalling, through to ordering when in contact with another protein domain [6].
The intrinsically disordered regions of SARS-CoV-2 nucleocapsid are thought to be responsible for liquid-liquid phase separation, which is an important process in eukaryotic cells. This also comes with certain dangers: liquid-liquid phase separation tends to concentrate proteins which have the tendency to trigger aggregation processes or jamming in high concentration, leading to the prevention of chemical reactions [32,33, p.]. It involves the formation of a macromolecule-rich, fluid compartment that is separate from the cytosol without a membrane layer. One can think of these regions as droplets or granules within the cytosol. The flexible linker of nucleocapsid shows residue similarities with other proteins that drive liquid–liquid phase separation [21]. During this process nucleocapsid forms amyloid-like fibrils which may encapsulate RNA during viral replication. Further research with fibril inhibitors in SARS-CoV-2 infected cells could give more insight into the actual functions of these fibrils. Due to the occurring neurological complications with COVID-19 and the connection between amyloid fibrils and illnesses such as dementia and Parkinson, the inhibition of this fibril formation could be relevant for better understanding of both COVID-19 and long COVID illnesses as well as therapeutic strategies [34].

Ribonucleoprotein complex

In order to protect the viral RNA, nucleocapsid must interact with the nucleic acid, which is preferentially mediated by GGG motifs from the leader RNA sequences [35], and the nucleocapsids need to oligomerise. According to the SARS-CoV nucleocapsid, the protein interacts with the RNA at multiple sites through the negatively charged phosphate backbone of RNA and the positively charged groove formed by the residues 248 through 280 of nucleocapsid (dimerization domain). However, the exact mechanism is unknown. It is also suspected that the nucleocapsid helps in RNA folding [6,36]. Inspections of tomogram slices showed a ‘G-shaped’ architecture of ribonucleoprotein complexes, with 15nm in diameter and 16 nm in height. Further 2D classification of the ribonucleoprotein complex revealed three classes of complexes, hexagonally and triangularly packed, and closely packed against the envelope. 3D refinement then came to two assembly models whose appearances differ in the virion shape. Spherical virions seem to have a higher number ofmembrane-proximal shaped ribonucleoprotein complex assemble (‘eggs-in-a-nest’) (see Figure 3(A)) and ellipsoidal virions have a higher number of membrane free shaped ribonucleoprotein complex assemblies (‘pyramid’). It was also observed that pyramid shaped complexes can assemble into ‘eggs-in-a-nest’ shaped complexes. The native ribonucleoprotein complexes are highly heterogenous and densely packed, as well as being locally ordered in the virus. These may act with the RNA in a ‘beads on a string’ (see Figure 3(B)) stoichiometry [37]. Another possible organization of the ribonucleoprotein complex is the helical shape as in SARS-CoV. In the helical arrangement the RNA is surrounded by nucleocapsids in an octamer formation, the complex shows a positively charged surface that could bind RNA via electrostatic interactions. The complex shows a twin helix with the octamers formed by two tetramers that are wound around each other.
The RNA binding domain alternates between one protomer in the inner side of the helical core and one protomer on the outer side of the helix [35,36]. The exact organization of the ribonucleoprotein complex in SARS-CoV-2 is not known, but it seems likely that it shows a different organization than SARS-CoV. However, the reasons for this organization change are not known yet.

Nucleocapsid as therapeutic target

Themultifunctional and conserved nature of the nucleocapsidmakes it an interesting drug target, but research is hindered by its disordered nature and the resulting lack of a complete atomic structure. Inhibition of its functions could disturb the viral infection cycle and improve the host immune response. A promising strategy for this is inhibition of the structurally well-established RNAbinding domain and, through this, prevention of ribonucleotide complex formation. Inhibition of the RNA-binding domain prevents viral replication [38]. The analysis of nucleocapsid protein from human coronavirus OC43 compared with SARS-CoV-2 suggests distinct ribonucleotide binding patterns between the proteinmolecules, and through this a potential inhibition pocket in SARS-CoV-2 could be identified [26]. This pocket lies alongside the β-sheet core (Figure 4). Drug candidate PJ34 is a potent inhibitor of RNA-binding activity in human coronavirus OC43. It binds to residues 48N, 49N, 50T, 51A, 110Y, 112Y like AMP and fits into the ribonucleotide-binding pocket of the RNA-binding domain(residues R88, T91, R93, R107, Y109, Y111, R149) [38]. A comparison between the HCoV-OC43 and SARS-CoV-2 nucleocapsid structures shows that the key residues S41, F53, Y109, Y111, and R149 (SARS-CoV-2 numbering) are conserved, which means that the human coronavirus OC43 and SARS-CoV-2 have similar binding pockets [27]. A second strategy would be the prevention of oligomerisation or inducing an abnormal aggregation. Experiments with MERS-CoV showed that the inhibitor 5-benzyloxygramine (P3) stabilizes protein–protein interactions between MERS nucleocapsids, leading to an abnormal full-length oligomerization [39]. P3 showed antiviral activity against MERSCoV in a non-native dimeric configuration. P3 targets the interface of the dimeric Nterminal domain and binds two hydrophobic pockets in two N-terminal domains. The important residues for the hydrophobic pockets in MERS-CoV are W43 and F135 on monomer 1 and G104, T105, G106 and A109 on monomer 2; these residues are highly conserved [39]. A comparison between SARS-CoV-2 and MERS-CoV also showed that the responsible residues are conserved, except for F135 in MERS-CoV which is replaced by I146 in SARS-CoV-2 [39].

Structural biology of SARS-CoV-2 nucleocapsid 13

Figure 3. Two proposed organizations of the ribonucleoprotein complex in the virion. A: Shows the ‘eggs-in-a-nest’ organization that is also called ‘beads on a string’, the nucleocapsid proteins bind to the RNA and oligomerise like beads on a string. B: shows the helical organization which was observed in SARS-CoV, the nucleocapsids organize in a decamer around the RNA. The RNP number indicates the number of nucleocapsid proteins in a single bead or helix turn respectively. Creator: Coronavirus Structural Task Force - Protein Imager, Oliver Kippes License: cc-by-sa

Coronaviruses also show a correlation between the nucleocapsid and the non-structural protein 3. An interaction between these two proteins seems to be essential for the virus to enhance infectivity [15] andvirus replication [40]. The gRNAof coronaviruses is onlyminimally infectious upon transfection into host cells. This increases with the co-transfection of mRNA that translates into nucleocapsids becoming active [15]. Experiments showed that the gRNA infectivity cannot be initiated if the ubiquitin-like domain of non-structural protein 3 is mutated. NMR titration showed that the nucleocapsid-non-structural protein 3 complex involves residues fromthe ubiquitin-like domain 1 and the Serine-Arginine rich region of the nucleocapsid [40,p.3] (SARS-CoV-2 residues: 176–206). The interactionmay also be important for the transcription of the virus. The nucleocapsid of themouse hepatitis virus can bind to the transcriptional regulatory sequence RNA which prevents the formation of the nucleocapsid-non-structural protein 3 complex. This competition between the non-structural protein 3 and the transcriptional regulatory sequence is an indicator for a viral transcription and replication switch. It was shown that with the phosphorylation of the serine rich region of the mouse hepatitis virus the binding affinity to the Ubiquitin like domain 1 is decreased [40]. Similar mechanisms could be present in SARS-CoV-2 but have not been investigated so far. An in silico study tried to predict the possible residues of SARS-CoV-2 that could be responsible for such interactions in the virus. The residues S188, S190, R191, N192, R195, S197, T198, P199, G200, S201, K237, G238, Q239, Q241, G243, Q244, T245, V246, T247, K248, F314, P309, S310, A311, S312, and A313 from the non-structural protein 3 and the residues S183, S184, R185, S186, S187, S188, R189, S190, R191, S193, S194, R195, and N196 from SARS-CoV-2 are likely to be involved in this proposed interaction [41]. Further studies are urgently needed. Inhibition of this interaction could be another therapeutic strategy.

Structural biology of SARS-CoV-2 nucleocapsid 14

Figure 4. Structure of the dimerization domain in monomer form A (PDB:7C22) and dimerized form B (PDB: 2GIB). A: The dimerization domain shows the order η1-α1-α2-η2-α3-α4-β1-β2-α5-η3. The two antiparallel β-strands form a C shape β hairpin. In the dimer form, the beta sheet of one monomer docks in between the equivalent sheet and the alpha helices of the other monomer. B: Dimeric form. The green copy is shown in the same orientation as in A. Creator: Coronavirus Structural Task Force - Protein Imager, Oliver Kippes License: cc-by-sa

Discussion and conclusion

Nucleocapsid is vital to the SARS-CoV-2 infection, but due to its relatively large size (ca. 114 kDa in dimer formation [42]) as well as the presence of disordered regions, it is hard to study and much is yet unknown. Indeed, Nucleocapsid is a reminder about the so-called ‘dark proteome’. Since a large portion of the protein world is disordered the biochemistry of these molecules remains poorly studied and understood. Still, nucleocapsid may be an excellent drug target due to its essential role in the viral infection cycle, immune system repression as well as its low rate ofmutation -a further indication of its fundamental role in SARS-CoV-2 biology. Nucleocapsid could also be an interesting target for vaccines as it is the second protein fromthe virion to interact with the immune system during the infection process and it is much more conserved compared to the spike protein, probably due to its RNA interaction and other vital functions in the infection cycle. We do know that inhibition of nucleocapsid leads to a highly decreased RNA binding affinity. Its essential role includes the formation of the ribonucleotide complex, the structure of which similarly eludes us. Current data suggest two potential structural arrangements: the helical hypothesis and the ‘beads on a string’ hypothesis. Furthermore, there seem to be additional function–structure relations which we do not yet understand. A deeper understanding of the ribonucleoprotein structure will shed light on the viral infection cycle as well as provide an alternative therapeutic strategy against SARS-CoV-2. Assumptions and inferences from SARS-CoV or other coronaviruses should be validated for SARS-CoV-2, in particular the organization of ribonucleoprotein complex and the nsp3–nucleocapsid interaction which is implied from Mouse Hepatitis Virus. Fold-search on the pdb shows that both ordered domains have a fold which is unique to the coronavirus family, and is not shared with other kind of proteins. More structure solutions are urgently needed, especiallywith inhibitor trials to unveil the precise structuralmechanisms of the interaction with single stranded RNA of RNA recognition. As a comparison, there are over 200 structures of spike protein domains from SARS-CoV-2 but less than 40 for nucleocapsid domains. SARS-CoV-2 will likely stay a common pathogen and hence, it is necessary to find long-term solutions against COVID-19. More knowledge about the biology of the virus is urgently needed, and the nucleocapsid has still a lot of secrets which we need to unravel to get there.

This blog post was published in Crystallography Review.
Please cite: https://doi.org/10.1080/0889311X.2022.2072835

Acknowledgements

The authorswould also like to thank JohannesKaub and RosemaryWilson for support and discussion. All figures are courtesy of the Coronavirus Structural Task Force (insidecorona.net). Figures 1, 2 and 3 were produced using the Protein Imager[43].

References

[1] MatchettWE, Joag V, Stolley JM, et al. Nucleocapsid vaccine elicits spike-independent SARSCoV-2 protective immunity. J Immunol [Internet]. 2021 [cited 2022 Jan 13]; Available from: https://www.jimmunol.org/content/early/2021/06/30/jimmunol.2100421
[2] PyrcK, Berkhout B, vanderHoek L. The novel humancoronavirusesNL63 andHKU1. J Virol. 2007;81:3051–3057.
[3] SurjitM, Liu B, Kumar P, et al. The nucleocapsid protein of the SARS coronavirus is capable of self-association through a C-terminal 209 amino acid interaction domain. Biochem Biophys Res Commun. 2004;317:1030–1036.
[4] Zlotnick A. Theoretical aspects of virus capsid assembly. J Mol Recognit. 2005;18:479–490.
[5] Forsythe HM, Rodriguez Galvan J, Yu Z, et al. Multivalent binding of the partially disordered SARS-CoV-2 nucleocapsid phosphoprotein dimer to RNA. Biophys J. 2021;120:2890–2901.
[6] McBride R, van Zyl M, Fielding BC. The coronavirus nucleocapsid is amultifunctional protein.Viruses. 2014;6:2991–3018.
[7] Sturman LS, Holmes KV, Behnke J. Isolation of coronavirus envelope glycoproteins and interaction with the viral nucleocapsid. J Virol [Internet]. 1980 [cited 2022 Jan 17]; Available from: https://journals.asm.org/doi/abs/10.1128/jvi.33.1.449-462.1980
[8] He R, Leeson A, Ballantine M, et al. Characterization of protein-protein interactions between the nucleocapsid protein and membrane protein of the SARS coronavirus. Virus Res. 2004;105:121–125.
[9] Vennema H,Godeke GJ, Rossen JW, et al.Nucleocapsid-independent assembly of coronaviruslike particles by co-expression of viral envelope protein genes. EMBO J. 1996;15:2020–2028.
[10] Siu YL, Teoh KT, Lo J, et al. The M, E, and N structural proteins of the severe acute respiratory syndrome coronavirus are required for efficient assembly, trafficking, and release of virus-like particles. J Virol [Internet]. 2008 [cited 2022 Jan 17]; Available from: https://journals.asm.org/doi/abs/10.1128/JVI.01052-08
[11] Mu J, Fang Y, Yang Q, et al. SARS-CoV-2 N protein antagonizes type I interferon signaling by suppressing phosphorylation and nuclear translocation of STAT1 and STAT2. Cell Discov. 2020;6:1–4.
[12] Zúñiga S, Cruz JLG, Sola I, et al. Coronavirus nucleocapsid protein facilitates template switching and is required for efficient transcription. J Virol [Internet]. 2010 [cited 2022 Jan 17]; Available from: https://journals.asm.org/doi/abs/10.1128/JVI.02011-09
[13] Hurst KR, Ye R,Goebel SJ, et al. An interaction between the nucleocapsid protein and a component of the replicase-transcriptase complex is crucial for the infectivity of coronavirus genomic RNA. J Virol. 2010;84:10276–10288.
[14] Stertz S, Reichelt M, Spiegel M, et al. The intracellular sites of early replication and budding of SARS-coronavirus. Virology. 2007;361:304–315.
[15] Hurst KR, Koetzner CA, Masters PS. Characterization of a critical interaction between the coronavirus nucleocapsid protein and nonstructural protein 3 of the viral replicasetranscriptase complex. J Virol [Internet]. 2013 [cited 2022 Jan 18]; Available from: https://journals.asm.org/doi/abs/10.1128/JVI.01275-13
[16] Züst R, Miller TB, Goebel SJ, et al. Genetic interactions between an essential 3 cisacting RNA Pseudoknot, Replicase gene products, and the extreme 3 end of the mouse coronavirus genome. J Virol [Internet]. 2008 [cited 2022 Jan 18]; Available from: https://journals.asm.org/doi/abs/10.1128/JVI.01690-07
[17] Surjit M, Kumar R, Mishra RN, et al. The severe acute respiratory syndrome coronavirus nucleocapsid protein is phosphorylated and localizes in the cytoplasm by 14-3- 3-mediated translocation. J Virol [Internet]. 2005 [cited 2022 Jan 18]; Available from: https://journals.asm.org/doi/abs/10.1128/JVI.79.17.11476-11486.2005
[18] Surjit M, Liu B, Chow VTK, et al. The nucleocapsid protein of severe acute respiratory syndrome-coronavirus inhibits the activity of cyclin-cyclin-dependent kinase complex and blocks S phase progression inmammalian cells. J Biol Chem. 2006;281:10669–10681.
[19] Zúñiga S, Sola I, Moreno JL, et al. Coronavirus nucleocapsid protein is an RNA chaperone. Virology. 2007;357:215–227. [20] Zhou B, Liu J, Wang Q, et al. The nucleocapsid protein of severe acute respiratory syndrome Coronavirus inhibits cell cytokinesis and proliferation by interacting with translation elongation factor 1α. J Virol. 2008;82:6962–6971.
[21] SARS-CoV-2 nucleocapsid protein phase-separates with RNA and with human hnRNPs. The EMBO Journal. 2020;39:e106478.
[22] Cubuk J, Alston JJ, Incicco JJ, et al. The SARS-CoV-2 nucleocapsid protein is dynamic, disordered, and phase separates with RNA. bioRxiv. 2020;2020.06.17.158121.
[23] Takeda M, Chang C, Ikeya T, et al. Solution structure of the c-terminal dimerization domain of SARS coronavirus nucleocapsid protein solved by the SAIL-NMR method. J Mol Biol. 2008;380:608–622.
[24] Jayaram H, Fan H, Bowman BR, et al. X-ray structures of the N- and C-terminal domains of a coronavirus nucleocapsid protein: implications for nucleocapsid formation. J Virol. 2006;80:6612–6620.
[25] Khan A, Tahir Khan M, Saleem S, et al. Structural insights into the mechanism of RNA recognition by the N-terminal RNA-binding domain of the SARS-CoV-2 nucleocapsid phosphoprotein. Comput Struct Biotechnol J. 2020;18:2174–2184.
[26] Kang S, Yang M, Hong Z, et al. Crystal structure of SARS-CoV-2 nucleocapsid protein RNA binding domain reveals potential unique drug targeting sites. Acta Pharm Sin B. 2020;10:1228–1238.
[27] Structures of the SARS-CoV-2nucleocapsid and their perspectives for drugdesign. TheEMBO Journal. 2020;39:e105938.
[28] Dinesh DC, Chalupska D, Silhan J, et al. Structural basis of RNA recognition by the SARSCoV- 2 nucleocapsid phosphoprotein. PLoS Pathog. 2020;16:e1009100.
[29] Ye Q, West AMV, Silletti S, et al. Architecture and self-assembly of the SARS-CoV-2 nucleocapsid protein. Protein Sci. 2020;29:1890–1901.
[30] Lu S, Ye Q, Singh D, et al. The SARS-CoV-2 Nucleocapsid phosphoprotein forms mutually exclusive condensates with RNA and the membrane-associated M protein. bioRxiv. 2020;2020.07.30.228023.
[31] Zhou R, Zeng R, von Brunn A, et al. Structural characterization of the C-terminal domain of SARS-CoV-2 nucleocapsid protein. Mol Biomed. 2020;1:2.
[32] Hyman AA, Weber CA, Jülicher F. Liquid-liquid phase separation in biology. Annu Rev Cell Dev Biol. 2014;30:39–58.
[33] Wang B, Zhang L, Dai T, et al. Liquid-liquid phase separation in human health and diseases. Signal Transduct Target Ther. 2021;6:290.
[34] Tayeb-Fligelman E, Cheng X, Tai C, et al. Inhibition of amyloid formation of the Nucleoprotein of SARS-CoV-2 [Internet]. 2021 [cited 2022 Jan 18]. p. 2021.03.05.434000. Available from: https://www.biorxiv.org/content/10.1101/2021.03.05.434000v2.
[35] Chang C, Hou M-H, Chang C-F, et al. The SARS coronavirus nucleocapsid protein – forms and functions. Antiviral Res. 2014;103:39–50.
[36] Chen C-Y, Chang C-K, Chang Y-W, et al. Structure of the SARS coronavirus nucleocapsid protein RNA-binding dimerization domain suggests a mechanism for helical packaging of viral RNA. J Mol Biol. 2007;368:1075–1086.
[37] Yao H, Song Y, Chen Y, et al. Molecular Architecture of the SARS-CoV-2 virus. Cell. 2020;183:730–738.e13.
[38] Ren P-X, ShangW-J, Yin W-C, et al. A multi-targeting drug design strategy for identifying potent anti-SARS-CoV-2 inhibitors. Acta Pharmacol Sin. 2022;43:483–493.
[39] Lin S-M, Lin S-C, Hsu J-N, et al. Structure-based stabilization of non-native protein–protein interactions of coronavirus nucleocapsid proteins in antiviral drug design. J Med Chem. 2020;63:3131–3141.
[40] Lei J, Kusov Y, Hilgenfeld R. Nsp3 of coronaviruses: structures and functions of a large multidomain protein. Antiviral Res. 2018;149:58–74.
[41] Khan MT, Zeb MT, Ahsan H, et al. SARS-CoV-2 nucleocapsid and Nsp3 binding: an in silico study. Arch Microbiol. 2021;203:59–66.
[42] ZengW, LiuG,MaH, et al.Biochemical characterization of SARS-CoV-2nucleocapsid protein. Biochem Biophys Res Commun. 2020;527:618–623.
[43] Tomasello G, Armenia I, Molla G. The Protein Imager: a full-featured online molecular viewer interface with server-side HQ-rendering capabilities. Bioinform. 2020;36:2909–2911. https://doi.org/10.1093/bioinformatics/btaa009

David C. Briggs, Luise Kandler, Lisa Schmidt, Gianluca Santoni & Andrea
Thorn

This blog post was published in Crystallography Reviews.
Please cite: https://doi.org/10.1080/0889311X.2023.2173744

Abstract

The coronavirus SARS-CoV-2 is the causative agent for the COVID-19 pandemic. Its proteome is typically separated into three classes of proteins: (1) Structural proteins which facilitate the transport and host cell infiltration of the viral RNA, (2) non-structural proteins which are thought to be essential for the viral life cycle and are all produced from open reading frame 1ab (ORF1ab) on the RNA, and (3) everything else, called accessory proteins. Although it was originally thought that these accessory proteins are non-essential for viral replication, a growing body of evidence suggests that these diverse proteins have crucial roles in virus-host interactions, in particular in the way they interfere with the signalling pathways that modulate the host cell’s response to infection and viral pathogenicity. Here, we summarize efforts to structurally characterize the accessory proteins from SARS-CoV-2.

Introduction

Synthesis of accessory proteins

During host infection, SARS-CoV-2 viral particles bind to the ACE2 receptor on the surface of host cells via their Spike protein. Following entry of the virion into the cell, (either via membrane fusion or endocytosis), the virus sheds its coat and ORF1ab of the positive sense RNA viral genome is translated into protein by host ribosomes. The proteases within the ORF1ab transcript cleave the polyprotein into the Non-structural proteins. A subset of these proteins, nsps 7,8,9,10,12,13,14, and 16, come together to form the replication and translation complex (RTC). The RTC is responsible for initiation of RNA replication, and subgenomic RNA (sgRNA) synthesis. It is these subgenomic RNAs that serve as the template for accessory protein synthesis.
Subgenomic RNAs are synthesized from the 3’ end of the genomic RNA. Synthesis continues until a transcriptional regulatory sequences (TRS) is encountered, at which point, the replication and translation complex may pause and jump to another, complementary transcriptional regulatory sequences. The negative sense transcript is transferred to the complementary leader transcriptional regulatory sequences (through base-pairing interactions) and transcription continues until the RTC reaches the 5’ end of the genome.
The skipping of certain sequences due to the transcriptional regulatory sequences leads to a range of product sizes, all sharing the same leader sequence. Once transcription is complete, the negative sense RNA template now serves as a template for positive strand synthesis, again performed by the replication and translation complex. In turn, the various positive sense subgenomic RNAs produced serve as templates for viralmRNA production, resulting in expression of proteins not encoded by the large polyprotein [1]. Additionally, full-length, positive sense genomic RNA is packaged into newly produced viral particles [2].

Table 1. Summary of SARS-CoV-2 accessory proteins with function and available structures.

Accessory proteinFunctionLength (aa)Known host binding partnersPDB entriesReferences
ORF3aIon channel, Autophagy inhibitor.2756XDC
7KJR
[4]
ORF6Interferon antagonist61mRNA export factor RAE-1, Nup987VPH[5]
ORF7aInhibits host defenses that prevent virion release121Tetherin6W37
7CI3
No publication
[6]
ORF7bUnknown – golgi resident43N/A
ORF8Down regulates MHC
I expression & interferon response
121IL17RA receptor, Spike-protein7JTL
7JX6
7F5F
[7]
no publication
[8]
ORF9bDisrupts Interferon response97TOM706Z4U∗
7DHG
7KDT
No publication
[9, 10]

Location in the SARS-CoV-2 genome

The accessory proteins are six small to medium-sized proteins with a length between 43 and 275 amino acid residues (Table 1). Unlike many of the non-structural proteins that are encoded within the large polyprotein ORF1ab (which is expressed as one long polyprotein chain and proteolytically cleaved to yield individual proteins), accessory proteins are each encoded by their own open reading frames (ORFs). In SARS-CoV-2, these ORFs are found at the 3’ end of the SARS-CoV-2 genome between the structural proteins, or in the case of ORF9b, within an alternative reading frame of the nucleocapsid gene (Figure 1(A)). ORF3a (275 amino acids long) is found eight nucleotides after the end of the gene encoding the Spike protein and before that the envelope protein. ORF6 (61 amino acids), ORF7a (121 amino acids), ORF7b (43 amino acids) and ORF8 (121 amino acids) are located between the genes for the membrane protein and nucleocapsid. The gene for accessory protein ORF9b (97 amino acids) is found within an alternative open reading frame1 ten bases after the start codon for the nucleocapsid gene.
This review covers SARS-CoV-2 accessory proteins as described in Uniprot [3] as having ‘evidence at a protein level’. Other accessory proteins have been proposed to be produced during SARS-CoV-2 infection, however, evidence for their expression is uncertain at the time of writing. This ambiguity exists because while open reading frames are detected in the SARS-CoV-2 genome, evidence of protein expression during infection of host cells (either by western blot or mass spectrometry) is lacking.

Structural biology of SARS-CoV-2 accessory proteins 15

Figure 1. (A) The genetic organization of the SARS-CoV-2 genome, with accessory proteins highlighted in green. The scale is in kilobase pairs, positions are approximate, (B) schematic showing the subcellular localization and function of the SARS-CoV-2 accessory proteins during host cell infection. Creator: Coronavirus Structural Task Force - Lisa Schmidt, License: cc-by-sa

Physiological roles of SARS-CoV-2 accessory proteins

Cell-based assays reveal that whilst the accessory proteins are not essential for viral replication, SARS-CoV-2 variants lacking ORF6, ORF7a, and ORF8 have altered viral replication kinetics. This means slower production of progeny virions, or less efficient virion release or reinfection following production in the host cell [11]. The same study shows that in humanized, transgenic mice SARS-CoV-2 infection with ORF3a-deficient viruses and, perhaps to a lesser extent, ORF6-deficient viruses, results in reduced lethality compared to wild type SARS-CoV-2 infection.

ORF3a

SARS-CoV-2 ORF3a is a dimeric viroporin, an integral membrane protein that functions as a non-selective, Ca2+-permeable ion channel [4]. Despite classification as an accessory protein, it does contribute to efficient replication and viral release. By binding to VPS39, a protein involved in vesicle sorting, SARS-CoV-2 ORF3a blocks the fusion of autophagosomes with lysosomes hence inhibiting the maturation of the autophagosomes (vesicles where unwanted material such as damaged organelles and invading pathogens are isolated prior to degradation) and thereby hindering autophagy as a cellular defence mechanism.
This binding mechanism blocks the assembly of the STX17–SNAP29–VAMP8 SNARE complex – a protein complex required for the fusion of autophagosomes and lysosomes, a crucial process by which the hydrolytic enzymes present in the lysosomes are released into the autophagosomes to help break down unwanted materials including invading pathogens which have been isolated for degradation [12].
In addition to this, it has been shown that SARS-CoV-1 ORF3a activates the NLRP3 inflammasome. NLRP3 (NACHT, LRR and PYD domains-containing protein 3) is an intracellular sensor that becomes activated on detection of pathogens or other danger signals.
This activation triggers release of pro-inflammatory cytokines, Interleukins (IL) -1B and – 18 [13]. SARS-CoV-1 ORF3a has also been shown to promote expression of IL-1B, via an interaction with TRAF3 (Tumour necrosis factor receptor–associated factor 3). This interaction with TRAF3 promotes the ubiquitination of ASC (apoptosis-associated speck-like protein containing a caspase recruitment domain) and p105 (Nuclear factor NF-kappa-B p105 subunit). Ubiquitination of ASC results in the activation of caspase 1, a protease required for IL-1B and IL-18 maturation [14]. Upon ubiquitination p105 yields the p50 protein; this promotes the NF-kappa-B proinflammatory response, which drives pathologic inflammation. SARS-CoV-2 ORF3a retains the SARS-CoV-1 TRAF binding motif (SARS-CoV-1: 36−PLQAS−40, SARS-CoV-2: 36−PIQAS−40) and therefore is assumed to retain this function, identified and characterized in [15].

ORF6

ORF6 is a multifunctional disruptor of interferon (IFN) signalling and antiviral immunity. It has been shown to accomplish this by interrupting nucleocytoplasmic transport through direct interaction with the Ribonucleic acid export 1 (RAE1) – Nuclear pore complex protein 98 (NUP98) protein complex [16,17]. This interaction downregulates IFN-Beta production by preventing Interferon regulatory factor 3 (IRF3) from moving into the nucleus.
An interaction with Karyopherin α 2 blocks ORF6 from importing IRF3 into the nucleus [18]. Once in the nucleus, IRF3 binds to the interferon-stimulated response element, and activates transcription of INF-1 and -2 and associated IFN-stimulated genes as part of the host response to infection [19]. The inhibition of NUP98 also prevents interferon signalling by blocking Signal Transducer and Activator of Transcription 1 (STAT1) entering the nucleus [20].
ORF6 also inactivates the Major Histocompatibility Complex (MHC) class I pathway, by inhibiting NLR Family CARD Domain Containing protein 5 (NLRC5), a key activator in host antiviral response [21]. Recent studies have shown that ORF6-mediated cytotoxicity is inhibited by the selective nuclear export inhibitor Selinexor (but not Ivermectin) [22], and this could formthe basis for future drug discovery projects to ameliorate SARS-CoV-2 symptoms.

ORF7a

SARS-CoV-2 ORF7a is a single-pass type I membrane protein thought to be retained on the surface of the endoplasmic reticulum (ER) after expression into the ER lumen, where it antagonizes the Tetherin protein (also called Bone marrow stromal antigen 2 (BST2)) [23]. Tetherin is a host defense protein that tethers nascent virions to the host membrane, preventing their dispersal. As such inhibition of Tetherin function aids virion release and subsequent infection of host cells. ORF7a can also become polyubiquitinated with Lys63-linked ubiquitin on Lys-119, located on the cytoplasmic side of the ER membrane. This modification leads to antagonism of IFN-1 signalling by blocking Signal Transducer and Activator of Transcription 2 (STAT2) phosphorylation [24].

ORF7b

Very little is known aboutORF7b, but in SARS-CoV-1 (which bears ∼81% sequence identity with SARS-CoV-2 ORF7b), it is found inside the Golgi compartment of the host cell and is has also been shown to be present in viral particles [25]. Expression of SARS-CoV-2 ORF7b within HEK-293 or Vero E6 cells induces expression of a range of cytokines including IFN-β, Tumour necrosis factor alpha (TNF-α) and IL-6, and activated type-I IFN signalling via phosphorylation of IRF3. Ultimately, ORF7b activates TNFα-induced apoptosis [26]. Proteomic analysis suggests that ORF7b might interact with innate immunity regulators Mitochondrial antiviral-signalling protein (MAVS) and Unc-93 homolog B1 (UNC93B1) [27].

ORF8

ORF8 is a multifunctional protein encoded in a hypervariable region of the SARS-CoV-2 genome [28] and is not conserved in SARS-CoV-1. ORF8 protein has a signal peptide and is secreted outside the cell [29], where it interacts with and activates host Interleukin 17 receptor alpha (IL17RA) by mimicking the natural ligand, Interleukin-17A (IL-17A) [30]. This interaction ultimately leads to the activation of NF-KB and expression of pro-inflammatory cytokines, contributing to the SARS-CoV-2 cytokine storm [31]. It has also been demonstrated that ORF8 interacts directly with the S1 region SARS-CoV-2 Spike protein, and downregulates Spike expression and S1/S2 cleavage, perhaps explaining the observation that variants lacking ORF8 have increased transmissibility [32]. In addition to this, ORF8 downregulates MHC-I levels on the cell surface by directly binding to MHC-I and promoting its degradation [33]. This downregulation results in SARS-CoV-2-infected cells being less susceptible to cytotoxic T-lymphocyte-mediated lysis. ORF8 can also function inside the cell as a histonemimic as it contains an ‘ARKS’ motif, located in a disordered loop in the crystal structure. This ARKS motif allows ORF8 to interact with chromatin and various histone-modifying enzymes resulting in the alteration of transcription of genes associated with response to viral infection [34]. It should be noted that much of the ORF8 protein in variant B.1.1.7 (Alpha Variant) is missing due to a Q27STOPmutation [35] but the full-length protein is found in subsequent variants. There is also a polymorphism that results in a L84S mutation – the S84 variant is associated with milder disease and less severe clinical outcomes [36].

ORF9b

ORF9b is a multi-functional and structurally dimorphic protein encoded within an alternative ORF within the SARS-CoV-2 nucleocapsid protein. In SARS-CoV-1, it is expressed through leaky ribosome scanning of the nucleocapsid mRNA [37]. SARS-CoV-2 ORF9b directly interacts with the mitochondrial import receptor subunit TOM70. This binding inhibits activation of an antiviral signalling cascade by locking the TOM70 in a state which has impaired binding to the Heat shock protein 90 (Hsp90) Glu-Glu-Val-Asp (EEVD) motif [10]. The interaction between TOM70 and Hsp90 is important for importing proteins from the cytosol into the mitochondria, which in turn is essential for mitochondria function [38]. Curiously however, the full-length ORF9b protein is incapable of binding to TOM70 [9]. The mature ORF9b protein exists as a dimer with a hydrophobic central cavity assumed to be involved in lipid binding (see later) and is consistent with a role in virion maturation (PDB: 6Z4U).

Structural overview of SARS-CoV-2 accessory proteins

Structures have so far only been solved for SARS-CoV-2 ORF3a (PDB entries 7KJR, 6XDC), ORF 6 (PDB entry 7VPH), ORF7a (PDB entries 7CI3, 6W37), ORF8 (PDB entries 7JTL, 7JX6, 7F5F) and ORF9b (PDB entries 7DHG, 6Z4U, 7KDT). No structures exist at the time of writing for ORF7b.

ORF3a

The cryo-EM structure of SARS-CoV-2 ORF3a has been determined with a single ORF3a homodimer contained within a lipid nanodisc [4]. The PDB contains two structures, 6XDC at 2.9Å resolution and 7KJR at 2.08Å resolution, both determined by the same research group. The model 7KJR is the higher resolution structure, and so all discussion of the ORF3a structure will focus on this. The maps are of excellent quality and show two ordered molecules of the lipid phosphatidylethanolamine, as well as visible interactions with the MSP1E3D1 scaffold protein (a nanodisc component).
As can be seen in Figure 2(A), ORF3a consists of an N-terminus (amino acids 1–39), a C-terminus (amino acids 239–275) and a short cytoplasmic loop (amino acids 175–180). The transmembrane domain of each symmetrically composed protomer consists of three helices. Together, the six transmembrane helices forman elliptical ion channel in the membrane. The pore is oriented with its N-terminus on the luminal side and the C-terminus on the cytosolic side. The ORF3a viroporin possesses a novel fold, with low structural homology to any other protein structure of the PDB. The third transmembrane helix of each protomer is followed by a helix-turn-helix motif connecting the transmembrane domain with the C-terminal cytosolic domain. An eight-stranded β-sheet sandwich forms a hydrophobic core of this cytosolic domain and links the two monomers through close Van der Waals interactions between V168, V225, F230 and I232.
There are several polar cavities within the ion channel, which are thought to reveal the path of the transiting ions. Structural studies of different conformational states may be required to ascertain whether this is correct [4]. Kern and colleagues were also able to isolate anddetermine the structure of a tetrameric form of ORF3a, albeit at lower resolution (6.5 Å). This tetramer is composed of two dimers side-by-side, however, it is currently not known if this assembly plays a physiological role (see EMDB entry EMD-22138).

Structural biology of SARS-CoV-2 accessory proteins 16

Figure 2. (A) The cryo-EM structure of the viroporin ORF3a (PDB: 7KJR) in a lipid nanodisc (ordered nanodisc components in grey), (B) the crystal structure of the tail of ORF6 (green), bound to the RAE1–Nup98 (blue/cyan) nucleoporin pair (PDB: 7VPH), (C) the crystal structure of the luminal domain of ORF7a (PDB: 7CI3), (D) the crystal structure of the disulphide-linked dimer of ORF8 (PDB: 7J6X, 7JTL, 7F5F), (E) the crystal structure of the full-length ORF9a dimer with bound PEG/Lipid molecule (green) (PDB: 6Z4U). Creator: Coronavirus Structural Task Force - Lisa Schmidt, License: cc-by-sa

ORF6

ORF6 is a small protein of only 61 amino acid residues. Secondary structure predictions with Jpred [39] and structure prediction with Colabfold implementation of Alphafold2 [40,41] suggest the protein is composed almost entirely of a single α-helix with a small unstructured carboxy-terminus (Colabfold models available at 10.5281/zenodo.7323979). The only structural information for ORF6 comes from a structure of the ternary complex formed from a synthesized peptide corresponding to residues 41 to 61 of ORF6 with the mRNA export factors Nup98 and RAE-1. In this structure, only the ORF6 C-terminal sequence DEEQPMEID (residues 53–61) is ordered and binds to the outer face of blades 5 and 6 of the seven-bladed β-propeller of RAE-1 (Figure 2(B)). The surface patch of the RAE1-Nup98 complex occupied by ORF6 is positively charged, and is thought to be the mRNA binding site that RAE1-Nup98 uses to export host mRNA from the nucleus to the cytosol, thus preventing this interaction inhibits host protein synthesis [5].

ORF7a

The N-terminal domain of ORF7a consists of seven β-strands arranged in an immunoglobulin (IG) like β-sandwich fold with greatest structural similarity to the Intercellular adhesion molecule (ICAM) IG-fold (Figure 2(C)). Sequence analysis of SARS-CoV-1 predicts that ORF7a encodes for a type I transmembrane protein with 122 amino acid residues, including a 15-amino-acid-long signal peptide at the N-terminus, a single transmembrane helix of ∼21 amino acid residues, and an endoplasmic reticulum retention signal at the C-terminus. These conclusions would suggest that the N-terminal IG-like domain would be present in the lumen of the endoplasmic reticulum, and indeed, subcellular localization studies confirm this [22].
The structure of the IG domain of ORF7a has been determined twice, both times by X-ray crystallography. Structure 6W37 covers residues 16–82, was determined to 2.9Å resolution and as yet has no accompanying publication. Structure 7CI3 is composed of residues 14–96 and was determined to 2.2Å resolution [6].
Within the folded domain, the IG-fold has seven β-strands ordered in two β-sheets consisting of four β-strands (A, G, F, C) in the first sheet and three (B, E, D) in the second one. Both sheets are amphipathic, with the hydrophobic side facing inwards closely packed against each other. The top of the ectodomain is defined by the BC, DE and FG loops and the bottom by the AB, CD and EF loops. The β-sandwich structure is stabilized by two disulphide bonds linking the sheets at opposite edges. At the bottom of the structure, a disulphide bridge connects Cys23 on strandAwithCys58 at the end of strandE. At the top, Cys35 of the BC loop is linked to Cys67 at the end of strand F (Figure 2(C)). Additionally, on top of the BED sheet, the DE loop protrudes from the structure and forms a groove together with β-strands C andD. In the centre is Glu33, which contributes to the negatively charged bottom of the mainly hydrophobic groove. This groove may be a potential site for ligand or cation interaction due to its central negative electrostatic potential.

ORF7b

ORF7b is a short 43 amino acid residues long protein, and as yet no structures exist of any portion of it. Despite the lack of a predicted signal sequence [42], it is thought that the protein is incorporated into the membrane of the Golgi apparatus and the produced viral particles, as has been demonstrated for SARS-CoV-1 [25]. Like ORF6, ORF7b is predicted to be formed almost entirely of a single alpha-helix, with a disordered carboxy-terminus (Colabfold models available at 10.5281/zenodo.7323984).

ORF8

ORF8 is secreted from the infected cell into the extracellular space. It has an N-terminal signal sequence, followed by a single domain of roughly 106 amino acids. There are three Xray crystal structures of SARS-CoV-2 ORF8 in the PDB: 7JTL, which was determined to a resolution of 2.04Å [7], 7JX6, which was determined to 1.6Å (no associated publication), and 7F5F also determined to 1.6Å [8] (Figure 2(D)). It is worth noting that all proteins were expressed in E. coli and hence lack the N-linked glycosylation on Asn78 [29]. There exists some intrinsic flexibility within the (physiologically relevant) dimer, as can be seen from the subtle changes in orientation in the right-hand monomer in Figure 3(A), when the three dimers are optimally superimposed on the left-hand monomer.
In all three structures, ORF8 forms a disulphide-linked homodimer through Cys20 (although in PDB 7F5F, the dimer spans two asymmetric units), and each protomer contains three additional intramolecular disulphide bonds. The fold of each protomer resembles an immunoglobulin fold, with β-strands B, E (not a β-strand in all structures) & F forming one sheet and strands C, D, G & H forming another sheet. Strand A also forms a β-sheet with strand H, and this two-stranded sheet makes up much of the hydrophobic homodimerization interface. A long and partially disordered loop exists between strands D & E.
There is a polymorphism in ORF8 where the amino acid at position 84 is switched from a hydrophobic leucine to a hydrophilic serine amino acid residue; here the site of the L84S polymorphism is on a face of the ORF8 structure that also contains the N78 Nlinked glycosylation site. A recent preprint suggests that ORF8 can be secreted through an unconventional YIF1B-mediated pathway, where it bypasses the host cell glycosylation machinery in the endoplasmic reticulum [43]. Lin et al demonstrate that this unglycosylated ORF8 is the form that binds to, and activates, IL17RA, rather than the glycosylated from. Taken alongside the L84S polymorphism, this finding allows us to speculate that the IL17RA binding site is on this face of ORF8 that carries both N78 and L84 (Figure 3(B)).
Clearly, further research is necessary to structurally characterize this interaction – if it turns out that this interaction might be amenable to disruption by small molecules or monoclonal antibodies, either by targeting ORF8 or IL17RA [31], therapeutics might be developed to prevent the debilitating cytokine storm, caused by ORF8, as seen in some COVID-19 patients.

Structural biology of SARS-CoV-2 accessory proteins 17

Figure 3. (A) Comparison of all three structures of SARS-CoV-2 ORF8, superimposed on the right-hand monomer. (B) Comparison of the L84 (PDB: 7JX6, green) and S84 (PDB: 7F5F, blue) variants of ORF8 showing the putative IL17RA binding surface that harbours the both polymorphism at position 84, and the N-linked glycosylation site at position 78, known to inhibit IL17RA binding in glycosylated ORF8. Creator: Coronavirus Structural Task Force - Lisa Schmidt, License: cc-by-sa

ORF9b

There are three structures of ORF9b in the PDB. The most complete structure is PDB entry 6Z4U, which contains residues 1 through 97 and has no accompanying publication. The data go to 1.95Å resolution and were phased by molecular replacement from the SARSCoV-1 ORF9b structure, 2CME [44]. The maps and the model-to-map fit are generally of good quality. The two structures share the same fold, with an RMSD of 0.91Å over 64 matched atom pairs. ORF9b exists as a highly interconnected dimer in solution, with an all-β topology that resembles a pair of β-barrels with extensive strand swapping and extended loops (Figure 2(E)). A hydrophobic cavity can be found in between the two protomers.
In the SARS-CoV-1 structure, this cavity has an extended fatty acyl chainmodelled into it, consistent with a role as a lipid-binding protein [44]. In the SARS-CoV-2 structure, the cavity has been modelled holding a polyethylene glycol chain from the crystallization solution. Examination of the electron density maps suggest that either solution could be correct, as both proteins were crystallized from a PEG-containing crystallization buffer, although Meier et al. did conduct mass spectrometry experiments to confirm the presence of a long chain fatty acid in the SARS-CoV-1 structure. The absence of an accompanying publication for the SARS-CoV-2 structure hinders certain identification of the bound hydrophobic ligand in PDB entry 6Z4U.
The remaining two structures of ORF9b show a C-terminal peptide of ORF9b (residues 43-78) in complex with the mitochondrial import receptor, TOM70 (Figure 4(B)). PDB 7KDT [10] (3.05Å cryo-EM structure) and 7DHG [9] (2.2Å X-ray crystallographic structure) are similar with an RMSD of 1.1Å over 416 C-alpha pairs. The structures contain residues 39–78 (7KDT) and 43–78 (7DHG) of ORF9b, which adopt a long α-helix (residues 51–70) with the N-terminus of the helix interacting with a cleft in the C-terminal TOM70 tetratricopeptide repeat (TPR) domain. PISA [45] analysis shows an interaction area of ∼2000Å2 between TOM70 and the Orf9b C-terminal peptide. The central portion of the bound helix (residues 58–66) has predominantly electrostatic interactions with TOM70, whereas the N-terminal end of the helix (residue 45–54) are more hydrophobic in nature, with several aliphatic side chains docking into non-polar pockets of TOM70 (Figure 4(C)).
Gao et al. report that the affinity of the C-peptide for TOM70 as KD = 0.96 μM, which is ∼2.6 times tighter than the interaction with the Hsp90 EEVD peptide. Interestingly, although they are predicted to bind at separate sites, pre-incubating TOM70 with ORF9b C-peptide reduced the affinity of TOM70 for EEVD by almost 30-fold (KD = 72.99 μM).
Taken together, these data suggest ORF9b is an allosteric inhibitor of the interaction between TOM70 and Hsp90, and it perhaps does this by locking the TOM70 in a conformation that is less able to bind Hsp90 – this results in the inhibition of downstream interferon activation.
Curiously, this TOM70-bound structure is completely incompatible with the lipidbound, all beta-sheet, full length form determined in PDB entry 6Z4U(Figure 4(A,B)), and indeed Gao et al. report that the full length ORF9b protein does not interact with TOM70 and only a truncated version or an alpha-helical peptide corresponding to the region seen in 7KDT and 7DHG bind to TOM70. It is not yet known what causes this switch between the two forms of ORF9b. This could happen at a transcript or translation level, or as the result of a proteolytic event, or perhaps that the all-β ORF9b structure observed in 6Z4U unfolds in the absence of a lipid, allowing the unfolded protein to adopt the α-helical conformation observed in the TOM70-bound forms. Further research is needed to determine the mechanism(s) that control this switch. Despite the unanswered questions regarding ORF9b, it is conceivable that a small molecule could be developed to prevent the interaction of TOM70 and ORF9b, and that this might alleviate symptoms by preventing SARS-CoV-2 from circumventing host anti-viral responses.

Summary

So far, efforts to structurally characterize the accessory proteins of SARS-CoV-2 have been largely successful, only ORF6 and ORF7b lack experimentally determined structures for most of the sequence. Given thatmuchof these twoproteins are thought tobe unstructured or just comprise a single alpha helix, it is not clear how beneficial isolated structures of these proteins will be to investigate their biological role. Structures in complex with host proteins (where known), however, would provide molecular details of interactions that may be targets for future drug design efforts. Despite their characterization as ‘accessory’, many of these proteins do have roles that promote effective and efficient infection. For example, ORF3a has been shown to promote efficient release of new viral particles, and ORF8 may regulate Spike protein presentation and maturation. Whilst they may not be high-priority for drug screening and design, it is clear that inhibition of the function of accessory proteins (e.g. blocking ORF8 IL17RA interactions) might serve to alleviate the symptoms of SARS-CoV-2 infection in vulnerable patients.
Despite their structural diversity, the SARS-CoV-2 accessory proteins primarily serve to block host cell reactions to viral infection. The interferon signalling pathway, a critical element in the activation of anti-viral defense, is inhibited by ORF6 (preventing transport of IRF3 into the nucleus to active transcription of interferon-response genes) and ORF7a (through blocking of STAT2 activation). Pro-inflammatory cytokine production and maturation (which can lead or contribute to the cytokine storm) are upregulated by ORF3a, ORF6, ORF7b and ORF8. Counterintuitively, whilst inflammation is an immune response to pathogens, over stimulation of the immune system can be beneficial to pathogens, as is seen in superantigens, proteins that subvert the hosts immune systemby over activating it, causing indiscriminate and off-target damage to host tissues [46]. ORF3a also inhibits host defenses by blocking the degradation of invading viruses by phagocytosis.
In terms of future work on the accessory proteins, the lack of data about ORF6 should be addressed once a binding partner is found. In addition to this, whilst we have multiple structures of ORF8, structures of this protein in complex with either Interleukin 17 Receptor A, or with SARS-CoV-2 spike protein would be most instructive, and either of these interactions, particularly the one with IL17RA, may prove to be a target for future therapeutic design efforts.

This blog post was published in Crystallography Reviews, please cite: https://doi.org/10.1080/0889311X.2023.2173744

Acknowledgements

The authors would also like to thank Johannes Kaub and Rosemary Wilson for support and discussion. Figures in this review use the Protein Imager interface [47].

Funding

This work was supported by the German Federal Ministry of Education and Research [grant number 05K19WWA and 05K22GU5] and Deutsche Forschungsgemeinschaft [project TH2135/2-1]. D.C.B. acknowledges that this work was supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK [grant number CC2068], the UK Medical Research Council [grant number CC2068] and the Wellcome Trust [grant number CC2068].

References

[1] Perlman S, Netland J. Coronaviruses post-SARS: update on replication and pathogenesis. Nat Rev Microbiol. 2009;7:439–450. Epub 2009/05/12.
[2] Hartenian E, Nandakumar D, Lari A, et al. The molecular virology of coronaviruses. J Biol Chem. 2020;295:12910–12934. Epub 2020/07/15.
[3] UniProt C. Uniprot: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49:D480–D4D9. Epub 2020/11/26.
[4] Kern DM, Sorum B, Mali SS, et al. Cryo-EM structure of the SARS-CoV-2 3a ion channel in lipid nanodiscs. bioRxiv. 2021. Epub 2020/06/27.
[5] Li T, Wen Y, Guo H, et al. Molecular mechanism of SARS-CoVs Orf6 targeting the Rae1-Nup98 complex to competewithmRNAnuclear export. FrontMol Biosci. 2021;8:813248. Epub 2022/02/01.
[6] Zhou Z, Huang C, Zhou Z, et al. Structural insight reveals SARS-CoV-2 ORF7a as an immunomodulating factor for human CD14(+) monocytes. iScience. 2021;24:102187. Epub 2021/02/23.
[7] Flower TG, Buffalo CZ, Hooy RM, et al. Structure of SARS-CoV-2 ORF8, a rapidly evolving immune evasion protein. Proc Natl Acad Sci U S A. 2021;118(2):e2021785118. Epub 2020/12/29.
[8] Chen X, Zhou Z, Huang C, et al. Crystal structures of bat and human coronavirusORF8 protein Ig-like domain provide insights into the diversity of immune responses. Front Immunol. 2021;12:807134. Epub 2022/01/04.
[9] Gao X, Zhu K, Qin B, et al. Crystal structure of SARS-CoV-2 Orf9b in complex with human TOM70 suggests unusual virus-host interactions. Nat Commun. 2021;12:2843. Epub 2021/05/16.
[10] Gordon DE, Hiatt J, Bouhaddou M, et al. Comparative host-coronavirus protein interaction networks reveal pan-viral disease mechanisms. Science. 2020;370:eabe9403. Epub 2020/ 10/17.
[11] Silvas JA, Vasquez DM, Park JG, et al. Contribution of SARS-CoV-2 accessory proteins to viral pathogenicity in K18 human ACE2 transgenic mice. J Virol. 2021;95:e0040221. Epub 2021/06/17.
[12] Miao G, Zhao H, Li Y, et al. ORF3a of the COVID-19 virus SARS-CoV-2 blocks HOPS complex-mediated assembly of the SNARE complex required for autolysosome formation. Dev Cell. 2021;56:427–442. Epub 2021/01/11.
[13] Swanson KV, Deng M, Ting JP. The NLRP3 inflammasome: molecular activation and regulation to therapeutics. Nat Rev Immunol. 2019;19:477–489. Epub 2019/05/01.
[14] Thornberry NA, Bull HG, Calaycay JR, et al. A novel heterodimeric cysteine protease is required for interleukin-1 beta processing in monocytes. Nature. 1992;356:768–774. Epub 1992/04/30.
[15] Siu KL, Yuen KS, Castano-Rodriguez C, et al. Severe acute respiratory syndrome coronavirus ORF3a protein activates the NLRP3 inflammasome by promoting TRAF3-dependent ubiquitination of ASC. FASEB J. 2019;33:8865–8877. Epub 2019/04/30.
[16] Kato K, Ikliptikawati DK, Kobayashi A, et al. Overexpression of SARS-CoV-2 protein ORF6 dislocates RAE1 and NUP98 from the nuclear pore complex. Biochem Biophys Res Commun. 2021;536:59–66. Epub 2020/12/29.
[17] Addetia A, Lieberman NAP, Phung Q, et al. SARS-CoV-2 ORF6 disrupts bidirectional nucleocytoplasmic transport through interactions with Rae1 and Nup98. mBio. 2021;12(2):e00065- Epub 2021/04/15.
[18] Xia H, Cao Z, Xie X, et al. Evasion of type I interferon by SARS-CoV-2. Cell Rep. 2020;33:108234. Epub 2020/09/28.
[19] Honda K, Takaoka A, Taniguchi T. Type I interferon [corrected] gene induction by the interferon regulatory factor family of transcription factors. Immunity. 2006;25:349–360. Epub 2006/09/19.
[20] Miorin L, Kehrer T, Sanchez-Aparicio MT, et al. SARS-CoV-2 Orf6 hijacks Nup98 to block STAT nuclear import and antagonize interferon signaling. Proc Natl Acad Sci U S A. 2020;117:28344–28354. Epub 2020/10/25.
[21] Yoo JS, Sasaki M, Cho SX, et al. SARS-CoV-2 inhibits induction of the MHC class I pathway by targeting the STAT1-IRF1-NLRC5 axis. Nat Commun. 2021;12:6602. Epub 2021/11/17.
[22] Lee JG, Huang W, Lee H, et al. Characterization of SARS-CoV-2 proteins reveals Orf6 pathogenicity,: subcellular localization, host interactions and attenuation by Selinexor. Cell Biosci. 2021;11:58. Epub 2021/03/27.
[23] Martin-Sancho L, Lewinski MK, Pache L, et al. Functional landscape of SARS-CoV-2 cellular restriction. Mol Cell. 2021;81:2656–2668. Epub 2021/05/01.
[24] Cao Z, Xia H, Rajsbaum R, et al. Ubiquitination of SARS-CoV-2 ORF7a promotes antagonism of interferon response. Cell Mol Immunol. 2021;18:746–748. Epub 2021/01/22.
[25] Schaecher SR, Mackenzie JM, Pekosz A. The ORF7b protein of severe acute respiratory syndrome coronavirus (SARS-CoV) is expressed in virus-infected cells and incorporated into SARS-CoV particles. J Virol. 2007;81:718–731. Epub 2006/11/03.
[26] Yang R, Zhao Q, Rao J, et al. SARS-CoV-2 accessory protein ORF7b mediates tumor necrosis factor-alpha-induced apoptosis in cells. Front Microbiol. 2021;12:654709. Epub 2021/09/07.
[27] Stukalov A, Girault V, Grass V, et al.Multilevel proteomics reveals host perturbations by SARSCoV-2 and SARS-CoV. Nature. 2021;594:246–252. Epub 2021/04/13.
[28] Chen S, Zheng X, Zhu J, et al. Extended ORF8 gene region is valuable in the epidemiological investigation of severe acute respiratory syndrome-similar coronavirus. J Infect Dis. 2020;222:223–233. Epub 2020/05/21.
[29] Matsuoka K, Imahashi N, Ohno M, et al. SARS-CoV-2 accessory protein ORF8 is secreted extracellularly as a glycoprotein homodimer. J Biol Chem. 2022;298(3):101724, Epub 2022/02/15.
[30] Wu X, Xia T, Shin WJ, et al. Viral mimicry of interleukin-17A by SARS-CoV-2 ORF8. mBio. 2022;13(2): e0040222. Epub 2022/03/29.
[31] Lin X, Fu B, Yin S, et al. ORF8 contributes to cytokine stormduring SARS-CoV-2 infection by activating IL-17 pathway. iScience. 2021;24:102293. Epub 2021/03/17.
[32] Chou JM, Tsai JL, Hung JN, et al. The ORF8 protein of SARS-CoV-2 modulates the spike protein and its implications in viral transmission. Front Microbiol. 2022;13:883597. Epub 2022/06/07.
[33] Zhang Y, Chen Y, Li Y, et al. The ORF8 protein of SARS-CoV-2 mediates immune evasion through down-regulating MHC-iota. Proc Natl Acad Sci U S A. 2021;118(23):e2024202118. Epub 2021/05/23.
[34] Kee J, Thudium S, Renner DM, et al. SARS-CoV-2 disrupts host epigenetic regulation via histone mimicry. Nature. 2022;610:381–388. Epub 2022/10/06.
[35] Leung K, Shum MH, Leung GM, et al. Early transmissibility assessment of the N501Y mutant strains of SARS-CoV-2 in the United Kingdom, October to November 2020. Euro Surveill. 2021;26(1). Epub 2021/01/09.
[36] Nagy A, Pongor S, Gyorffy B. Different mutations in SARS-CoV-2 associate with severe and mild outcome. Int J Antimicrob Agents. 2021;57:106272. Epub 2020/12/22.
[37] Xu K, Zheng BJ, Zeng R, et al. Severe acute respiratory syndrome coronavirus accessory protein 9b is a virion-associated protein. Virology. 2009;388:279–285. Epub 2009/04/28.
[38] NeupertW, Herrmann JM. Translocation of proteins into mitochondria. Annu Rev Biochem. 2007;76:723–749. Epub 2007/02/01.
[39] Drozdetskiy A, Cole C, Procter J, et al. Pred4: a protein secondary structure prediction server. Nucleic Acids Res. 2015;43:W389–W394. Epub 2015/04/18.
[40] Jumper J, Evans R, Pritzel A, et al. Highly accurate protein structure predictionwithAlphaFold. Nature. 2021;596:583–589. Epub 2021/07/16.
[41] Mirdita M, Schütze K, Moriwaki Y, et al. ColabFold – making protein folding accessible to all. bioRxiv. 2022:2021.08.15.456425.
[42] Teufel F, Almagro Armenteros JJ, Johansen AR, et al. Signalp 6.0 predicts all five types of signal peptides using protein language models. Nat Biotechnol. 2022;40(7):1023–1025. Epub 2022/01/05.
[43] Lin X, Fu B, Xiong Y, et al. Unconventional secretion of unglycosylated ORF8 is critical for the cytokine storm during SARS-CoV-2 infection. bioRxiv. 2021.
[44] Meier C, Aricescu AR, Assenberg R, et al. The crystal structure of ORF-9b, a lipid binding protein from the SARS coronavirus. Structure. 2006;14:1157–1165. Epub 2006/07/18.
[45] Krissinel E, Henrick K. Inference of macromolecular assemblies from crystalline state. J Mol Biol. 2007;372:774–797. Epub 2007/08/08.
[46] Johnson HM, Torres BA, Soos JM. Superantigens: structure and relevance to human disease. Proc Soc Exp Biol Med. 1996;212:99–109. Epub 1996/06/01.
[47] Tomasello G, Armenia I,Molla G. The protein imager: a full-featured online molecular viewer interface with server-side HQ-rendering capabilities. Bioinformatics. 2020;36:2909–2911. Epub 2020/01/14.

Sam Horrell, Gianluca Santoni & Andrea Thorn

This blog post was published in Crystallography Reviews.
Please cite: https://doi.org/10.1080/0889311X.2022.2065270

Abstract

The SARS-CoV-2’s endoribonuclease (NendoU) nsp15, is an Mn2+ dependent endoribonuclease specific to uridylate that SARS-CoV-2 uses to avoid the innate immune response by managing the stray RNA generated during replication. As of the writing of this review 20 structures of SARS-CoV-2 nsp15 have been deposited into the PDB, largely solved using X-ray crystallography and some through Cryo-EM. These structures show that an nsp15 monomer consist of three conserved domains, the N-terminal oligomerization domain, the middle domain, and the catalytic NendoU domain. Enzymatically active nsp15 forms a hexamer through a dimer of trimers (point group 32), whose assembly is facilitated by the oligomerization domain. This review summarises the structural and functional information gained from SARs-CoV-2, SARs-CoV and MERS-CoV nsp15 structures, compiles the current structure-based drug design efforts, and complementary knowledge with a view to provide a clear starting point for downstream structure users interested in studying nsp15 as a novel drug target to treat COVID-19.

Introduction

SARS-CoV-2 is a nidovirus with a non-segmented positive-sense RNA genome, meaning the RNA genome is read from 51 to 31 and can be directly translated into viral proteins; it’s effectively messenger RNA. The RNA genome of SARS-CoV-2 is one of the largest RNA genomes among RNA viruses [1], comprised of a replicase gene which encodes non-structural proteins (nsps), structural proteins, and accessory proteins. The genome can produce two different polyprotein chains through a ribosomal frameshift [2] (ORF1a and ORF1b). Once translated, these polyproteins are cleaved by one of the two encoded pro- teases (3C-like protease (nsp5) or papain-like-protease (nsp3)) to yield between 15 and 16 non-structural proteins, which assemble into a large membrane-bound replicase complex (RTC).
One of these non-structural proteins is nsp15, a 346 amino acid nidoviral RNA uridylate-specific and Mn2+-dependent [3] endoribonuclease (NendoU). Its gene is found towards the end of the non-structural proteins in the SARS-CoV-2 genome on ORF1b (bases 6453–6798[4]). Nsp15 preferentially cleaves the 31 end of uridine, producing a 21-31 cyclic phosphodiester and 51-hydroxyl terminus [5] (Figure 1). Nsp15 is conserved across coronavirus family members [6], to the point where it has been proposed as a universal genetic marker to distinguish nidoviruses [7] from all other RNA virus families. Although highly conserved (88% sequence identity with SARS-CoV-2, 50% with MERS, and 43% with HCoV-229E), nsp15 has been found to be non-essential for viral replication in Mouse Hepatitis Virus [6] (MHV), SARS-CoV, and HCoV-229E. Some nsp15 mutations completely abolished RNA synthesis; however, these mutations resulted in misfolded and insoluble nsp15 when expressed in E. coli[6]. As a result, the loss of RNA synthesis is thought to be a knock-on effect on neighbouring polyprotein components that are critical for replication, as opposed to a genuine effect on viral replication through lack of nsp15 [6]. Further evidence of nsp15’s non-essential role in viral replication comes from insect nidoviruses and invertebrate roniviruses, which completely lack EndoU activity [8,9].
Although not essential for viral replication, recent studies suggest nsp15 plays a role in repressing activation of the host innate immune response [11-13]. During viral replication, positive-sense RNA is translated to produce the viral replication complex, which replicates the positive-sense RNA to produce negative-sense RNA. The negative-sense RNA then acts as a template to produce new positive-sense genomic RNA and subgenomic RNA. Subgenomic RNA consists of smaller transcribed sections of RNA produced by initiating transcription in the middle of the template strand (internal initiation), falling off the template strand before reaching the 5’ stop codon (premature termination), or by jumping off the template strand and reinitiating transcription further down the template (discontinuous transcription). This process produces short and long double-stranded RNA intermediates with polyuridine tracts at the 51 end which can be recognized by pattern recognition receptors in the host cell such as RIG-I-like receptors (RLRs), protein kinase R (PKR), oligoadenylate synthases (OASes), and melanoma differentiation-associated gene 5 (MDA5). These sensors promote an innate and antiviral immune response [11,14,15] by activating the type I and III interferon (IFN) response, which induces expression of Interferon-stimulated genes through the signal transducer and activator of transcription proteins 1 and 2 (STAT1/2) signaling pathways. By cleaving the 51-polyuridine tracts in negative-sense viral RNA, nsp15, along with nsp16 and nsp10, limit the accumulation of MDA5-dependent pathogen-associated molecular patterns to delay the host’s immune response [16]. Loss of nsp15 activity has been shown to activate the interferon response and reduce viral titers in piglets infected with nsp15-deficient porcine epidemic diarrhea coronavirus (PEDV) [17] and mice infected with nsp15-deficient Mouse Hepatitis Virus [11]. It has also been demonstrated that nsp15 plays a role in disrupting formation of autophagosomes, which are double-membraned vesicles containing cellular material to be degraded.

Structural biology of SARS-CoV-2 endoribonuclease NendoU (nsp15) 18

Figure 1. RNA Cleavage as performed by nsp15 to give a 2-3 cyclic phosphodiester and 5-hydroxyl terminus from an RNA nucleotide phosphodiester from PDB entry 1RNA. Figure created using Protein Imager [10]. Creator: Coronavirus Structural Task Force - Sam Horrell & Lisa Schmidt, License: cc-by-sa

Structural overview

SARS-CoV-2 nsp15 consists of an N-terminal oligomerisation domain (Figure 2, blue), a middle domain (Figure 2, purple), and the catalytic C-terminal NendoU domain (Figure 2, teal). The Oligomerisation domain is formed from an anti-parallel β-sheet (β1-3) which wraps around helices α1 and α2. The middle domain consists of three β-hairpins (β5-6, β7-8, and β12-13), a mixed β-sheet (β4, β9, β10, β11, and β15), 2 α-helices (α3 and α4), and a right-handed 310 helix (η4). The catalytic NendoU domain contains two anti-parallel β-sheets (β16-18 and β19-21) which form a concave surface flanked by five α-helices (α6, α7, α8, α9, and α10). SARS-CoV-2 nsp15 shows high sequence identity with SARS-CoV nsp15 (88%) and lower sequence identity with MERS-CoV (51%), however the overall structural similarity is very high between the three viruses [1]. Three structures have been solved for SARS-CoV nsp15 (PDB entries 2H85 [18], 2OZK [19], and 2RHB [20]) one structure of MERS nsp15 (PDBID: 5YVD [21]), two structures from mouse hepatitis virus (2GTH and 2GTI [3]), and one structure from human coronavirus 229E (PDB entry 4S1T).

Structural biology of SARS-CoV-2 endoribonuclease NendoU (nsp15) 19

Figure 2. Crystal structure of the nsp15 monomer represented as a transparent surface and cartoon (left) and as a cartoon (right) coloured by domain using PDB entry 6X4I. The Figure was created using Protein Imager [10]. Creator: Coronavirus Structural Task Force - Sam Horrell & Lisa Schmidt, License: cc-by-sa

As of writing this review 20 structures of SARS-CoV-2 nsp15 have been solved with a variety of bound ligands using X-ray crystallography and cryo-EM (Table 1) [1,22,23].
The biological assembly of nsp15 is a double-ringed hexamer made up of a dimer of trimers (point group 32, Figure 3). The trimeric form retains some ribonuclease activity, but the monomer presents with only residual cleavage [24]. The hexamer is stabilised by an N-terminal oligomerisation domain present in each monomer. A crystal structure from SARS-CoV with a 28 amino acid N-terminal truncation (PDB entry 2H85) presented with a misfolded endoU active site, suggesting oligomerisation may act as an allosteric activation switch [19]. The six monomers come together to form the active enzyme with a 100 Å long negatively charged channel 10–15 Å wide open to solvent at the top, bottom, and on three separate side openings in the middle of the hexamer. Formation of the hexamer is essential for enzymatic activity, making the oligomerisation interfaces a potential target for structure-based drug design.
The active site of nsp15 is an electropositive pocket which lies at the interface between each monomer’s NendoU domain. The active site is highly conserved between SARS- CoV-2, SARS-CoV, and MERS proteins. Six key residues (His235, His250, Lys290, Thr341, Tyr343, and Ser294) are arranged in a shallow groove in the N-terminal NendoU domain [1]. His235, His250, and Lys290 are proposed to act as a catalytic triad, using a similar mechanism to that observed in RNase A [23]. However, RNase A is metal-independent, while SARS-CoV-2 nsp15 is Mn2+ dependent, so the mechanism is not an exact match.
Mutation of either histidine in the catalytic triad to alanine eliminates RNA cleavage activity in nsp15 but has no effect on the formation of stable hexamers, showing they are not a factor in nsp15 oligomerisation [22].

Table 1. Data quality indicators for all deposited SARS-CoV-2 nsp15 structures

PDBRworkRworkRfreeClash ScoreRama. outliers (%)Rama. outliers (%)RSRZ Outliers (%)MethodComment about the highest Fourier dierence peak from Coot
6WLC1.820.1950.1702.00.02.10.7X-rayAll these structures present the same problematic difference peak around the N-terminal region
6X1B1.970.1850.1572.00.01.60.9X-ray
6X4I1.850.1890.1662.00.01.71.7X-ray
6WXC1.850.1940.1713.00.32.40.4X-ray
6W011.900.1850.1612.00.31.31.7X-ray
7K0R3.30N/AN/A4.00.01.8N/ACryo-EM
6XDH2.350.1820.1572.00.01.32.4X-ray68 peaks above 5σ. The structure shows clear signs of specific radiation damage.
7K1L2.250.1920.1672.00.01.00.3X-ray34 peaks above 5σ. At least 2 wrongly modelled sulphates.
6VWW2.200.1780.1582.00.01.90.1X-ray14 peaks above 5σ. Only missing a possible alternate conformer for N-terminal Met.
7KEG2.900.2180.1752.00.07.11.9X-ray32 peaks above 5σ. No particular issue apart from a few missing water molecules.
7KEH2.590.2200.1873.00.05.73.6X-ray24 peaks above 5σ. Only 2 wrong sidechain orientations.
7KF42.610.2460.2193.00.14.51.1X-ray19 peaks above 5σ. Overall good modelling.
7K1O2.400.2420.2072.00.01.32.2X-ray14 peaks above 5σ. Only missing a possible alternate conformer for N-terminal Met.
7K9P2.600.2090.1911.00.33.90.1RT Serial
X-ray
12 peaks, mostly around Glu carbonyl groups, showing possible signs of specific radiation damage.
5S722.510.2770.2115.00.43.60.6X-rayStructures deposited from a PANDDA analysis. No problematic peaks observed.
5S711.940.2150.1843.00.11.81.3X-ray
5S702.330.2280.1823.00.01.91.0X-ray
5S6Y2.320.2540.2093.00.01.60.7X-ray
5S6X2.320.2220.1843.00.02.10.7X-ray
5S6Z2.280.2220.1912.00.02.61.3X-ray

This is unsurprising, as the N-terminal oligomerisation domain is the key player in the formation of the hexamer, but formation of the hexamer clearly plays an allosteric role in the formation of the active site, as activity in the monomer is significantly reduced.
Uracil specificity is proposed to be governed by Ser294 [20], with the main chain nitrogen of Ser294 predicted to interact with the carbonyl O2 oxygen of uracil and the hydroxyl group of Ser294 binding to uracil N3. However, mutation studies on homologs have shown that a Ser294Ala mutation significantly decreased activity without completely abolishing it[18] and negates uridine specificity. Tyr343 is likely important in governing uracil specificity, as shown by van der Waals stacking between the ribose sugar or Uridine and Tyr343 in cryo-EM structures [20,21]. Mutation of Tyr343 equivalent residues in SARS-CoV and MERS to alanine caused near complete loss of nuclease activity [20,21], suggesting a key role in enzymatic activity.
The structure of SARS-CoV-2 nsp15 has been solved in the presence of various catalytic intermediates, including 51UMP (PDB entry: 6WLC), 31UMP (PDB entry: 6X4I), 51GpU (PDB entry: 6X1B), and uridine 21,31-vanadate (PDB entry: 7K1L). All intermediates bound to the C-terminal catalytic domain, interacting with the seven conserved active site residues (His235, His250, Lys290, Trp333, Thr341, Tyr343, Ser294, Gly248, Lys345, and Val292) and the structures showed no significant conformational deviations from each other (Cα RMSD = 0.29 Å). The uracil moiety of 51UMP, guanylyl(3’−5’)uridine (GpU), and uridine 21,31-vanadate are all bound by Ser294 and Leu346 (Figure 4 Top left, bottom left, and bottom right, respectively), reinforcing the idea of uracil recognition being mediated by these residues. The combination of these structures confirms the predicted parallels between the reaction mechanism of SARS-CoV-2 nsp15 and RNAse A. The 51UMP, 51GpU, and uridine 21-,31-vanadate bound structures support the previously proposed hypothesis about uracil and purine base discrimination with Ser294 playing a key role [23]. Contrary to this finding, the 31UMP bound structure shows the uracil base forminga stacking interaction with Trp333 (Figure 4, top right), the guanine binding site identified in the 51GpU complex, suggesting nsp15’s active site can accommodate both purine and pyrimidine bases. However, the Trp333 interacting base is likely less relevant when binding larger RNA molecules as it provides a potential stacking interaction for bases without selectivity [23]. Comparison of these ligand-bound structures with RNase A catalytic sites suggests nsp15 acts through a similar reaction mechanism [23].

Structural biology of SARS-CoV-2 endoribonuclease NendoU (nsp15) 20

Figure 3. The Structure of the nsp15 hexamer generated by crystallographic symmetry using PDB entry 6X4I. On the left-hand side, the nsp15 hexamer is represented as a transparent surface and cartoon from a side-on view. On the right-hand side, the hexamer has been rotated 90 degrees towards the reader to give a top-down view looking down the 10–15 Å wide channel. The hexamer is coloured by trimer with trimer 1 in blue, with 1 light blue monomer, and trimer two in teal. The figure was created using the Protein Imager [10]. Creator: Coronavirus Structural Task Force - Sam Horrell & Lisa Schmidt, License: cc-by-sa

Based on these findings a two-step mechanism has been proposed starting with a transphosphorylation reaction whereby His250 acts as a base and deprotonates 21OH of the RNA ribose, with Lys290 stabilising the negative charge that builds up during the transition state. His235 then acts as a general acid donatinga proton for the departing 51OH group. This is followed by a hydrolysis step where the roles of His250 and His235 are reversed, with His235 deprotonating a water molecule and His250 acing as a proton donor for the 51OH leaving group to convert the 21-31 cyclic phosphate back to 21OH and a 31-phosphoryl group. Despite the similar mechanisms, the structural environments of His235 in nsp15 and the RNase A equivalent (His119) differ significantly, with the residues being ∼ 8 Å apart and making several different hydrogen bonding interactions. These differences may provide an answer as to why nsp15 is much more sensitive to pH change compared to Rnase A [22]. What remains unclear is the con- tribution of Mn2+ to the reaction mechanism, particularly as an Mn2+ binding site has not been located in SARS-CoV-2 nsp15 [22].

Structural biology of SARS-CoV-2 endoribonuclease NendoU (nsp15) 21

Figure 4. SARS-CoV-2 nsp15 active site crystal structures with bound reaction intermediates. 5UMP (PDB entry: 6WLC) in the top left, 3UMP (PDB entry: 6X4I) in the top right, 5GpU (PDB entry: 6X1B) in the bottomleft, and the cyclic intermediate mimic uridine 2,3-vanadate (PDB entry: 7K1L) in the bottom right. Proteins are coloured in teal and represented as a cartoon with active site residues and bound ligands represented as sticks. Bound ligands are colouredwhite. This figurewasmade using Protein Imager [10]. Creator: Coronavirus Structural Task Force - Sam Horrell & Lisa Schmidt, License: cc-by-sa

Therapeutic interest of the protein

As previously mentioned, knockout studies on nsp15 have shown it is not essential for viral replication. Despite this, a nsp15 inhibitor could provide an effective treatment against SARS-CoV-2 by hampering its evasion and modulation of the innate immune response to help promote longer-lasting immunity. Targeting nsp15 is particularly interesting as nsp15 has no close human homologues [25], thereby potentially reducing harmful side effects. A number of biochemical assays have been performed on nsp15 to screen previously approved drugs and various libraries for inhibition of nsp15, as well as a number of in-silico studies to dock approved therapeutics to guide drug design efforts. A fragment screening study has also been performed that yielded 6 small molecule fragments. Benzopurpurin B, C-473872 (CAS registry number: 331675-78-6), and Congo Red, as well as small molecular Rnase A inhibitors, have been shown to inhibit nsp15 activity and reduce infectivity of SARS-CoV in Vero cells [26] but further testing on SARS-CoV-2 nsp15 is required. Additionally, nsp15 has been screened against the ReFrame [27], Pandemic Response Box (Medicines for Malaria Venture (MMV) & Drugs for neglected disease initiative (DNDi)), and Covid Box drug repurposing libraries for 50% inhibition below concentration of 10 μM, identifying 23, 1, and 0 hits respectively from the libraries [25]. Two fluorescence resonance energy transfer (FRET) assays to determine the half- maximal inhibitory concentration (IC50) reduced the hits to 12 (11 in ReFrame, 1 in Pandemic Response Box), which were whittled down to 3 (Exebryl-1, Piroxantrone, and MMV1580853) after 9 were identified as false positives due to the production of reac- tive oxygen species such as H2O2, which destabilized protein in the assay. Ligand binding was assessed using high resolution mass spectrometry. Piroxantrone and MMV1580853 showed significantly weaker binding and ultimately no antiviral activity in SARS-CoV-2 assays. Exebryl-1 bound with an affinity constant Kd of ∼ 12 μM per monomer in the first instance, with approximately four molecules binding to one monomer on average per 100 μM Exebryl-1; and molecular docking of Exebryl-1 against PDB entry 6XDH using an automated Qvina docking workflow [28] showed binding in a pocket close to and within the active site. Exebryl-1 demonstrated antiviral activity in three separate assays at concen- trations over 10 μM. However, based on blood plasma levels in Sprague–Dawley rats after an oral dose of 100 mg/kg reaching only 9 μM after 1 h, and dropping to 4 μM after 4 h, Exebryl-1 is not expected to reach therapeutic levels in its current state [25].
A repurposed colorectal cancer drug, Tipiracil, has been found to partially inhibit nsp15 activity in biochemical assays. However, the efficacy is greatly decreased in the presence of increased Mn2+ concentrations. A structure of nsp15 with Tipiracil interacting with the uridine binding pocket has also been solved (PDB entry: 6WXC), with its uracil ring stacking against Tyr341 and forming several hydrogen bonds with Ser294, Lys345, and His250 (Figure 5) as well as several interactions with other active site residues through water and phosphate molecules. The only unique interaction for this ligand is between the Iminopyrrolidin nitrogen of Tipiracil and Gln245 (Figure 5). Although not an immediate treatment option, the uracil derivative drug provides a potential scaffold for further SARS- CoV-2 nsp15 inhibitor development [23]. Based on Tipiracil binding at the active site a library of 85 flavinoid compounds were docked using the molecular mechanics/generalized Born surface area (MMGBSA) method and molecular dynamics with nsp15 (PDB entry 6WXC) as part of an in-silico study; but binding was found to be significantly weaker than Tipiracil in all cases [29].
Fragment screens have been performed on nsp15, with six structures currently available in the PDB without an accompanying publication. In addition to the soaked fragments present in these structures, all show a citrate molecule bound to the catalytic NendouU domain (Figure 6, CIT), with one fragment bound adjacent to citrate (PDB entry 5S70, Figure 6, EN300-181428 (WUS)) through a stacking interaction with Trp333 and a hydrogen bond between the NO3 hydrogen of EN300-181428 and O5 of the citrate molecule. Four fragments are bound at the interface between the middle domain (Figure 6, pur- ple) and the N-terminal oligomerisation domain (Figure 6, blue), including FUZS-5 (PDB entry 5S71, Figure 6, WUV) Z2889976755 (PDB entry 5S6X, Figure 6, WUG), BBL029427 (PDB entry 5S72, Figure 6, WUY), and PB2255187532 (PDB entry 5S6Z, Figure 6, WUM). Finally, BBL029427 (PDB entry 5S6Y, Figure 6, WUJ) is bound to a loop connecting beta strands in the middle domain. Unfortunately, the crystal packing in these structures prevents the formation of the active double-ringed hexamer structure using symmetry related molecules, making it difficult to assess how the fragments interact with the active hex- amer. However, this monomeric crystal form could provide a starting point for the design of a drug to break up formation of the active hexamer by interfering with surfaces on the N-terminal oligomerization domain. Molecular docking, all-atom molecular dynamics, and an assessment of absorption, distribution, metabolism, and excretion (ADME) properties have been carried out on PDB entry 6W01 using 15 scalarane sesterterpenes, compounds purified from Red Sea Marine Sponges with a variety of relevant pharmacological activities. to assess their efficacy as drug targets to inhibit nsp15 [30]. Eight compounds were found to have equivalent or better binding energies compared to the reference ligand, Benzopurpurin 4B. All eight compounds bound the C-terminal catalytic domain in the large shallow active site, forming polar interactions with the catalytic triad (His235, His250, and Lys290), interacting with Trp333 through π-stacking, and forming at least one hydrogen bond with Lys290 and further anchoring hydrogen bonds with Gly248 and/or Gln245 [30]. Two of the eight were used in all atom molecular dynamics simulations and showed good stability, high negative binding free energies, and scored well on ADME drug property predictions

Structural biology of SARS-CoV-2 endoribonuclease NendoU (nsp15) 22

Figure 5. SARS-CoV-2 nsp15 active site crystal structures with bound Tipiracil from PDB entry 6WXC. The protein is coloured in teal and represented as a cartoon with active site residues and bound Tipiracil represented as sticks. Tipiracil is coloured white. This figure was made using Protein Imager [10]. Creator: Coronavirus Structural Task Force - Sam Horrell & Lisa Schmidt, License: cc-by-sa

In-silico docking investigations of 32 phytochemicals from Asparagus racemous have also been performed on nsp15 (PDBID: 6W01). The top 5 ligands (Asparoside-C, Asparoside-F, Rutin, Asparoside-D, and Racemoside-A) bound at the C-terminal active site with binding free energy scores between −7.165 kcal/mol and −5.993 kcal/mol. Com- plexes of nsp15 and Asparoside-C, -F, and -D were subjected to further analysis by 100 ns molecular dynamics simulations, which found Asparoside-D and -F to have favourable binding interactions and better affinity than the control ligand Remdesivir [31]. 23 pre- viously approved drugs have also been docked to nsp15, with three demonstrating high predicted binding affinities between −9.1 and −9.6 kcal/mol (Saquinavir, Aprepitant, and Valrubicin) [32]. However, the pocket Saquinavir, Aprepitant, and Valrubicin are docked to sites on the opposite side of the active site pocket which houses the catalytic triad, approximately 17 Å away. Barring an undetermined allosteric effect caused by this binding, which the paper makes no mention of, further development of these drug targets “ . . . modifying them to fit to the SARS-CoV-2 nsp15 active site pocket precisely” needs to be rethought as the active site has not been targeted in the first instance.

Structural biology of SARS-CoV-2 endoribonuclease NendoU (nsp15) 23

Figure 6. Small molecule fragment screening against SARS-CoV-2 nsp15, with nsp15 represented as flatfield coloured by domain (NendoU in teal, middle Domain in purple, and N-terminal Oligomerisation domain in blue). Fragment binding is shown as a flat field, coloured grey, with ligands represented as sticks in surrounding circles. This is a composite image of PDB entries 5S70 (EN300-181428, WUS), 5S71 (FUZS-5, WUV), 5S6X (Z2889976755, WUG), 5S72 (BBL029427, WUY), 5S6Y (BBL029427, WUJ), and 5S6Z (PB2255187532, WUM). This figure was made using Protein Imager. Creator: Coronavirus Structural Task Force - Sam Horrell & Lisa Schmidt, License: cc-by-sa

Complementary knowledge

The enzymatic activity of nsp15 and its crystal structure have been demonstrated, but the exact role in viral replication remains unclear. SARS-CoV nsp15 has been shown to co-localize with replicating RNA [33] around the nucleus as well as nsp8 and nsp12 from the replication/transcription complex in in situ studies [34], in the presence and absence of RNA. It was also shown that SARS-CoV nsp15 does not co-localise with the M protein [34]. Yeast two-hybrid screens and glutathione S-transferase (GST) pulldown assays have also identified nsp8 and nsp12 as potential binding partners to SARS-CoV nsp15 [35].
Furthermore, nsp15 has demonstrated a strong inhibitory effect on interferon (IFN) production and interferon regulatory factor 3 nuclear localization in in-vitro co-expression assays against the Cantell strain of Sendai virus with nsp13, nsp14, and accessory protein ORF6 [36]. However, interferon antagonization in in-vitro conditions is not necessarily representative of real infection, individual protein expression levels can vary greatly compared to overexpression studies and altered localization can have a significant effect[36]. The individual contribution or mechanism of nsp15 interferon inhibition is not discussed by Yuen et al 2020 in this study. Overall SARS-CoV-2 appears less effective at suppressing interferon signaling compared to SARS-CoV due to the loss of SARS-CoV-2 papain-like protease (PLpro) as an interferon antagonist [36]. Reverse genetic studies (analysis of a resulting phenotype following genetic engineering) have suggested that ORF6 is the major player in interferon suppression instead [37]. However, SARS-CoV-2 ORF6 is also less conserved between SARS-CoV and SARS-CoV-2 at only 69% sequence identity and only 4 of 10 key amino acids identified from SARS-CoV ORF6 being present in SARS-CoV-2 ORF6 [36].
It has been shown that nsp15 activity is highly dependent on the presence of Mn2+ ions, showing greatly reduced activity in the presence of Mg2+ ions. In the presence of Mn2+ nsp15 was able to cleave all four uridine sites in an eicosamer, a 20-subunit oligomer consisting of 51GAACU↓CAU↓GGACCU↓U↓GGCAG31, with no preference for sequence and increased cleavage rate with rising metal ion concentration [23]. This is particularly interesting as Mn2+ enhances activity in SARS-CoV nsp15, but protein activity does not depend on the presence of Mn2+, and no metal binding sites have been identified in coronavirus structures to date [18]. Considering SARS-CoV-2 nsp15 shares 88% sequence identity with SARS-CoV nsp15, and all active site residues are conserved, SARS-CoV-2 nsp15’s dependence on Mn2+ is a significant difference between the enzymes. Further to this, nsp15 alone is promiscuous, cutting any uridine sites in RNA, but becomes site-specific when in complex with nsp8 and nsp12 and leaves uridine tails between 5 and 10 bases long [16].
A library of 5000 small molecule compounds has been screened against nsp15 for inhibition of nuclease activity, with twelve compounds showing potential as antiviral treatments in a fluorescent biochemical kinetic screen. Further analysis using a gel-based assay found only one compound, NSC95397, able to inhibit nuclease activity at a concentration of 10 μM. However, tests on SARS-CoV-2 infected VERO E6 cells found the compound toxic at concentrations above 10 μM and ineffective at inhibiting viral growth at lower concentrations [38].
A fluorescence resonance energy transfer (FRET) assay has been performed to measure nsp15 activity on a 6-mer oligonucleotide (51-AAAUAA) with a 51-fluorescein and 31-TAMRA label [21,22]. Activity is measured through an increase in fluorescence caused by the removal of the 51-TAMRA label. Nsp15 activity was confirmed for the wild-type protein and abolished in H235A and H250A mutants [22]. FRET analysis was paired with liquid chromatography electrospray ionization mass spectrometry to demonstrate that nsp15 31RNA products show a preference for accumulation of 21-31 cyclic phosphate (80%) compared to 31-phosphate, a significant difference compared to RNAse A which generates a 2’-3’ cyclic phosphate which is then hydrolysed to a 3’-phosphate.

Summary

SARS-CoV-2 nsp15 is an RNA uridylate-specific Mn2+-dependent [3] endoribonucle- ase from the nidoviral endoU (NendoU) family, which acts on single-stranded and double- stranded RNA to help SARS-CoV-2 evade detection by the innate immune response. Knockout studies have demonstrated that nsp15 is not essential for viral replication, but numerous studies have shown a reduction in viral titre and virulence in nsp15-deficient SARS-CoV-2 when studied in the presence of an effective immune response. The sequence of nsp15 is highly conserved between SARS-CoV-2, SARS-CoV, MERS- CoV, and HCoV-229E, as is the fold of the monomer and active hexamer. The monomer consists of three domains, the N-terminal oligomerisation domain, a middle domain, and the NendoU catalytic domain which houses the active site. The active site is a shallow groove made up of six key residues (His235, His250, Lys290, Thr341, Tyr343, and Ser294). A series of structures with different catalytic intermediates have been solved and the reaction mechanism is predicted to act in a similar manner to the well-studied RNaseA enzyme. However, nsp15’s dependence on manganese, where RNase A’s activity is metal independent, throws some aspersions on this theory. Three in-silico drug screening studies have been performed on nsp15, two using 6W01 and one using 6WXC as the protein models. 6W01 is a citrate bound nsp15 structure solved to 1.9 Å resolution, with acceptable data processing and refinement statistics overall, the only minor concern is that 5% of the residues in both chains show one issue with their geometry, and a small subset of that 5% show an issue in their fit to the electron density. 6WXC is a Tipiracil bound nsp15 structure solved to 1.85 Å resolution, it faces a similar minor problem to 6W01 with 7% of residues in both chains showing one issue with their geometry but with fewer electron density fit outliers. Use of either model should present no major stumbling blocks for simulation studies.

Discussion & outlook

Nsp15 has been one of the lesser explored proteins compared to other SARS-CoV-2 pro- teins, such as the main protease and the papain-like protease, which have undergone extensive in-silico drug design studies through a number of large collaborative efforts between universities, synchrotrons, and other organizations [39-45] to feed into the COVID Moonshot project [46]. Overall, the structural work on nsp15 has been sound and all available models could provide a good starting structure for computational drug design. A series of structures with catalytic intermediates suggests a mechanism akin to RNase A, however, the dependence of nsp15 on Mn2+ suggests a departure from this mechanism as RNase A’s mechanism is metal independent. Follow up in-silico studies (described above) were based on well validated models with acceptable statistics for the resolution the structures were solved at, although none have yet pointed to a viable lead compound for clinical application. Nsp15 not being essential for viral replication makes it a much less desirable target for structure-based drug design compared to other essential viral proteins. However, the impact of nsp15 on SARS-CoV-2’s virulence by repressing the innate immune response shows a potential avenue to weaken SARS-CoV-2 through inhibition of nsp15 to allow the immune system to fight off infection before it becomes more severe.

This blog post was published in Crystallography Reviews, please cite:
https://doi.org/10.1080/0889311X.2022.2065270

Acknowledgements

The authors would also like to thank Johannes Kaub and Rosemary Wilson for support and discussion.

Funding

This work was supported by Bundesministerium für Bildung und Forschung: [Grant Number 05K19WWA]; Deutsche Forschungsgemeinschaft: [Grant Number TH2135/2-1].

References

[1] Kim Y, Jedrzejczak R, Maltseva NI, et al. Crystal structure of Nsp15 endoribonuclease XE “endoribonuclease” NendoU from SARS-CoV-2. Protein Sci. 2020;29:1596–1605.
[2] Cui J, Li F, Shi Z-L. Origin and evolution of pathogenic coronaviruses. Nat Rev Microbiol. 2019;17:181–192.
[3] Ivanov KA, Hertzig T, Rozanov M, et al. Major genetic marker of nidoviruses encodes a replicative endoribonuclease XE “endoribonuclease” . Proc Natl Acad Sci U S A. 2004;101:12694–12699.
[4] Naqvi AAT, Fatima K,Mohammad T, et al. Insights into SARS-CoV-2 genome, structure, evolution, pathogenesis and therapies: structural genomics approach. Biochim Biophys Acta Mol Basis Dis. 2020;1866:165878.
[5] Bhardwaj K, Sun J, Holzenburg A, et al. RNA recognition and cleavage by the SARS Coronavirus endoribonuclease. J Mol Biol. 2006;361:243–256.
[6] Deng X, Baker SC. An “Old” protein with a new story: Coronavirus endoribonuclease XE “endoribonuclease” is important for evading host antiviral defenses. Virology. 2018;517:157–163.
[7] Snijder EJ, Decroly E, Ziebuhr J. Chapter three - The nonstructural proteins directing Coronavirus RNA synthesis and processing. In: Ziebuhr J, editor. Advances in virus Research [internet].Academic Press; 2016. p. 59–126. [cited 2022 Jan 7]. https://www.sciencedirect.com/science/article/pii/S0065352716300471.
[8] Nga PT, Parquet MdC, Lauber C, et al. Discovery of the first insect nidovirus, a missing evolutionary link in the emergence of the largest RNA virus genomes. PLoS Pathog. 2011;7:e1002215.
[9] Lauber C, Ziebuhr J, Junglen S, et al. Mesoniviridae: a proposed new family in the order nidovirales formed by a single species of mosquito-borne viruses. Arch Virol. 2012;157:1623–1628.
[10] Tomasello G, Armenia I,Molla G. The protein imager: a full-featured online molecular viewer interface with server-side HQ-rendering capabilities. Bioinformatics. 2020;36:2909–2911.
[11] Deng X,Hackbart M,Mettelman RC, et al. Coronavirus nonstructural protein 15 mediates evasion of dsRNA sensors and limits apoptosis in macrophages. PNAS. 2017;114:E4251–E4260.
[12] Kindler E, Gil-Cruz C, Spanier J, et al. Early endonuclease-mediated evasion of RNA sensing ensures efficient coronavirus replication. PLoS Pathog. 2017;13:e1006195.
[13] Volk A,Hackbart M, Deng X, et al. Coronavirus Endoribonuclease and deubiquitinating interferon antagonists differentiallymodulate the host response during replication inmacrophages. Journal of Virology [Internet]. 2020. [cited 2022 Jan 6]; https://journals.asm.org/doi/abs/10. 1128/JVI.00178-20.
[14] Kato H, Takeuchi O, Sato S, et al. Differential roles of MDA5 and RIG-I helicases in the recognition of RNA viruses. Nature. 2006;441:101–105.
[15] Mandilara G, Koutsi MA, Agelopoulos M, et al. The role of Coronavirus RNA-Processing enzymes in innate immune evasion. Life (Basel). 2021;11:571.
[16] Hackbart M, Deng X, Baker SC. Coronavirus endoribonuclease XE “endoribonuclease” targets viral polyuridine sequences to evade activating host sensors. Proc Natl Acad Sci U S A. 2020;117:8094–8103.
[17] Deng X, Geelen Av, Buckley AC, et al. Coronavirus Endoribonuclease activity in porcine epidemic diarrhea virus suppresses type I and type III interferon responses. Journal of Virology [Internet]. 2019. [cited 2022 Jan 6]; https://journals.asm.org/doi/abs/10.1128/JVI.02000-18.
[18] Ricagno S, Egloff M-P, Ulferts R, et al. Crystal structure and mechanistic determinants of SARS coronavirus nonstructural protein 15 define an endoribonuclease XE “endoribonuclease” family. PNAS. 2006;103:11892–11897.
[19] Joseph JS, Saikatendu KS, Subramanian V, et al. Crystal structure of a monomeric form of Severe Acute Respiratory Syndrome Coronavirus endonuclease nsp15 suggests a role for hexamerization as an allosteric switch. Journal of Virology [Internet]. 2007. [cited 2022 Jan 6]; https://journals.asm.org/doi/abs/10.1128/JVI.02817-06.
[20] Bhardwaj K, Palaninathan S, Alcantara JMO, et al. Structural and functional analyses of the Severe Acute Respiratory Syndrome Coronavirus Endoribonuclease Nsp15∗. J Biol Chem. 2008;283:3655–3664.
[21] Zhang L, Li L, Yan L, et al. Structural and biochemical characterization of endoribonuclease Nsp15 encoded by Middle East Respiratory Syndrome coronavirus. Journal of Virology [Internet]. 2018. [cited 2022 Jan 6]; https://journals.asm.org/doi/abs/10.1128/JVI.00893-18.
[22] PillonMC, Frazier MN, Dillard LB, et al. Cryo-EM structures of the SARS-CoV-2 endoribonuclease XE “endoribonuclease” Nsp15 reveal insight into nuclease specificity and dynamics. Nat Commun. 2021;12:636.
[23] Kim Y, Wower J, Maltseva N, et al. Tipiracil XE “tipiracil” binds to uridine site and inhibits Nsp15 endoribonuclease XE “endoribonuclease” NendoU from SARS-CoV-2. Commun Biol. 2021;4:1–11.
[24] Saramago M, Costa VG, Souza CS, et al. The nsp15 nuclease as a good target to combat SARSCoV-2: mechanism of action and Its inactivation with FDA-approved drugs.Microorganisms. 2022;10(342):342–363.
[25] Choi R, Zhou M, Shek R, et al. High-throughput screening of the ReFRAME, Pandemic Box, and COVID Box drug repurposing libraries against SARS-CoV-2 nsp15 endoribonuclease XE “endoribonuclease” to identify small-molecule inhibitors of viral activity. PLOS ONE. 2021;16:e0250019.
[26] Ortiz-Alcantara J, Bhardwaj K, Palaninathan S, et al. Small molecule inhibitors of the SARSCoV Nsp15 endoribonuclease XE “endoribonuclease” . Virus Adaptation and Treatment. 2010;2:125–133.
[27] Janes J, Young ME, Chen E, et al. The ReFRAME library as a comprehensive drug repurposing library and its application to the treatment of cryptosporidiosis. PNAS. 2018;115:10750–10755.
[28] Alhossary A, Handoko SD, Mu Y, et al. Fast, accurate, and reliable molecular docking with QuickVina 2. Bioinformatics. 2015;31:2214–2216.
[29] Mishra GP, Bhadane RN, Panigrahi D, et al. The interaction of the bioflavonoids with five SARS-CoV-2 proteins targets: An in silico study. Comput Biol Med. 2021;134:104464.
[30] Elhady SS, Abdelhameed RFA,Malatani RT, et al. Molecular docking and dynamics simulation study of hyrtios erectus isolated scalarane sesterterpenes as potential SARS-CoV-2 dual target inhibitors. Biology (Basel). 2021;10:389.
[31] Chikhale RV, Sinha SK, Patil RB, et al. In-silico investigation of phytochemicals from Asparagus racemosus as plausible antiviral agent in COVID-19. J Biomol Struct Dyn. 2021;39:5033–5047.
[32] Mahmud S, Elfiky AA, Amin A, et al. Targeting SARS-CoV-2 nonstructural protein 15 endoribonuclease XE “endoribonuclease”: an in silico perspective. Future Virol.:10.2217/fvl-2020–0233.
[33] Shi ST, Schiller JJ, Kanjanahaluethai A, et al. Colocalization and membrane Association of murine Hepatitis Virus gene 1 products and De novo-synthesized viral RNA in infected cells. Journal of Virology [Internet]. 1999. [cited 2022 Jan 6];
[34] Athmer J, Fehr AR, Grunewald M, et al. In situ tagged nsp15 reveals interactions with Coronavirus replication/transcription complex-associated proteins. mBio [Internet]. 2017. [cited 2022 Jan 6]; https://journals.asm.org/doi/abs/10.1128/mBio.02320-16.
[35] Imbert I, Snijder EJ, Dimitrova M, et al. The SARS-Coronavirus PLnc domain of nsp3 as a replication/transcription scaffolding protein. Virus Res. 2008;133:136–148.
[36] Yuen C-K, Lam J-Y,WongW-M, et al. SARS-CoV-2 nsp13, nsp14, nsp15 and orf6 function as potent interferon antagonists. Emerg Microbes Infect. 2020;9:1418–1428.
[37] Schroeder S, Pott F, Niemeyer D, et al. Interferon antagonism by SARS-CoV-2: a functional study using reverse genetics. The Lancet Microbe. 2021;2:e210–e218.
[38] Canal B, Fujisawa R, McClure AW, et al. Identifying SARS-CoV-2 antiviral compounds by screening for small molecule inhibitors of nsp15 endoribonuclease XE “endoribonuclease”. Biochem J. 2021;478:2465–2479.
[39] Cantrelle F-X, Boll E, Brier L, et al. NMR spectroscopy of the main protease of SARS-CoV-2 and fragment XE “fragment” -based screening identify three protein hotspots and an antiviral fragment. Angew Chem, Int Ed. 2021;60:25428–25435.
[40] Newman JA, Douangamath A, Yadzani S, et al. Structure, mechanism and crystallographic fragment screening of the SARS-CoV-2 NSP13 helicase. Nat Commun. 2021;12:4848.
[41] Zhao Y, Du X, Duan Y, et al. High-throughput screening identifies established drugs as SARSCoV-2 PLpro inhibitors. Protein Cell. 2021;12:877–888.
[42] Ma C, Sacco MD, Xia Z, et al. Discovery of SARS-CoV-2 papain-like Protease Inhibitors through a combination of high-throughput screening and a FlipGFP-based reporter assay. ACS Cent Sci. 2021;7:1245–1260.
[43] Douangamath A, Fearon D, Gehrtz P, et al. Crystallographic and electrophilic fragment screening of the SARS-CoV-2 main protease. Nat Commun. 2020;11:5047.
[44] Ahmad S, Abdullah I, Lee YK, et al. Extensive Crystallographic Fragment XE “Fragment”-Based Approach to Design SARS CoV2 3CLpro Main Protease Inhibitors and Related Metadata. 2021 [cited 2022 Jan 6]; https://chemrxiv.org/engage/chemrxiv/article-details/60c753c8469df4c73ef44e13.
[45] Günther S, Reinke PYA, Fernández-García Y, et al. X-ray screening identifies active site and allosteric inhibitors of SARS-CoV-2 main protease. Science. 2021;372:642–646.
[46] Consortium TCM, Achdout H, Aimon A, et al. Open Science Discovery of Oral Non-Covalent SARS-CoV-2 Main Protease Inhibitor Therapeutics [Internet]. 2021 [cited 2022 Jan 6]. p.2020.10.29.339317. https://www.biorxiv.org/content/10.1101/2020.10.29.339317v2.

This blog post is pulished also as a citeable article: doi.org/10.1080/0889311X.2024.2325352

Abstract

Following infection, SARS-CoV-2’s leader protein nsp1 is the very first viral protein to be expressed. Its sequence is highly conserved among different SARS-CoV-2 strains, indicating its vital function that makes it a promising target for drugs and vaccination. Nsp1 takes over the host cell protein expression machinery to facilitate the production of SARS-CoV-2 virions by the host cell by binding to the ribosome and obstructing the expression of any but the viral proteins. To date, 28 structures obtained by cryo-electron microscopy and X-ray diffraction have been published, showing the protein’s structural features as well as a number of interactions with host ribosomes. Nsp1 has also been shown to interfere with immune response pathways and is connected to the cytokine storm causing organ damage and failure in COVID-19 patients.

Introduction

SARS-CoV-2’s non-structural protein 1 (nsp1) is known by several names, including leader protein [1], host translation inhibitor [2] and host shutoff factor [3]. It is found in all beta-coronaviruses [4], and despite being relatively small with a length of only 180 amino acid residues [5], it plays a central role in the viral life cycle: nsp1 inhibits the host cell translation while facilitating viral translation [6]. Nsp1 is translated from the 5′ end of the SARS-CoV-2 genome [3] and is located in the beginning section of open reading frames 1a and 1ab (ORF1a) of the coronavirus genome; hence, it is the very first protein that is expressed by an infected host cell [7]. In the resulting polyprotein chain, nsp1 is located at the N-terminus.

Translation of the viral RNA into proteins occurs either shortly after the virus enters the host cell or after viral RNA replication inside the infected cell. The virus does not have its own proteins for RNA translation; instead, it uses the already existing translation machinery of the host cell: ribosomes [8]. They consist of several strands of ribosomal RNA (rRNA) and ribosomal proteins, which then form a larger (60S) and a smaller (40S) subunit [9]. The SARS-CoV-2 leader protein hijacks ribosomes and uses them for the replication of viral RNA, while the translation of original host cell proteins is suppressed [6]. In order to achieve this, nsp1 binds to the mRNA entrance channel of the ribosome [3] and facilitates the degradation of host-innate mRNA, leading, among other things, to severe inhibition of the cellular immune response which facilitates infection.

BLAST analysis [10] indicates that the sequence of nsp1 in SARS-CoV-1 and SARS-CoV-2 is unique, with no significant similarity to any known sequences in other viruses. The closest similarity in sequence found in vertebrate genes (with BLAST E values of no better than 0.85) [11]. Even though more recent studies indicated a common origin of alpha- and betacoronaviral nsp1, despite a lack of sequence homology, the mechanism of interaction between nsp1 and the host-cell-innate protein expression machinery is still considered specific to SARS-CoV-1 and SARS-CoV-2 [12].

The first full-length structure of SARS-CoV-1 nsp1 was obtained by nuclear magnetic resonance (NMR) in December 2006 (PDB entry: 2HSX) [13]. It would remain the only experimental structure of coronavirus nsp1 until February 2013 when in a crystallographic study, nsp1 from transmissible gastroenteritis virus (TGEV), an alphacoronavirus, was compared to the SARS-CoV-1 protein structure and functionality. TGEV nsp1, while exhibiting very little sequence homology to SARS-CoV-1 nsp1, contains the same structural features: a barrel fold made up of six β-strands with an α-helix located on one of its openings. Despite these similarities, it was postulated that both proteins inhibit host-innate translation by different mechanisms [14]. Five years later, the first full-length protein structure of an nsp1 was published, this time stemming from another alphacoronavirus, porcine epidemic diarrhoea virus (PEDV). This structure (PDB entry: 5XBC) again also showed similar structural features but low sequence homology [15].
In this review, we will describe the role and functions of this protein, with a particular focus on the available molecular structures.

Structural biology of SARS-CoV-2 leader protein (nsp1) 24

Figure 1. Overlay of the SARS-CoV-2 nsp1 N-terminal domain (PDB entry 7K7P; blue/darker color) and TGEV nsp1 (PDB entry: 6IVC; green/lighter color) with labels indicating common structural features.

Structural overview

The structure of nsp1 can be divided into two folded domains (Figure 2): an N-terminal domain (amino acid residues 1 to 128) and the smaller C-terminal domain (residues 148 to 180), with a flexible unstructured linker between them [3].

Structural biology of SARS-CoV-2 leader protein (nsp1) 25

Figure 2. Domains of SARS-CoV-2 nsp1 in sequence with domain folds below. The protein consists of two ordered domains, the N-terminal globular domain (blue, left, PDB entry: 7EQ4) and the C-terminal domain (purple, right, PDB entry: 6ZOJ), with intrinsically disordered regions (N-terminal IDR) on the N-terminal end and between the two domains (Linker).

By the end of 2021, a total of 17 structures of the SARS-CoV-2 nsp1 were available (Table 1), of which only three were obtained by X-ray crystallography. These three structures only show the globular N-terminal domain with a resolution of up to 1.25 Å (PDB entry: 7EQ4). They are very similar in fold, with e.g. 7K7P and 7K3N exhibiting an RMSD of 0.35 Å based on Cα positions. The remaining 14 structures were obtained by cryo-electron microscopy (cryo-EM) and show the C-terminal domain of nsp1 in different complexes with ribosomal proteins and rRNA ranging in resolutions from 2.6 to 3.3 Å. Model-based predictions of the full-length structure of SARS-CoV-2 nsp1 indicate strong structural similarity to SARS-CoV-1 nsp1 (84 % sequence identity in the nsp1 sections of ORF1ab [16,17]) and hence a similar fold and function (see Figure 3) [12]. A near-complete assignment of the backbone chemical shifts obtained from 1H, 13C and 15N NMR experiments further supports this hypothesis by taking experimental data from various sources and methods into account [18].

Overview of available structures for SARS-CoV-1 and SARS-CoV-2 nsp1

SARS-CoV-1
PDB IDMethodResolution [Å]Description
2GDT & 2HSXNMRFirst structure of its kind (amino acids 12 to 127), showing a previously unknown β-barrel fold. Deposited as an ensemble, (2GDT) as a consensus structure (2HSX) [Citation13].
7OPLCryo-EM4.12Nsp1 (amino acids 13 to 127) in complex with DNA polymerase α [Citation36].
SARS-CoV-2
PDB IDMethodResolution [Å]Description
6ZLWCryo-EM2.6C-terminal domain of nsp1 (amino acids 148 to 180) in complex with human 40S ribosomal subunit [Citation6].
6ZM7Cryo-EM2.7C-terminal domain of nsp1 (amino acids 152 to 180) in complex with human CCDC124–80S–EBP1 ribosomal complex [Citation6].
6ZMECryo-EM3.0C-terminal domain of nsp1 (amino acids 152 to 180) in complex with human CCDC124–80S–eERF1 ribosomal complex and rRNA [Citation6].
6ZMICryo-EM2.6C-terminal domain of nsp1 (amino acids 148 to 180) in complex with human LYAR–80S ribosomal complex [Citation6].
6ZMOCryo-EM3.1C-terminal domain of nsp1 (amino acids 148 to 180) in complex with human LYAR–80S–eEF1a ribosomal complex and rRNA [Citation6].
6ZMTCryo-EM3.0C-terminal domain of nsp1 (amino acids 152 to 180) in complex with human pre-40S-like ribosome and rRNA, state 1 [Citation6].
6ZN5Cryo-EM3.2C-terminal domain of nsp1 (amino acids 152 to 180) in complex with human pre-40S-like ribosome and rRNA, state 2 [Citation6].
6ZOJCryo-EM2.8C-terminal domain of nsp1 (amino acids 148 to 180) in complex with human 40S ribosomal subunit [Citation3].
6ZOKCryo-EM2.8C-terminal domain of nsp1 (amino acids 148 to 180) in complex with human 40S ribosomal subunit body domain and 18S rRNA [Citation3].
6ZOLCryo-EM2.8Human 40S ribosomal subunit head domain and 18S rRNA (no involvement of nsp1) [Citation3].
6ZONCryo-EM3.0C-terminal domain of nsp1 (amino acids 151 to 180) in complex with human 43S preinitiation ribosome and 18S rRNA, state 1 [Citation6].
6ZP4Cryo-EM2.9C-terminal domain of nsp1 (amino acids 148 to 180) in complex with human 43S preinitiation ribosome and 18S rRNA, state 2 [Citation6].
7EQ4X-ray Diffraction1.25Globular N-terminal domain (amino acids 11 to 125) [Citation40].
7JQBCryo-EM2.7C-terminal domain of nsp1 (amino acids 145 to 180) in complex with rabbit 40S ribosomal subunit and rRNA [Citation7].
7JQCCryo-EM3.3C-terminal domain of nsp1 (amino acids 145 to 180) in complex with rabbit 40S ribosomal subunit and CrPV IRES [Citation7].
7K3NX-ray Diffraction1.65N-terminal globular domain (amino acids 13 to 127) [Citation41].
7K5ICryo-EM2.9C-terminal domain of nsp1 in complex with human 40S ribosomal subunit and 18s rRNA [Citation23].
7K7PX-ray Diffraction1.77N-terminal globular domain (amino acids 10 to 127) [Citation16].
8A4YX-ray Diffraction1.099N-terminal domain (amino acids 10 to 126) in complex with N-(2,3-dihydro-1H-inden-5-yl)acetamide [Citation37].
8A55X-ray Diffraction0.99N-terminal globular domain (amino acids 10 to 126) [Citation17].
8AOUNMRFull-length structure (amino acids 3 to 182) [Citation21].
8ASQX-ray Diffraction1.15N-terminal globular domain (amino acids 10 to 126) in complex with N-methyl-1-(4-(thiophen-2-yl)phenyl)meth-anamine [Citation17].
8AYSX-ray Diffraction1.37N-terminal globular domain (amino acids 10 to 126) in complex with 4-(2-aminothiazol-4-yl)phenol [Citation17].
8AYWX-ray Diffraction1.10N-terminal globular domain (amino acids 10 to 126) in complex with (S)-1-(4-chlorophenyl)ethan-1-amine [Citation17].
8AZ8X-ray Diffraction1.18N-terminal globular domain (amino acids 10 to 126) in complex with 2-(benzylamino)ethan-1-ol [Citation17].
8AZ9X-ray Diffraction1.42N-terminal globular domain (amino acids 10 to 126) in complex with 1-(2-(3-chlorophenyl)thiazol-4-yl)-N-methyl-methanamine [Citation17].
8CRFX-ray Diffraction1.15N-terminal globular domain (amino acids 10 to 126) in complex with fragment hit 5E11 [Citation42].
8CRKX-ray Diffraction1.1N-terminal globular domain (amino acids 10 to 126) in complex with fragment hit 7H2 [Citation42].
8CRMX-ray Diffraction1.42N-terminal globular domain(amino acids 10 to 126) in complex with fragment hit 11C6 [Citation42].
Table 1 of 2

The structural models of the N-terminal domain of nsp1 derived from X-ray diffraction have been evaluated by the Coronavirus Structural Task Force [19]. The difference Fourier map for PDB entry 7EQ4 contain a total of nine peaks larger than 5σ and a methionine residue which should be flipped to better fit the electron density map. 7K3N contains only four peaks above 5σ, but the solvent molecules are not modelled ideally. 7K7P has five peaks above 5σ, an alternative confirmation for the residue His74 in chain B might improve the structure model.

Structural biology of SARS-CoV-2 leader protein (nsp1) 26

Figure 3. Sequence alignment of SARS-CoV-1 and SARS-CoV-2 nsp1, with differences in sequence highlighted.

Complementary data on the protein’s secondary structure features were obtained via 1H-, 13C- and 15N-NMR in solution and published in May 2021, revealing details about the Cβ positions of the full-length apo structure and indicating intramolecular interactions between the two ordered domains [20].

A full-length structure of SARS-CoV-2 nsp1 with atomic resolution (0.99 Å, PDB entry: 8A55) was reported in October 2022 by Shumeng et al. [17]. The group has also succeeded in crystallizing the N-terminal domain (amino acids 10 to 126) using common standardized crystallization screens, resulting in three types of crystals with space group P43212.

Structural biology of SARS-CoV-2 leader protein (nsp1) 27

Figure 4. NMR ensemble of SARS-CoV-2 nsp1 in solution, showing a subset of four structures from a structural ensemble (PDB entry: 8AOU). The disordered C-terminal domain (purple) shows affinity towards the N-terminal domain’s (blue) face containing the α-helix (α1) that blocks one side of the β-sheet barrel.

A complete NMR structure model of both domains was reported in January 2023 by Wang et al. [21]. It shows that the C-terminal domain, which is structurally disordered in free nsp1, interacts with the same region of the N-terminal domain that shows affinity towards viral mRNA when the protein is bound to a ribosome.

Structural features

The N-terminal globular domain comprises residues 13 to 128 and is made up of six β-strands forming a mixed parallel/anti-parallel barrel and an α-helix covering one of its openings (Figure 5), with a short 310 helix (η1) next to the barrel. The globular domain’s sequential arrangement is β1-α1-β2-η1-β3-η3-β4-β5-β6, the sequence of strands in the barrel is β1-β2-β5-β3-β4-β6(-β1). The protein’s N- and C-terminal loops, residues 1 to 12 and 129 to 147 respectively, are flexibly disordered [13]. A recent study reports an additional flexible 310 helix (η3) located between β3 and β4, comprising amino acids 80 to 83 [17].

Structural biology of SARS-CoV-2 leader protein (nsp1) 28

Figure 5. N-terminal globular domain of SARS-CoV-2 nsp1 (PDB entry: 7EQ4). The β-sheets form a semi-open barrel structure, with one aperture blocked by an α-helix (α1).

The C-terminal domain (residues 148 to 180) containing the two helices α2 (residues 153 to 159) and α3 (residues 166 to 178), which are responsible for anchoring the protein to the ribosome [22]. This fundamental interaction will be described in the following section.

The two domains are connected by a disordered 20 residues long linker (residues 128 to 147), which is usually omitted in structural studies.

Interaction with ribosomes

The C-terminus of Nsp1 binds to the 40S subunit of the host ribosome. This inhibition causes major disruption to the cellular metabolism and disables important immune response pathways [7].

Human ribosomes, also known as 80S ribosomes (the number indicates their sedimentation coefficient), consist of two subunits referred to as 40S and 60S. Once nsp1 is released into the cytoplasm of an infected cell, its C-terminus interacts with the 40S ribosomal subunit in different ways, forming various complexes that might involve additional molecular components. Complexes can form under involvement of TSR1, a biogenesis factor present in the cytosol, or various eukaryotic initiation factors such as eIF1, eIF1A, eIF3 and an eIF2–tRNAi–GTP (guanosine triphosphate) complex, effectively constituting 43S preinitiation complexes (PIC) [6].

Complexes formed between nsp1’s C-terminus and complete 80S ribosomes might also involve additional components, including CCDC124 (coiled-coil domain containing short open reading frame 124), ABCE1 (an enzyme relevant for ribosome recycling and ATP cleavage), eRF1 (a translation termination factor), or the LYAR (Ly 1 antibody reactive) protein, as well as an eEF1A–GTP–tRNA complex. All of these 80S complexes are translationally inactive, and some interactions have not been observed in an nsp1-free environment before [6]. The binding mechanism between nsp1 and the ribosomal subunit is identical in either case:

Nsp1’s C-terminal helix α2 interacts hydrophobically with the ribosomal protein uS5 of the 40S body domain and electrostatically with ribosomal protein uS3 of the 40S head domain, placing nsp1 inside the ribosome’s mRNA entry channel and locking the head domain in a conformational position that prevents loading of mRNA, hence rendering translation impossible. A KH motif (residues 164 and 165) in the loop following α2 in nsp1’s C-terminal sequence can subsequently interact with the ribosome’s nucleotide pseudoknot structure h18 on the rRNA (Figure 6). The helix α3 in nsp1’s C-terminal domain forms additional stabilizing interactions with h18 and uS5 [6]. While nsp1 exhibits no affinity towards 60S ribosomal subunits, its affinity to the 40S subunit, and therefore the entire 80S ribosome, relies on the KH motif in its C-terminal domain. It could be shown that mutations in these positions disrupt the colocalization of nsp1 with both 40S subunits and 80S ribosomes in human embryonic kidney (HEK) cells. To confirm this, Thomes et al. showed experimentally that K164A and H165A mutants of both SARS-CoV-1 and SARS-CoV-2 showed no binding affinity towards 40S ribosomal subunits, unlike their non-mutated variants [6].

Structural biology of SARS-CoV-2 leader protein (nsp1) 29

Figure 6. Interaction between SARS-CoV-2 nsp1 (purple), human ribosomal RNA (bright orange) and ribosomal proteins (mainly 40S, dark orange), with a highlight on nsp1’s KH motif (molecular representation in the zoomed-in view) mediating the interaction (PDB entry: 6ZOJ).

Nsp1 sterically hinders the physiological conformation that is necessary for the ribosome to be able to load the host cell’s mRNA by limiting the rotational movement of the head domain of the 40S subunit. Due to this blockade, nsp1 plays an important role in suppressing the host’s immune response by preventing expression of proteins involved in immune pathways, such as cytokines and interferons. SARS-CoV-2’s own viral RNA is capable of overcoming the blockade. For this, various mechanisms via specific stem–loop structures in the viral RNA’s 5′ untranslated region (UTR) are being discussed [1], possibly involving the host-innate essential initiation factor eIF3 [6,7]. In a preprint from Shi et al., the viral 5′ UTR is suggested to interact with the C-terminal domain of nsp1 in a way that renders simultaneous interaction with the ribosome sterically impossible and therefore lifts the blockade for the viral mRNA [23]. It has also been suggested that while not all ribosomes are blocked by nsp1, the remaining are five times more likely to translate mRNA including the viral RNA’s 5′ untranslated region, causing translation of viral RNA to be significantly more efficient [3].

Complementary knowledge

Similarity to SARS-CoV-1

SARS-CoV-2 nsp1 is highly homologous to SARS-CoV-1 nsp1 both in sequence (84% identity) and structure. However, there are a number of structural differences between the two proteins that may slightly alter their function in the context of their respective viral life cycles. For instance, residues 23 to 25, a sequence that is present in both SARS-CoV-1 and SARS-CoV-2 nsp1, folds as a 310 helix (η1) in SARS-CoV-2, but not in SARS-CoV-1. Without any sequential differences in the immediate vicinity that could stabilize this structural feature, it is supported via a new interaction with residue Q63 on η2 (η1 in SARS-CoV-1, residues 63 to 65) [16].

Furthermore, SARS-CoV-2’s β4-strand (residues 84 to 92) contains four additional amino acids due to adjacent mutations K84V, V85M, and M92L. This extension allows for auxiliary interactions involving the loops after β3 and β4, causing significant mean displacement of their Cα (16.6 Å and 27.6 Å, respectively). This new network of polar interactions stabilizes an additional short β-strand not present in SARS-CoV-1 (β5; residues 95 to 97). Similarly, β1 (residues 13 to 20), β3 (residues 68 to 73) and β6 (β5 in SARS-CoV-1, residues 103 to 109 in SARS-CoV-2) have been extended by one amino acid each as well. Three mutations—A38V, E44Q and N48D—can be found on the structurally conserved α1-helix (residues 34 to 49), the strands β2 (residues 51 to 54) and β7 (β6 in SARS-CoV-1; residues 117 to 123 in SARS-CoV-2) are conserved as well [16]. An NMR study on nsp1 secondary structure, on the other hand, reports slightly different amino acid ranges for many of these features and did not find evidence for the SARS-CoV-2-specific β5-strand observed via X-ray crystallography, defining the sequence as part of a dynamic region instead, implying that the observed differences are due to crystal packing [20].

Similarity between different SARS-CoV-2 strains

Sequence alignment analysis across different SARS-CoV-2 strains shows that nsp1 mutates rarely, with only one single mutation (D75E) evident on the C-terminal side of the β4 sheet that is unlikely to significantly affect the protein’s functionality due to the conserved polarity between aspartic and glutamic acid [22]. This lack of endemic mutants hints at nsp1’s crucial function in the viral life cycle.

Cleavage of host-innate mRNA

In addition to blocking host mRNA translation at ribosomes, the 40S–nsp1 complex causes cell-innate capped mRNA to degrade by inducing endonucleolytic cleavage in the untranslatable region (UTR) near its 5′ terminus. However, nsp1 does not exhibit RNase activity itself but is believed to employ a host-innate Rnase, although it is unclear which one that may be [24]. SARS-CoV-2’s viral RNA itself is not targeted by this degradation process [1], leading to a rise in viral RNA activity compared to host-innate RNA activity over time. A leader sequence present in all SARS-CoV-2 RNA strands in the UTR at the 5′ terminus is thought to protect against this host cell enzymatic activity. To this end, a section on the N-terminal domain of ribosome-bound nsp1 that is blocked by the C-terminal domain in the free protein interacts with the 5′ UTR on the viral RNA [21]. Hence, the viral RNA is blocked from interacting with the endonucleolytically active nsp1–40S complex and is instead abundantly available for translation by ribosomes free of nsp1 interference. At the same time, cell-innate mRNA readily interacts with the complex and is gradually deactivated, until it is unable to compete with the significantly more abundant viral mRNA for functional ribosomes [3,24].

Immune response interferences

Nsp1 has not only been shown to facilitate the degradation of host-innate mRNA—it also interferes with its nuclear export. By binding to the transcriptional repressor NFX1, an important agent in nuclear transcription, nsp1 appears to not immediately limit its ability to bind RNA, but rather change the conformation of NFX1–RNA complexes to a state that is unable to dock with export adapters that allow the mRNA to permeate the nuclear pore complex, as indicated by a series of in vitro glutathione S-transferase pull-down assays [25]. Such an effect has also been observed for SARS-CoV-1 [26] and other viruses, for example influenza via its NS1 protein [27]. Inhibiting the export of newly transcribed mRNA out of the nucleus severely limits the capability of the infected cell to adjust their transcriptome to the viral threat and prevents the expression of proteins that would be crucial for a functional immune response, such as type-I interferons [28,29]. At the same time, it contributes to the reduction of host-innate mRNA activity in the cytoplasm and hence creates a favourable environment for viral reproduction. The structural mechanism of this inhibition remains to be investigated.

Interestingly, another study suggests that expression of certain proteins involved in immune response are in fact upregulated in cells infected with SARS-CoV-2 [30], especially proinflammatory cytokines e.g. IL-22, an interleukin protein involved in tissue repair processes that has been shown to play an important role in maintaining the integrity of epithelial cells [31]. This effect is observed in severe cases of COVID-19 that require hospitalization and intensive care treatment [32], as increased secretion of these hyperinflammatory factors causes various detrimental complications in patients, among them liver damage, heart and kidney failure [33]. How their expression avoids nsp1-mediated downregulation remains unclear.

Structural biology of SARS-CoV-2 leader protein (nsp1) 30

Figure 7. Interactome of nsp1 with host cell factors, grouped by function: Pyrroline-5-carboxylate reductase 1 and 2 (PYCR1 & PYCR2), mevalonate kinase (MVK), DNA polymerase alpha subunits 1 and 2 (POLA1 & POLA2), DNA primase small and large subunits (PRIM1 & PRIM2) as well as Eukaryotic translation initiation factor 3 subunits G and I (EIF3G & EIF3I).

A comprehensive study on the human cell interactome of SARS-CoV-2 employing tandem affinity purification and proximity labelling very likely indicates an interaction between the nsp1 N-terminal globular domain and the DNA polymerase α complex (Pol α) (see Figure 7). This interaction can also be observed for SARS-CoV-1 nsp1 [34], but not for any nsp1 homolog in other coronaviruses, hinting at an interaction mechanism that is unique to SARS-CoV-1 and SARS-CoV-2 [35]. These results were later confirmed experimentally by cryo-EM on the basis of SARS-CoV-1 nsp1, as it showed a somewhat higher affinity towards the Pol α [36]. The implications of these interactions for SARS-CoV-2 virulence remains to be investigated.

Therapeutic interest in nsp1

The leader protein nsp1 is a crucial factor for pathogenicity shared by alpha- and betacoronaviruses. In spite of the fact that little sequential conservation is evident between nsp1 variants in different sarbecovirus genera, their function in the infection cycle is similar, namely, inhibiting the host cell immune response and facilitating expression of viral proteins. Disabling nsp1 activity is therefore a promising strategy for COVID-19 therapy. In addition, only very few mutations have been identified in the SARS-CoV-2 nsp1 sequence across viral strains [12]. Hence, the leader protein is a promising drug target that is unlikely to escape therapeutic approaches by mutation.

A virtual screening of potential drug molecules assessing their stability in complex with nsp1 by molecular dynamics (MD) simulations identified a total of 16 compounds with a high affinity independent of protein conformation. Three molecules stood out with the most favourable energy scores (ranging from ‒6.9 to ‒10.4 kcal mol‒1), namely tirilazad (DB13050), phthalocyanine (DB12983) and Zk-806450 (DB2112). These three even exceeded the calculated binding affinity of alisporivir and cyclosporine, two compounds that have previously been identified as potent nsp1 inhibitors in SARS-CoV-1. The study concludes that the specific binding pockets for these drug molecules in SARS-CoV-2 nsp1, which are all located in the protein’s C-terminal domain, do not naturally occur in the human proteome, making severe side effects of medication with the compounds unlikely. However, inferences of this in silico investigation for actual biological systems are so far speculative, necessitating further analysis in vitro. [4]

Another molecular dynamics study screened more than 5,000 phytochemicals traditionally used in drug development for the stability of their interaction with the inter-motif loop (K164 and H165) in nsp1’s C-terminal domain that is responsible for the protein’s ribosomal affinity. Additionally, density functional theory (DFT) calculations were performed to support the results. The five most stable complexes (with energy scores ranging from 9.63 to ‒8.75 kcal mol‒1) were formed with the compounds dihydromyricetin, 10-demethylcephaeline, dihydroquercetin, pseudolycorine and tricetin. All of these molecules are currently being assessed as drugs for various diseases, including diabetes mellitus, hepatitis and cancer. This may speed up the preparation required for clinical studies in the context of COVID-19 [22].

A recent screening study identified two binding sites on the N-terminal domain of nsp1 for a total of five ligand fragments containing phenyl rings alongside, in three cases, heterocyclic five-rings [17]. The first binding site identified in the study consists of β1, β7 and several amino acids from α1, while the second, more shallow binding site is located between α1, β6, β7 and η1. The screening was performed on nsp110–126 crystals, interactions between nsp1 N-terminal domain and the ligands involved additional copies of the protein (PDB entries: 8AZ9, 8AYS, 8ASQ, 8AZ8 and 8AYW). The binding pockets are not found in SARS-CoV-1 and MERS [17].

Another study reports a total of four putative binding sites on the surface of nsp1’s N-terminal domain, a result obtained from an analysis based on unbiased MD simulation [37]. Crystallographic experiments were unable to confirm these findings, as the binding sites were at least partially blocked by crystal packing.

Nsp1’s interaction with the 5′ untranslated region  of viral mRNA could be changed to obtain an attenuated live vaccine [1]. Studies to create such an attenuated viral strain by purposeful deletions in nsp1’s sequence have been successfully conducted with the protein’s MHV (mouse hepatitis virus) homologue [38].

Summary

Leader protein nsp1 is encoded at the 5′ terminus of ORF1a of the SARS-CoV-2 genome and is the first viral protein to be expressed by an infected host cell [7]. The relatively short protein is present in a subset of betacoronaviruses, and even though a protein with a similar secondary structure fulfilling similar roles is part of the proteome of certain alphacoronaviruses—hence also called nsp1—, the sequences of the two proteins called nsp1 show very little homology (for instance, porcine epidemic diarrhea virus PEDV and SARS-CoV-1 nsp1 share a sequence identity of only 12%) [12]. As an important virulence and pathogenicity factor, nsp1 is responsible for abolishing the host’s protein biosynthesis by deactivating ribosomes and inducing endonucleolytic cleavage of capped mRNA. SARS-CoV-2’s own viral mRNA is unsusceptible to these cleavage events facilitated by the nsp1–40S complex. Thereby, nsp1 affects the host cell’s transcriptome and promotes the expression of viral components while at the same time downregulating all other processes involving protein synthesis, including various immune response pathways [1].

Discussion & outlook

With its critical role of inhibiting host-innate protein expression while simultaneously facilitating translation of the viral genome, nsp1 is very important for the SARS-CoV-2 viral infection cycle. Despite being a rather small protein comprising only 180 amino acids in two distinct domains, its full-length structure has not yet been solved. This is due to the C-terminal domain being primarily disordered in absence of a binding partner, and its mechanisms of interaction with the host cell translation machinery not yet completely being understood [12]. Nonetheless, recent studies offer insights into possible interactions between nsp1 and the human ribosome and shed light on potential mechanisms they are involved in [21].Studies have shown that nsp1 is responsible for inhibition of major immune response pathways, downregulating expression of several interferons and cytokines. It has also been suggested to promote expression of certain immune response factors, hence supporting the cytokine storm that is responsible for cases of organ failures. leading to many COVID-19 fatalities [39]. Understanding these processes on a molecular level is crucial to prevent life-threatening cytokine storms in COVID-19 patients [33].

The characteristic structure of nsp1 allows it to engage in very specific interactions that are indispensable for the viral life cycle [6]. Its sequence is highly conserved between different strands of SARS-CoV-2, exhibiting a remarkably low mutation rate [22]. Even though not all mechanisms of the protein are fully understood, these properties make it a promising drug target, allowing for precise inhibition of crucial virulent processes with low risk of severe side effects or drug evasion via mutation.

In silico screenings have identified a number of compounds that show favourable binding affinities to nsp1 and, hence, come into consideration as potential inhibitors. However, further investigation on actual biological systems in vitro or in vivo is necessary [4].

Remaining questions

SARS-CoV-2 leader protein nsp1 is known to interfere with various important physiological processes in infected host cells on an immediate molecular level, including gene expression at the nucleus [25], ribosomal RNA translation [7] and enzymatic degradation of host-innate mRNA [24]. Additionally, it has affinity towards ribosomes and nuclear transcription agents, as well as the DNA polymerase α complex [36]. While all of these interactions contribute to the virulence of SARS-CoV-2, it remains unclear which is the most crucial factor. Structural data exist for a number of ribosomal complexes [6], however, the mechanisms and implications of these processes remain mostly speculative. As similar interactions have been observed for other disease-causing viruses including [26], but not limited to other coronaviruses [27], thorough investigation of these processes promises to enable insights even beyond the scope of the immediate threat of the COVID-19 pandemic.

Acknowledgements

This work was supported by the German Federal Ministry of Education and Research under Grant 05K19WWA and 05K22GU5; and Deutsche Forschungsgemeinschaft under Grant TH2135/2-1. The authors would also like to thank Rosemary Wilson for support and discussion and Gianluca Tomasello for his assistance with the reviews’ illustrations. All figures are courtesy of the Coronavirus Structural Task Force, which retains copyright for both the text and the figures.

References

[1] Vankadari N, Jeyasankar NN, Lopes WJ. Structure of the SARS-CoV-2 Nsp1/5-untranslated
region complex and implications for potential therapeutic targets, a vaccine, and virulence. J
Phys Chem Lett. 2020;11:9659–9668. doi:10.1021/acs.jpclett.0c02818
[2] Anwar MU, Adnan F, Abro A, et al. Combined deep learning and molecular docking
simulations approach identifies potentially effective FDA approved drugs for repurposing
against SARS-CoV-2. Comput Biol Med. 2020 [cited 2022 Apr 13];141:105049. doi:10.1016/j.
compbiomed.2021.105049. Available from: https://chemrxiv.org/engage/chemrxiv/articledetails/
60c74a96f96a007f8228747a.
[3] Schubert K, Karousis ED, Jomaa A, et al. SARS-CoV-2 Nsp1 binds the ribosomal mRNA
channel to inhibit translation. Nat StructMol Biol. 2020;27:959–966. doi:10.1038/s41594-020-
0511-8
[4] de Lima Menezes G, da Silva RA. Identification of potential drugs against SARS-CoV-2 nonstructural
protein 1 (nsp1). J Biomol Struct Dyn. 2021;39:5657–5667. doi:10.1080/07391102.
2020.1792992
[5] Yoshimoto FK. The proteins of severe acute respiratory syndrome coronavirus-2 (SARS CoV-2
or n-COV19), the cause of COVID-19. Protein J. 2020;39:198–216. doi:10.1007/s10930-020-
09901-4
[6] Thoms M, Buschauer R, Ameismeier M, et al. Structural basis for translational shutdown
and immune evasion by the Nsp1 protein of SARS-CoV-2. Science. 2020;369:1249–1255.
doi:10.1126/science.abc8665
[7] Yuan S, Peng L, Park JJ, et al. Nonstructural protein 1 of SARS-CoV-2 is a potent
pathogenicity factor redirecting host protein synthesis machinery toward viral RNA.Mol Cell.
2020;80:1055–1066.e6. doi:10.1016/j.molcel.2020.10.034
16 J. KAUB ET AL.
[8] Tirumalai MR, Rivas M, Tran Q, et al. The peptidyl transferase center: a window to the past.
Microbiol Mol Biol Rev. 2021; 85:e00104-21.
[9] Khatter H, Myasnikov AG, Natchiar SK, et al. Structure of the human 80S ribosome. Nature.
2015;520:640–645. doi:10.1038/nature14427
[10] Altschul SF, Madden TL, Schäffer AA, et al. Gapped BLAST and PSI-BLAST: a new
generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402.
doi:10.1093/nar/25.17.3389
[11] Connor RF, Roper RL. Unique SARS-CoV protein nsp1: bioinformatics, biochemistry and
potential effects on virulence. TrendsMicrobiol. 2007;15:51–53. doi:10.1016/j.tim.2006.12.005
[12] MinY-Q,MoQ,Wang J, et al. SARS-CoV-2nsp1: bioinformatics, potential structural and functional
features, and implications for drug/vaccine designs. Front Microbiol. 2020;11:587317.
doi:10.3389/fmicb.2020.587317
[13] Almeida MS, Johnson MA, Herrmann T, et al. Novel β-barrel fold in the nuclear magnetic
resonance structure of the replicase nonstructural protein 1 from the severe acute respiratory
syndrome coronavirus. J Virol. 2007;81:3151–3161. doi:10.1128/JVI.01939-06
[14] Jansson AM. Structure of alphacoronavirus transmissible gastroenteritis virus nsp1 has
implications for coronavirus nsp1 function and evolution. J Virol. 2013;87:2949–2955.
doi:10.1128/JVI.03163-12
[15] Shen Z, Ye G,Deng F, et al. Structural basis for the inhibition of host gene expression by porcine
epidemic diarrhea virus nsp1. J Virol. 2018;92:e01896-17.
[16] Clark LK, Green TJ, Petit CM. Structure of nonstructural protein 1 fromSARS-CoV-2. J Virol.
2021;95:e02019-20.
[17] Ma S, Damfo S, Lou J, et al. Two ligand-binding sites on SARS-CoV-2 non-structural
protein 1 revealed by fragment-based X-ray screening. Int J Mol Sci. 2022;23:12448.
doi:10.3390/ijms232012448
[18] Wang Y, Kirkpatrick J, zur Lage S, et al. 1H, 13c, and 15N backbone chemical-shift assignments
of SARS-CoV-2 non-structural protein 1 (leader protein). Biomol NMR Assign.
2021;15:287–295. doi:10.1007/s12104-021-10019-6
[19] Croll T, Diederichs K, Fischer F, et al. Making the invisible enemy visible [Internet]. bioRxiv;
2020 [cited 2022 Jun 2]. Available from: https://www.biorxiv.org/content/10.11012020.10.07.
307546v2.
[20] Agback T, Dominguez F, Frolov I, et al. 1 H, 13 C and 15 N resonance assignment of the SARSCoV-
2 full-length nsp1 protein and its mutants reveals its unique secondary structure features
in solution. PLoS One. 2021;16(12):e0251834. doi:10.1371/journal.pone.0251834
[21] Wang Y, Kirkpatrick J, zur Lage S, et al. Structural insights into the activity regulation
of full-length non-structural protein 1 from SARS-CoV-2. Structure. 2023;31:128–137.e5.
doi:10.1016/j.str.2022.12.006
[22] PrabhuD, Rajamanikandan S, SureshanM, et al.Modelling studies reveal the importance of the
C-terminal intermotif loop of NSP1 as a promising target site for drug discovery and screening
of potential phytochemicals to combat SARS-CoV-2. JMol GraphicsModell. 2021;106:107920.
doi:10.1016/j.jmgm.2021.107920
[23] Shi M, Wang L, Fontana P, et al. SARS-CoV-2 Nsp1 suppresses host but not viral translation
through a bipartite mechanism[Internet]. Cell Biol. 2020 [cited 2022 Apr 13]. Available from:
http://biorxiv.org/lookup/doi/10.11012020.09.18.302901.
[24] Huang C, Lokugamage KG, Rozovics JM, et al. SARS coronavirus nsp1 protein induces
template-dependent endonucleolytic cleavage of mRNAs: viral mRNAs are resistant to nsp1-
induced RNAcleavage. PLoS Pathog. 2011;7:e1002433. doi:10.1371/journal.ppat.1002433
[25] Zhang K, Miorin L, Makio T, et al. Nsp1 protein of SARS-CoV-2 disrupts the mRNA export
machinery to inhibit host gene expression. Sci Adv. 2021;7:eabe7386. doi:10.1126/sciadv.
abe7386
[26] Narayanan K, Huang C, Lokugamage K, et al. Severe acute respiratory syndrome coronavirus
nsp1 suppresses host gene expression, including that of type I interferon, in infected cells. J
Virol. 2008;82:4471–4479. doi:10.1128/JVI.02472-07
CRYSTALLOGRAPHY REVIEWS 17
[27] York A, Fodor E. Biogenesis, assembly, and export of viralmessenger ribonucleoproteins in the
influenza A virus infected cell. RNA Biol. 2013;10:1274–1282. doi:10.4161/rna.25356
[28] Xia H, Cao Z, Xie X, et al. Evasion of type I interferon by SARS-CoV-2. Cell Rep.
2020;33:108234. doi:10.1016/j.celrep.2020.108234
[29] Lei X, Dong X, Ma R, et al. Activation and evasion of type I interferon responses by SARSCoV-2
[30] Liao Y, Li X, Mou T, et al. Distinct infection process of SARS-CoV-2 in human bronchial
epithelial cell lines. J Med Virol. 2020;92:2830–2838. doi:10.1002/jmv.26200
[31] Alcorn JF. IL-22 plays a critical role in maintaining epithelial integrity during pulmonary
infection. Front Immunol. 2020;11:1160. doi:10.3389/fimmu.2020.01160
[32] Costela-Ruiz VJ, Illescas-Montes R, Puerta-Puerta JM, et al. SARS-CoV-2 infection: the
role of cytokines in COVID-19 disease. Cytokine Growth Factor Rev. 2020;54:62–75.
doi:10.1016/j.cytogfr.2020.06.001
[33] Fara A, Mitrev Z, Rosalia RA, et al. Cytokine storm and COVID-19: a chronicle of proinflammatory
cytokines. Open Biol. 2020;10:200160. doi:10.1098/rsob.200160
[34] Gordon DE, Hiatt J, Bouhaddou M, et al. Comparative host-coronavirus protein interaction
networks reveal pan-viral disease mechanisms. Science. 2020;370:eabe9403. doi:10.1126/
science.abe9403
[35] Chen Z, Wang C, Feng X, et al. Interactomes of SARS-CoV-2 and human coronaviruses
reveal host factors potentially affecting pathogenesis. EMBO J. 2021;40:e107776.
doi:10.15252/embj.2021107776
[36] Kilkenny ML, Veale CE, Guppy A, et al. Structural basis for the interaction of SARS-CoV-
2 virulence factor nsp1 with DNA polymerase α–primase. Protein Sci. 2022;31:333–344.
doi:10.1002/pro.4220
[37] Borsatto A, Akkad O, Galdadas I, et al. Revealing druggable cryptic pockets in the
Nsp1 of SARS-CoV-2 and other β-coronaviruses by simulations and crystallography. eLife.
2022;11:e81167. doi:10.7554/eLife.81167
[38] Züst R, Cervantes-Barragán L, Kuri T, et al. Coronavirus non-structural protein 1 is a major
pathogenicity factor: implications for the rational design of coronavirus vaccines. PLoS Pathog.
2007;3:e109. doi:10.1371/journal.ppat.0030109
[39] Hojyo S, Uchida M, Tanaka K, et al. How COVID-19 induces cytokine storm with high
mortality. Inflamm Regen. 2020;40:37. doi:10.1186/s41232-020-00146-3
[40] Zhao K, Ke Z, Hu H, et al. Structural basis and function of the N terminus of SARS-CoV-2
nonstructural protein 1. Microbiol Spectr. 2021;9:e00169-21.
[41] Semper C, Watanabe N, Savchenko A. Structural characterization of nonstructural protein 1
from SARS-CoV-2. iScience. 2021;24:101903. doi:10.1016/j.isci.2020.101903
[42] Ma S, Mykhaylyk V, Bowler MW, et al. High-confidence placement of fragments into electron
density using anomalous diffraction – a case study using hits targeting SARS-CoV-2
non-structural protein 1. Int J Mol Sci. 2023;24(13):11197. doi:10.3390/ijms241311197

This article has been written for the journal Crystallographic Reviews and has been published on 11 June 2022 online. For additional information, data and tables, please see the full publication: https://doi.org/10.1080/0889311X.2022.2065270.

Abstract

The SARS-CoV-2’s endoribonuclease (NendoU) nsp15, is an Mn2+ dependent endoribonuclease specific to uridylate that SARS-CoV-2 uses to avoid the innate immune response by managing the stray RNA generated during replication. As of the writing of this review 20 structures of SARS-CoV-2 nsp15 have been deposited into the PDB, largely solved using X-ray crystallography and some through Cryo-EM. These structures show that a nsp15 monomer consist of three conserved domains, the N-terminal oligomerization domain, the middle domain, and the catalytic NendoU domain. Enzymatically active nsp15 forms a hexamer through a dimer of trimers (point group 32), whose assembly is facilitated by the oligomerization domain. This review summarises the structural and functional information gained from SARS-CoV-2, SARs-CoV and MERS-CoV nsp15 structures, compiles the current structure-based drug design efforts, and complementary knowledge with a view to provide a clear starting point for downstream structure users interested in studying nsp15 as a novel drug target to treat COVID-19.

Introduction

SARS-CoV-2 is a nidovirus with a non-segmented positive-sense RNA genome, meaning the RNA genome is read from 5′ to 3′ and can be directly translated into viral proteins; it’s effectively messenger RNA. The RNA genome of SARS-CoV-2 is one of the largest RNA genomes among RNA viruses [1], comprised of a replicase gene which encodes non-structural proteins (nsps), structural proteins, and accessory proteins. The genome can produce two different polyprotein chains through a ribosomal frameshift [2] (ORF1a and ORF1b). Once translated, these polyproteins are cleaved by one of the two encoded proteases (3C-like protease (nsp5) or papain-like-protease (nsp3)) to yield between 15 and 16 non-structural proteins, which assemble into a large membrane-bound replicase complex (RTC).

One of these non-structural proteins is nsp15, a 346 amino acid nidoviral RNA uridylate‐specific and Mn2+-dependent [3] endoribonuclease (NendoU). Its gene is found towards the end of the non-structural proteins in the SARS-CoV-2 genome on ORF1b (bases 6453 to 6798[4]). Nsp15 preferentially cleaves the 3′ end of uridine, producing a 2′‐3′ cyclic phosphodiester and 5′‐hydroxyl terminus [5] (Figure 1). Nsp15 is conserved across coronavirus family members [6], to the point where it has been proposed as a universal genetic marker to distinguish nidoviruses [7] from all other RNA virus families. Although highly conserved (88% sequence identity with SARS-CoV-2, 50% with MERS, and 43% with HCoV-229E), nsp15 has been found to be non-essential for viral replication in Mouse Hepatitis Virus [6] (MHV), SARS-CoV, and HCoV-229E. Some nsp15 mutations completely abolished RNA synthesis; however, these mutations resulted in misfolded and insoluble nsp15 when expressed in E. coli[6]. As a result, the loss of RNA synthesis is thought to be a knock-on effect on neighbouring polyprotein components that are critical for replication, as opposed to a genuine effect on viral replication through lack of nsp15 [6]. Further evidence of nsp15’s non-essential role in viral replication comes from insect nidoviruses and invertebrate roniviruses, which completely lack EndoU activity [8,9].

RNA Cleavage as performed by nsp15 to give a 2′‐3′ cyclic phosphodiester and 5′‐hydroxyl terminus from an RNA nucleotide phosphodiester from PDB entry 1RNA
Figure 1: RNA Cleavage as performed by nsp15 to give a 2′‐3′ cyclic phosphodiester and 5′‐hydroxyl terminus from an RNA nucleotide phosphodiester from PDB entry 1RNA. Figure created using Protein Imager [10].

Although not essential for viral replication, recent studies suggest nsp15 plays a role in repressing activation of the host innate immune response [11–13]. During viral replication, positive-sense RNA is translated to produce the viral replication complex, which replicates the positive-sense RNA to produce negative-sense RNA. The negative-sense RNA then acts as a template to produce new positive-sense genomic RNA and subgenomic RNA. Subgenomic RNA consists of smaller transcribed sections of RNA produced by initiating transcription in the middle of the template strand (internal initiation), falling off the template strand before reaching the 5’ stop codon (premature termination), or by jumping off the template strand and reinitiating transcription further down the template (discontinuous transcription). This process produces short and long double-stranded RNA intermediates with polyuridine tracts at the 5′ end which can be recognized by pattern recognition receptors in the host cell such as RIG-I-like receptors (RLRs), protein kinase R (PKR), oligoadenylate synthases (OASes), and melanoma differentiation-associated gene 5 (MDA5). These sensors promote an innate and antiviral immune response [11,14,15] by activating the type I and III interferon (IFN) response, which induces expression of interferon -stimulated genes through the signal transducer and activator of transcription proteins 1 and 2 (STAT1/2) signaling pathways. By cleaving the 5′-polyuridine tracts in negative-sense viral RNA, nsp15, along with nsp16 and nsp10, limit the accumulation of MDA5-dependent pathogen-associated molecular patterns to delay the host’s immune response [16]. Loss of nsp15 activity has been shown to activate the interferon response and reduce viral titers in piglets infected with nsp15-deficient porcine epidemic diarrhea coronavirus (PEDV) [17] and mice infected with nsp15-deficient Mouse Hepatitis Virus [11]. It has also been demonstrated that nsp15 plays a role in disrupting formation of autophagosomes, which are double-membraned vesicles containing cellular material to be degraded.  

Structural overview

SARS-CoV-2 nsp15 consists of an N-terminal oligomerisation domain (Figure 2, blue), a middle domain (Figure 2, purple), and the catalytic C-terminal NendoU domain (Figure 2, teal). The Oligomerisation domain is formed from an anti-parallel β-sheet (β1-3) which wraps around helices α1 and α2. The middle domain consists of three β-hairpins (β5-6, β7-8, and β12-13), a mixed β-sheet (β4, β9, β10, β11, and β15), 2 α-helices (α3 and α4), and a right-handed 310 helix (η4).  The catalytic NendoU domain contains two anti-parallel β-sheets (β16-18 and β19-21) which form a concave surface flanked by five α-helices (α6, α7, α8, α9, and α10). SARS-CoV-2 nsp15 shows high sequence identity with SARS-CoV nsp15 (88%) and lower sequence identity with MERS-CoV (51%), however the overall structural similarity is very high between the three viruses [1]. Three structures have been solved for SARS-CoV nsp15 (PDB entries 2H85 [18], 2OZK [19], and 2RHB [20]) one structure of MERS nsp15 (PDBID: 5YVD [21]), two structures from mouse hepatitis virus (2GTH and 2GTI [3]), and one structure from human coronavirus 229E (PDB entry 4S1T). As of writing this review 20 structures of SARS-CoV-2 nsp15 have been solved with a variety of bound ligands using X-ray crystallography and cryo-EM (Table 1) [1,22,23].

Crystal structure of the nsp15 monomer using PDB entry 6X4I
Figure 2: Crystal structure of the nsp15 monomer represented as a transparent surface and cartoon (left) and as a cartoon (right) coloured by domain using PDB entry 6X4I. The Figure was created using Protein Imager [10]

The biological assembly of nsp15 is a double-ringed hexamer made up of a dimer of trimers (point group 32, Figure 3). The trimeric form retains some ribonuclease activity, but the monomer presents with only residual cleavage [24] The hexamer is stabilised by an N-terminal oligomerisation domain present in each monomer. A crystal structure from SARS-CoV with a 28 amino acid N-terminal truncation (PDB entry 2H85) presented with a misfolded endoU active site, suggesting oligomerisation may act as an allosteric activation switch [19]. The six monomers come together to form the active enzyme with a 100 Å long negatively charged channel 10 to 15 Å wide open to solvent at the top, bottom, and on three separate side openings in the middle of the hexamer. Formation of the hexamer is essential for enzymatic activity, making the oligomerisation interfaces a potential target for structure-based drug design.

The active site of nsp15 is an electropositive pocket which lies at the interface between each monomer’s NendoU domain. The active site is highly conserved between SARS-CoV-2, SARS-CoV, and MERS proteins. Six key residues (His235, His250, Lys290, Thr341, Tyr343, and Ser294) are arranged in a shallow groove in the N-terminal NendoU domain [1]. His235, His250, and Lys290 are proposed to act as a catalytic triad, using a similar mechanism to that observed in RNase A [23]. However, RNase A is metal-independent, while SARS-CoV-2 nsp15 is Mn2+ dependent, so the mechanism is not an exact match. Mutation of either histidine in the catalytic triad to alanine eliminates RNA cleavage activity in nsp15 but has no effect on the formation of stable hexamers, showing they are not a factor in nsp15 oligomerisation [22]. This is unsurprising, as the N-terminal oligomerisation domain is the key player in the formation of the hexamer, but formation of the hexamer clearly plays an allosteric role in the formation of the active site, as activity in the monomer is significantly reduced.

Uracil specificity is proposed to be governed by Ser294 [20], with the main chain nitrogen of Ser294 predicted to interact with the carbonyl O2 oxygen of uracil and the hydroxyl group of Ser294 binding to uracil N3. However, mutation studies on homologs have shown that a Ser294Ala mutation significantly decreased activity without completely abolishing it[18] and negates uridine specificity. Tyr343 is likely important in governing uracil specificity, as shown by van der Waals stacking between the ribose sugar or Uridine and Tyr343 in cryo-EM structures [20,21]. Mutation of Tyr343 equivalent residues in SARS-CoV and MERS to alanine caused near complete loss of nuclease activity [20,21], suggesting a key role in enzymatic activity.

The Structure of the nsp15 hexamer generated by crystallographic symmetry using PDB entry 6X4I
Figure 3: The Structure of the nsp15 hexamer generated by crystallographic symmetry using PDB entry 6X4I.  On the left-hand side, the nsp15 hexamer is represented as a transparent surface and cartoon from a side-on view. On the right-hand side, the hexamer has been rotated 90 degrees towards the reader to give a top-down view looking down the 10-15 Å wide channel. The hexamer is coloured by trimer with trimer 1 in blue, with 1 light blue monomer, and trimer two in teal. The figure was created using the Protein Imager [10].

The structure of SARS-CoV-2 nsp15 has been solved in the presence of various catalytic intermediates, including 5′UMP (PDB entry: 6WLC), 3′UMP (PDB entry: 6X4I), 5′GpU (PDB entry: 6X1B), and uridine 2′,3′-vanadate (PDB entry: 7K1L). All intermediates bound to the C-terminal catalytic domain, interacting with the seven conserved active site residues (His235, His250, Lys290, Trp333, Thr341, Tyr343, Ser294, Gly248, Lys345, and Val292) and the structures showed no significant conformational deviations from each other (Cα RMSD = 0.29 Å). The uracil moiety of 5′UMP, guanylyl(3’-5’)uridine (GpU), and uridine 2′,3′-vanadate are all bound by Ser294 and Leu346 (Figure 4 Top left, bottom left, and bottom right, respectively), reinforcing the idea of uracil recognition being mediated by these residues. The combination of these structures confirms the predicted parallels between the reaction mechanism of SARS-CoV-2 nsp15 and RNAse A. The 5′UMP, 5′GpU, and uridine 2′-,3′-vanadate bound structures support the previously proposed hypothesis about uracil and purine base discrimination with Ser294 playing a key role [23]. Contrary to this finding, the 3′UMP bound structure shows the uracil base forming a stacking interaction with Trp333 (Figure 4, top right), the guanine binding site identified in the 5′GpU complex, suggesting nsp15’s active site can accommodate both purine and pyrimidine bases. However, the Trp333 interacting base is likely less relevant when binding larger RNA molecules as it provides a potential stacking interaction for bases without selectivity [23].

SARS-CoV 2 nsp15 active site crystal structures with bound reaction intermediates
Figure 4: SARS-CoV 2 nsp15 active site crystal structures with bound reaction intermediates. 5′UMP (PDB entry: 6WLC) in the top left, 3′UMP (PDB entry: 6X4I) in the top right, 5′GpU (PDB entry: 6X1B) in the bottom left, and the cyclic intermediate mimic uridine 2′,3′-vanadate (PDB entry: 7K1L) in the bottom right. Proteins are coloured in teal and represented as a cartoon with active site residues and bound ligands represented as sticks. Bound ligands are coloured white. This figure was made using Protein Imager [10].

Comparison of these ligand-bound structures with RNase A catalytic sites suggests nsp15 acts through a similar reaction mechanism [23]. Based on these findings a two-step mechanism has been proposed starting with a transphosphorylation reaction whereby His250 acts as a base and deprotonates 2′OH of the RNA ribose, with Lys290 stabilising the negative charge that builds up during the transition state. His235 then acts as a general acid donating a proton for the departing 5′OH group. This is followed by a hydrolysis step where the roles of His250 and His235 are reversed, with His235 deprotonating a water molecule and His250 acing as a proton donor for the 5′OH leaving group to convert the 2′-3′ cyclic phosphate back to 2′OH and a 3′-phosphoryl group. Despite the similar mechanisms, the structural environments of His235 in nsp15 and the RNase A equivalent (His119) differ significantly, with the residues being ~8 Å apart and making several different hydrogen bonding interactions. These differences may provide an answer as to why nsp15 is much more sensitive to pH change compared to Rnase A [22]. What remains unclear is the contribution of Mn2+ to the reaction mechanism, particularly as an Mn2+ binding site has not been located in SARS-CoV-2 nsp15 [22].

Therapeutic interest of the protein

As previously mentioned, knockout studies on nsp15 have shown it is not essential for viral replication. Despite this, a nsp15 inhibitor could provide an effective treatment against SARS-CoV-2 by hampering its evasion and modulation of the innate immune response to help promote longer-lasting immunity. Targeting nsp15 is particularly interesting as nsp15 has no close human homologues [25], thereby potentially reducing harmful side effects. A number of biochemical assays have been performed on nsp15 to screen previously approved drugs and various libraries for inhibition of nsp15, as well as a number of in-silico studies to dock approved therapeutics to guide drug design efforts. A fragment screening study has also been performed that yielded 6 small molecule fragments.

Benzopurpurin B, C-473872 (CAS registry number: 331675-78-6), and Congo Red, as well as small molecular Rnase A inhibitors, have been shown to inhibit nsp15 activity and reduce infectivity of SARS-CoV in Vero cells [26] but further testing on SARS-CoV-2 nsp15 is required. Additionally, nsp15 has been screened against the ReFrame [27], Pandemic Response Box (Medicines for Malaria Venture (MMV) & Drugs for neglected disease initiative (DNDi)), and Covid Box drug repurposing libraries for 50% inhibition below concentration of 10 µM, identifying 23, 1, and 0 hits respectively from the libraries [25]. Two fluorescence resonance energy transfer (FRET) assays to determine the half-maximal inhibitory concentration (IC50) reduced the hits to 12 (11 in ReFrame, 1 in Pandemic Response Box), which were whittled down to 3 (Exebryl-1, Piroxantrone, and MMV1580853) after 9 were identified as false positives due to the production of reactive oxygen species such as H2O2, which destabilized protein in the assay. Ligand binding was assessed using high resolution mass spectrometry. Piroxantrone and MMV1580853 showed significantly weaker binding and ultimately no antiviral activity in SARS-CoV-2 assays. Exebryl-1 bound with an affinity constant Kd of ~12 µM per monomer in the first instance, with approximately four molecules binding to one monomer on average per 100 µM Exebryl-1; and molecular docking of Exebryl-1 against PDB entry 6XDH using an automated Qvina docking workflow [28] showed binding in a pocket close to and within the active site. Exebryl-1 demonstrated antiviral activity in three separate assays at concentrations over 10 µM. However, based on blood plasma levels in Sprague-Dawley rats after an oral dose of 100 mg/kg reaching only 9 µM after 1 hour, and dropping to 4 µM after 4 hours, Exebryl-1 is not expected to reach therapeutic levels in its current state [25].

A repurposed colorectal cancer drug, Tipiracil, has been found to partially inhibit nsp15 activity in biochemical assays. However, the efficacy is greatly decreased in the presence of increased Mn2+ concentrations. A structure of nsp15 with Tipiracil interacting with the uridine binding pocket has also been solved (PDB entry: 6WXC), with its uracil ring stacking against Tyr341 and forming several hydrogen bonds with Ser294, Lys345, and His250 (Figure 5) as well as several interactions with other active site residues through water and phosphate molecules. The only unique interaction for this ligand is between the Iminopyrrolidin nitrogen of Tipiracil and Gln245 (Figure 5). Although not an immediate treatment option, the uracil derivative drug provides a potential scaffold for further SARS-CoV-2 nsp15 inhibitor development [23]. Based on Tipiracil binding at the active site a library of 85 flavinoid compounds were docked using the molecular mechanics/generalized Born surface area (MMGBSA) method and molecular dynamics with nsp15 (PDB entry 6WXC) as part of an in-silico study; but binding was found to be significantly weaker than Tipiracil in all cases [29].

SARS-CoV 2 nsp15 active site crystal structures with bound Tipiracil from PDB entry 6WXC
Figure 5: SARS-CoV 2 nsp15 active site crystal structures with bound Tipiracil from PDB entry 6WXC. The protein is coloured in teal and represented as a cartoon with active site residues and bound Tipiracil represented as sticks. Tipiracil is coloured white. This figure was made using Protein Imager [10].

Fragment screens have been performed on nsp15, with six structures currently available in the PDB without an accompanying publication. In addition to the soaked fragments present in these structures, all show a citrate molecule bound to the catalytic NendouU domain (Figure 6, CIT), with one fragment bound adjacent to citrate (PDB entry 5S70, Figure 6, EN300-181428 (WUS)) through a stacking interaction with Trp333 and a hydrogen bond between the NO3 hydrogen of EN300-181428 and O5 of the citrate molecule. Four fragments are bound at the interface between the middle domain (Figure 6, purple) and the N-terminal oligomerisation domain (Figure 6, blue), including FUZS-5 (PDB entry 5S71, Figure 6, WUV) Z2889976755 (PDB entry 5S6X, Figure 6, WUG), BBL029427 (PDB entry 5S72, Figure 6, WUY), and PB2255187532 (PDB entry 5S6Z, Figure 6, WUM). Finally, BBL029427 (PDB entry 5S6Y, Figure 6, WUJ) is bound to a loop connecting beta strands in the middle domain.  Unfortunately, the crystal packing in these structures prevents the formation of the active double-ringed hexamer structure using symmetry related molecules, making it difficult to assess how the fragments interact with the active hexamer. However, this monomeric crystal form could provide a starting point for the design of a drug to break up formation of the active hexamer by interfering with surfaces on the N-terminal oligomerization domain.

Small molecule fragment screening against SARS-CoV-2 nsp15, with nsp15 represented as flatfield coloured by domain
Figure 6: Small molecule fragment screening against SARS-CoV-2 nsp15, with nsp15 represented as flatfield coloured by domain (NendoU in teal, Middle Domain in purple, and N-terminal Oligomerisation domain in blue. Fragment binding is shown as a flat field, coloured grey, with ligands represented as sticks in surrounding circles. This is a composite image of PDB entries 5S70 (EN300-181428, WUS), 5S71 (FUZS-5, WUV), 5S6X (Z2889976755, WUG), 5S72 (BBL029427, WUY), 5S6Y (BBL029427, WUJ), and 5S6Z (PB2255187532, WUM). This figure was made using Protein Imager.

Molecular docking, all-atom molecular dynamics, and an assessment of absorption, distribution, metabolism, and excretion (ADME) properties have been carried out on PDB entry 6W01 using 15 scalarane sesterterpenes, compounds purified from Red Sea marine sponges with a variety of relevant pharmacological activities.  to assess their efficacy as drug targets to inhibit nsp15 [30]. Eight compounds were found to have equivalent or better binding energies compared to the reference ligand, Benzopurpurin 4B. All eight compounds bound the C-terminal catalytic domain in the large shallow active site, forming polar interactions with the catalytic triad (His235, His250, and Lys290), interacting with Trp333 through π-stacking, and forming at least one hydrogen bond with Lys290 and further anchoring hydrogen bonds with Gly248 and/or Gln245 [30]. Two of the eight were used in all atom molecular dynamics simulations and showed good stability, high negative binding free energies, and scored well on ADME drug property predictions.

In-silico docking investigations of 32 phytochemicals from Asparagus racemous have also been performed on nsp15 (PDBID: 6W01). The top 5 ligands (Asparoside-C, Asparoside-F, Rutin, Asparoside-D, and Racemoside-A) bound at the C-terminal active site with binding free energy scores between ‒7.165 kcal/mol and ‒5.993 kcal/mol. Complexes of nsp15 and Asparoside-C, -F, and -D were subjected to further analysis by 100 ns molecular dynamics simulations, which found Asparoside-D and -F to have favorable binding interactions and better affinity than the control ligand Remdesivir [31]. 23 previously approved drugs have also been docked to nsp15, with three demonstrating high predicted binding affinities between ‒9.1 and ‒9.6 kcal/mol (Saquinavir, Aprepitant, and Valrubicin) [32]. However, the pocket Saquinavir, Aprepitant, and Valrubicin are docked to sites on the opposite side of the active site pocket which houses the catalytic triad, approximately 17 Å away. Barring an undetermined allosteric effect caused by this binding, which the paper makes no mention of, further development of these drug targets “…modifying them to fit to the SARS-CoV-2 nsp15 active site pocket precisely” needs to be rethought as the active site has not been targeted in the first instance.

Complementary knowledge

The enzymatic activity of nsp15 and its crystal structure have been demonstrated, but the exact role in viral replication remains unclear. SARS-CoV nsp15 has been shown to co-localize with replicating RNA [33] around the nucleus as well as nsp8 and nsp12 from the replication/transcription complex in in situ studies [34], in the presence and absence of RNA. It was also shown that SARS-CoV nsp15 does not co-localise with the M protein [34]. Yeast two-hybrid screens and glutathione S-transferase (GST) pulldown assays have also identified nsp8 and nsp12 as potential binding partners to SARS-CoV nsp15 [35].

Furthermore, nsp15 has demonstrated a strong inhibitory effect on interferon (IFN) production and interferon regulatory factor 3 nuclear localization in in-vitro co-expression assays against the Cantell strain of Sendai virus with nsp13, nsp14, and accessory protein ORF6 [36]. However, interferon antagonization in in-vitro conditions is not necessarily representative of real infection, individual protein expression levels can vary greatly compared to overexpression studies and altered localization can have a significant effect[36]. The individual contribution or mechanism of nsp15 interferon inhibition is not discussed by Yuen et al 2020 in this study. Overall SARS-CoV-2 appears less effective at suppressing interferon signaling compared to SARS-CoV due to the loss of  SARS-CoV-2 papain-like protease (PLpro) as an interferon antagonist [36]. Reverse genetic studies (analysis of a resulting phenotype following genetic engineering) have suggested that ORF6 is the major player in interferon suppression instead [37]. However, SARS-CoV-2 ORF6 is also less conserved between SARS-CoV and SARS-CoV-2 at only 69% sequence identity and only 4 of 10 key amino acids identified from SARS-CoV ORF6 being present in SARS-CoV-2 ORF6 [36].    

It has been shown that nsp15 activity is highly dependent on the presence of Mn2+ ions, showing greatly reduced activity in the presence of Mg2+ ions. In the presence of Mn2+ nsp15 was able to cleave all four uridine sites in an eicosamer, a 20-subunit oligomer consisting of 5′GAACU↓CAU↓GGACCU↓U↓GGCAG3′, with no preference for sequence and increased cleavage rate with rising metal ion concentration [23]. This is particularly interesting as Mn2+ enhances activity in SARS-CoV nsp15, but protein activity does not depend on the presence of Mn2+, and no metal binding sites have been identified in coronavirus structures to date [18]. Considering SARS-CoV-2 nsp15 shares 88% sequence identity with SARS-CoV nsp15, and all active site residues are conserved, SARS-CoV 2 nsp15’s dependence on Mn2+ is a significant difference between the enzymes. Further to this, nsp15 alone is promiscuous, cutting any uridine sites in RNA, but becomes site-specific when in complex with nsp8 and nsp12 and leaves uridine tails between 5 and 10 bases long [16].

A library of 5000 small molecule compounds has been screened against nsp15 for inhibition of nuclease activity, with twelve compounds showing potential as antiviral treatments in a fluorescent biochemical kinetic screen. Further analysis using a gel-based assay found only one compound, NSC95397, able to inhibit nuclease activity at a concentration of 10 µM. However, tests on SARS-CoV 2 infected VERO E6 cells found the compound toxic at concentrations above 10 µM and ineffective at inhibiting viral growth at lower concentrations [38].    

A fluorescence resonance energy transfer (FRET) assay has been performed to measure nsp15 activity on a 6-mer oligonucleotide (5′-AAAUAA) with a 5′-fluorescein and 3′-TAMRA label [21,22]. Activity is measured through an increase in fluorescence caused by the removal of the 5′-TAMRA label. Nsp15 activity was confirmed for the wild-type protein and abolished in H235A and H250A mutants [22]. FRET analysis was paired with liquid chromatography electrospray ionization mass spectrometry to demonstrate that nsp15 3′RNA products show a preference for accumulation of 2′-3′ cyclic phosphate (80%) compared to 3′-phosphate, a significant difference compared to RNAse A which generates a 2’-3’ cyclic phosphate which is then hydrolysed to a 3’-phosphate.    

Summary

SARS-CoV 2 nsp15 is an RNA uridylate‐specific Mn2+-dependent [3] endoribonuclease from the nidoviral endoU (NendoU) family, which acts on single-stranded and double-stranded RNA to help SARS-CoV-2 evade detection by the innate immune response. Knockout studies have demonstrated that nsp15 is not essential for viral replication, but numerous studies have shown a reduction in viral titre and virulence in nsp15-deficient SARS-CoV-2 when studied in the presence of an effective immune response.

The sequence of nsp15 is highly conserved between SARS-CoV-2, SARS-CoV, MERS-CoV, and HCoV-229E, as is the fold of the monomer and active hexamer. The monomer consists of three domains, the N-terminal oligomerisation domain, a middle domain, and the NendoU catalytic domain which houses the active site. The active site is a shallow groove made up of six key residues (His235, His250, Lys290, Thr341, Tyr343, and Ser294). A series of structures with different catalytic intermediates have been solved and the reaction mechanism is predicted to act in a similar manner to the well-studied RNaseA enzyme. However, nsp15’s dependence on manganese, where RNase A’s activity is metal independent, throws some aspersions on this theory.

Three in-silico drug screeningstudies have been performed on nsp15, two using 6W01 and one using 6WXC as the protein models. 6W01 is a citrate bound nsp15 structure solved to 1.9 Å resolution, with acceptable data processing and refinement statistics overall, the only minor concern is that 5% of the residues in both chains show one issue with their geometry, and a small subset of that 5% show an issue in their fit to the electron density. 6WXC is a Tipiracil bound nsp15 structure solved to 1.85 Å resolution, it faces a similar minor problem to 6W01 with 7% of residues in both chains showing one issue with their geometry but with fewer electron density fit outliers. Use of either model should present no major stumbling blocks for simulation studies.      

Discussion & Outlook

Nsp15 has been one of the lesser explored proteins compared to other SARS-CoV 2 proteins, such as the main protease and the papain-like protease, which have undergone extensive in-silico drug design studies through a number of large collaborative efforts between universities, synchrotrons, and other organizations [39–45] to feed into the COVID Moonshot project [46]. Overall, the structural work on nsp15 has been sound and all available models could provide a good starting structure for computational drug design. A series of structures with catalytic intermediates suggests a mechanism akin to RNase A, however, the dependence of nsp15 on Mn2+ suggests a departure from this mechanism as RNase A’s mechanism is metal independent. Follow up in-silico studies (described above) were based on well validated models with acceptable statistics for the resolution the structures were solved at, although none have yet pointed to a viable lead compound for clinical application. Nsp15 not being essential for viral replication makes it a much less desirable target for structure-based drug design compared to other essential viral proteins. However, the impact of nsp15 on SARS-CoV-2’s virulence by repressing the innate immune response shows a potential avenue to weaken SARS-CoV-2 through inhibition of nsp15 to allow the immune system to fight off infection before it becomes more severe.

Acknowledgements

This work was supported by the German Federal Ministry of Education and Research [grant no. 05K19WWA], Deutsche Forschungsgemeinschaft [project TH2135/2-1]. The authors would also like to thank Johannes Kaub and Rosemary Wilson for support and discussion. All figures are courtesy of the Coronavirus Structural Task Force (insidecorona.net), who retains copyright for the text and the figures..

[1]        Kim Y, Jedrzejczak R, Maltseva NI, et al. Crystal structure of Nsp15 endoribonuclease NendoU from SARS-CoV-2. Protein Science. 2020;29:1596–1605.

[2]        Cui J, Li F, Shi Z-L. Origin and evolution of pathogenic coronaviruses. Nat Rev Microbiol. 2019;17:181–192.

[3]        Ivanov KA, Hertzig T, Rozanov M, et al. Major genetic marker of nidoviruses encodes a replicative endoribonuclease. Proc Natl Acad Sci U S A. 2004;101:12694–12699.

[4]        Naqvi AAT, Fatima K, Mohammad T, et al. Insights into SARS-CoV-2 genome, structure, evolution, pathogenesis and therapies: Structural genomics approach. Biochim Biophys Acta Mol Basis Dis. 2020;1866:165878.

[5]        Bhardwaj K, Sun J, Holzenburg A, et al. RNA Recognition and Cleavage by the SARS Coronavirus Endoribonuclease. J Mol Biol. 2006;361:243–256.

[6]        Deng X, Baker SC. An “Old” protein with a new story: Coronavirus endoribonuclease is important for evading host antiviral defenses. Virology. 2018;517:157–163.

[7]        Snijder EJ, Decroly E, Ziebuhr J. Chapter Three - The Nonstructural Proteins Directing Coronavirus RNA Synthesis and Processing. In: Ziebuhr J, editor. Advances in Virus Research [Internet]. Academic Press; 2016 [cited 2022 Jan 7]. p. 59–126. Available from: https://www.sciencedirect.com/science/article/pii/S0065352716300471.

[8]        Nga PT, Parquet M del C, Lauber C, et al. Discovery of the First Insect Nidovirus, a Missing Evolutionary Link in the Emergence of the Largest RNA Virus Genomes. PLOS Pathogens. 2011;7:e1002215.

[9]        Lauber C, Ziebuhr J, Junglen S, et al. Mesoniviridae: a proposed new family in the order Nidovirales formed by a single species of mosquito-borne viruses. Arch Virol. 2012;157:1623–1628.

[10]      Tomasello G, Armenia I, Molla G. The Protein Imager: a full-featured online molecular viewer interface with server-side HQ-rendering capabilities. Bioinformatics. 2020;36:2909–2911.

[11]      Deng X, Hackbart M, Mettelman RC, et al. Coronavirus nonstructural protein 15 mediates evasion of dsRNA sensors and limits apoptosis in macrophages. PNAS. 2017;114:E4251–E4260.

[12]      Kindler E, Gil-Cruz C, Spanier J, et al. Early endonuclease-mediated evasion of RNA sensing ensures efficient coronavirus replication. PLOS Pathogens. 2017;13:e1006195.

[13]      Volk A, Hackbart M, Deng X, et al. Coronavirus Endoribonuclease and Deubiquitinating Interferon Antagonists Differentially Modulate the Host Response during Replication in Macrophages. Journal of Virology [Internet]. 2020 [cited 2022 Jan 6]; Available from: https://journals.asm.org/doi/abs/10.1128/JVI.00178-20.

[14]      Kato H, Takeuchi O, Sato S, et al. Differential roles of MDA5 and RIG-I helicases in the recognition of RNA viruses. Nature. 2006;441:101–105.

[15]      Mandilara G, Koutsi MA, Agelopoulos M, et al. The Role of Coronavirus RNA-Processing Enzymes in Innate Immune Evasion. Life (Basel). 2021;11:571.

[16]      Hackbart M, Deng X, Baker SC. Coronavirus endoribonuclease targets viral polyuridine sequences to evade activating host sensors. Proc Natl Acad Sci U S A. 2020;117:8094–8103.

[17]      Deng X, Geelen A van, Buckley AC, et al. Coronavirus Endoribonuclease Activity in Porcine Epidemic Diarrhea Virus Suppresses Type I and Type III Interferon Responses. Journal of Virology [Internet]. 2019 [cited 2022 Jan 6]; Available from: https://journals.asm.org/doi/abs/10.1128/JVI.02000-18.

[18]      Ricagno S, Egloff M-P, Ulferts R, et al. Crystal structure and mechanistic determinants of SARS coronavirus nonstructural protein 15 define an endoribonuclease family. PNAS. 2006;103:11892–11897.

[19]      Joseph JS, Saikatendu KS, Subramanian V, et al. Crystal Structure of a Monomeric Form of Severe Acute Respiratory Syndrome Coronavirus Endonuclease nsp15 Suggests a Role for Hexamerization as an Allosteric Switch. Journal of Virology [Internet]. 2007 [cited 2022 Jan 6]; Available from: https://journals.asm.org/doi/abs/10.1128/JVI.02817-06.

[20]      Bhardwaj K, Palaninathan S, Alcantara JMO, et al. Structural and Functional Analyses of the Severe Acute Respiratory Syndrome Coronavirus Endoribonuclease Nsp15*. Journal of Biological Chemistry. 2008;283:3655–3664.

[21]      Zhang L, Li L, Yan L, et al. Structural and Biochemical Characterization of Endoribonuclease Nsp15 Encoded by Middle East Respiratory Syndrome Coronavirus. Journal of Virology [Internet]. 2018 [cited 2022 Jan 6]; Available from: https://journals.asm.org/doi/abs/10.1128/JVI.00893-18.

[22]      Pillon MC, Frazier MN, Dillard LB, et al. Cryo-EM structures of the SARS-CoV-2 endoribonuclease Nsp15 reveal insight into nuclease specificity and dynamics. Nat Commun. 2021;12:636.

[23]      Kim Y, Wower J, Maltseva N, et al. Tipiracil binds to uridine site and inhibits Nsp15 endoribonuclease NendoU from SARS-CoV-2. Commun Biol. 2021;4:1–11.

[24]      Saramago M, Costa VG, Souza CS, et al. The nsp15 Nuclease as a Good Target to Combat SARS-CoV-2: Mechanism of Action and Its Inactivation with FDA-Approved Drugs. Microorganisms. 2022;10:342.

[25]      Choi R, Zhou M, Shek R, et al. High-throughput screening of the ReFRAME, Pandemic Box, and COVID Box drug repurposing libraries against SARS-CoV-2 nsp15 endoribonuclease to identify small-molecule inhibitors of viral activity. PLOS ONE. 2021;16:e0250019.

[26]      Ortiz-Alcantara J, Bhardwaj K, Palaninathan S, et al. Small molecule inhibitors of the SARS-CoV Nsp15 endoribonuclease. Virus Adaptation and Treatment. 2010;2:125–133.

[27]      Janes J, Young ME, Chen E, et al. The ReFRAME library as a comprehensive drug repurposing library and its application to the treatment of cryptosporidiosis. PNAS. 2018;115:10750–10755.

[28]      Alhossary A, Handoko SD, Mu Y, et al. Fast, accurate, and reliable molecular docking with QuickVina 2. Bioinformatics. 2015;31:2214–2216.

[29]      Mishra GP, Bhadane RN, Panigrahi D, et al. The interaction of the bioflavonoids with five SARS-CoV-2 proteins targets: An in silico study. Comput Biol Med. 2021;134:104464.

[30]      Elhady SS, Abdelhameed RFA, Malatani RT, et al. Molecular Docking and Dynamics Simulation Study of Hyrtios erectus Isolated Scalarane Sesterterpenes as Potential SARS-CoV-2 Dual Target Inhibitors. Biology (Basel). 2021;10:389.

[31]      Chikhale RV, Sinha SK, Patil RB, et al. In-silico investigation of phytochemicals from Asparagus racemosus as plausible antiviral agent in COVID-19. J Biomol Struct Dyn. 2021;39:5033–5047.

[32]      Mahmud S, Elfiky AA, Amin A, et al. Targeting SARS-CoV-2 nonstructural protein 15 endoribonuclease: an in silico perspective. Future Virol. :10.2217/fvl-2020–0233.

[33]      Shi ST, Schiller JJ, Kanjanahaluethai A, et al. Colocalization and Membrane Association of Murine Hepatitis Virus Gene 1 Products and De Novo-Synthesized Viral RNA in Infected Cells. Journal of Virology [Internet]. 1999 [cited 2022 Jan 6]; Available from: https://journals.asm.org/doi/abs/10.1128/JVI.73.7.5957-5969.1999.

[34]      Athmer J, Fehr AR, Grunewald M, et al. In Situ Tagged nsp15 Reveals Interactions with Coronavirus Replication/Transcription Complex-Associated Proteins. mBio [Internet]. 2017 [cited 2022 Jan 6]; Available from: https://journals.asm.org/doi/abs/10.1128/mBio.02320-16.

[35]      Imbert I, Snijder EJ, Dimitrova M, et al. The SARS-Coronavirus PLnc domain of nsp3 as a replication/transcription scaffolding protein. Virus Res. 2008;133:136–148.

[36]      Yuen C-K, Lam J-Y, Wong W-M, et al. SARS-CoV-2 nsp13, nsp14, nsp15 and orf6 function as potent interferon antagonists. Emerg Microbes Infect. 9:1418–1428.

[37]      Schroeder S, Pott F, Niemeyer D, et al. Interferon antagonism by SARS-CoV-2: a functional study using reverse genetics. The Lancet Microbe. 2021;2:e210–e218.

[38]      Canal B, Fujisawa R, McClure AW, et al. Identifying SARS-CoV-2 antiviral compounds by screening for small molecule inhibitors of nsp15 endoribonuclease. Biochem J. 2021;478:2465–2479.

[39]      Cantrelle F-X, Boll E, Brier L, et al. NMR Spectroscopy of the Main Protease of SARS-CoV-2 and Fragment-Based Screening Identify Three Protein Hotspots and an Antiviral Fragment. Angewandte Chemie International Edition. 2021;60:25428–25435.

[40]      Newman JA, Douangamath A, Yadzani S, et al. Structure, mechanism and crystallographic fragment screening of the SARS-CoV-2 NSP13 helicase. Nat Commun. 2021;12:4848.

[41]      Zhao Y, Du X, Duan Y, et al. High-throughput screening identifies established drugs as SARS-CoV-2 PLpro inhibitors. Protein Cell. 2021;12:877–888.

[42]      Ma C, Sacco MD, Xia Z, et al. Discovery of SARS-CoV-2 Papain-like Protease Inhibitors through a Combination of High-Throughput Screening and a FlipGFP-Based Reporter Assay. ACS Cent Sci. 2021;7:1245–1260.

[43]      Douangamath A, Fearon D, Gehrtz P, et al. Crystallographic and electrophilic fragment screening of the SARS-CoV-2 main protease. Nat Commun. 2020;11:5047.

[44]      Ahmad S, Abdullah I, Lee YK, et al. Extensive Crystallographic Fragment-Based Approach to Design SARS CoV2 3CLpro Main Protease Inhibitors and Related Metadata. 2021 [cited 2022 Jan 6]; Available from: https://chemrxiv.org/engage/chemrxiv/article-details/60c753c8469df4c73ef44e13.

[45]      Günther S, Reinke PYA, Fernández-García Y, et al. X-ray screening identifies active site and allosteric inhibitors of SARS-CoV-2 main protease. Science. 2021;372:642–646.

[46]      Consortium TCM, Achdout H, Aimon A, et al. Open Science Discovery of Oral Non-Covalent SARS-CoV-2 Main Protease Inhibitor Therapeutics [Internet]. 2021 [cited 2022 Jan 6]. p. 2020.10.29.339317. Available from: https://www.biorxiv.org/content/10.1101/2020.10.29.339317v2.

Since the outbreak of SARS-CoV-2, infection has continued to spread. At the same time, governmental agencies around the world have adjusted the rules to prevent its spread. Information sources as basis for these rules have been obtained from scientific studies, public health research and simulation tests to understand the efficiency of mask types in preventing spread of infection by SARS-CoV-2. In this article, we will look at the mask types in use today, how much they can impede viral droplets and aerosols and how the construction of different masks helps to protect us from infection by SARS-CoV-2.

SARS-CoV-2 droplet sizes and viral transmission

The SARS-CoV-2 virus can be transmitted via droplets and aerosols. 

Droplets are particles of sizes varying from 0.05 to 500 μm. They are directly emitted while breathing or talking. After being released into the air, larger droplets fall to the ground and others rapidly evaporate to form droplet nuclei less than 5 µm of size, also called aerosols, containing viruses in the range of 0.02 to 0.3 μm. Droplet nuclei can remain suspended in air for a longer time compared to large droplets and potentially contribute to airborne transmission1,2,3.

SARS-CoV-2 has been observed to be transmitted via 3 modes:4,5,6

  •   Contact transmission (usually via direct contact with infected persons, surfaces, or air)
  •   Droplet transmission over short distances when a person is close to an infected person
  •   Aerosol transmission over longer distances via inhalation of aerosols that remain airborne and travel with the air

Although maintaining a safe distance from an infected or possibly infected person will prevent viral spread via direct contact and droplet transmission, maintaining a safe distance may not be able to prevent spread of infection through airborne aerosols. This is why it becomes even more important to wear a mask.

Mask types and structure

Surgical masks, also called medical face masks or mouth-nose protection (MNS), are disposable products that are normally used in clinics or in doctor's offices on a daily basis. They are made of special plastics with multiple layers. They have a rectangular shape with wrinkles so that the mask can adapt to the face. The front (outside) is often coloured, the back (inside) is not. The masks have ear loops and a wire noseband (see Figure 1).

Due to the shape and fit of most medical face masks, some of the breathing air can flow past the edges. Especially during inhalation, unfiltered breathing air can be sucked in. Therefore, medical face masks usually offer the wearer less protection against pathogenic aerosols than particle-filtering half-masks (FFP). Medical face masks, however, can protect the mouth and nose of the wearer from pathogen transmission via direct contact, for example with contaminated hands.

Since they are medical devices, their manufacturing and distribution must be carried out in accordance with medical device law. They must therefore comply with the legal requirements and the European standard EN 14683:2019-10. Only then can manufacturers mark the medical masks with the CE mark and distribute them freely in Europe. This is subject to supervision by competent authorities7.

Surgical mask, picture taken by CSTF.
Figure 1: A surgical mask.

Particle filtering half masks / filtering facepieces (FFP) are objects of personal protective equipment (PPE) within the framework of occupational health and safety. They protect the wearer of the mask from particles, droplets, and aerosols. When worn correctly, FFP masks are tightly attached and offer external and self-protection. Since the masks are disposable products as intended by the manufacturer, they should be changed regularly and disposed of after use.

FFP masks are produced either with or without an exhalation valve. Masks without exhalation valve filter both the inhaled air and the exhaled air over the mask surface and therefore offer both self-protection and external protection. Masks with valves offer less external protection because exhaled aerosols are not intercepted by the filter material but are only slowed down and swirled to a certain extent by the valve.

Like medical face masks, FFP masks must comply with clear requirements of laws and technical standards. In particular, the filter performance of the mask material is tested with aerosols in accordance with the European standard EN 149:2001+A1:2009. FFP2 masks must filter at least 94% of the test aerosols, for FFP3 masks the minimum is even 99% . They are therefore proven to provide effective protection against aerosols. The test standard, together with the CE mark and the four-digit identification number of the notified body, is printed on the surface of the FFP mask7.

FFP2 mask, picture taken by CSTF.
Figure 2: An FFP2 mask.

Mask standards

The table below shows the currently accepted standards for masks and how they are effective in filtering out bacteria as well as particles.

Table showing Filtration Capacity of Mask Standards
Table 1: Filtration capacity of mask standards, evaluated standards include bacteria filtration efficiency (BFE), particle filtration efficiency (PFE), and penetration of filter material (PFM).

Mechanisms of protection

Masks ensure protection from viral spread in three main ways1,5:

Flow resistance inhibits the momentum of exhaled droplets and the velocity of incoming airborne aerosols. This significantly reduces the risk of infection in the vicinity of an infected person, protecting third parties as well. This is afforded by surgical masks, FFP2/N95/KN95, or better particle filtering respirator masks.

Droplet filtration blocks out large droplets via gravity sedimentation, inertial impaction, and minimizing contact of hand to mouth, nose, or other facial canals with access to the respiratory tract. It is afforded by most kinds of masks.

Aerosol filtration reduces the spread of aerosols via interception, diffusion, and electrostatic attraction. Electrostatic effects likely result in charge transfer with nanoscale aerosol particles. It is afforded by FFP2/N95/KN95 or better particle filtering respirator masks.

At small aerosol droplet sizes in the range of 0.1 to 1 μm, the mask layers prevent particles from passing mainly by blocking movement of particles with the fibers in the filter layer and, hence, not allowing diffusion. For nanometer-sized particles, which can easily slip between the openings in the network of filter fibers, electrostatic attraction is the main way by which mask layers remove low mass particles, which are attracted to and bind to the fibers. This filtering of particles by electrostatic attraction is generally most efficient at low speed of the particles such as the speed of aerosols released by breathing through a face mask.

It is important to note that openings and gaps (such as those between the mask edge and the face) can compromise the performance. Findings indicate that leakages around the mask area can reduce efficiencies by ∼50% or more, pointing out the importance of a proper “fit”8.

Although a home-made fabric mask will at least offer some degree of protection against larger droplets and prevent access to facial features, it will not be very effective in protecting against respirable particles and droplets with a diameter of 0.3 to 2 μm, as these pass through the materials largely unfiltered5.

Thus, the inhalation of droplets containing viruses can be prevented by using a tight-fitting mask with particle filtering properties (self-protection). The FFP2/FFP3 mask type is very well suited to protect people from an infection by means of aerosol even when the environment is strongly contaminated with infectious droplets5.

How does mask structure affect filter particles?

For high filtration and blocking efficiency, the construction of masks layers is very important. Factors that contribute to this efficiency are these4,8:

Movement of droplets/aerosols is directly affected by interfiber spacing of the mask material and the number of layers. Combining layers of differing fiber arrangement to form hybrid masks uses mechanical filtering and may be an effective approach.

Electrostatic interaction impeding aerosol transmission is influenced by the type of mask material. Electrostatic attraction mainly affects the removal of low mass particles, which are attracted to and bind to the fibers. Leveraging electrostatic filtering may be another effective approach8.

The SEM pictures below show the structure and construction of mask fibers and give an insight into the factors that contribute to their high filtering and blocking efficiency.

An FFP2 mask combines layers featuring different spacing and fiber network types to form hybrid masks, employing both mechanical and electrostatic filtering.

Microscopic image of FFP2 mask layers, showing different droplet sizes in comparison
Figure 3: SEM image of FFP2 filter layer fibers showing an incoming pseudo droplet and aerosol. A pseudo aerosol, shown here as a yellow dot, is bound to the mask fiber due to electrostatic attraction and, hence, cannot pass through the mask due to electrostatic filtering. A pseudo droplet shown here in blue is larger than the interfiber spacing of the mask fiber and, thus, cannot pass through the mask due to mechanical filtering. Picture: Carl Zeiss GmbH | Coronavirus Structural Task Force.

Why are FFP masks superior? 

Surgical and respiratory masks are compliant to regulations that guarantee to fulfill certain standards (cf. Table 1). The superior protection of FFP masks stems partially from its filtering layer (cf. Figure 3), using electrostatic filtration to block smaller particles (~0.1 µm).

Conclusion

While maintaining a safe distance from an infected or possibly infected person will prevent spread of infection through direct contact and droplet transmission, maintaining a safe distance may not effectively prevent the spread of infection through airborne aerosols. This is where it becomes very important to wear a mask.

Masks offer self-protection and minimize transmission of potentially infectious exhaled droplets to the surrounding atmosphere. However, in some situations like closed rooms or highly contaminated places, only masks with high blocking and filtration efficiencies will offer this kind of protection, provided they are closely fitted to prevent air from flowing around the mask edges.


The authors would like to explicitly thank Carl Zeiss GmbH, who provided the microscopic images.


References

1.        Anand, S. & Mayya, Y. S. Size distribution of virus laden droplets from expiratory ejecta of infected subjects. Sci. Rep. 10, 1–9 (2020).

2.        Chirizzi, D. et al. SARS-CoV-2 concentrations and virus-laden aerosol size distributions in outdoor air in north and south of Italy. Environ. Int. 146, 106255 (2021).

3.        Lee, B. U. Minimum sizes of respiratory particles carrying SARS-CoV-2 and the possibility of aerosol generation. Int. J. Environ. Res. Public Health 17, 1–8 (2020).

4.        Sanchez, A. L., Hubbard, J. A., Dellinger, J. G. & Servantes, B. L. Experimental study of electrostatic aerosol filtration at moderate filter face velocity. Aerosol Sci. Technol. 47, 606–615 (2013).

5.        Kähler, C. J. & Hain, R. Fundamental protective mechanisms of face masks against droplet infections. J. Aerosol Sci. 148, (2020).

6.        Oct, U. COVID-19 Scienti c Brief : SARS-CoV-2 and Potential Airborne Transmission small particles that can move through the air The term “ airborne transmission ” has a specialized meaning in public health practice respiratory microbes The epidemiology of SARS-Co. 2019–2022 (2021).

7.        https://www.bfarm.de/SharedDocs/Risikoinformationen/Medizinprodukte/DE/schutzmasken.html                       Accessed 21 April 2021.

8.        Konda, A. et al. Aerosol Filtration Efficiency of Common Fabrics Used in Respiratory Cloth Masks. ACS Nano 14, 6339–6347 (2020).


Introduction

This protein is known under many different names such as non-structural protein NSP1, leader protein, host translation inhibitor and host shutoff factor. Some of these names already tell us about the function and importance of this relatively small protein. It is found in all betacoronaviruses1 and, even though it only contains 180 amino acids2, it is indispensable for the viral life cycle and the pathogenicity of SARS-CoV-2.

It plays an important role when it comes to the point where the virus needs its own genetic information in form of a string of codons. Its mRNA is translated into the corresponding amino acids that make up the viral proteins. Translation occurs either shortly after the virus entered the host cell (see life cycle) or after the viral mRNA has been replicated (as described here).

For this process, the virus does not have its own proteins; instead, it just uses the already existing translation machinery of the host cell: the ribosomes.

As ribosomes are responsible for synthesizing proteins by translating the information on the host’s mRNA into a string of amino acids, they are an important part of human cells. They consist of ribosomal RNA (rRNA) and ribosomal proteins, which form a larger (60S) and a smaller (40S) subunit3.

Here, the NSP1 comes into play. It helps the virus hijack ribosomes and use them for the replication of its own mRNA, while the host cells translation is supressed/inhibited/shut off4.

To understand how the NSP1 is involved in all this, we will first have take a closer look at the structure of the protein.

Structural features & interaction with ribosomes

Even though the full-length structure of NSP1 is unknown so far, we know what the two individual domains (connected via a linker that is 20 amino acids long) of the SARS-CoV-2 NSP1 look like and can even say a lot about its interaction with human ribosomes.

How SARS-CoV-2 takes over its host—NSP1, the Leader Protein 31

Figure 1: a: Schematic structure of NSP1. b: N-terminal domain (PDB: 7K7P)., c: C-terminal domain. KH motif (amino acids K164 and H165) in yellow (PDB: 6ZLW).

The first domain is the globular N-terminal domain (amino acids 1–128), which takes up most of the protein. It consists of a β-barrel of seven β-strands, two 310 helices and one α-helix5, as can be seen in Figure 1b.

The probably more interesting domain, due to the crucial role it plays for interaction with human ribosome, is the C-terminal domain comprising three moieties (Figure 1c). It consists of the two α-helices, α1 and α2, and a loop connecting them4. The shape of this C-terminal domain and its surface charge matches the mRNA entry channel of the ribosome perfectly and therefore covers the whole usual mRNA path4. In Figure 2, the small 40S ribosomal subunit (green) in a complex with the C-terminal domain of NSP1 (pink) is shown.

How SARS-CoV-2 takes over its host—NSP1, the Leader Protein 32

Figure 2: a: Ribosomal 40S subunit in complex with the NSP1 C-terminal domain (PDB: 6ZLW). The C-terminal domain is bound to the mRNA channel between the “head” and “body” of the 40S. b & c: NSP1 C-terminal domain shown with and without surface.

While the C-terminal domain is bound to the mRNA entry channel of the host cell’s 40S ribosomal subunit, the N-terminal domain can move around it within a 60 Å radius, connected by the 20 amino acid long flexible linker6.

All these interactions lead to an inhibition of the translation of the hosts mRNA—but how does the viral mRNA get translated, if the NSP1 is bound to the ribosome’s mRNA entry channel?

Viral translation

The virus needs a mechanism to circumvent its own translational blockage to maintain the capability for translation of the viral mRNA. It is not yet completely clear how this is accomplished, but different suggestions exist.

The first theory involves the N-terminal domain of NSP1 and the 5’ untranslated region (5’UTR) of the viral mRNA7.

In most coronaviruses, the 5’UTR part of the viral mRNA is conserved with a complex secondary structure6. Some scientists7 suggest that it might interact with the N-terminal domain, making the interaction between NSP1 and the ribosome sterically impossible and therefore lifting the blockage. This was also based on their study indicating that the C-terminal domain alone can suppress the host’s protein synthesis, but the N-terminal domain is needed to bypass the translation inhibition. Also, extending the linker between the two domains artificially by additional amino acids could be shown to reduce the viral mRNA translation7.

The second theory suggests that the translational blockage induced by the viral NSP1 is not lifted. In this mechanism, most ribosomes would be blocked by the NSP1s, but those left unblocked could still synthesize proteins. Here the viral 5’UTRs would make the mRNA of the virus more favourable than the host’s mRNA. This would lead the ribosomes into translating the viral mRNA with a higher efficiency than the cellular mRNA6.

Effect on the cells and immune system interference

Translation inhibition of the cellular mRNA by NSP1 results directly in another interesting and significant effect on the human cell. Besides the negative effects on normal cell functions, the translation of proteins involved in innate immune response is also inhibited. This includes interferons (proteins involved in antiviral activity8)  like Interleukin-8, IFN-β, IFN-γ1 and anti-viral factors that are stimulated by interferons, leading to a downregulation of the cell’s defence system4,9.

Earlier studies on SARS-CoV-1 also showed that NSP1 is further inducing cleavage of the host’s mRNA, probably by using one of the host’s proteins. This again does not apply to its own viral mRNA10, making the impact on the host cell even greater.

Taken together, this protein is a major pathogenicity factor of SARS-CoV-2 and might therefore be an interesting drug target1.

Available structures

As of this writing, 16 structures of the SARS-CoV-2 NSP1 are available, of which two display the N-terminal domain. The other structures show the C-terminal domain in complex with a ribosome, ribosomal subunit or preinitiation ribosome. As there is no full-length structure solved so far, only predictions on the whole protein were made, for example given by Clark et al.5.

Available structures of the N-terminal: 7k7p, 7k3n.

Available structures of the C-terminal: 7k5i, 6zoj, 6zok, 6zm7, 6zlw, 6zmi, 6zp4, 6zon, 7jqb, 6zme, 6zmt, 6zn5, 6zmo, 7jpc.

References

  1. de Lima Menezes, G. & da Silva, R. A. Identification of potential drugs against SARS-CoV-2 non-structural protein 1 (nsp1). Journal of Biomolecular Structure and Dynamics 1–11 (2020) doi:10.1080/07391102.2020.1792992.
  2. Yoshimoto, F. K. The Proteins of Severe Acute Respiratory Syndrome Coronavirus-2 (SARS CoV-2 or n-COV19), the Cause of COVID-19. 19.
  3. Khatter, H., Myasnikov, A. G., Natchiar, S. K. & Klaholz, B. P. Structure of the human 80S ribosome. Nature 520, 640–645 (2015).
  4. Thoms, M. et al. Structural basis for translational shutdown and immune evasion by the Nsp1 protein of SARS-CoV-2. 8 (2020).
  5. Clark, L. K., Green, T. J. & Petit, C. M. Structure of Nonstructural Protein 1 from SARS-CoV-2. Journal of Virology 95, 12 (2021).
  6. Schubert, K. et al. SARS-CoV-2 Nsp1 binds the ribosomal mRNA channel to inhibit translation. Nat Struct Mol Biol 27, 959–966 (2020).
  7. Shi, M. et al. SARS-CoV-2 Nsp1 suppresses host but not viral translation through a bipartite mechanism. http://biorxiv.org/lookup/doi/10.1101/2020.09.18.302901 (2020) doi:10.1101/2020.09.18.302901.
  8. De Andrea M. et al. The interferon system: an overview. Eur J Paediatr Neurol (2002) doi:10.1053/ejpn.2002.0573.
  9. Vann, K. R. Inhibition of translation and immune responses by the virulence factor Nsp1 of SARS-CoV-2. 4.
  10. Huang, C. et al. SARS Coronavirus nsp1 Protein Induces Template-Dependent Endonucleolytic Cleavage of mRNAs: Viral mRNAs Are Resistant to nsp1-Induced RNA Cleavage. PLoS Pathog 7, e1002433 (2011).

A guest entry by Hauke Hillen

In order for the novel coronavirus SARS-CoV-2 to replicate, it has to achieve two basic tasks: It needs to make copies of its genome that can be packaged into new virus particles, and it needs to activate viral genes to produce the proteins that actually form new virus particles, such as spike or nucleocapsid. Both tasks are carried out by a specialized molecular copying machine called the replication and transcription complex, or short RTC. The RTC is made up of a number of viral non-structural proteins (nsps) which act together to produce copies of the viral RNA genome. Some of the RTC components have been discussed in previous posts, for example the exonuclease nsp14, which can correct errors that occur during RNA copying (this is called proofreading),or the methyltransferases nsp14 and nsp16 that add chemical modifications to the RNA that help stabilize and hide it from the immune system (this is called capping).

In this post, we will have a closer look at the enzyme that carries out RNA replication, the RNA-dependent RNA polymerase (RdRp) nsp12, and how it interacts with the other components to form the RTC.

RNA polymera…what?

First off, let’s briefly discuss what a “RNA-dependent RNA polymerase” is, and why it is important for the virus. Polymerases are enzymes found in every living cell carrying out one of the most fundamental tasks in biology: they replicate genetic information. While most cells store their genetic information in form of DNA (deoxyribonucleic acid) and use RNA (ribonucleic acid) only as transient messenger molecules (mRNAs), some viruses rely on RNA for both information storage and transmission and are hence called “RNA viruses”. To replicate their genetic information, they need a polymerase that uses RNA as template to copy the encoded information into a new RNA molecule – a RNA-dependent RNA polymerase. Chemically speaking, RNA is a polymer (a long string of nearly-identical individual building blocks) composed of the four nucleotides adenosine (A), guanosine (G), cytosine (C) and uracil (U), the sequence of which defines the genetic information.

The job of the RNA polymerase is to read the sequence of nucleotides in a template RNA and synthesize new RNA with the same sequence (technically with a complementary sequence) using individual nucleotides as building blocks. To do this, the RNA polymerase has to achieve three basic steps. First, it needs to bind the template RNA. Second, it has to read the sequence of nucleotides in the RNA. Third, it has to incorporate the correct matching nucleotide building blocks to polymerize a new RNA strand.

Coronaviruses are RNA viruses and therefore have an RNA-dependent RNA polymerase (RdRp, or nsp12). However, they are exceptional in several ways. First, their RNA genomes are almost 30.000 nucleotides in length, which are the largest RNA virus genomes known to date. Second, their RNA polymerase nsp12 requires the additional proteins nsp7 and nsp8 in order to form the active “core” RdRp. Third, this core RdRp assembles with further viral proteins to form the RTC, which can carry out additional functions such as proofreading to remove copying errors - a highly unusual capability for RNA viruses.

Since the RNA polymerase has such a fundamental job during virus replication, it is an attractive drug target to combat viral infections. Indeed, many successful anti-viral drugs against Hepatitis C virus, HIV or Herpes virus act by inhibiting viral polymerases. Strikingly, viral RdRp enzymes are remarkably similar in their overall structure even between unrelated viruses, indicating that they share a common evolutionary ancestor and their function is so essential that it does not allow for drastic changes. Therefore, some known anti-virals developed to treat other viral diseases have also been tested for their activity against the polymerase of SARS-CoV-2 and even approved for clinical use against COVID-19 [1]. However, these repurposed compounds are generally not as effective as many had hoped, because even rather subtle structural differences between the polymerase enzymes of different viruses can have strong effects on the action of anti-viral drugs. Thus, more specific and efficient drugs against SARS-CoV-2 are therefore badly needed. In order to discover and develop such compounds, detailed knowledge of the structure and function of the RdRp is necessary.

Structure of the coronavirus RNA-dependent RNA polymerase (RdRp)

The first structure of a coronavirus polymerase was determined shortly before the outbreak of the current COVID-19 pandemic when Kirchdoerfer and Ward reported the cryo-electron microscopy (cryo-EM) structure of SARS-CoV-1 RdRp [2]. Since SARS-CoV-2 has emerged, scientists all over the world have been racing to determine the structures of its RdRp. As of February 2021, this has led to more than 20 structures of SARS-CoV-2 polymerase-complexes published in the PDB, and often several groups of scientists reported similar structures around the same time.

These structures show that the RNA polymerase nsp12 resembles a right hand with individual domains called palm, fingers and thumb (Figure 1) [3–7]. This “hand” shape is typical for viral RNA polymerases and holds a tight grip on the double-stranded RNA helix that forms between template and product strand during RNA synthesis. Within the palm lies the “active center” of the enzyme, where nucleotides are added to the growing product chain. The active center is accessible from the surface of the enzyme through a special tunnel, so that nucleotides can enter the substrate-binding site. As the template strand is opposite to the substrate-binding site, each nucleotide entering is “sampled” for whether it can form base-pairing interactions with the template base. If this is the case, the nucleotide remains bound, and it is added to the 3’ end of the product strand by forming a chemical bond. After that, the RdRp enzyme must slide ahead on the template strand by one nucleotide, which moves the newly produced 3’ end of the product RNA from the substrate-binding site (which is sometimes also referred as position +1) to the position where the previous 3’ end of the product was located prior to addition (position -1). This “translocation” completes the nucleotide addition cycle, as it positions the next templating base and frees up the substrate-binding site for the next matching building block.

Watching coronavirus multiply – the quest for structures of SARS-CoV-2 RNA polymerase 33
Figure 1 – Structure of SARS-CoV-2 RdRp

Left: Cryo-EM structure of SARS-CoV-2 RdRp (PDB 6YYT). Right: Enlarged active site with template strand in blue and nascent chain in red..

In addition to its polymerase domain, RdRp also contains a part that is only found in nidoviruses (the virus family that coronaviruses belong to) called “nidovirus RdRp-associated nucleotidyltransferase domain”, or short NiRAN-domain. Scientists believe that this domain has the capability to transfer nucleotidyl-residues, which means that it can form chemical bonds between nucleotides and other molecules. This hints that it may be involved in modification of the RNA (so-called “capping”) or in helping the enzyme initially kickstart RNA synthesis, but its precise role during coronavirus replication is still being studied by scientists.

In order to efficiently copy RNA, nsp12 requires two additional viral proteins, nsp7 and nsp8. The structures of coronavirus RdRp show that two molecules of nsp8 and one molecule of nsp7 bind on top of the hand-shaped nsp12. Interestingly, even though identical in amino acid sequence, the two nsp8 molecules adopt slightly different shapes. While one of them interacts with the finger domain of nsp12 directly, the interaction of the other one is mediated by nsp7. Both nsp8 molecules have long “arms” that protrude away from the polymerase and touch the RNA duplex as it emerges from the polymerase during replication. These “sliding poles” are unique to coronaviruses and most likely stabilize the RdRp on the RNA, which may help to make sure it doesn’t fall off during replication of the very large genome.

Seeing is believing - visualizing how anti-viral compounds block RdRp

So how exactly can this structural knowledge help to find new drugs against COVID-19? In many ways, enzymes are like tiny molecular machines. By studying their structure, one can analyze in detail how they work biochemically, and this in turn allows us to come up with ways to block their function. Most known anti-viral drugs that target RNA polymerases are so-called nucleoside analogs, which means they are molecules that structurally resemble the natural building blocks of RNA. These compounds can “trick” the RdRp by binding to the active site, but due to their chemical nature, they either cannot be incorporated into the product or lead to mutations that end the viral life cycle. The structures of SARS-CoV-2 RdRp reveal the exact architecture of the active site and show how the chemical environment that the enzyme creates around the product RNA and the substrate nucleotides, facilitates polymerization (Figure 1). This knowledge can help to rationally design or improve compounds in such a way that they bind more efficiently.

This is exemplified by recent studies analyzing how repurposed anti-virals inhibit SARS-CoV-2 RdRp. One such drug is Remdesivir, a compound originally developed against Ebola and other viruses that has also been approved by the FDA and European agencies for treatment of COVID-19 (see also this previous post). Remdesivir chemically resembles adenosine triphosphate (ATP), but has an additional bulky chemical residue called a cyano-group attached to the C1-atom of its ribose moiety. In contrast to most nucleoside analogs, Remdesivir does not block the RdRp immediately, but only after another three nucleotides are added, a process called “delayed stalling” [8–10]. Initial structures of SARS-CoV-2 RdRp in the presence of Remdesivir showed how it can act as an adenosine analog and how it can be incorporated at the 3’ end of the nascent product RNA strand and translocated to the -1 position (Figure 2a,b) [5,6]. However, this could not explain how it would lead to inhibition of RNA synthesis.

To pinpoint why Remdesivir interferes with RNA synthesis exactly after three subsequent nucleotides are added, the authors of a recent study used a combination of synthetic chemistry and structural biology [11]. They systematically determined structures of SARS-CoV-2 RdRp bound to a template-product RNA duplex which contained Remdesivir and either two or three additional nucleotides at the 3’ end. In the first case, the structure showed that the RdRp was in the post-translocated state, with Remdesivir at position -3 and  an empty substrate binding site, as expected (Figure 2c). In contrast, the structure of the RdRp with an RNA containing Remdesivir and three additional nucleotides was not in the post-translocated state and Remdesivir was not located at position -4. Instead, it remained at position -3, and the third additional nucleotide at the 3’ end of the product was stuck in the substrate-binding site (Figure 2d). This state resembles the situation directly after addition of a new nucleotide but before translocation and is hence called the pre-translocated state. This suggests that remdesivir inhibits the SARS-CoV-2 RNA polymerase by posing a translocation barrier, and the structures provide a molecular and chemical explanation for this: Initially, Remdesivir can be added to the growing RNA just like adenosine triphosphate and also translocated to add another two nucleotides. However, after binding and addition of a third nucleotide, Remdesivir can not be translocated to the -4 position, because its bulky cyano group would clash with a serine residue (Ser861) in the thumb domain of nsp12 (Figure 2c). Therefore, the polymerase gets stuck in the pre-translocated state, which explains why exactly three nucleotides can be added after Remdesivir incorporation – addition of a fourth nucleotide would first require translocation. This proposed mechanism is in agreement with previous modelling [5,10] and was shortly after independently confirmed by another structural study, in which the authors managed to trap an identical pre-translocated, stalled intermediate with Remdesivir in position -3 [12].

Watching coronavirus multiply – the quest for structures of SARS-CoV-2 RNA polymerase 34
Figure 2 – How remdesivir inhibits SARS-CoV-2 RdRp

Structural snapshots of remdesivir (purple) moving through the active site of SARS-CoV-2 RdRp. When it reaches the third position after its incorporation to the RNA (-3), its further movement is blocked because the cyano group would bump into Ser861. A) Remdesivir at position +1 (PDB 7BV2) B) Remdesivir at position -1 PDB 7C2K C) Remdesivir at position -3 (PDB 7B3B) D) Remdesivir at position -3 with the nucleotide at the 3’ end stuck in the substrate binding site (PDB 7B3C).

This mechanism also suggests how Remdesivir may at least partially escape the coronavirus proofreading enzyme nsp14, which removes misincorporated nucleotides at the 3’ end of the RNA and thus counteracts anti-virals that target RdRp. Since Remdesivir can be translocated until it reaches position -3 before it causes stalling, it may leave the active site of the enzyme before it can be recognized by the proofreading machinery.

Importantly, these studies also provide clues as to why Remdesivir has had limited success in fighting COVID-19. The structures show that the steric block between Ser861 and the cyano group of Remdesivir is not severe and can therefore be overcome by the enzyme, for example at high concentrations of substrate NTPs [8] Consistent with this, substitution of Ser861 with residues that clash even less (Alanine or Glycine) make the RdRp less sensitive or even resistant to Remdesivir [5,13]. This suggests that the translocation barrier could potentially be enhanced by a compound that leads to more severe clashes. One way to achieve this could be to modify Remdesivir to contain more bulky chemical moieties than the cyano group. Thus, the detailed molecular insights into the mechanism of Remdesivir also provide a rational basis for designing more potent anti-virals and test their effect on the SARS-CoV-2 RdRp.

Similar structure-function studies are now being undertaken also for other promising anti-viral compounds, such as Favipiravir. Like Remdesivir, it is a nucleoside analogue that was initially developed against other viruses, but showed some promising results against SARS-CoV-2. Structures of the SARS-CoV-2 RNA polymerase with Favipiravir show how it mimics both guanosine or adenosine in the active site of the enzyme by forming unusual base-pairing interactions with cytosine and uracil, respectively, and this leads to errors during RNA copying that eventually kill the virus [14,15]. Another study recently reported the structure of Suramin bound to SARS-CoV-2 RdRp [16]. In contrast to Remdesivir and Favipiravir, this compound is not a nucleoside analog and hence does not get incorporated into the RNA. Instead, two Suramin molecules can apparently bind to the RdRp and thereby prevent its association with template and product RNA, rendering it inactive.

These studies are good examples of how structural biology can visualize complicated chemical reactions in an intuitive way. Based on these results, drugs like Remdesivir or Favipiravir can be rationally optimized to more effectively combat COVID-19 and may cause less side effects.

Dissecting the SARS-CoV-2 RTC structure by structure

In addition to provide detailed snapshots of how anti-viral compounds act to inhibit SARS-CoV-2 RdRp, structural studies are also helping scientists to understand how the unique coronavirus RTC combines different functions such as RNA synthesis, proofreading and modification. After the structure of the “core” SARS-CoV-2 RdRp was determined within a few months after the outbreak of COVID-19, scientists quickly moved to studying how additional non-structural proteins bind to it to form the RTC. One of these is nsp13, which belongs to a protein class called “helicases”. These are enzymes that bind to DNA or RNA and, with the help of chemical energy in the form of ATP, move along them or unwind helices. Cryo-EM structures of nsp13 bound to the SARS-CoV-2 RdRp show that two molecules of nsp13 can bind to the RdRp, and suggest that it may allow the polymerase to move backwards on the RNA (Figure 3) [17,18]. This finding seems unexpected at first, but experts think this may be required for the proofreading enzyme nsp14 to remove errors that the polymerase makes during copying or to produce the mRNAs for certain viral proteins. Another recent cryo-EM structure shows that the small protein nsp9 interacts with the NiRAN domain in the SARS-CoV-2 polymerase [19]. In the accompanying paper, the authors suggest that the NiRAN domain may be involved in RNA capping, and nsp9 seems to block its activity. However, others have proposed that nsp9 is in fact a substrate of nucleotidylation by the NiRAN domain, and that it may be involved in priming the RNA polymerase for initial RNA synthesis [20]. Thus, further studies are necessary to determine whether the NiRAN is involved in capping, priming or even both.

Watching coronavirus multiply – the quest for structures of SARS-CoV-2 RNA polymerase 35
Figure 3 – Structures help uncover the roles of the different RTC components

Structure of the SARS-CoV-2 RdRp complex with nsp13 (salmon) and nsp9 (cyan) (PDB 7CYQ).

What’s next?

Over the past year, scientists have uncovered the structure of the SARS-CoV-2 RdRp and associated proteins at a record-breaking pace. While these structures provide impressive first glimpses at RdRp-complexes, researchers are already working to determine how the remaining nsp proteins interact with the RdRp to form the complete RTC complex. This may ultimately aid the quest to find new treatment options for COVID-19 that not only target the polymerase itself, but also proofreading or RNA capping. Such drugs are not only desperately needed for the current pandemic, but may also prove useful for future emerging coronaviruses, because the RNA polymerase is typically very similar even between different virus strains.

1. Ledford H: Hopes rise for coronavirus drug remdesivir. Nature 2020, doi:10.1038/d41586-020-01295-8.

2. Kirchdoerfer RN, Ward AB: Structure of the SARS-CoV nsp12 polymerase bound to nsp7 and nsp8 co-factors. Nat Commun 2019, 10:2342.

3. Gao Y, Yan L, Huang Y, Liu F, Zhao Y, Cao L, Wang T, Sun Q, Ming Z, Zhang L, et al.: Structure of the RNA-dependent RNA polymerase from COVID-19 virus. Science 2020, 368:779–782.

4. Hillen HS, Kokic G, Farnung L, Dienemann C, Tegunov D, Cramer P: Structure of replicating SARS-CoV-2 polymerase. Nature 2020, 584:154–156.

5. Wang Q, Wu J, Wang H, Gao Y, Liu Q, Mu A, Ji W, Yan L, Zhu Y, Zhu C, et al.: Structural Basis for RNA Replication by the SARS-CoV-2 Polymerase. Cell 2020, 182:417-428.e13.

6. Yin W, Mao C, Luan X, Shen D-D, Shen Q, Su H, Wang X, Zhou F, Zhao W, Gao M, et al.: Structural basis for inhibition of the RNA-dependent RNA polymerase from SARS-CoV-2 by remdesivir. Science 2020, 368:1499–1504.

7. Peng Q, Peng R, Yuan B, Zhao J, Wang M, Wang X, Wang Q, Sun Y, Fan Z, Qi J, et al.: Structural and Biochemical Characterization of the nsp12-nsp7-nsp8 Core Polymerase Complex from SARS-CoV-2. Cell Reports 2020, 31:107774.

8. Gordon CJ, Tchesnokov EP, Woolner E, Perry JK, Feng JY, Porter DP, Götte M: Remdesivir is a direct-acting antiviral that inhibits RNA-dependent RNA polymerase from severe acute respiratory syndrome coronavirus 2 with high potency. J Biol Chem 2020, 295:6785–6797.

9. Tchesnokov EP, Feng JY, Porter DP, Götte M: Mechanism of Inhibition of Ebola Virus RNA-Dependent RNA Polymerase by Remdesivir. Viruses 2019, 11:326.

10. Gordon CJ, Tchesnokov EP, Feng JY, Porter DP, Gotte M: The antiviral compound remdesivir potently inhibits RNA-dependent RNA polymerase from Middle East respiratory syndrome coronavirus. The Journal of biological chemistry 2020, doi:10.1074/jbc.ac120.013056.

11. Kokic G, Hillen HS, Tegunov D, Dienemann C, Seitz F, Schmitzova J, Farnung L, Siewert A, Höbartner C, Cramer P: Mechanism of SARS-CoV-2 polymerase stalling by remdesivir. Nat Commun 2021, 12:279.

12. Bravo JPK, Dangerfield TL, Taylor DW, Johnson KA: Remdesivir is a delayed translocation inhibitor of SARS CoV-2 replication in vitro. Biorxiv 2020, doi:10.1101/2020.12.14.422718.

13. Tchesnokov EP, Gordon CJ, Woolner E, Kocinkova D, Perry JK, Feng JY, Porter DP, Götte M: Template-dependent inhibition of coronavirus RNA-dependent RNA polymerase by remdesivir reveals a second mechanism of action. J Biol Chem 2020, 295:16156–16165.

14. Naydenova K, Muir KW, Wu L-F, Zhang Z, Coscia F, Peet MJ, Castro-Hartmann P, Qian P, Sader K, Dent K, et al.: Structural basis for the inhibition of the SARS-CoV-2 RNA-dependent RNA polymerase by favipiravir-RTP. Biorxiv 2020, doi:10.1101/2020.10.21.347690.

15. Peng Q, Peng R, Yuan B, Wang M, Zhao J, Fu L, Qi J, Shi Y: Structural basis of SARS-CoV-2 polymerase inhibition by Favipiravir. Innovation 2021, doi:10.1016/j.xinn.2021.100080.

16. Yin W, Luan X, Li Z, Zhou Z, Wang Q, Gao M, Wang X, Zhou F, Shi J, You E, et al.: Structural basis for inhibition of the SARS-CoV-2 RNA polymerase by suramin. Nat Struct Mol Biol 2021, 28:319–325.

17. Chen J, Malone B, Llewellyn E, Grasso M, Shelton PMM, Olinares PDB, Maruthi K, Eng ET, Vatandaslar H, Chait BT, et al.: Structural basis for helicase-polymerase coupling in the SARS-CoV-2 replication-transcription complex. Cell 2020, doi:10.1016/j.cell.2020.07.033.

18. Yan L, Zhang Y, Ge J, Zheng L, Gao Y, Wang T, Jia Z, Wang H, Huang Y, Li M, et al.: Architecture of a SARS-CoV-2 mini replication and transcription complex. Nat Commun 2020, 11:5874.

19. Yan L, Ge J, Zheng L, Zhang Y, Gao Y, Wang T, Huang Y, Yang Y, Gao S, Li M, et al.: Cryo-EM Structure of an Extended SARS-CoV-2 Replication and Transcription Complex Reveals an Intermediate State in Cap Synthesis. Cell 2021, 184:184-193.e10.

20. Slanina H, Madhugiri R, Bylapudi G, Schultheiß K, Karl N, Gulyaeva A, Gorbalenya AE, Linne U, Ziebuhr J: Coronavirus replication–transcription complex: Vital and selective NMPylation of a conserved site in nsp9 by the NiRAN-RdRp subunit. Proc National Acad Sci 2021, 118:e2022310118.

cross