Structural Task Force
April 22, 2020

Untangling Nsp3 of SARS-CoV-2

Kristopher Nolte

The world holds its breath as the novel Coronavirus continues to spread across the world, bringing our lives to a halt. We have gathered a lot of knowledge about the virus but there are still many gaps to fill. The non-structural-protein 3 (nsp3) represents one of these gaps in our knowledge. As the largest protein encoded by the coronaviruses genome, untangling its structure and function poses a huge task.

However, we can glean some knowledge around the specific function of SARS-CoV-2 nsp3 by looking at the virus‘s subfamily,  Orthocoronaviridae. As related viruses do share some common traits, academics were not completely unprepared when SARS-CoV-2 came. In the background, while only very few people were worried about a new corona virus, scientists around the world had been investigating the invisible enemy for decades. Building on this past work we look at the functions of proteins from other coronaviruse, like Murine Hepatitis Virus (MHV) and SARS-CoV, to learn more about how best to fight against SARS-CoV-2.

Fig. 1: The crystal structure of papain-like protease of SARS CoV-2 nsp3 (PDB-ID: 6w9c). Picture by Kristopher Nolte.

The gene which produces nsp3 lies on the open reading frame 1a (ORF1a) which encodes polyprotein 1a. The sequence for nsp3 of SARS-CoV is 1922 amino acids long and sandwiched between nsp2 and nsp4. It not only cleaves itself from the polyprotein by its papain-like protease domain but also nsp1 and nsp2. In coronaviruses, 18 different domains have been found in nsp3. Each virus type has 10 to 16 of these, out of which eight domains and two transmembrane regions form the conserved part of nsp3, which can be found in every coronavirus known to date [1]:

  1. Ubiquitin-like-domian (Ubl1)
  2. Ubiquitin-like-domain (Ubl2)
  3. Papain-like protease (PlPro)
  4. Macro domain / X domain (Mac)
  5. Hypervariable region / Glu-rich acidic domain (HVR)
  6. Transmembrane regions (TM1)
  7. Transmembrane regions (TM2)
  8. Ectodomain / Zinc finger domain (3ecto)
  9. Nidovirus-conserved domain of unknown function (Y1)
  10. Coronviurs specific carboxyl-terminal domain (CoV-Y)

To start our investigation on SARS-CoV-2 related structural data, we will look into the protein sequences of SARS-CoV and SARS-CoV-2 to learn where they are similar and where they differ.

Genetic Comparsion of SARS-CoV and SARS-CoV-2

SARS-CoV has 16 domains which span 1922 amino acids. The nsp3 protein of SARS-CoV-2 is a bit longer at 1945 amino acids. When compared to each other, there is an overall similarity of 75,97%.[2] In Addition to the ten conserved domains the nsp3 gene of SARS-CoV-2 codes for four domains:

Fig 1: Position of the nsp3 gene on the SARS-CoV-1 genome. Nsp3 is seperated into 12 domains. Picture by Thomas Splettstoesser,
  1. Nucleic-acidic-binding domain (NAB)
  2. Betacoronavirus specific marker domain (βSM)
  3. Domain preceding Ubl2 and PL2pro (DPUP)
  4. Amphipathic helix 1 (AH1)

The two domains at the N-terminal end, Ubl1 and HVR, have an alignment of 79% and 64%, respectively. There seems to be a trend in coronaviridae for these domains to be poorly conserved, but Ubl1 still adopts the expected conserved fold.[4] If this proves true, could be analysed by comparing the sequence alignment and the structural similarity. It is unsurprising that the "high variable region" lives up to its name and shows the worst alignment of all. In the related MHV nsp3, this domain is dispensable for replication.[5]
It has been speculated that the Mac1 domain functions as an ADP ribose 1"-phosphatase, however, the effects of mutation in this region differ from virus to virus.[4] As a result, it is difficult to judge what significance the bad alignment of this domain will have on our understanding of SARS-CoV-2 without further research.

Table. 1: The domain amino acid range for SARS-CoV-1 was taken from Hilgenfeld et al.,2018 [2]. The range for SARS-CoV-2 was determined by taking the amino acid ranges of CoV-1 and using BLAST [2] to search for the best alignment of the domain sequences. Picture by Kristopher Nolte

The Mac1 domain, also known as the X-domain, is followed by two macrodomains which were originally called "SARS-CoV Unique domains" (SUD-N and SUD-M), but were renamed when they were found to not be unique to SARS-CoV. It has since been observed that only Mac3 plays an essential role in viral RNA replication[6], which could explain why Mac3 is one the most conserved domains in the alignment of SARS-CoV and SARS-CoV-2.

Pl2Pro and its neighbouring domain Ubl2 show some of the highest sequence alignments of all domain comaprisons. This could be explained by their essential function to cleave nsp3 from the polyprotein.
Little is known about the domains following Pl2Pro and our current structural knowledge is limited to a nuclear magnetic resonance (NMR) structure of NAB. While the structure and function of Y1 and CoV-Y from SARS-CoV-2 are currently unknown, their sequence, which compromises a fifth of the genome, is highly conserved in all coronaviruses.

Fig. 2: The location of the aligned domains of SARS-CoV (abbreviated CoV-1) and SARS-CoV-2 (abbreviated CoV-2) is shown over the length of nsp3 (TM1 = 1, TM2 = 2, AH1 =A). Picture by Tim Scharf.

In the second part of the series of Untangling Nsp3 of SARS-CoV-2 we will delve deeper into some structures of nsp3 of SARS-CoV-1 and SARS-CoV-2 and will try to find out how the differences in the sequence may have influenced some structures of the protein. For a further in-depth reading on the topics discussed here I highly recommend the sources below.  

Table. 2: For each domain and their respective counterpart in SARS-CoV-2 a BLAST search was contucted to search for fitting PDB-IDs. Last Update: 18.05.2020. The scripts and the PDB-data can be found in our Git repository [3]
Picture by Kristopher Nolte


  • [1] Lei J, Kusov Y, Hilgenfeld R. Nsp3 of coronaviruses: Structures and functions of a large multi-domain protein. Antiviral Res. 2018 Jan;149:58-74. doi: 10.1016/j.antiviral.2017.11.001. Epub 2017 Nov 8. PMID: 29128390; PMCID: PMC7113668.
  • [2] Madden T. The BLAST Sequence Analysis Tool. 2002 Oct 9 [Updated 2003 Aug 13]. In: McEntyre J, Ostell J, editors. The NCBI Handbook [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2002-. Chapter 16. Available from:
  • [3]
  • [4] Benjamin W. Neuman, Bioinformatics and functional analyses of coronavirus nonstructural proteins involved in the formation of replicative organelles, Antiviral Research, Volume 135, 2016, Pages 97-107, ISSN 0166-3542,
  • [5] K.R. Hurst, C.A. Koetzner, P.S. Masters, Characterization of a critical interaction between the coronavirus nucleocapsid protein and nonstructural protein 3 of the viral replicase-transcriptase complex J. Virol., 87 (2013), pp. 9159-9172
  • [6] Kusov Y, Tan J, Alvarez E, Enjuanes L, Hilgenfeld R. A G-quadruplex-binding macrodomain within the "SARS-unique domain" is essential for the activity of the SARS-coronavirus replication-transcription complex. Virology. 2015 Oct;484:313-22. doi: 10.1016/j.virol.2015.06.016. Epub 2015 Jul 3. PMID: 26149721; PMCID: PMC4567502.

Kristopher Nolte

Kristopher joined the Thorn Lab as part of his bachelor thesis.
In this thesis he will refine aspects of the diagnostic tool for graphical X-Ray data analysis (AUSPEX) with the help of machine learning.

But since the corona crisis halted all our lives he contributes to the task force by using his knowledge of bioinformatics and programming to collect, organize, update all Coronavirus-relevant data from the protein data bank (PDB).

Apart from this he will investigate the proteins of SARS-CoV-2 through bioinformatic tools and will share his results on inside corona.

Leave a Reply

Your email address will not be published. Required fields are marked *

Coronavirus Structural Taskforce