Babelian SARS-CoV-2 confusion

May 20, 2021

There is a secret code that virologists use to talk about the new coronavirus. This code is made up of synonymous words and abbreviations for each of the 28 proteins which facilitate the viral life cycle. In this article, we will shed some light on this mythical language.

First of all, SARS-CoV-2 has three classes of proteins:
Structural Proteins, namely the spike protein, the membrane protein and the envelope protein as well as the nucleocapsid, which forms an extra shell around the single-stranded RNA, are also known as the S-, M-, E- and N-Protein.

Non-structural proteins (NSP) ensure the viral life cycle but are not making up the hull or nucleocapsid; These are conveniently numbered 1–16.

And then there are accessory proteins, which seem to be more important in-vivo than in-vitro, and most of them have not yet been structurally determined.

Of course, this nice and clear naming scheme tells you little about the function and properties of the different proteins, which is why virologists invented plenty of other names for them. And this is where the confusion begins.

SARS-CoV-2 Pl2Pro meme
Meme by Andrea Thorn.

NSP3, for example, contains two ubiquitin-like (UBL1 and UBL2) domains, a papain-like protease (PLpro, PL2pro) domain (which includes a zinc finger), a "macro" domain (also known as X domain, Mac1, or ADP ribose phosphatase), a hypervariable region (also called Glu-rich acidic domain or HVR), two transmembrane domains (TM1 and TM2), an ecto (3Ecto) domain (which is also a zinc finger), a conserved domain of unknown function called Y1, and a coronavirus-specific carboxyl-terminal (CoV-Y) domain. The SARS-unique domains, or SUDs—namely SUD-M, SUD-N, and SUD-C—were all renamed after it was found out they are not unique to SARS: SUD-N is now Mac2, SUD-M is Mac3 and SUD-C is called DPUP.

If this was not enough to convince you that all of this is confusing, here are some additional names:

S-Protein, surface glycoprotein, E2 glycoprotein

NSP1: leader protein

NSP5: 3CLpro, SARS-CoV-2 3C-like protease, 3C-like proteinase, main protease, NSP5A_3CLpro, NSP5B_3CLpro, Mpro, Non-structural protein 5

NSP9: Non-structural protein 9, ssRNA-binding protein

NSP10: Non-structural protein 10, growth factor-like protein, GFL

NSP12: RNA Polymerase, RNA-dependent RNA Polymerase, NiRAN, RdRp

NSP13: NSP13-pp1ab, non-structural protein 13, helicase, NTpase, Hel

NSP14: NSP14A2_ExoN, SARS-CoV-2 3'-to-5' exonuclease, non-structural protein 14, NSP14B_NMT

NSP15: NSP15-A1, SARS-CoV-2 endoRNAse, NSP15B-NendoU, NendoU, uridylate-specific endoribonuclease NendoU

All these names are certainly hard to remember, but as a scientist you need them in order to save the world! So, we made a handy glossary for you that you can access here.

If you have any more suggestions or corrections for the glossary, please let us know in the comments!


Andrea Thorn

Group Leader @ Institute for Nanostructure and Solid-State Physics, Hamburg University
Andrea is a specialist for crystallography and Cryo-EM structure solution, having contributed to programs like SHELX, ANODE and (a little bit) to PHASER in the past. Her group develops the diffraction diagnostics tool AUSPEX, a neural network for secondary structure annotation of Cryo-EM maps (HARUSPEX) and enables other scientists to solve problem structures. Andrea is […]
More about this author

One comment on “Babelian SARS-CoV-2 confusion”

  1. And there is :
    ORF : Open Reading Frame
    RBD : Receptor Binding Domain
    (and a list so long . . about the immunesystem)

Leave a Reply

Your email address will not be published. Required fields are marked *