Structural Task Force

Spike Glycoprotein: Corona’s Key for Invasion

COVID-19 is caused by the new coronavirus SARS-CoV-2. This virus has a characteristic virus hull featuring surface proteins which are commonly called “spikes”. Protruding from the viral hull like “spikes of a crown”, they give the coronavirus its name (corona = crown).  These proteins make the first contact with human cells and are akin to keys that use a human receptor called “angiotensin-converting enzyme2” (ACE2) as a backdoor to gain access to and infect the cell.

SARS-COV2 Animated picture. Realistic surface and spike proteins with glycosylation. Image: Thomas Splettstoesser;
Fig. 1. SARS-COV2 Animated picture. Numerous spike proteins, coloured in green, protrude from the virus hull which is coloured in brown. Spikes enable the coronavirus to invade human epithelial cells. Image: Thomas Splettstoesser;

1. Fuction of ACE2

ACE2 is a membrane protein which is anchored in the human cell membrane of epithelial cells. This type of cells can be found on the surface of lung, intestine, heart and kidney tissue. As a type I membrane protein, its primary function is to take part in maturation of angiotensin, a peptide hormone which controls vasoconstriction and blood pressure. ACE2 can be compared to a lock which can be unlocked by the coronavirus spike protein. The virus can then enter the cell and hijack its functions to reproduce itself, thus causing the Covid-19 infection which poses a serious danger to humanity, especially for older people and people with pre-existing conditions. For this reason, one approach to combating SARS-CoV-2 is to target and inhibit the spike to prevent infection. In order to do so, knowledge of the structural features of the spike and its interaction processes with ACE2 are indispensable. (Further information about how macromolecular structures are visualized can be found on our homepage:

2. Spike: Structure and Fusion Mechanism

Fig. 2. Image of a spike protein (green) protruding out of the viral envelope (brown). This image shows the structure of a spike protein divided into several subdomains. Each subdomain comprises a specific function necessary for binding and fusion. The transmembrane domain anchors the spike protein in the virus membrane.  Heptat repeat 1, 2 and the fusion peptide play key roles in mediation of the fusion process and with the RBD domain, the virus makes contact to human cells. Note that only “stumps” of carbohydrate chains are shown. Image: Thomas Splettstoesser;

The Spike protein has a trimeric shape comprising three identical monomeric structural elements. Each of these monomers can fold out akin to a modern car key with a fold-out key element with specific teeth on its surface. This fold-out key element is the so-called “receptor binding domain” (RBD). The spike can only interact with ACE2 when its RBD is in a folded-out position, exposing its teeth, or  “receptor binding motive” (RBM). As the name suggests, it comprises a motive of different amino acids which then can bind and unlock the ACE2 receptor. This key lock mechanism triggers a cascade of events initiating fusion with the host cell. First, protein scissors are recruited to the binding site. These scissors (furin & transmembrane serine protease 2) cleave the spike protein for subsequent activation. The active spike molecule then rearranges itself to form a long structural “hook” (formed of HR1/ HR2 and FP see Fig.2) that brings the epithelial cell and viral cell membrane into close proximity for fusion. Once the fusion is completed, the path for the virus is clear to transfer its genome encoded in ribonucleic acid (RNA) into the host cell. This successful transfer then enables the virus to multiply itself and finally spread from cell to cell, causeing Covid-19 in its wake.

Fig. 3. This image shows a spike protein in complex with the human ACE2 receptor. (PDB:6vsb/6lzg). Left: The structure of a spike protein coloured in orange in complex with the human ACE2 receptor coloured in light orange. The white box shows the interaction site which is shown enlarged in the image ion the right. Right: The interaction site between spike and ACE2. Spike's "receptor binding domain (RBD)" includes a "receptor binding motif (RBM)" whose amino acids interact with those of the human receptor through hydrophilic interactions. These amino acids are shown as sticks protruding from the RBM and ACE2. Image: Sabrina Stäb

3. Evading the Immune System with Carbohydrate Chains

The human immune system normally recognizes the surface proteins of foreign organisms such as viruses or bacteria and reacts with an immune response to combat them. Spike proteins are such surface proteins but because of structural peculiarities, the coronavirus evades both the innate and the adaptive human immune system. The secret of these structural peculiarities are the N-glycans. These are long carbohydrate chains which sit on spike’s surface.  Each spike comprises 66 N-glycans forming a protective shield around the protein. Hence the human immune system has problems recognizing spikes and identifying the coronavirus as an enemy.

Fig. 5. Ribbon diagrams of a spike trimer with N-glycans on its surface coloured in cyan (PDB: 6vxx). In Image a, the spike protein is shown sideways and in b, the trimer can be seen from above. Unfortunately, both X-ray crystallography and cryo-EM cannot resolve long carbohydrate chains, so the structures of the chains shown in Figure 4 contain a maximum of three sugar monomers, while in most cases, the carbohydrate chains are much longer, covering most of the contact surfaces of the upper spike protein. Image: Sabrina Stäb

The COVID 19 pandemic has a massive impact on our lives, our health and the global economy. Scientists around the world are trying to develop new drugs to combat the virus. Since the spike plays a critical role in the infection process, it is a prime target for drug development against the pandemic.  One drug approach to inhibit the interaction between spike and the ACE2 receptor is to cap the spike protein using antibodies. Antibodies are proteins, normally produced by the human immune system to fight viruses. The idea is to treat patients with antibodies that cap the RBD of spike, thus preventing interactions with ACE2. This would lead to a nonfunctional spike, blocking the coronavirus from entering the cell (The key would no longer fit the lock). Another approach includes the development of small molecules that target and inactivate the protein scissor transmembrane serine protease 2 (see chapter 2), as the spike’s functionality depends on its cleavage activity. Since the spike protein decorates the virus hull, it could even be part of a potential vaccine. For this reason,  the spike protein could also become the key in the molecular fight against COVID-19.


The short answer to this question is “almost certainly not”. However, we live in an unprecedented time; where people are both tired of experts while simultaneously believing that having read a meme on social media makes one an expert. So, what do I even mean by “almost certainly”? Between the politicians and the scientists on TV you’re probably tired of not getting a straight answer. I can’t speak for the politicians, but there is a reason for this from scientists. Scientists don’t like to work in absolutes. Not because we want to hide something, but because uncertainty is our home ground. Science works by minimising our uncertainty to the point where we can identify the simplest [1] and most likely outcome based on our observations.

So, was SARS-CoV-2 made in a lab? You can’t help but think: “It could have happened though, right?” This guy, Professor Nikolai Petrovsky from Flinders University, certainly thinks it’s possible. He also said it could be “…a chance transmission of a virus from an as yet unidentified animal to human”, but that’s not as interesting a headline. You can watch his full interview on the topic here.

[1] This is called the law of parsimony, or Occam’s razor, which states that the simplest solution is most likely the right one.

Figure 1: Professor Nikolai Petrovsky of Flinders University being interviewed on Sky News Australia

What will be discussed here?

Before I get into the science, I want to clarify the sort of claims I am addressing. As scientist we can only address a claim where the data are available and verifiable.  Many of the arguments for the virus being created in the lab start with something along the lines of President Trump’s statement on April 30th.

“We have people looking at it very, very strongly. Scientific people, intelligence people, and others. We’re going to put it all together. I think we will have a very good answer eventually. And China might even tell us.”

President Donald Trump, April 2020

Ominous, but obviously lacking any real data. When pressed for evidence to prove his claims he retorted with

“I can’t tell you that. I’m not allowed to tell you that.”

President Donald Trump, April 2020

State secrets aside, this does not cut it in this battle ground. It is impossible to make a valid argument from secret data. This would be like me submitting a paper to a scientific journal and replying to reviewer’s comments with

“Just trust me, I have the data to back up my claim that alpacas breath fire when we’re not looking, but I’m not allowed to show you it because big wool is stopping me”

Dr Sam Horrell, June 2020.

Hence, we will only deal with claims for which there are valid data (See Figure 2).

Figure 2: Fire breathing alpacas caught* on tape in the wild. *Disclaimer: This might have been faked

The other argument typically sounds something like

“Of course it looks like the virus evolved naturally, these are very clever people that know how to cover their tracks”

Karen on Facebook, 2020.

This puts you in the unfortunate position of trying to prove a negative with your counter argument, which is, sadly, not possible. If we’re arguing this in a scientific manner, we must adhere to the burden of proof and provide positive evidence which allows us to find the simplest and most likely conclusion from our observations. A classic fallacy along the “very clever people” line is the creationist argument for a young Earth that God has made look old to trick the non-believers. Any science you try and throw at this is credited to God and a lack of faith, so you can’t argue this logically, but it does run into the problem of infinitely increasing complexity. You can see how were rocketing away from the simplest and most likely answer here.

On Natural Selection

Evolution and natural selection are central to this discussion. If you are of the opinion that evolution does not exist, then the rest of this article is not going to convince you and I hope you enjoyed the fire breathing alpaca picture. Natural selection works like this: each time a species reproduces there is a chance a mutation will occur in their genome. If this change grants an advantage (i.e. long necked giraffes), this increases the rate of survival and the chance that the giraffe has offspring allowing the change to persist in the population. If a change results in a considerable disadvantage (i.e. stumpy necked giraffes) it is less likely to be passed on to the next generation and will be selected out. Then there are some mutations which are innocuous and will persist in the genome. Although they are not useful to the species, they are very useful to evolutionary biologists when tracing a species’ genetic lineage. Viruses and bacteria have a considerable advantage when it comes to natural selection, as they reproduce at a much faster rate than us mammals. For example, E. coli cells can divide every 30 minutes, so will go through several generations over the course of a single day, which means a greater chance of stumbling onto a favourable mutation! Ever wonder why antibiotic resistance is such a big problem? Because of speedy evolution.

The Coronavirus Origin Story

The new Coronavirus SARS-CoV-2 was first identified after a pneumonia outbreak on the 12th of December 2019. Its genome was sequenced, and it showed 79.6% sequence identity to the virus causing Severe Acute Respiratory Syndrome (SARS) from 2002 - and 96% sequence identity to a bat coronavirus (RaTG13-CoV) which was recently reported by a lab in Wuhan. Since then all manner of conspiracy theories have popped up suggesting that this Coronavirus was produced in a lab in Wuhan, was intentionally or accidentally released, and had been specifically designed to target humans. And why not? 96% sounds too high to be a coincidence, right? Releasing this bat virus must be the cause of COVID-19! However, if we compare humans to one of their closest relatives, the chimpanzee, we can see that we also share 96% of our genomes. And as you can see from Figure three there are a fair few differences between us. Bringing it back to coronaviruses, that 96% difference accounts for 1,100 differences between these viruses. If we line up the sequences, we see a random distribution of mutations across the genome which follows the natural evolution typical of coronaviruses. We also have the benefit of previous data from the SARS-CoV outbreak in 2002. Human SARS-CoV was found to share 99.8% sequence identity with a palm civet coronavirus, with only 202 differences between the viruses. If this is the level of similarity that has been observed historically, it follows that a 96% identical virus is not likely to be the immediate source of a species jumping global pandemic. Even if it was the immediate source this only proves the virus has come from a bat, a species not known for their molecular biology expertise.

Figure 3: An accurate comparison of 96% identical species, Homo sapiens (left) and Pan troglodytes (right). Picture by Thomas Splettstößer.

Super Villain Interlude

If I was a super villain that had released bat corona virus aiming to shut down the world with a pandemic, I’d effectively be spinning an evolutionary roulette wheel and hoping it landed on unprecedented global health crisis. Not so much maniacal as just lucky. So, it’s highly unlikely (there’s that word again) that SARS-CoV-2 came directly from the bat coronavirus being released from the lab in Wuhan. If we stop for a moment and think about it, the bat corona virus already existed in the world, so what would releasing it from a lab without extensive modification really achieve? It is much more likely that there is an animal intermediate we’re currently missing in the natural evolution of Coronavirus, most likely the result of having animals in close proximity to other animals as well as humans at the animal market in Wuhan. But as of the writing of this blog this route has not been proven.

Still not convinced that the virus did not come from a lab? OK, let’s keep going. How do we even go about making a virus? At this point we are going to have to dig into some molecular biology, so hold on to your butts!  

Homemade Viruses

We start with everyone’s favourite helical molecule, DNA, and a process called transcription. In transcription, DNA is partially unwound and a single stranded complementary (opposite) copy of the DNA sequence is produced, which we call RNA. RNA then is translated into proteins. When a virus infects a cell, it releases its genetic material (DNA or RNA) and uses our own cellular machinery to produce more viruses. If we were so inclined *cough super villain cough*, we could isolate this genetic material and, using an enzyme called reverse transcriptase, make a copy of the viral genome for our own nefarious purposes (or try and make a vaccine). This is called complimentary DNA (cDNA) and can be used to produce an infectious virus in a host which we can manipulate according to our wishes. In fact, this technique has been used already to study caliciviruses, alphaviruses, flaviviruses, arteriviruses, and *drum roll* coronaviruses! This paragraph makes this sound easy but don’t be fooled, this is certainly not the case.  Making a zoonotic virus, an animal virus that can infect humans, is a significant undertaking, but not as significant as making a zoonotic virus that can be spread between humans.

So how do we know this is not where our SARS-CoV-2 comes from? To start, we are going to investigate the genome of SARS-CoV-2 and compare it with other notable coronaviruses. A recent paper published in Nature by Andersen and colleagues has identified two notable features in SARS-CoV-2’s genome that can help us answer this question. The first is that SARS-CoV-2 interacts well with a human protein called ACE2 because of five mutations on the spike protein (the bits poking out of the virus in Figure 4 – for more information on the spike protein see here). The second is that SARS-CoV-2’s spike protein has an additional twelve bases in its RNA sequence which make it particularly infectious and able to jump between host species. On face value, this sounds like a convincing argument for SARS-CoV-2 being made in a lab. Just add a little change to the genome and release it on an unsuspecting populace. Basic super villain stuff. However, as we dig a little deeper into the science behind this, this begins to seem much less likely.

Figure 4: Illustration of SARS-CoV-2 and its spike protein by Thomas Splettstößer.

SARS-CoV-2 and ACE2 Binding

Let’s start with the optimised binding to human Angiotensin-Converting Enzyme 2, or ACE2 for short. ACE2 is a human enzyme that decorates the outer surface of a variety of cells throughout the human body, including the lungs. On a normal day, ACE2 plays an important role in cardiovascular (heart) and renal (kidney) function by producing vasodilators, key molecules that open blood vessels to increase blood flow and lower blood pressure. On an abnormal day an invasive virus (SARS-CoV-2) can bind to ACE2, enter our cells, and hijack our cell’s machinery to produce more viruses. If we compare the receptor binding domains of the spike protein from SARS-CoV (SARS-CoV-2’s 2002 predecessor), bat coronavirus and the SARS-CoV-2, we can see five key differences which improve SARS-CoV-2’s interaction with human ACE2. However, computational simulations show the interaction is far from perfect, and the binding differs from previously predicted binding modes. Furthermore, computational modelling suggests the spike protein is capable of recognising ACE2 in a number of animal species, with the exception of mice or rats. If these five key mutations were the only differences it would be more indicative of deliberate manipulation, however, the presence of 1095 other mutations distributed across the genome is much more suggestive of evolution through an animal intermediate.

Super Villain Interlude II: Electric Boogaloo

If I don my super villain costume again, to cover my tracks and make this look convincing I need to identify and isolate the bat corona virus, produce cDNA from that virus, develop a system to produce and study my new virus in a lab separate from current published methods, perform extensive computational modelling to identify a previously unreported binding mode for the spike protein, and then add in thousands of innocuous mutations without impairing the virus. Is all this possible? Of course, we have the technology as I explained earlier. But is it likely? Not really. This would take a large team of world leading experts from several different fields working for years in complete secrecy at the cutting edge of molecular biology. At this point were entering that rocky ground from earlier where the justification for the conspiracy theory is getting complex to the point of near impossibility.  

Adding Sugar to a Virus Makes it Worse?

Next up, a polybasic furin cleavage site and O-linked glycans! Or, in English, some other stuff that makes SARS-CoV-2 more infectious. Part of SARS-CoV-2’s spike protein has a sequence made up of two different amino acids (RRAR) which is recognised and cut by a protease (a protein cutting enzyme) called furin. Cutting this sequence is predicted to be a key factor in virus binding to and gaining entry to cells. These sites are a signature of other highly infectious avian influenza viruses; affecting the pathogenicity of the virus and the hosts the viruses can infect. Natural selection of these sites can allow it to jump between species and turn a low-level pathogen to a highly pathogenic, ‘we-should-all-be-worried, “it’s over 9000”’-level pathogen.

What does that have to do with glycans? When furin cleaves the spike protein it makes two new sites either side of the cut, which scientist have predicted to be targets for O-linked glycosylation (attachment of a type of sugar to oxygen atoms on a protein). But what do these glycans even do? Well, we don’t exactly know yet for SARS-CoV-2. But we do know from experience that O-linked glycosylation can be used by viruses to avoid the immune system.

So, what does this cleavage site tell us about the possibility of making corona virus in a lab? The development of the furin cleavage site and the prediction of glycans also help us put this conspiracy theory to rest. Such cleavage sites are typically the result of a low-pathogenicity virus interacting with an immune system over many generations. Of course, we have the technology to add in the RRAR sequence into our hypothetical cDNA virus genome no problem, but accurately predicting where to put that site is a wholly different challenge. Natural selection in viruses can manage this by rolling the dice many millions of times until a random change, or more likely changes, grant such a significant advantage that a dominant version of the virus is selected out; a process that has been observed previously with influenza and furin cleavage sites. If you want a cleavage site for your new lab made virus, your best bet is to isolate a genetically similar virus and expose it repeatedly animals with ACE2 receptors akin to human ACE2. Cell culture wouldn’t cut it as interaction with an immune system is the driving factor in these changes, and we’ve already seen that rats and mice aren’t a viable system from the computational modelling. A piece of work on this scale represents a considerable time sink and monetary investment in an inefficient process which relies on roll of the dice to provide the desired results. As we have observed this evolutionary behaviour before in nature it stands to reason that the furin cleavage site is the result of natural selection and not deliberate manipulation.


We’ve covered a lot of ground from abductive reasoning and a young Earth to molecular biology and furin cleavage sites in our quest to unpick this conspiracy theory. As more studies are published the specifics of this may change, but, barring a colossal government coverup being unmasked, the involvement of deliberate manipulation in a lab appears unlikely. The evidence suggests the virus originated in bats, but it is highly unlikely the bat virus (RaTG13-CoV) is the direct precursor to SARS-CoV-2. Our best candidate for an intermediate species comes from a pangolin coronavirus which has been found to share the five mutations in the spike protein that facilitate ACE2 binding, but not the furin cleavage site11. We have shown that it is indeed possible to make our own viruses in a lab, but SARS-CoV-2’s backbone doesn’t match up with any of the currently available reverse genetic systems so this is unlikely to be a factor. And finally, looking deeper into the genome of SARS-CoV-2 we see ample evidence of natural selection across the whole viral genome, not just in the spike protein’s binding region, and the appearance of a furin cleavage site; a well-documented naturally selected phenomenon observed in viruses previously. Based on the available evidence, discounting any secret data that may be being held hostage in a secret lair hidden in a volcano, we come to the most logical and simple answer. SARS-CoV-2 was most likely not made in a lab but evolved naturally from bat coronavirus via an animal intermediate, possibly pangolins.

Further Reading


I would like to thank a number of people for help with the writing of this post, Harri Webb for acting as a fire breathing alpaca wrangler, Mary Cruise for proof reading and suggestions, Thomas Splettstöße for the figures that look professionally made, and the members of the Coronavirus structural taskforce, particularly Alex Payne, Dale Tronrud, and Andrea Thorn for all their help and suggestions.

Proteins are complex and fickle molecules. Experimental structure determination can teach us a lot about their function, but this is not the easiest thing to do. It’s not as simple as looking through a microscope, focussing, and taking a picture of the protein. It’s more like when you have a broken arm and the doctor uses X-rays to see what is inside, with the minor caveat that you first need to remove your arm from the rest of your limbs, crystallise your arm with thousands of other identical copies, and THEN shoot the high energy X-rays at the crystal. After successfully navigating a whole lot of maths, and you can finally get a protein structure.

Different Methods

Experimentally determined corona virus protein structures can come from three sources: X-ray crystallography, electron cryo microscopy (cryo-EM) and solution nuclear magnetic resonance (NMR). Each experimental technique has its own advantages and disadvantages. These structural techniques can also be complemented with a number of other techniques such as mass spectrometry, chemical cross-linking, fluorescence resonance energy transfer, and genetics, to fill in the finer details around structure and function.


If you are looking to find the structure of a “big” molecule (although it's still quite small, really) electron microscopy can help you. Unlike in other techniques, the biomolecule of interest is imaged directly using a beam of electrons and a system of lenses. The complicated bit is turning these 2D pictures into 3D objects. This can be achieved by imaging thousands of the same kind of biomolecules in different orientations, so that we can reconstruct it in 3D. Although Cryo-EM is historically considered a lower resolution technique, recent technological advances have brought about a resolution revolution, and some structures are almost as detailed as those from X-ray crystallography. As a result, Cryo-EM can now show us amino acid sidechains, surface water molecules, and non-covalently bound ligands, which were previously the purview of X-ray Crystallography alone.

Formation of a Cryo-EM structure. (A): Picture of thousand different angles of the strcuture, (B): Averages of the 16 most populated classes out of 118,556 selected particles, (C): 3.3 Å map of entire complex.
Original pictures from: Matthies, D., Bae, C., Toombes, G.E., Fox, T., Bartesaghi, A., Subramaniam, S., Swartz, K.J. (2018) Life 2018;7:e37558, edited by Ferdinand Kirsten, License: CC BY-ND 2.0
Formation of a Cryo-EM structure. (A): Picture of thousand different angles of the strcuture, (B): Averages of the 16 most populated classes out of 118,556 selected particles, (C): 3.3 Å map of entire complex.
Original pictures from: Matthies, D., Bae, C., Toombes, G.E., Fox, T., Bartesaghi, A., Subramaniam, S., Swartz, K.J. (2018) Life 2018;7:e37558, edited by Ferdinand Kirsten, License: CC BY-ND 2.0


An important step in NMR-spectroscopy is the so-called isotope enrichment. While the typical MRI at your doctors just measures the basic whereabouts of the atom nuclei in a certain tissue, this method can help identify the distribution of carbon atoms in a structure. However, this requires that some carbons must differ from others. Different isotopes of carbon, with different numbers of neutrons in their nuclei, are incorporated into the protein in the purification process. After this purification, the protein is suspended in a strong magnetic field and is probed with radio waves. The distinctive resonance of each isotope is then analysed, yielding information about the whereabouts of the different carbon nuclei and revealing the distances and possible connections between them. Using the knowledge about these distances, scientists can solve the Sudoku-like puzzle to generate an atomic model of the protein. This method only works for small to medium sized proteins, as larger structures cause problems with overlapping peaks in the resonance spectra. On the other hand, NMR-spectroscopy has a major advantage in its ability to measure flexible proteins in solution instead of solid states, which may hinder molecular movement.

NMR visualisation: Some of the restraints used to solve the structure of a small monomeric hemoglobin are shown here, using software from the BioMagResBank1. The protein (1vre and 1vrf) is shown in green, and restraints are shown in yellow.
Visualisation of a data-Set obtained through NMR-spectroscopy. Some of the restraints used to solve the structure of hemoglobin are shown here, using a specific software. The protein (PDB: 1vre and 1vrf) is shown in green, and restraints are shown in yellow.
Picture courtesy of

X-ray crystallopraphy

Of the methods discussed here, X-ray crystallography has produced the most structures to date, totalling 145 252 in the PDB compared to 12 965 from NMR and only 4 926 from Cryo-EM. However, X-ray crystallography has a major drawback: the need for a protein crystal. While this common method can provide very detailed atomic information, showing every atom of each amino acid and even of ligands, inhibitors, ions and other molecules included in the structure, the process of crystallization is difficult and might limit which type of protein that is studied. Purifying a protein for crystallisation has become much more straightforward in recent years but is still a non-trivial task. After purification, the production of a protein crystal from can take months to years before structure-worthy data can be measured from it. In particular flexible proteins are much harder to crystallise. As enzymes or receptors, a lot of proteins rely on movable parts and different conformations to fully operate, so unfortunately, interesting proteins are often flexible! Once produced, the crystal is cooled in liquid nitrogen and subjected to an intense X-ray beam. You can compare this to a cystal that you hold to the light and observe its reflections on a wall. In this case, the X-rays hit the protein and get diffracted in a specific pattern, but the diffracted rays cannot give a picture of the crystal, but need to be interpreted with a structural model. The distribution of electrons can be calculated from this pattern, resulting in an electron density map with the estimated location of each atom.

Basic Workflow of R-ray crystallography, from crystal to atomic model. Crystal by Andrea Thorn, Diffraction pattern by Sabrina Stäb, picture by Ferdinand Kirsten
Basic Workflow of X-ray crystallography. The protein crystal ist depicted to a diffraction pattern from X-ray waves. The pattern is then interpreted and solved into an electron density map with mathematical algorythms. An atomic model can be estimated and refined based on this map.
Crystal by Andrea Thorn, Diffraction pattern by Sabrina Stäb, image by Ferdinand Kirsten.


The molecular models obtained by these methods open up numerous possibilities: structure-based drug design, computational dynamics simulations and answers to biological questions. But how can we interpret and refine those models to extract every last biological detail? This will be discussed in the next blog entry.

Interested? Learn more about it:

Pharmaceutical drugs can be found by chance, but today, most so-called active pharmaceutical ingredients (APIs) are developed through a long, iterative process of designing and testing them.

Targets and active ingredients

Most medicinal drugs are small molecules with up to 70 atoms, which bind in the body to larger molecules, or macromolecules. These so-called targets are typically proteins (long chains of amino acids), RNA, DNA (long chains of nucleotides) or carbohydrates (long chains of sugars). However, most targets are proteins.

Proteins are the workhorses of all living organisms: fungi, animals, plants, bacteria and even viruses (sic!) utilize them. They are the tools that allow us to digest what we eat, enable cell division, and form muscles and hair. One reason why our diet needs to contain proteins is that they are disassembled into amino acids from which new proteins can be made. Some amino acids we can’t make ourselves, so we HAVE to get them from our diet - these are the “essential” amino acids. To properly function, our body needs to be able to use a large variety of proteins - our genes encode at least 20,000 of them​1​!

An example: How Aspirin works

Our bodies require the proper function of proteins to live, but not all proteins are good for you: For chronic illnesses or metabolic disorders, diminishing the activity of certain proteins might be advantageous. A good example is cyclooxygenase-II or COX for short. This protein is formed when cells are injured or during inflammation and it catalyzes an important step in the production of pain mediators, called prostaglandines​2​. Without cyclooxygenase-II, no pain mediators can be produced.

Acetylsalicylic acid - also known as ASS or Aspirin – binds to cyclooxygenase-II and stops it from working (see image below). As a consequence, your body stops producing those pain mediators, and your pain is relieved*. In this case, cyclooxygenase-II is the target and ASS the active pharmaceutical ingredient.

inhibiion of COX by aspirin
Left: cyclooxygenase-II, a protein that produces pain mediating prostaglandines. Right: acetylation (4 red atoms) by acetyl salicylic acid (aspirin) blocks the channel into the catalytic center, so that cyclooxygenase-II can no longer produce pain mediators. Its effect only ends when the body has produced new cyclooxygenase-II (a few hours). Image by Andrea Thorn.

Rational drug design

There are two major methods in rational drug design:

Indirect or ligand-based drug design utilizes molecules which are similar to known active pharmaceutical ingredients in their shape and charge. These molecules are then tested for binding and/or inhibition of the target, or, just as often, a resulting change in some biological parameter of interest, such as the killing of a virus!

Direct or structure-based drug design utilizes knowledge about the target. The potential API is chosen to bind to the target - for example, in the case of Cyclooxgenase-II, to fit in the channel (see image). In fragment based drug design, several smaller molecules are bound to sites in the target and with this knowledge, an active pharmaceutical ingredient is designed that combines their properties.

For both of these methods a pre-selection of potential molecules can be done by computer-aided drug design. However, for strutcure-based drug design, the macromolecular structure of the target must be known.

The Coronavirus Structural Task Force supports the search for a drug against COVID-19 by validating and, where possible, improving the macromolecular structures of potential coronavirus targets. We also offer drug designers information about different SARS-Cov-2 and SARS-CoV macromolecules. With this, we hope to do our part in the fight against COVID-19.

(blog header image: aspirin tablets by Ragesoss, Wikimedia Commons / license: CC 4.0)

* Unfortunately, instead of prostaglandines, the body then produces more leucotrienes, which can cause asthma attacks - hence, asthma patients should only take aspirin after consulting their GP.

  1. 1.
    Ponomarenko EA, Poverennaya EV, Ilgisonis EV, et al. The Size of the Human Proteome: The Width and Depth. International Journal of Analytical Chemistry. Published online 2016:1-6. doi:10.1155/2016/7436849
  2. 2.
    Ricciotti E, FitzGerald GA. Prostaglandins and Inflammation. Arterioscler Thromb Vasc Biol. Published online May 2011:986-1000. doi:10.1161/atvbaha.110.207449

SARS-CoV-2: Not new, but different

The novel Coronavirus (2019‐nCoV) is classified as a large positive sense single stranded RNA-Virus from the family of betacoronaviruses. It shows high genetic similarity to SARS‐CoV and MERS‐CoV and is even closer related to the Bat-SARS-like corona virus, from which it most likely evolved. Even though it shows a lot of similarities to its ancestors, further insights in the infection mechanism and the structure of its proteins reveal significant differences.

But what is in it?

Like most RNA-viruses, the virus has a lipidic hull, with envelope and other proteins integrated in it. This viral shell is responsible for the interaction with host cells and the protection of the inner parts, most importantly: the viral RNA. This RNA acts as a direct template for the translation of two polyproteins named pp1a and pp1ab which encode the 16 non-structural proteins (nsps) of the replication‐transcription complex (RTC). Those 16 nsps, encoded by about two third of the genome (in terms of length), are cleaved from the polyprotein by the chymotrypsin‐like protease (3CLpro) (=Main protease) and one or two papain‐like proteases to generate the functional single proteins. As a result, the RTC synthesizes a variety of subgenomic RNAs (sgRNAs) in a discontinuous transcription, which serve as templates to produce subgenomic mRNA. Other open reading frames of the genome encode at least four structural proteins, that are necessary for the assembly of the virions, the hull and the infection of cells (called S-, M-,E- and N-protein for spike, membrane, envelope and nucleocapsid).

Visualisation of the SARS-CoV-2 structure.  The envelope-protein E, membrane protein M and spike-protein S bound to the viral envelope, the Nucleocapsin-protein N and single-stranded RNA inside. Image: Thomas Splettstoesser;
Visualisation of the SARS-CoV-2 structure. The envelope-protein E, membrane protein M and spike-protein S bound to the viral envelope, the Nucleocapsin-protein N and single-stranded RNA inside. Image: Thomas Splettstoesser;

Making contact

The majority of infected cells are ACE2 (Angiotensin-converting enzyme 2)-bearing cells of the respiratory system. The viral mRNA is introduced through endocytosis via the spike glycoprotein of the Coronavirus. What does this mean? The S- or spike protein which forms the "corona" around the virus binds with its receptor-binding domain (RBD) to the receptor, which is located on the surface of the host cells. Afterwards, the virus can merge with the cell through a complicated mechanism, the so-called endocytosis. Once infected, these cells now act as a multiplicator for the virus which provokes a strong reaction of the immune system. Most common symptoms include cough, fever, fatigue, loss of taste, headache, diarrhoea, dyspnoea, and lymphopenia or pneumonia, even causing death of the patient in severe cases.

Crystal structure of spike protein receptor-binding domain from SARS coronavirus epidemic strain complexed with human-civet chimeric receptor ACE2, picture by Ferdinand Kirsten
Crystal structure of spike protein receptor-binding domain (RBD) from SARS coronavirus epidemic strain (2002-2003) (magenta) complexed with human-civet chimeric receptor ACE2 (green). The green bit is usually bound to the host cell and the magenta bit is at the top of the spike on the outside of the virus. PDB: 3SCL, picture by Ferdinand Kirsten

The structure of the virus, its infection mechanism and multiplication offer numerous possibilities for drug targeting, such as the inhibition of the main protease or the polymerases, the disturbance of the assembly of shell and entry proteins or the replication‐transcription complex and direct mRNA antiviral methods. However, none of them has been proven effective in clinical studies to this point.

Learn more:

Form follows function

Proteins are big molecules, ranging from 400 to 20 000 atoms. They are the work horses of the living world – they break down what you eat, build your muscles, organise cell division, make up hair and skin. They are formed from amino acids as a long chain that then cross-links and folds into the functional molecule. The sequence of the amino acids – there are 20 different ones – determines the fold, but in many cases, we cannot predict it: it is too complicated (yet), so that we have to determine the molecule’s shape experimentally.

Molecular model of Penicillin by Dorothy Hodgkin
Molecular model of Penicillin by Dorothy Hodgkin with electron density, ca. 1945. Picture courtesy of

If we learn the structure of a protein, we can understand how it works and what it does. The corona virus encodes its own proteins which are made by human cells when infected. These proteins interact with human proteins, and hence, understanding them is crucial: disabling the viral proteins, or disrupting their interactions with the human host can stop the infection, and permit us to fight the virus and gain the upper hand.

But how do you even measure and visualize something so small? Good question! After the difficult task of making a crystal from such large (and somewhat floppy) molecules and shooting it with X-rays like a madman, the data get interpreted with a model of the structure.

Growth of known protein structures
Growth in the number and complexity of structures in the Protein Data Bank (PDB; courtesy of the RCSB Protein Data Bank

But even if you have a model, it is really difficult to see anything. There are too many atoms. The increase in the size of molecular structures we can determine experimentally, and their growing number necessitated a better way to visualize them. A big step to solve that problem was taken by Jane Richardson of Duke University, when she created the Ribbon Diagram in 1980.

The Ribbon Diagram

"I don't see how you could possibly describe a protein structure in a thousand words, but you can come a lot closer with one picture."

- Jane Richardson

These three-dimensional, schematic representations are the most common visualization of protein structures. The ribbon shows the backbone (= amino acid chain) of the protein. Depending on the basic fold, determined by so-called hydrogen bonds, the peptide chain can be separated into one of three categories, so called secondary structures: α-helices, β-sheets and loops. These are then shown as ribbon helices, arrows (to indicate parallel β-sheets, in which all amino acid chains go in the same direction, or anti-parallel β-sheets, in which they have alternating directions). Loops are shown as a tube with a smaller diameter than α-helices or β-sheets. Any additional features can then be added, as well as labelling.

Different visualizations of an alpha-helix
Different visualizations of an alpha-helix: The backbone shown as sticks and simplified as a ribbon or cartoon. Picture by Ferdinand Kirsten.
Different visualisations of an anti-parallel beta-sheet
Different visualisations of an anti-parallel beta-sheet with a loop connection its two strands. The backbone shown as sticks and simplified as a ribbon or cartoon. Picture by Ferdinand Kirsten.

Looking at macromolecular structures with visualizations like the ribbon diagram can provide a good overall view of the protein’s inner conformation, its symmetry and possible interaction and binding sites. Different structural features promote different functions in proteins. A uniform presentation of gathered data is the key to better understanding and comparing these small machines, which work in and around us at every second.

Three views of PDB 6vxs
The ADP-ribose-phosphatase of NSP3 from SARS CoV-2 portrayed from different angles (Protein Data bank entry 6vxs). Picture by Ferdinand Kirsten.

Regarding the current problem child – the SARS-Coronavirus-2 (or hCoV2019), ribbon diagrams of viral proteins reveal numerous insights into the structure and function of different parts of the virus and its infection of host cells. While making the atomic model to begin with is a difficult task in its own right, the contributions of Jane Richardson and others form how we, and every new generation of molecular biologists, perceives and thinks of macromolecules and the molecular basis of life.

Got interested? Read more about this at:

Coronavirus Structural Taskforce