Coronavirus
Structural Task Force

How can we measure the structures of macromolecules?

Proteins are complex and fickle molecules. Experimental structure determination can teach us a lot about their function, but this is not the easiest thing to do. It’s not as simple as looking through a microscope, focussing, and taking a picture of the protein. It’s more like when you have a broken arm and the doctor uses X-rays to see what is inside, with the minor caveat that you first need to remove your arm from the rest of your limbs, crystallise your arm with thousands of other identical copies, and THEN shoot the high energy X-rays at the crystal. After successfully navigating a whole lot of maths, and you can finally get a protein structure.

Different Methods

Experimentally determined corona virus protein structures can come from three sources: X-ray crystallography, electron cryo microscopy (cryo-EM) and solution nuclear magnetic resonance (NMR). Each experimental technique has its own advantages and disadvantages. These structural techniques can also be complemented with a number of other techniques such as mass spectrometry, chemical cross-linking, fluorescence resonance energy transfer, and genetics, to fill in the finer details around structure and function.

Cryo-EM

If you are looking to find the structure of a “big” molecule (although it's still quite small, really) electron microscopy can help you. Unlike in other techniques, the biomolecule of interest is imaged directly using a beam of electrons and a system of lenses. The complicated bit is turning these 2D pictures into 3D objects. This can be achieved by imaging thousands of the same kind of biomolecules in different orientations, so that we can reconstruct it in 3D. Although Cryo-EM is historically considered a lower resolution technique, recent technological advances have brought about a resolution revolution, and some structures are almost as detailed as those from X-ray crystallography. As a result, Cryo-EM can now show us amino acid sidechains, surface water molecules, and non-covalently bound ligands, which were previously the purview of X-ray Crystallography alone.

Formation of a Cryo-EM structure. (A): Picture of thousand different angles of the strcuture, (B): Averages of the 16 most populated classes out of 118,556 selected particles, (C): 3.3 Å map of entire complex.
Original pictures from: Matthies, D., Bae, C., Toombes, G.E., Fox, T., Bartesaghi, A., Subramaniam, S., Swartz, K.J. (2018) Life 2018;7:e37558, edited by Ferdinand Kirsten, License: CC BY-ND 2.0
Formation of a Cryo-EM structure. (A): Picture of thousand different angles of the strcuture, (B): Averages of the 16 most populated classes out of 118,556 selected particles, (C): 3.3 Å map of entire complex.
Original pictures from: Matthies, D., Bae, C., Toombes, G.E., Fox, T., Bartesaghi, A., Subramaniam, S., Swartz, K.J. (2018) Life 2018;7:e37558, edited by Ferdinand Kirsten, License: CC BY-ND 2.0

NMR-spectroscopy

An important step in NMR-spectroscopy is the so-called isotope enrichment. While the typical MRI at your doctors just measures the basic whereabouts of the atom nuclei in a certain tissue, this method can help identify the distribution of carbon atoms in a structure. However, this requires that some carbons must differ from others. Different isotopes of carbon, with different numbers of neutrons in their nuclei, are incorporated into the protein in the purification process. After this purification, the protein is suspended in a strong magnetic field and is probed with radio waves. The distinctive resonance of each isotope is then analysed, yielding information about the whereabouts of the different carbon nuclei and revealing the distances and possible connections between them. Using the knowledge about these distances, scientists can solve the Sudoku-like puzzle to generate an atomic model of the protein. This method only works for small to medium sized proteins, as larger structures cause problems with overlapping peaks in the resonance spectra. On the other hand, NMR-spectroscopy has a major advantage in its ability to measure flexible proteins in solution instead of solid states, which may hinder molecular movement.

NMR visualisation: Some of the restraints used to solve the structure of a small monomeric hemoglobin are shown here, using software from the BioMagResBank1. The protein (1vre and 1vrf) is shown in green, and restraints are shown in yellow. https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/methods-for-determining-structure
Visualisation of a data-Set obtained through NMR-spectroscopy. Some of the restraints used to solve the structure of hemoglobin are shown here, using a specific software. The protein (PDB: 1vre and 1vrf) is shown in green, and restraints are shown in yellow.
Picture courtesy of PDB101.rcsb.org

X-ray crystallopraphy

Of the methods discussed here, X-ray crystallography has produced the most structures to date, totalling 145 252 in the PDB compared to 12 965 from NMR and only 4 926 from Cryo-EM. However, X-ray crystallography has a major drawback: the need for a protein crystal. While this common method can provide very detailed atomic information, showing every atom of each amino acid and even of ligands, inhibitors, ions and other molecules included in the structure, the process of crystallization is difficult and might limit which type of protein that is studied. Purifying a protein for crystallisation has become much more straightforward in recent years but is still a non-trivial task. After purification, the production of a protein crystal from can take months to years before structure-worthy data can be measured from it. In particular flexible proteins are much harder to crystallise. As enzymes or receptors, a lot of proteins rely on movable parts and different conformations to fully operate, so unfortunately, interesting proteins are often flexible! Once produced, the crystal is cooled in liquid nitrogen and subjected to an intense X-ray beam. You can compare this to a cystal that you hold to the light and observe its reflections on a wall. In this case, the X-rays hit the protein and get diffracted in a specific pattern, but the diffracted rays cannot give a picture of the crystal, but need to be interpreted with a structural model. The distribution of electrons can be calculated from this pattern, resulting in an electron density map with the estimated location of each atom.

Basic Workflow of R-ray crystallography, from crystal to atomic model. Crystal by Andrea Thorn, Diffraction pattern by Sabrina Stäb, picture by Ferdinand Kirsten
Basic Workflow of X-ray crystallography. The protein crystal ist depicted to a diffraction pattern from X-ray waves. The pattern is then interpreted and solved into an electron density map with mathematical algorythms. An atomic model can be estimated and refined based on this map.
Crystal by Andrea Thorn, Diffraction pattern by Sabrina Stäb, image by Ferdinand Kirsten.

Prospect

The molecular models obtained by these methods open up numerous possibilities: structure-based drug design, computational dynamics simulations and answers to biological questions. But how can we interpret and refine those models to extract every last biological detail? This will be discussed in the next blog entry.

Interested? Learn more about it:

SARS-CoV-2: Not new, but different

The novel Coronavirus (2019‐nCoV) is classified as a large positive sense single stranded RNA-Virus from the family of betacoronaviruses. It shows high genetic similarity to SARS‐CoV and MERS‐CoV and is even closer related to the Bat-SARS-like corona virus, from which it most likely evolved. Even though it shows a lot of similarities to its ancestors, further insights in the infection mechanism and the structure of its proteins reveal significant differences.

But what is in it?

Like most RNA-viruses, the virus has a lipidic hull, with envelope and other proteins integrated in it. This viral shell is responsible for the interaction with host cells and the protection of the inner parts, most importantly: the viral RNA. This RNA acts as a direct template for the translation of two polyproteins named pp1a and pp1ab which encode the 16 non-structural proteins (nsps) of the replication‐transcription complex (RTC). Those 16 nsps, encoded by about two third of the genome (in terms of length), are cleaved from the polyprotein by the chymotrypsin‐like protease (3CLpro) (=Main protease) and one or two papain‐like proteases to generate the functional single proteins. As a result, the RTC synthesizes a variety of subgenomic RNAs (sgRNAs) in a discontinuous transcription, which serve as templates to produce subgenomic mRNA. Other open reading frames of the genome encode at least four structural proteins, that are necessary for the assembly of the virions, the hull and the infection of cells (called S-, M-,E- and N-protein for spike, membrane, envelope and nucleocapsid).

Visualisation of the SARS-CoV-2 structure.  The envelope-protein E, membrane protein M and spike-protein S bound to the viral envelope, the Nucleocapsin-protein N and single-stranded RNA inside. Image: Thomas Splettstoesser; www.scistyle.com
Visualisation of the SARS-CoV-2 structure. The envelope-protein E, membrane protein M and spike-protein S bound to the viral envelope, the Nucleocapsin-protein N and single-stranded RNA inside. Image: Thomas Splettstoesser; www.scistyle.com

Making contact

The majority of infected cells are ACE2 (Angiotensin-converting enzyme 2)-bearing cells of the respiratory system. The viral mRNA is introduced through endocytosis via the spike glycoprotein of the Coronavirus. What does this mean? The S- or spike protein which forms the "corona" around the virus binds with its receptor-binding domain (RBD) to the receptor, which is located on the surface of the host cells. Afterwards, the virus can merge with the cell through a complicated mechanism, the so-called endocytosis. Once infected, these cells now act as a multiplicator for the virus which provokes a strong reaction of the immune system. Most common symptoms include cough, fever, fatigue, loss of taste, headache, diarrhoea, dyspnoea, and lymphopenia or pneumonia, even causing death of the patient in severe cases.

Crystal structure of spike protein receptor-binding domain from SARS coronavirus epidemic strain complexed with human-civet chimeric receptor ACE2, picture by Ferdinand Kirsten
Crystal structure of spike protein receptor-binding domain (RBD) from SARS coronavirus epidemic strain (2002-2003) (magenta) complexed with human-civet chimeric receptor ACE2 (green). The green bit is usually bound to the host cell and the magenta bit is at the top of the spike on the outside of the virus. PDB: 3SCL, picture by Ferdinand Kirsten

The structure of the virus, its infection mechanism and multiplication offer numerous possibilities for drug targeting, such as the inhibition of the main protease or the polymerases, the disturbance of the assembly of shell and entry proteins or the replication‐transcription complex and direct mRNA antiviral methods. However, none of them has been proven effective in clinical studies to this point.

Learn more:

Form follows function

Proteins are big molecules, ranging from 400 to 20 000 atoms. They are the work horses of the living world – they break down what you eat, build your muscles, organise cell division, make up hair and skin. They are formed from amino acids as a long chain that then cross-links and folds into the functional molecule. The sequence of the amino acids – there are 20 different ones – determines the fold, but in many cases, we cannot predict it: it is too complicated (yet), so that we have to determine the molecule’s shape experimentally.

Molecular model of Penicillin by Dorothy Hodgkin
Molecular model of Penicillin by Dorothy Hodgkin with electron density, ca. 1945. Picture courtesy of https://proteopedia.org/wiki/index.php/Molecular_sculpture

If we learn the structure of a protein, we can understand how it works and what it does. The corona virus encodes its own proteins which are made by human cells when infected. These proteins interact with human proteins, and hence, understanding them is crucial: disabling the viral proteins, or disrupting their interactions with the human host can stop the infection, and permit us to fight the virus and gain the upper hand.

But how do you even measure and visualize something so small? Good question! After the difficult task of making a crystal from such large (and somewhat floppy) molecules and shooting it with X-rays like a madman, the data get interpreted with a model of the structure.

Growth of known protein structures
Growth in the number and complexity of structures in the Protein Data Bank (PDB; courtesy of the RCSB Protein Data Bank http://www.pdb.org/pdb/home)

But even if you have a model, it is really difficult to see anything. There are too many atoms. The increase in the size of molecular structures we can determine experimentally, and their growing number necessitated a better way to visualize them. A big step to solve that problem was taken by Jane Richardson of Duke University, when she created the Ribbon Diagram in 1980.

The Ribbon Diagram

"I don't see how you could possibly describe a protein structure in a thousand words, but you can come a lot closer with one picture."

- Jane Richardson

These three-dimensional, schematic representations are the most common visualization of protein structures. The ribbon shows the backbone (= amino acid chain) of the protein. Depending on the basic fold, determined by so-called hydrogen bonds, the peptide chain can be separated into one of three categories, so called secondary structures: α-helices, β-sheets and loops. These are then shown as ribbon helices, arrows (to indicate parallel β-sheets, in which all amino acid chains go in the same direction, or anti-parallel β-sheets, in which they have alternating directions). Loops are shown as a tube with a smaller diameter than α-helices or β-sheets. Any additional features can then be added, as well as labelling.

Different visualizations of an alpha-helix
Different visualizations of an alpha-helix: The backbone shown as sticks and simplified as a ribbon or cartoon. Picture by Ferdinand Kirsten.
Different visualisations of an anti-parallel beta-sheet
Different visualisations of an anti-parallel beta-sheet with a loop connection its two strands. The backbone shown as sticks and simplified as a ribbon or cartoon. Picture by Ferdinand Kirsten.

Looking at macromolecular structures with visualizations like the ribbon diagram can provide a good overall view of the protein’s inner conformation, its symmetry and possible interaction and binding sites. Different structural features promote different functions in proteins. A uniform presentation of gathered data is the key to better understanding and comparing these small machines, which work in and around us at every second.

Three views of PDB 6vxs
The ADP-ribose-phosphatase of NSP3 from SARS CoV-2 portrayed from different angles (Protein Data bank entry 6vxs). Picture by Ferdinand Kirsten.

Regarding the current problem child – the SARS-Coronavirus-2 (or hCoV2019), ribbon diagrams of viral proteins reveal numerous insights into the structure and function of different parts of the virus and its infection of host cells. While making the atomic model to begin with is a difficult task in its own right, the contributions of Jane Richardson and others form how we, and every new generation of molecular biologists, perceives and thinks of macromolecules and the molecular basis of life.

Got interested? Read more about this at:

https://iubmb.onlinelibrary.wiley.com/doi/full/10.1002/bmb.2002.494030010005

https://research.duke.edu/ribbon-diagrams

https://blogs.sciencemag.org/pipeline/archives/2018/11/05/hail-to-the-ribbon

Coronavirus Structural Taskforce
Top