Proteins are complex and fickle molecules. Experimental structure determination can teach us a lot about their function, but this is not the easiest thing to do. It’s not as simple as looking through a microscope, focussing, and taking a picture of the protein. It’s more like when you have a broken arm and the doctor uses X-rays to see what is inside, with the minor caveat that you first need to remove your arm from the rest of your limbs, crystallise your arm with thousands of other identical copies, and THEN shoot the high energy X-rays at the crystal. After successfully navigating a whole lot of maths, and you can finally get a protein structure.
Experimentally determined corona virus protein structures can come from three sources: X-ray crystallography, electron cryo microscopy (cryo-EM) and solution nuclear magnetic resonance (NMR). Each experimental technique has its own advantages and disadvantages. These structural techniques can also be complemented with a number of other techniques such as mass spectrometry, chemical cross-linking, fluorescence resonance energy transfer, and genetics, to fill in the finer details around structure and function.
If you are looking to find the structure of a “big” molecule (although it's still quite small, really) electron microscopy can help you. Unlike in other techniques, the biomolecule of interest is imaged directly using a beam of electrons and a system of lenses. The complicated bit is turning these 2D pictures into 3D objects. This can be achieved by imaging thousands of the same kind of biomolecules in different orientations, so that we can reconstruct it in 3D. Although Cryo-EM is historically considered a lower resolution technique, recent technological advances have brought about a resolution revolution, and some structures are almost as detailed as those from X-ray crystallography. As a result, Cryo-EM can now show us amino acid sidechains, surface water molecules, and non-covalently bound ligands, which were previously the purview of X-ray Crystallography alone.
An important step in NMR-spectroscopy is the so-called isotope enrichment. While the typical MRI at your doctors just measures the basic whereabouts of the atom nuclei in a certain tissue, this method can help identify the distribution of carbon atoms in a structure. However, this requires that some carbons must differ from others. Different isotopes of carbon, with different numbers of neutrons in their nuclei, are incorporated into the protein in the purification process. After this purification, the protein is suspended in a strong magnetic field and is probed with radio waves. The distinctive resonance of each isotope is then analysed, yielding information about the whereabouts of the different carbon nuclei and revealing the distances and possible connections between them. Using the knowledge about these distances, scientists can solve the Sudoku-like puzzle to generate an atomic model of the protein. This method only works for small to medium sized proteins, as larger structures cause problems with overlapping peaks in the resonance spectra. On the other hand, NMR-spectroscopy has a major advantage in its ability to measure flexible proteins in solution instead of solid states, which may hinder molecular movement.
Of the methods discussed here, X-ray crystallography has produced the most structures to date, totalling 145 252 in the PDB compared to 12 965 from NMR and only 4 926 from Cryo-EM. However, X-ray crystallography has a major drawback: the need for a protein crystal. While this common method can provide very detailed atomic information, showing every atom of each amino acid and even of ligands, inhibitors, ions and other molecules included in the structure, the process of crystallization is difficult and might limit which type of protein that is studied. Purifying a protein for crystallisation has become much more straightforward in recent years but is still a non-trivial task. After purification, the production of a protein crystal from can take months to years before structure-worthy data can be measured from it. In particular flexible proteins are much harder to crystallise. As enzymes or receptors, a lot of proteins rely on movable parts and different conformations to fully operate, so unfortunately, interesting proteins are often flexible! Once produced, the crystal is cooled in liquid nitrogen and subjected to an intense X-ray beam. You can compare this to a cystal that you hold to the light and observe its reflections on a wall. In this case, the X-rays hit the protein and get diffracted in a specific pattern, but the diffracted rays cannot give a picture of the crystal, but need to be interpreted with a structural model. The distribution of electrons can be calculated from this pattern, resulting in an electron density map with the estimated location of each atom.
The molecular models obtained by these methods open up numerous possibilities: structure-based drug design, computational dynamics simulations and answers to biological questions. But how can we interpret and refine those models to extract every last biological detail? This will be discussed in the next blog entry.
The novel Coronavirus (2019‐nCoV) is classified as a large positive sense single stranded RNA-Virus from the family of betacoronaviruses. It shows high genetic similarity to SARS‐CoV and MERS‐CoV and is even closer related to the Bat-SARS-like corona virus, from which it most likely evolved. Even though it shows a lot of similarities to its ancestors, further insights in the infection mechanism and the structure of its proteins reveal significant differences.
Like most RNA-viruses, the virus has a lipidic hull, with envelope and other proteins integrated in it. This viral shell is responsible for the interaction with host cells and the protection of the inner parts, most importantly: the viral RNA. This RNA acts as a direct template for the translation of two polyproteins named pp1a and pp1ab which encode the 16 non-structural proteins (nsps) of the replication‐transcription complex (RTC). Those 16 nsps, encoded by about two third of the genome (in terms of length), are cleaved from the polyprotein by the chymotrypsin‐like protease (3CLpro) (=Main protease) and one or two papain‐like proteases to generate the functional single proteins. As a result, the RTC synthesizes a variety of subgenomic RNAs (sgRNAs) in a discontinuous transcription, which serve as templates to produce subgenomic mRNA. Other open reading frames of the genome encode at least four structural proteins, that are necessary for the assembly of the virions, the hull and the infection of cells (called S-, M-,E- and N-protein for spike, membrane, envelope and nucleocapsid).
The majority of infected cells are ACE2 (Angiotensin-converting enzyme 2)-bearing cells of the respiratory system. The viral mRNA is introduced through endocytosis via the spike glycoprotein of the Coronavirus. What does this mean? The S- or spike protein which forms the "corona" around the virus binds with its receptor-binding domain (RBD) to the receptor, which is located on the surface of the host cells. Afterwards, the virus can merge with the cell through a complicated mechanism, the so-called endocytosis. Once infected, these cells now act as a multiplicator for the virus which provokes a strong reaction of the immune system. Most common symptoms include cough, fever, fatigue, loss of taste, headache, diarrhoea, dyspnoea, and lymphopenia or pneumonia, even causing death of the patient in severe cases.
The structure of the virus, its infection mechanism and multiplication offer numerous possibilities for drug targeting, such as the inhibition of the main protease or the polymerases, the disturbance of the assembly of shell and entry proteins or the replication‐transcription complex and direct mRNA antiviral methods. However, none of them has been proven effective in clinical studies to this point.
Proteins are big molecules, ranging from 400 to 20 000 atoms. They are the work horses of the living world – they break down what you eat, build your muscles, organise cell division, make up hair and skin. They are formed from amino acids as a long chain that then cross-links and folds into the functional molecule. The sequence of the amino acids – there are 20 different ones – determines the fold, but in many cases, we cannot predict it: it is too complicated (yet), so that we have to determine the molecule’s shape experimentally.
If we learn the structure of a protein, we can understand how it works and what it does. The corona virus encodes its own proteins which are made by human cells when infected. These proteins interact with human proteins, and hence, understanding them is crucial: disabling the viral proteins, or disrupting their interactions with the human host can stop the infection, and permit us to fight the virus and gain the upper hand.
But how do you even measure and visualize something so small? Good question! After the difficult task of making a crystal from such large (and somewhat floppy) molecules and shooting it with X-rays like a madman, the data get interpreted with a model of the structure.
But even if you have a model, it is really difficult to see anything. There are too many atoms. The increase in the size of molecular structures we can determine experimentally, and their growing number necessitated a better way to visualize them. A big step to solve that problem was taken by Jane Richardson of Duke University, when she created the Ribbon Diagram in 1980.
"I don't see how you could possibly describe a protein structure in a thousand words, but you can come a lot closer with one picture."- Jane Richardson
These three-dimensional, schematic representations are the most common visualization of protein structures. The ribbon shows the backbone (= amino acid chain) of the protein. Depending on the basic fold, determined by so-called hydrogen bonds, the peptide chain can be separated into one of three categories, so called secondary structures: α-helices, β-sheets and loops. These are then shown as ribbon helices, arrows (to indicate parallel β-sheets, in which all amino acid chains go in the same direction, or anti-parallel β-sheets, in which they have alternating directions). Loops are shown as a tube with a smaller diameter than α-helices or β-sheets. Any additional features can then be added, as well as labelling.
Looking at macromolecular structures with visualizations like the ribbon diagram can provide a good overall view of the protein’s inner conformation, its symmetry and possible interaction and binding sites. Different structural features promote different functions in proteins. A uniform presentation of gathered data is the key to better understanding and comparing these small machines, which work in and around us at every second.
Regarding the current problem child – the SARS-Coronavirus-2 (or hCoV2019), ribbon diagrams of viral proteins reveal numerous insights into the structure and function of different parts of the virus and its infection of host cells. While making the atomic model to begin with is a difficult task in its own right, the contributions of Jane Richardson and others form how we, and every new generation of molecular biologists, perceives and thinks of macromolecules and the molecular basis of life.
Got interested? Read more about this at: