Protein Structure
Principles & Architecture
Marjolein Thunnissen
Dep. of Biochemistry & Structural Biology
Lund University September 2011
Bioinformatics
Homology, pattern and 3D structure
searches need databases and
database managing tools, search
technique and dedicated tools for
sequence and structure comparison
and detection of similarity, for
homology modelling etc.
All this is the object of
bioinformatics
Why use bioinformatics?
•
An explosive growth in the amount
of biological information
•
A more global perspective in
experimental design.
•
Data-mining.
•
The potential for uncovering
phylogenetic relationships and
evolutionary patterns.
Role of (bio)informatics in drug
discovery
Genome Gene Protein HTS Hit Lead Candidate drug
Genomics
Bioinformatics
Structural bioinformatics Chemoinformatics
Structure-Based Drug Design ADMET Modelling
Structural bioinformatics techniques are valuable in
areas from target identification to lead discovery
06/27/12
Proteins:0What0do0they0do?
Receptors Hair0and0skin Muscles Transport0of0oxygen Growth0and0development DigesHon Defense0against0infecHons And0much0more….Proteins: What do they do?
• Carry out chemistry as enzymes • Recognition
• signal transduction • immune recognition • Capture and emit light
• Pump small and big molecules over membranes • Movements (muscles), use ATP as energy • RNA and DNA maintenance/ metabolism etc • Form hair, nail, cartilage, silk, spiderwebs • and more
More than 80% of medicines do bind to proteins!
6
Why study 3D structures of biological
macromolecules?
1.
FUNCTION IS STRUCTURE!
2.
Sequence homology is not enough to identify
functional relationships.
3.
Protein folding is still not fully understood.
Predictions do not work satisfactory.
4.
Drug design.
Pharmaceutical industry
Protein Structure Hierarchy
06/27/12
Proteins0are0polymers
Proteins are formed by a chain of repeating molecules. One such molecule is called an amino-acid. There are 20 types of amino-acids but they have all a
common backbone or main-chain.
In DNA code: 20 different amino-acids.
In proteins 20 different amino-acids are found. The names of the different aminoacids can be given as a 3 letter code or a 1 letter code:
Alanine ——> Ala ——> A
The amino-acids can be divided into sub-groups dependent on the nature of their side-chain.
Group1 Hydrophobic Ala (A), Val (V), Leu (L), Ile (I), Phe (F), Pro (P) and Met (M) Group2 Charged Asp (D), Glu (E), Arg (R), Lys (K) Group3 Polar Ser (S), Thr (T), Cys (C), Asn (N),
Gln (Q), His (H), Tyr (Y) and Trp (W) Group4 No special properties Gly (G)
Alternatively there is also a 5th group:
Group 5 Aromatic rings Phe (F), Tyr (Y), Trp (W) and His (H)
The 20 amino-acids: hydrophobic residues
Alanine (Ala, A) Valine (Val, V) Proline (Pro, P)
Isoleucine (Ile, I)
Leucine (Leu, L)
Phenylalanine (Phe, F) Methionine (Met, M)
The 20 amino-acids: charged residues
Arginine (Arg, R) Lysine (Lys, K) Aspartic acid (Asp, D) Glutamic acid (Glu, E)
The 20 amino-acids: polar residues
Serine (Ser, S) Threonine (Thr T) Tyrosine (Tyr, Y) Histidine (His, H)
Cysteine (Cys, C) Asparagine (Asn, N) Glutamine (Gln, Q) Tryptophan (Trp, W)
The 20 amino-acids: Glycine
Glycine, (Gly, G)
Special properties of amino-acids
Since there are 4 different groups attached to the central Cα atom of an amino-acid (except for Glycine), it is an asymmetric atom.
Amino acids are therefore chiral molecules. There are two forms: L-form and D-form:
The natural configuration of amino acids in proteins is always the L-form.
Structure in four dimensions
Due to the fact that there are 20 different amino-acids, proteins are described in different dimensions.
Primary Structure Amino-acid sequence.
Secondary Structure Local regular structure: α-helices and β-sheets. Tertiary Structure Packing of secondary structure into one or several
compact globular domains
Quaternary Structure The overall complex if the protein exists out of several polypeptide chains.
How to form the peptide chain?
The protein chain is formed by linking the amino-acids together. The linkage is called the peptide bond:
The chain of amino-acids linked to each other by peptide bonds is also called: polypeptide chain.
Two variants of the peptide bond
18
Trans peptide bond
Cis peptide bond
Properties of the peptide bond
The peptide bond unit containing the atoms Cn, On, Nn+1 is a rigid plane with bond lengths and angles nearly the same for each of these units in a polypeptide chain.
The freedom in conformation of this chain comes from rotating around the bonds between Nn+1 - Cαn+1 and Cαn+1 -Cn+1
Phi-Psi angles
The rotation around N- C
α is called phi
(
φ) and the angle around Cα-C is called
psi (
ψ). Each amino acid is associated
with these two conformational angles. If
phi and psi for each residue is known:
conformation of the whole backbone-chain
is known since the peptide planes are so
rigid.
Ramachandran plot
Most combinations of φ/ψ are not allowed since they would cause steric collisions between side chains and main chain.
The φ/ψ pairs can be plotted against each other. Such a plot is called a Ramachandran plot. The residues will cluster in certain areas.
Ramachandran plot of Barnase
Glycine residues
Glycine residues lack a side chain. Therefore they can
have a much wide range of conformations then other
residues. Glycines are used a lot to be able to have
unusual main chain conformations (like a tight turn).
Cysteines can form cross-links
Cysteine residues from different parts of the sequence can link together in a disulfide-bridge to form cross-links. The environment needs to be oxidative, within the cell the environment is reductive: cross-bridges are not often seen. Quite normal for
extra-cellular proteins.
These cross-links give extra stability to a protein structure. They can also link two polypeptide chains together.
Low and high energy conformation (allowed and
disallowed):
Certain side chain conformations are energetically more favourable than others: these are more frequently seen in proteins. These
conformers are called rotamers.
Rotamers for Phe
Forces holding proteins together
Electrostatic interactions
Ionic interactions
e.g. salt bridges
Dipolar interactions
dipole-dipole induced dipoleHydrogen bonds
shared H-atom
Hydrophobic packing
mainly entropic
Salt bridges and polar interactions
Ionic interactions occur either between fully charged groups (ionic), or between partially charged groups (dipole-dipole).
The force of attraction between δ+ and δ- decreases rapidly with distance. In the absence of water these interactions can be very strong.
In protein molecules ionic bonds occur between the charged residues. Combinations: Arg-Asp, Arg-Glu, Lys-Asp and Lys-Glu Dipole-Dipole interactions can occur eg.
between Asn-Thr or Ser- Gln (many more combinations possible).
Hydrogen bonds
Proteins DNA
Examples from macro-molecules:
Hydrogen bonds occur when one hydrogen is shared between two atoms (mostly O and N atoms) . One atom “donates” the hydrogen while the other “accepts “ it.
The hydrogen bond is the strongest when it is in a straight line.
Hydrophobic interactions
In proteins this means that the protein folds such that a core arises in which hydrophobic residues are buried.
This is one major force in why proteins do fold. It is based on the fact that apolar and polar molecules do not like to mix, e.g. water-oil mixtures do not mix. The hydrophobic effect is really an entropy phenomenon. By clustering the hydrophobic molecules together there are less ordered water molecules.
Secondary structure
The main driving force behind protein folding is to pack hydrophobic residues into the interior of the protein thereby creating a hydrophobic core.
Problem: the backbone of an amino acid contains some highly polar atoms: O and N.
These atoms have to be ”neutralized”
Neutralization is achieved by formation of hydrogen bonds, the O is an acceptor, while the N is a donator.
Secondary structure is an elegant way for the protein to bury the polar peptide bond in the protein interior.
There are two types of secondary structure: alpha helices and beta sheets
Alpha (
α) helices
α-helixes are found in proteins when consecutive residues all have φ/ψ angles of approximately -60° and -50°. This gives rise to helix formation. The α-helix is right-handed and has 3.6 residues per turn and there is a rise of 1.5Å per residue. In proteins α-helices are between 4 to 5 residues up to over 40 residues long with an average length of 10 residues (15Å).
Hydrogen bonding pattern in an
α-helix
In the α-helix a very regular pattern of hydrogen bonds is formed. Hydrogen bonds are formed between the C=O of residue n and the NH of residue n +4. Therefore all these polar atoms are joined through hydrogen bonds. Exceptions are the NH atoms at the beginning of the helix and the O atoms at the end of the helix. The ends of the helix are polar and are found most often at the surface of the protein.
Amphipatic
α-helix
A very common position for an α-helix is on the surface of the protein. This means that one side of the helix points towards the solution and the other side towards the hydrophobic core. There are 3.6 residues per turn: patterns arise where residues change from hydrophobic to hydrophilic every 3 to 4 residues. The helix is polar on one side and hydrophobic on the other: amphipatic. A way to look at sequences in an helix is to use an helical wheel representation : This is a projection of the residues on a plane perpendicular to the axis of the helix
Connecting helices: Helix-turn-helix motif
DNA-binding motif
Ca-binding motif
Four-helix bundle
Beta (β)-strands
The second major type of secondary structure is β-sheets. In contrast with α-helices these are not built from continous stretches of sequence but from a combination of several regions of the polypeptide chain. These regions are between 5 to 10 residues long and the residues are in a full extended conformation with φ/ψ angles of around -135/135°. This is called a β-strand.
The β-strands are aligned adjacent to each other so that hydrogen bonds can be formed between the C=O groups from one strand and the N atoms from another strand. The sheets that are formed are pleated: Ca atoms are alternatively a little above and a little beneath the plane of the β-sheet. There are two alignments possible: parallel and anti-parallel.
β-sheets parallel & antiparallel.
A sheet is called parallel if the amino-acids in the strands run all in the same biochemical direction (amino-terminal to carboxyl-terminal). If the strands have an alternating pattern N --> C and then C--> N etc then it is an antiparallel sheet.
Hydrogen bonding in
β-sheets.
The hydrogen bonding pattern is quite different between parallel and antiparallel sheets. In the antiparallel sheet there are narrowly spaced hydrogen bonds alternating with more widely spaced. The parallel sheet has more evenly spread hydrogen bonds.
Mixed
β-sheets
Almost all β-sheets ( whatever type) have a twisted sheet: this twist has always the same handedness: left-handed twist.
This is because the strands themselves have a very slight right-twist. sheets can also have a mixed character: partially parallel and antiparallel: mixed
β-sheets. These are the most common β-sheets in proteins.
Loops and turns
Most proteins are built from several secondary structure elements which are linked to each other by loop regions. These loop regions differ in size and shape. The main chain C=O and N atoms are not interacting with each each other through hydrogen bonds, instead they are exposed. This is one reason that loops are often found on the surface of proteins so that these atoms can make hydrogen bonds with water molecules. Often charged and polar residues are used in loops.
Some loops (especially in antiparallel β-sheets) are quite common: they are called hairpin loops.
How to represent protein structures?
In order to obtain most information from pictures about protein structure we need to simplify. We use schematic cartoons for doing that
Topology diagram
In order to have an overview of all the secondary structure elements and the order in which they appear in a protein, simple schematic drawings have been developed. These are called topology files. In these β-strands are represented by arrows and
α-helices by cylinders.
β-sheet topology diagrams
Antiparallel β-sheet in aspartate
transcarbamoylase Antiparallel barrel in plastocyanin Parallel β-sheet in
Level up: motifs
Some simple combinations of secondary structure
elements occur in many different proteins. These can exist
out of e.g. two helices connected with a loop or two
β-strands and a helix. These combinations have been called
supersecondary structure or motifs.
Some of these motifs have a particular function (e.g. DNA
binding) but others seem to have no biological role but are
used as building blocks.
Most elementary motifs:
α α
(two anti-parallel helices linked by a small loop)
β β
(two anti-parallel strands linked by a small loop)
β α β
βαβ motif
For antiparallel β-sheets we can link the strands with small loops (quite often hairpins), however for parallel β-sheets we need longer loops or cross-over segments. These segments are frequently ,made by α-helices. The whole unit looks the like β-strand - loop - α-helix - loop -β-strand. This is called the βαβ motif. The loops in this motif can differ in length (from only several residues to nearly 100) and contain more secondary structure elements. The element can have two hands (helix under strands or above) but the latter is much much more common.
Adding
βαβ motifs together:
Two ways to join the units together, giving: open twisted α−β structure
α−β barrels
Three main types of structure based on
βαβ motifs
Closed barrel
Triosephosphate isomerase Open twisted β-sheetAlcohol dehydrogenase
Open barrel
The active site in all α/β barrels is in a pocket formed by
the loop regions that connect the carboxy ends of the β
strands with the adjacent a helices
A view from the top of the
barrel of the active site of the enzyme RuBisCo (ribulose bisphosphate carboxylase)
Greek Key motif
This motif occurs in proteins with 4 adjacent anti-parallel β-strands. Since the topology file resembles an ornamental pattern used in ancient Greece it was called Greek Key.
This motif is structural and no specific function is associated with it.
The eight strands in γ-crystallin are arranged in two
Greek key motifs
Motifs are used as building blocks.
Motifs and secondary structure elements are used as a kind of Lego blocks to form 3-dimensional structures. If the resulting structure can fold independently it is called a
domain.
Fatty acid binding protein: beta barrel + helix-loop-helix
Lac-repressor: many motifs e.g. helix-loop-helix and 4 helix bundle
Level up: domains
Compact folded units of polypeptide chain
Fundamental units of protein structure.
Nature juggles with domains.
Quite often specific sub-roles are carried out by different domains.
Some proteins however are not easily assembled from domains and are more complex.
53
Classes of structures
In general all proteins structures can be placed into three
groups:
all
α-helical proteins
all
β-sheet proteins
α/β and α+β proteins
α-domain structures
Many different types of structures can be formed by α-helices alone. The first protein structures (myoglobin and hemoglobin) solved had only α-helices. Their
fold is called globin-fold.
Hemoglobin
α-domain structures
The helices in an all-helical domain can be packed in almost parallel manner. This gives rise to two different types of packing: 4 helix bundles or large arrangements.
All β-structures
Up-and-down barrel
Retinol binding protein Up-and-down sheet Influenza neuraminidasePropeller-like fold All-β structures are predominantly antiparallel (no helices to make crossovers) and consist of packed sheets
Superoxide dismutase (SOD) comprises eight
antiparallel
β-strands
All
β-structures (2)
Jelly-roll barrel (2 x Greek key)
Viral coat proteins
β-helix
Pectate lyase
α/β and α+β structures
These are the most common structures found. They consist of a central sheet (mixed or parallel) surrounded by α-helices (α/β) or segregated α and β regions. There are many variations in these classes (e.g. see how
βαβ-units pack). Often the secondary structure elements provide structural strength while loops are involved in the function of the protein.
α+β Lysozyme α/β tyrosyl-tRNA
Level up: Multi-domain proteins
Multiple copies of similar kind of domain,
e.g. rhodanese, immunoglobulins.
Multiple dissimilar domains, occurs in
many different proteins.
Combinations can also occur.
61
Similar domains
Immunoglobulin G :
Mixing similar domains (V & C )
62
Multi-domain proteins with dissimilar
domains
63