Protein Structure

(1)

Protein Structure

Principles & Architecture

Marjolein Thunnissen

Dep. of Biochemistry & Structural Biology

Lund University September 2011

Bioinformatics

Homology, pattern and 3D structure

searches need databases and

database managing tools, search

technique and dedicated tools for

sequence and structure comparison

and detection of similarity, for

homology modelling etc.

All this is the object of

bioinformatics

Why use bioinformatics?

• An explosive growth in the amount

of biological information

• A more global perspective in

experimental design.

• Data-mining.

• The potential for uncovering

phylogenetic relationships and

evolutionary patterns.

Role of (bio)informatics in drug

discovery

Genome Gene Protein HTS Hit Lead Candidate drug

Genomics

Bioinformatics

Structural bioinformatics Chemoinformatics

Structure-Based Drug Design ADMET Modelling

Structural bioinformatics techniques are valuable in

areas from target identification to lead discovery

(2)

06/27/12

Proteins:0What0do0they0do?

Receptors Hair0and0skin Muscles Transport0of0oxygen Growth0and0development DigesHon Defense0against0infecHons And0much0more….

Proteins: What do they do?

• Carry out chemistry as enzymes • Recognition

• signal transduction • immune recognition • Capture and emit light

• Pump small and big molecules over membranes • Movements (muscles), use ATP as energy • RNA and DNA maintenance/ metabolism etc • Form hair, nail, cartilage, silk, spiderwebs • and more

More than 80% of medicines do bind to proteins!

6

Why study 3D structures of biological

macromolecules?

1. FUNCTION IS STRUCTURE!

2. Sequence homology is not enough to identify

functional relationships.

3. Protein folding is still not fully understood.

Predictions do not work satisfactory.

4. Drug design.

Pharmaceutical industry

Protein Structure Hierarchy

(3)

06/27/12

Proteins0are0polymers

Proteins are formed by a chain of repeating molecules. One such molecule is called an amino-acid. There are 20 types of amino-acids but they have all a

common backbone or main-chain.

In DNA code: 20 different amino-acids.

In proteins 20 different amino-acids are found. The names of the different aminoacids can be given as a 3 letter code or a 1 letter code:

Alanine ——> Ala ——> A

The amino-acids can be divided into sub-groups dependent on the nature of their side-chain.

Group1 Hydrophobic Ala (A), Val (V), Leu (L), Ile (I), Phe (F), Pro (P) and Met (M) Group2 Charged Asp (D), Glu (E), Arg (R), Lys (K) Group3 Polar Ser (S), Thr (T), Cys (C), Asn (N),

Gln (Q), His (H), Tyr (Y) and Trp (W) Group4 No special properties Gly (G)

Alternatively there is also a 5th group:

Group 5 Aromatic rings Phe (F), Tyr (Y), Trp (W) and His (H)

The 20 amino-acids: hydrophobic residues

Alanine (Ala, A) Valine (Val, V) Proline (Pro, P)

Isoleucine (Ile, I)

Leucine (Leu, L)

Phenylalanine (Phe, F) Methionine (Met, M)

The 20 amino-acids: charged residues

Arginine (Arg, R) Lysine (Lys, K) Aspartic acid (Asp, D) Glutamic acid (Glu, E)

(4)

The 20 amino-acids: polar residues

Serine (Ser, S) Threonine (Thr T) Tyrosine (Tyr, Y) Histidine (His, H)

Cysteine (Cys, C) Asparagine (Asn, N) Glutamine (Gln, Q) Tryptophan (Trp, W)

The 20 amino-acids: Glycine

Glycine, (Gly, G)

Special properties of amino-acids

Since there are 4 different groups attached to the central Cα atom of an amino-acid (except for Glycine), it is an asymmetric atom.

Amino acids are therefore chiral molecules. There are two forms: L-form and D-form:

The natural configuration of amino acids in proteins is always the L-form.

Structure in four dimensions

Due to the fact that there are 20 different amino-acids, proteins are described in different dimensions.

Primary Structure Amino-acid sequence.

Secondary Structure Local regular structure: α-helices and β-sheets. Tertiary Structure Packing of secondary structure into one or several

compact globular domains

Quaternary Structure The overall complex if the protein exists out of several polypeptide chains.

(5)

How to form the peptide chain?

The protein chain is formed by linking the amino-acids together. The linkage is called the peptide bond:

The chain of amino-acids linked to each other by peptide bonds is also called: polypeptide chain.

Two variants of the peptide bond

18

Trans peptide bond

Cis peptide bond

Properties of the peptide bond

The peptide bond unit containing the atoms Cn, On, Nn+1 is a rigid plane with bond lengths and angles nearly the same for each of these units in a polypeptide chain.

The freedom in conformation of this chain comes from rotating around the bonds between Nn+1 - Cαn+1 and Cαn+1 -Cn+1

Phi-Psi angles

The rotation around N- C

α is called phi

(

φ) and the angle around Cα-C is called

psi (

ψ). Each amino acid is associated

with these two conformational angles. If

phi and psi for each residue is known:

conformation of the whole backbone-chain

is known since the peptide planes are so

rigid.

(6)

Ramachandran plot

Most combinations of φ/ψ are not allowed since they would cause steric collisions between side chains and main chain.

The φ/ψ pairs can be plotted against each other. Such a plot is called a Ramachandran plot. The residues will cluster in certain areas.

Ramachandran plot of Barnase

Glycine residues

Glycine residues lack a side chain. Therefore they can

have a much wide range of conformations then other

residues. Glycines are used a lot to be able to have

unusual main chain conformations (like a tight turn).

Cysteines can form cross-links

Cysteine residues from different parts of the sequence can link together in a disulfide-bridge to form cross-links. The environment needs to be oxidative, within the cell the environment is reductive: cross-bridges are not often seen. Quite normal for

extra-cellular proteins.

These cross-links give extra stability to a protein structure. They can also link two polypeptide chains together.

Low and high energy conformation (allowed and

disallowed):

Certain side chain conformations are energetically more favourable than others: these are more frequently seen in proteins. These

conformers are called rotamers.

Rotamers for Phe

(7)

Forces holding proteins together

Electrostatic interactions

Ionic interactions

e.g. salt bridges

Dipolar interactions

dipole-dipole induced dipole

Hydrogen bonds

shared H-atom

Hydrophobic packing

mainly entropic

Salt bridges and polar interactions

Ionic interactions occur either between fully charged groups (ionic), or between partially charged groups (dipole-dipole).

The force of attraction between δ+_{and δ}-_{decreases rapidly with distance. In the} absence of water these interactions can be very strong.

In protein molecules ionic bonds occur between the charged residues. Combinations: Arg-Asp, Arg-Glu, Lys-Asp and Lys-Glu Dipole-Dipole interactions can occur eg.

between Asn-Thr or Ser- Gln (many more combinations possible).

Hydrogen bonds

Proteins DNA

Examples from macro-molecules:

Hydrogen bonds occur when one hydrogen is shared between two atoms (mostly O and N atoms) . One atom “donates” the hydrogen while the other “accepts “ it.

The hydrogen bond is the strongest when it is in a straight line.

Hydrophobic interactions

In proteins this means that the protein folds such that a core arises in which hydrophobic residues are buried.

This is one major force in why proteins do fold. It is based on the fact that apolar and polar molecules do not like to mix, e.g. water-oil mixtures do not mix. The hydrophobic effect is really an entropy phenomenon. By clustering the hydrophobic molecules together there are less ordered water molecules.

(8)

Secondary structure

The main driving force behind protein folding is to pack hydrophobic residues into the interior of the protein thereby creating a hydrophobic core.

Problem: the backbone of an amino acid contains some highly polar atoms: O and N.

These atoms have to be ”neutralized”

Neutralization is achieved by formation of hydrogen bonds, the O is an acceptor, while the N is a donator.

Secondary structure is an elegant way for the protein to bury the polar peptide bond in the protein interior.

There are two types of secondary structure: alpha helices and beta sheets

Alpha (

α) helices

α-helixes are found in proteins when consecutive residues all have φ/ψ angles of approximately -60° and -50°. This gives rise to helix formation. The α-helix is right-handed and has 3.6 residues per turn and there is a rise of 1.5Å per residue. In proteins α-helices are between 4 to 5 residues up to over 40 residues long with an average length of 10 residues (15Å).

Hydrogen bonding pattern in an

α-helix

In the α-helix a very regular pattern of hydrogen bonds is formed. Hydrogen bonds are formed between the C=O of residue n and the NH of residue n +4. Therefore all these polar atoms are joined through hydrogen bonds. Exceptions are the NH atoms at the beginning of the helix and the O atoms at the end of the helix. The ends of the helix are polar and are found most often at the surface of the protein.

Amphipatic

α-helix

A very common position for an α-helix is on the surface of the protein. This means that one side of the helix points towards the solution and the other side towards the hydrophobic core. There are 3.6 residues per turn: patterns arise where residues change from hydrophobic to hydrophilic every 3 to 4 residues. The helix is polar on one side and hydrophobic on the other: amphipatic. A way to look at sequences in an helix is to use an helical wheel representation : This is a projection of the residues on a plane perpendicular to the axis of the helix

(9)

Connecting helices: Helix-turn-helix motif

DNA-binding motif

Ca-binding motif

Four-helix bundle

(10)

Beta (β)-strands

The second major type of secondary structure is β-sheets. In contrast with α-helices these are not built from continous stretches of sequence but from a combination of several regions of the polypeptide chain. These regions are between 5 to 10 residues long and the residues are in a full extended conformation with φ/ψ angles of around -135/135°. This is called a β-strand.

The β-strands are aligned adjacent to each other so that hydrogen bonds can be formed between the C=O groups from one strand and the N atoms from another strand. The sheets that are formed are pleated: Ca atoms are alternatively a little above and a little beneath the plane of the β-sheet. There are two alignments possible: parallel and anti-parallel.

β-sheets parallel & antiparallel.

A sheet is called parallel if the amino-acids in the strands run all in the same biochemical direction (amino-terminal to carboxyl-terminal). If the strands have an alternating pattern N --> C and then C--> N etc then it is an antiparallel sheet.

Hydrogen bonding in

β-sheets.

The hydrogen bonding pattern is quite different between parallel and antiparallel sheets. In the antiparallel sheet there are narrowly spaced hydrogen bonds alternating with more widely spaced. The parallel sheet has more evenly spread hydrogen bonds.

Mixed

β-sheets

Almost all β-sheets ( whatever type) have a twisted sheet: this twist has always the same handedness: left-handed twist.

This is because the strands themselves have a very slight right-twist. sheets can also have a mixed character: partially parallel and antiparallel: mixed

β-sheets. These are the most common β-sheets in proteins.

(11)

Loops and turns

Most proteins are built from several secondary structure elements which are linked to each other by loop regions. These loop regions differ in size and shape. The main chain C=O and N atoms are not interacting with each each other through hydrogen bonds, instead they are exposed. This is one reason that loops are often found on the surface of proteins so that these atoms can make hydrogen bonds with water molecules. Often charged and polar residues are used in loops.

Some loops (especially in antiparallel β-sheets) are quite common: they are called hairpin loops.

How to represent protein structures?

In order to obtain most information from pictures about protein structure we need to simplify. We use schematic cartoons for doing that

Topology diagram

In order to have an overview of all the secondary structure elements and the order in which they appear in a protein, simple schematic drawings have been developed. These are called topology files. In these β-strands are represented by arrows and

α-helices by cylinders.

β-sheet topology diagrams

Antiparallel β-sheet in aspartate

transcarbamoylase Antiparallel barrel in plastocyanin Parallel β-sheet in

(12)

Level up: motifs

Some simple combinations of secondary structure

elements occur in many different proteins. These can exist

out of e.g. two helices connected with a loop or two

β-strands and a helix. These combinations have been called

supersecondary structure or motifs.

Some of these motifs have a particular function (e.g. DNA

binding) but others seem to have no biological role but are

used as building blocks.

Most elementary motifs:

α α

(two anti-parallel helices linked by a small loop)

β β

(two anti-parallel strands linked by a small loop)

β α β

βαβ motif

For antiparallel β-sheets we can link the strands with small loops (quite often hairpins), however for parallel β-sheets we need longer loops or cross-over segments. These segments are frequently ,made by α-helices. The whole unit looks the like β-strand - loop - α-helix - loop -β-strand. This is called the βαβ motif. The loops in this motif can differ in length (from only several residues to nearly 100) and contain more secondary structure elements. The element can have two hands (helix under strands or above) but the latter is much much more common.

Adding

βαβ motifs together:

Two ways to join the units together, giving: open twisted α−β structure

α−β barrels

Three main types of structure based on

βαβ motifs

Closed barrel

Triosephosphate isomerase Open twisted β-sheetAlcohol dehydrogenase

Open barrel

(13)

The active site in all α/β barrels is in a pocket formed by

the loop regions that connect the carboxy ends of the β

strands with the adjacent a helices

A view from the top of the

barrel of the active site of the enzyme RuBisCo (ribulose bisphosphate carboxylase)

Greek Key motif

This motif occurs in proteins with 4 adjacent anti-parallel β-strands. Since the topology file resembles an ornamental pattern used in ancient Greece it was called Greek Key.

This motif is structural and no specific function is associated with it.

The eight strands in γ-crystallin are arranged in two

Greek key motifs

Motifs are used as building blocks.

Motifs and secondary structure elements are used as a kind of Lego blocks to form 3-dimensional structures. If the resulting structure can fold independently it is called a

domain.

Fatty acid binding protein: beta barrel + helix-loop-helix

Lac-repressor: many motifs e.g. helix-loop-helix and 4 helix bundle

(14)

Level up: domains

 Compact folded units of polypeptide chain

 Fundamental units of protein structure.

 Nature juggles with domains.

 Quite often specific sub-roles are carried out by different domains.

 Some proteins however are not easily assembled from domains and are more complex.

53

Classes of structures

In general all proteins structures can be placed into three

groups:

all

α-helical proteins

all

β-sheet proteins

α/β and α+β proteins

α-domain structures

Many different types of structures can be formed by α-helices alone. The first protein structures (myoglobin and hemoglobin) solved had only α-helices. Their

fold is called globin-fold.

Hemoglobin

α-domain structures

The helices in an all-helical domain can be packed in almost parallel manner. This gives rise to two different types of packing: 4 helix bundles or large arrangements.

(15)

All β-structures

Up-and-down barrel

Retinol binding protein Up-and-down sheet Influenza neuraminidasePropeller-like fold All-β structures are predominantly antiparallel (no helices to make crossovers) and consist of packed sheets

Superoxide dismutase (SOD) comprises eight

antiparallel

β-strands

All

β-structures (2)

Jelly-roll barrel (2 x Greek key)

Viral coat proteins

β-helix

Pectate lyase

α/β and α+β structures

These are the most common structures found. They consist of a central sheet (mixed or parallel) surrounded by α-helices (α/β) or segregated α and β regions. There are many variations in these classes (e.g. see how

βαβ-units pack). Often the secondary structure elements provide structural strength while loops are involved in the function of the protein.

α+β Lysozyme α/β tyrosyl-tRNA

(16)

Level up: Multi-domain proteins



Multiple copies of similar kind of domain,

e.g. rhodanese, immunoglobulins.



Multiple dissimilar domains, occurs in

many different proteins.



Combinations can also occur.

61

Similar domains



Immunoglobulin G :

Mixing similar domains (V & C )

62

Multi-domain proteins with dissimilar

domains

63

The three domains of pyruvate kinase do

not show any structural resemblance to

each other.

Multi-domain game



Combining domains in a chain is a rapid

way to combine functions.

(17)