Mapping Protein

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Mapping Protein-Protein Interfaces and Protein Topologies by Novel Enrichment of Chemical Cross-links and Downstream Identification Using MALDI Mass Spectrometry

At any moment in time there are a number of proteins working tirelessly within a cell. While some are catalyzing a wide spectrum of chemical reactions, others act as gatekeepers regulating the passage of molecules across a membrane. Certain proteins form the structural framework that gives cells their shape while others serve to relay messages across the cytoplasm. These are a few roles in a long list of functions proteins execute in order for cells to survive and thrive1. With proteins performing such sophisticated functions, it is not surprising that they are the most structurally elaborate molecules known. Even a small protein is composed of hundreds of carefully positioned atoms associating by covalent and noncovalent bonds, all determined by billions of years of evolution2.

It is remarkable that all this functional and structural complexity can be reduced to combinations of 20 amino acids, strung together by covalent peptide bonds to form long chains we know as proteins3. A quick calculation reminds us of this reality. Since there are 20 amino acids that can occur in any point in the peptide chain, just considering proteins 300 amino acids long, there could be more than 10390 different proteins. The amino acid sequence of these proteins specifies the structure of the protein which in turn dictates the function of the protein4. With the sequencing of the human genome, the amino acid sequence of more than 10,000 human proteins has been identified5. The next logical step into investigating the complexity of the proteome is to identify the structure and function of all these proteins and is by no means a simple task5.

Structural Genomics: Protein Topology Mapping

To date, X-ray crystallography is the most widely used technique for determining the structure of individual proteins6. In large quantities produced by a heterologus expression system, a protein is experimentally induced to form three-dimensional periodic arrays commonly known as a crystal. This crystal is subsequently irradiated with a narrow beam of parallel x-rays. While most of the x-rays simply pass through unaffected, select x-ray beams will be scattered by the atoms of the protein. Due to the ordered nature of the crystal, scattered waves are reinforced producing detectable diffraction spots6. Analysis of the overall diffraction pattern gives a three-dimensional electron-density map which is correlated with the amino acid sequence to yield the structure of the protein with an atomic resolution6. Although X-ray crystallography is a powerful technique it is limited to the small subset of proteins that can be isolated in large quantities and are capable of forming structured crystals. Even for those select proteins that are suitable for the technique it often takes years to optimize crystallization conditions to form crystals of sufficient quality.

Nuclear magnetic resonance (NMR) spectroscopy is another popular method for mapping protein structures and does not require a crystalline starting sample7. All that is needed is a solution of concentrated protein and a strong magnetic field. NMR capitalizes on the magnetic spin that atomic nuclei in proteins posses. This magnetic moment aligns with an external magnetic field and temporarily changes its orientation in response to radiofrequency pulses of electromagnetic radiation7. When the nuclei return to their original alignment there is a unique emission of radiofrequency electromagnetic radiation that depends on the atomic environment of the nuclei7. Since this environment consists of the neighboring amino acids the unique emission spectrum provides information about the structure of the protein7. Unfortunately NMR is limited to the few proteins 20,000 Daltons or less as the resolution of the technique decreases as the size of the protein increases7.

Although technological advancements in X-ray and NMR would likely address their respective limitations it would not resolve their major shortcoming. Both techniques heavily rely on isolation of proteins from their cellular milieu. One thing that is clear in the literature on protein folding is that the environment plays an enormous role on the structure of the protein8. Therefore it is not farfetched that the structure of the proteins resolved by X-ray and NMR are artifacts of the technique and are far from the real fold of the endogenous protein.

Towards the Interactome: Protein Interaction Mapping

Proteins perform a number of different functions dictated by their structure4. However they do not act in isolation. In order to execute their functions, proteins must interact with each other as part of higher order complexes, producing large networks of protein-protein interactions9. Mapping these networks is a massive task and as such most experimental methods have been directed to characterizing the interactome of individual proteins10. In addition to contributing to the cellular interactome the interaction network of individual proteins provide insight into the cellular function of the individual protein. Its role can be inferred from the established function of the proteins to which it specifically binds10.

A powerful genetic strategy used to map binary protein interactions is the yeast two-hybrid system11. This technique capitalizes on the modular nature of transcriptional activators12. These proteins have an isolated domain which binds consensus DNA sequences and a second separate protein domain which activates gene transcription12. The “bait” protein, whose interaction network is being probed, is fused to the DNA-binding domain of a transcriptional activator12. When expressed in yeast this recombinant fusion protein will associate with the regulatory region of a reporter gene12. The “prey” proteins, potential interacting proteins, are fused to the transcriptional activation domain of the transcriptional activator and individually expressed along with the “bait” fusion protein in yeast12. If the two co-expressed fusion proteins interact the two domains of the transcriptional activator are united and the reporter gene will be expressed12. In this way cellular interactomes have been generated for yeast, C. elegans, and Drosophila11.

Another genetic method for generating protein interaction networks is fluorescence resonance energy transfer (FRET). This technique relies on the fact that interacting proteins are in close proximity in the cell13. Immunofluorescence is often used first to show an overlapping cellular localization. Here proteins of interest are labeled by primary and fluorescently labeled secondary antibodies in fixed permeabilized cells. The labeled cells are subsequently imaged by epifluorescence microscopy or spinning-disk confocal microscopy. If the two proteins colocalize by immunoflouersence it demonstrates that the proteins, likely but not definitively, interact13. For greater evidence of an interaction FRET is used. Similar to the yeast-two hybrid system the “bait” protein is recombinantly tagged to produce a fusion protein with a fluorochrome13. The “prey” proteins are recombinantly tagged to produce a fusion protein with a different fluorochrome that has an absorption spectrum that overlaps with the emission spectrum of the fluorochrome attached to the “bait” protein13. When the two vectors are co-expressed in live cells, if the “bait” and “prey” proteins interact their fluorochomes will be united and the energy of absorbed light from the “bait” flouorochrome will be transferred to the “prey” flouorochrome producing a characteristic fluorescent emission in the cellular location of their interaction13.

A genetic method to map a different set of protein interaction is synthetic-lethality analyses in yeast. Synthetic lethality occurs when two separate non-lethal gene deletions lead to lethality14. The reason for this is that the two protein gene products function in parallel cellular pathways14. While loss of one protein in the pathway can be compensated by a parallel pathway, the loss of two proteins in parallel pathways cannot be compensated for, leading to a dramatic impairment15. Synthetic genetic array (SGA) technology has automated this screening process and been successfully used to map the entire interactome of yeast15. This genetic interactome is complementary but distinct from an interaction networks generated based on physical protein interaction. Synthetic lethality identifies interactions that transcend direct protein-protein interaction as they rely on pathway membership. Consistent with this, yeast proteins have numerous genetic protein interactions while they only have, on average, eight physical interactors identified by yeast two-hybrid screens15.

In addition to the many genetic techniques available to build protein interaction networks there are a number of biochemical methods as well. In-vitro binding assays are a common tool used to demonstrate two proteins are capable of interacting. A “bait” protein is recombinantly tagged with an affinity motif such as GST and expressed in a heterologus expression system. The “prey” protein is also similarly expressed. After both proteins are isolated and purified separately the “bait” protein is immobilized through its affinity motif on a matrix. The “prey” protein is then incubated with the “bait” protein. If the two proteins elute from the solid support it indicates that the proteins physically interact. This concept is scaled up for convenient application to map protein interaction networks. Similar to the microarrays used to study transcription on a large scale, protein arrays have been developed to identify protein interactions on a large scale16. Thousands of “prey” proteins are immobilized on glass slides and fluorescently labeled “prey” protein is incubated with the array. After a series of washes florescent spots that remain represent a “prey” protein to which the “bait” protein binds to, indicating an interaction between the two.

Another widespread biochemical tool to identify interacting proteins is affinity purification. In this technique, which is quite similar to in-vitro binding assays, the “bait” protein is first recombinantly tagged at the N or C-terminal with an affinity motif such as FLAG, HA, Myc, or Protein A peptide17. After genetic tagging there is heterlogous expression of the fusion protein. It is assumed that the overexpressed tagged protein functions similar to the endogenous protein and thus also interacts with normal protein interactors17. The “bait” protein is then isolated by affinity chromatography directed against the affinity motif immobilizing the “bait” protein to the affinity matrix17. Endogenous proteins in the cell extract that interact with the “bait” protein will in turn be immobilized indirectly to the affinity matrix through their association with the “bait” protein. These proteins can then be eluted and individually identified by western blotting. After demonstrating a protein interaction by this technique, co-immunoprecipitation can be used for a more convincing demonstration of a robust interaction. In this “gold standard” method an antibody coupled to a solid matrix is used to pull down the endogenous “bait” from the cell extract18. Like with the affinity tag based purification any proteins that interact with the “bait” protein will be captured along with the “bait”18.

Western blotting for individual proteins after co-purification or co-immunoprecipitation with the “bait” protein is time consuming and impractical for generating large protein interactomes. With the genome sequenced of model organisms the proteins produced by these organisms are all known4. Identification of proteins only requires matching of some amino acid sequences within the unknown proteins to known catalogued gene products. High-throughput mass spectrometry and subsequent database searches use this approach to identify interacting proteins5. Mass spectrometry relies on the fact that charged particles have predictable responses when exposed to an electric and magnetic field in a vacuum5. For matrix-assisted laser desorption ionization-time-of-flight spectrometry (MALDI-TOF), proteins are experimentally digested into peptides and then mixed with an organic acid and loaded onto a metal slide5. A laser is then used to eject the peptides from the slide as an ionized gas, with each molecule carrying a single positive charge5. The charged peptide is accelerated by an electric field towards a detector. The time it takes the charged particle to reach the detector depends on the mass and charge. Larger peptides move slower than smaller peptides. The resulting mass to charge ratios (m/z) of the peptide fragments are presented in a spectrum and are known as a peptide mass fingerprint (PMF)5. This PMF can be used to unambiguously identify a protein through a search of databases where masses of expected peptides from proteins have been tabulated5.

The use of all these different experimental methods to identify protein and cellular interactomes is evidence that each technique has benefits over the others but at the same time suffers from its own limitations. Where all techniques fall short is their identification of false positive protein interactors and failure to identify false negative protein interactors. With the widespread presence of protein interaction domains, the structurally dynamic nature of proteins and the fundamental noncovalent bonds through with they associate by, proteins have an intrinsic ability to bind proteins they do not normally physiologically interact with. For this reason when proteins are artificially placed together with all these techniques, it is not unexpected that there will be false positives. Under normal physiological conditions these proteins might be found in different cellular compartments, may have different transcriptional profiles or have different folds that leave them sterically inaccessible to each other. The other consequence of an artificial encounter is that valid protein interactions might be missed. It may be that the proteins have a weak or no intrinsic affinity for each other and it is the cellular milieu which facilitate and promote their transient association.

Combining Structural Genetics and the Interactome: Cross-linking and Mass Spectrometry

The ultimate goal of proteomics is to assemble a comprehensive representation of proteins and their functions in a cell. The hope is to depict the structures of proteins as well as the interfaces they use to associate with their interactors to perform their functions. To date small steps have been taken towards this, with the structure of proteins being investigated in isolation from their interaction all while interaction interfaces have almost been ignored. Often in biology the greatest discoveries are not small steps but giant leaps towards a better understanding of the cell. In vivo chemical cross-linking coupled with mass spectrometry identification is the giant leap towards building a representation of the proteins of a cell and promises to be revolutionary in the field of proteomics. It allows protein structures to be solved, protein interactors to be identified, and protein interfaces to be mapped for the entire cell all at the same time. It seems too good to be true, and in part it is. But current methodology development in our lab is making this a reality.

A whole cell chemical cross-linking experiment begins with the selection of a chemical cross-linking reagent19. Traditional cross-linking reagents possess a single reactive functional group at each of its two ends for covalent attachment to complementary functional groups found on proteins19. In a homobifunctional cross-linking reagent these functional groups are identical while in a heterobifunctional reagent these two groups are different20. These functional groups can be constitutively active or inducible by an external stimulus such as UV light20. Equally important as the functional group is the membrane permeability and solubility as well as the tether length, the distance the cross-link can span between residues20.

The cross-linking reagent is then added to the cell for a limited time. Once in the cell the cross-linking reagent reacts at either one or both of its ends, attaching to a single amino acid or bridging two respectively. In the latter case amino acids in close proximity would be linked. Thus covalent cross-links would be formed between neighboring peptide chains within a protein and at the interface between two interacting proteins. Here lies the power of chemical cross-linking. With knowledge of which stretches of amino acids within a protein are cross-linked to each other, the structure of the protein can be solved by taking into consideration the distance constraint defined by the cross-linker tether length. Cross-linking of the protein in its cellular milieu ensures the structure is preserved and is not altered by non-physiological conditions after the cellular integrity is disrupted. Thus it resolves the major issue that X-ray and NMR techniques suffer from, albeit at the cost of the atomic resolution. Cross-linking also resolves the issue interactome techniques face with false positive and false negative interactors. In vivo cross-linking ensures that only proteins that physiologically interact are bound together by covalent bonds21. Thus false positives are largely avoided through stringent washing steps that remove non-physiological interactors that just have an intrinsic affinity to the “bait” protein. At the same time the formation of covalent bonds between protein interactors allows for stabilization of transient and weak interactors, dramatically reducing false negatives. Cross-linking also maps the largely ignored interfaces between two interacting proteins. Information about the amino acids that span the interface of interacting proteins, together with information about the structure of the protein afforded by cross-linking, can be used to map the topology of protein complexes for the entire cell.

After a limited cross-linking the cellular proteins are isolated and typically first separated by SDS-PAGE5 ( 1). Protein gel bands are then excised and in-gel digested by trypsin which cleaves at lysines and arginines5. This produces a mixture of peptide products typically around 5 amino acids long5. Assuming a short exposure time for the cross-linking there is an overabundance of unmodified peptides in this mixture. The cross-linker has reacted and covalently associated to the remaining small fraction of peptides ( 2). Amongst the cross-linked peptides one can distinguish between three types of cross-linking products. The first group of peptides are simply modified by a cross-linker that has reacted only at one of its two functional groups. These peptides are known in the literature as “dead end” peptides20. The second group of peptides are modified twice by a single cross-linker such that both the functional groups of a cross-linker have reacted within a single peptide. These peptides are called “intrapeptide” cross-linked20. The third group of peptides have been modified once by a cross-linker and through-space linked to a second peptide also modified by the cross-linker. These “interpeptide” cross-links are the informative cross-links between peptide chains within a protein or between proteins and constitute an extremely small fraction of the final peptide mixture20. In most cases MALDI MS/MS or ESI MS/MS is then used to identify these informative cross-links in a mixture of all the peptides produced from the trypsin proteolysis (1). However identification of “interpeptide” cross-links is nearly impossible due to the inherent complexity of the final peptide mixture. In fact, in the literature, the identification of the “interpeptide” cross-links has been compared to looking for ‘a needle in a haystack'20.

In order to reduce the complexity of whole cell peptide mixtures, protein complexes of interest are often extensively enriched. This is the case for the time-controlled transcardiac perfusion cross-linking (tcTPC) method, the first successful strategy to identify in vivo cross-linked protein interactions from complex tissue22. After mild formaldehyde cross-linking by transcardiac perfusion, cross-linked protein complexes were immunoaffinity purified with highly selective antibodies22. This method enabled identification of more than 20 protein interactors of the prion protein and many protein interactors for the Alzheimer's disease protein amyloid precursor protein22. In spite of extensive protein enrichment and the hundred fold reduction in sample complexity, the final peptide mixture is still too complex for the tcTPC method to successfully identify individual “interpeptide” cross-links. As such the method is limited to identifying protein interactors and cannot identify the structure of proteins and the interfaces through which they interact. It is clear that in addition to isolating protein complexes of interest, more stringent enrichment steps are required to reduce the sample complexity.

Efforts to identify cross-linked peptides have focused on specially designed cross-linking reagents. One approach involves the use of a trifunctional cross-linker with the two traditional functional groups for cross-linking and an affinity tag such as biotin23. After chemical cross-linking and enzymatic digestion, cross-linked peptides are selectively purified from unmodified peptides by their affinity tag using a complementary system like streptavidin23. In another approach that also relies on enrichment, the chemical cross-linker is designed to become fluorescent when both functional groups react with complementary protein groups24. Using HPLC, cross-linked peptides can be selectively purified from the proteolytic peptide mixture based on their fluorescence24. Other custom cross-linker approaches rely on mass spectrometry to distinguish cross-linked peptides from unmodified peptides. In one study, after using a 1:1 mixture of isotope-labeled cross-linkers, peaks in the mass spectrum could easily be identified as cross-linked peptides based on their characteristic isotopic pattern25. Another approach identifies cross-links by marker ions released from the product ion of cross-linked peptides after low-energy fragmentation26. An equally common technique uses a thiol-cleavable cross-linking reagent and compares the mass spectrum of peptide samples before and after reduction of the cross-linker27. Peptide peaks not observed in the spectrum following reduction are putative cross-linked proteins27.

Although there is an abundance of custom cross-linkers that allow for identification of cross-linked proteins, they are unfortunately neither practical nor effective. Since these large cross-linkers cannot readily cross lipid membranes, they cannot be properly delivered into cells to efficiently cross-link proteins. Thus any advantage they provide in the downstream analysis of peptides is of modest practical benefit. But even for those few cross-linkers designed to be membrane permeable there is still the inherent inability to discriminate “dead end” and “intrapeptide” cross-links from “interpeptide” cross-links. As such, methods that identify “interpeptide” cross-links independent of the choice of cross-linker are desired. One such method capitalizes on the two C-termini of “interpeptide” cross-links after proteolysis versus the one C-termini of unmodified, “dead end” and “intrapeptide” cross-links. After chemical cross-linking proteins were digested by trypsin in ‘heavy' 18O water resulting in the incorporation of two 18O atoms at each C-terminus28. Compared to unlabled “interpeptide” cross-links, labeled through-space cross-links shift 8u while unmodified, “dead end” and “intrapeptide” cross-links only undergo a 4u shift relative to their unlabled counterparts28. Thus “interpeptide” cross-links can be identified in the mass spectrum by their characteristic mass shift28. Unfortunately this method still does not resolve the issue of ion suppression of already rare “interpeptide” cross-links by complex peptide mixtures. It is clear that there currently is no generic enrichment strategy to selectively purify and identify “interpeptide” cross-links.

Methodology Development: Finding a Needle in a Haystack

We decided to exploit the two termini of “interpeptide” cross-links after proteolysis versus the one termini of unmodified, “dead end” and “intrapeptide” cross-links. Our intent was to selectively label a single terminus of each peptide with an affinity tag that would allow us to separate the “interpeptide” cross-links with two affinity tags from the rest of the peptide mixture which would only contain single affinity tags. A review of current protein labeling strategies that targeted endogenous amino acid nucleophiles, N-terminal amino groups, or C-terminal carboxyl groups demonstrated the absence of a site-specific labeling strategy. Through a serendipitous experimental finding we realized that cyanogen bromide (CNBr) chemical cleavage of proteins C-terminal to methionines generates peptide fragments on average 50 amino acids long with a homoserine lactone (HSL) moiety at the C-termini. The mechanism involves the nucleophilic attack of the thioester sulfur on the carbon in CNBr, followed by cyclization to form the iminolactone, which is hydrolyzed by water, resulting in the cleavage of the peptide bond and generation of a homoserine lactone. The intermediate reactivity of the HSL electrophile is perfect for chemoselective bio-orthogonal labeling of the C-terminal of peptides. Aminolysis by a primary amine was selected as our method of nucelophilic attack as its mild nucleophilicity prevents it from nonspecifically modifying amino acids but still allows it to conjugate to the homoserine lactone to form a stable amide bond ( 3).

As with all chemical reactions a number of experiments were conducted to optimize conditions. The duration of the reaction, the reaction temperature, the concentration of the primary amine reagent, the pH of the reaction and the solvent of the reaction were all explored ( 4). It was found that the reaction reaches equilibrium within 6 hours however since overnight incubations did not cause any appreciable side reactions overnight incubations were used due to convenience. Not surprising for an endothermic reaction an elevated temperature accelerated product formation and the optimum reaction temperature was found to be between 45 and 55 degrees. The ideal molar ratio of primary amine functionalized reagents and 200pmol of HSL containing peptides was 1000:1 to 10,000:1. Since it was known that under basic conditions the HSL moiety is consumed and undergoes alcholysis to form the free acid homoserine reactions we carried out under acidic conditions. Because the reactivity of the primary amine functionalized reagent dramatically decreases in acidic conditions due to protonation, an acid scavenger such as triethylammonium was added to reduce protonation. Anhydrous solvents like dimethyl formamide were preferentially used over water based solvents as H­2O promotes the isomerization of homoserine lactones to homoserines thus reducing the available homoserine lactones to react with primary amine tags.

Tandem Tetrahistidine and Biotin Enrichment of Cross-linked Peptides

With a successful chemoselective peptide terminal labeling strategy developed we decided to conjugate one of two different affinity tags at any C-termini. The presence of two different affinity tags at “interpeptide” cross-links would allow for a tandem affinity purification of these rare through-space cross-links from a mixture of other peptides. We selected the polyhisitidine and biotin affinity tags for this method which are bound by nickel-nitrilo-triacetic acid (Ni-NTA) functionalized IMAC matrixes and streptavidin agarose respectively. These affinity systems were chosen to make the method more user friendly as many biochemists are already familiar with these affinity systems. Since the conjugation of the primary anime functionalized biotin tag proceeded at a higher rate than the primary amine functionalized tetrahisitide tag, tetrahistidine and biotin were consecutively conjugated to the HSL at the C-termini of peptides generated by CNBr cleavage of proteins. Unmodified, “dead end” and “intrapeptide” cross-links would carry only one of the two affinity tags, while a large fraction of “interpeptide” cross-links carry both affinity tags. Focusing on the through-space crosslinked peptides, those peptides functionalized with a 4xHis tag at either one or two C-termini, are initially purified by Ni-NTA affinity chromatography ( 5). As for the select few crosslinked peptides that do not possess a tetrahistidine tag, whether they are simply unlabeled, or biotinylated at one or two C-termini, they will be washed away ( 5). Thereafter, from the eluent of the Ni-NTA column, C-terminal biotinylated crosslinked peptides are purified by incubation with streptavidin beads. Those very few crosslinked peptides that do not possess a biotin tag, but instead are labeled with a tetrahisitidine tag at one or two c-termini, will be washed away. In effect, a selection pressure is placed on crosslinked peptides that possess a biotin tag and a tetrahisitidine tag at the two C-terminals, a requirement that uncrosslinked peptides cannot hope to fulfill with their single C-terminal. The result is a population of only through-space crosslinked peptides. The enrichment methodology was successfully tested on a model peptide synthesized with a variety of amino acids including a cysteine residue allowing for the formation of an intermolecular disulfide cross-link and a methionine for the formation of a C-terminal HSL after CNBr chemical cleavage. The different stages of the methodology were monitored using MALDI mass spectrometry ( 6).

The enrichment methodology was then tested on a more complex system ( 7). Azurin, a copper binding protein was selected as a model protein due to the presence of a single natural internal disulfide cross-link and multiple methonine residues. A CNBr digest that cleaved at these residues produced the target fragment of 4756.29 Da with two peptides linked by a disulfide bond, a single peptide of 4922.51 Da and many other smaller fragments. As usual the tetrahistidine and biotin affinity tags were sequentially conjugated to the peptides. To add complexity the CNBr digested, affinity conjugated Azurin was spiked into a peptide mixture generated by a CNBr chemical cleavage of reduced and alkylated proteins in a bacterial cell extract. Interestingly the initial Ni-NTA IMAC purification copurified a small number of bacterial peptides in addition to expected Azurin peptides. However following the streptavidin purification step only “interpeptide” cross-links from Azurin remained. This demonstrates the usefulness of the two purification steps in this tandem purification methodology. In this experiment the different stages of the methodology were monitored using MALDI mass spectrometry. If more complex mixtures of cross-linked CNBr fragments are used, they can first be separated by two-dimensional gel electrophoresis. A number of short peptides for each cross-linked fragment can then be generated by in-gel trypsinization and can be used for identification by MALDI MS/MS or ESI MS/MS. Fourier transform mass spectrometers and orbitrap mass spectrometers could identify the cross-linked fragments directly without the need to separate peptides and digest them first.

Experimental Section (published in our Analytical Chemistry 2009 paper)

Peptides and other reagents

Bovine serum albumin (BSA) and horse liver alcohol dehydrogenase (ADH) were purchased from Sigma-Aldrich (Oakville, ON, Canada). The N-terminally acetylated model peptide AcCAPQEGILEDMPVDPDNEAY was synthesized using an automated peptide synthesizer (Applied Biosystems, Foster City, CA, USA) and azurin protein was generously provided by Dr. Yi Lu (University of Illinois, Urbana, IL, USA). The tetrahistidine (His-His-His-His-CO-NH-CH2-CH2-NH2) and biotin (biotin-CO-NH-CH2-CH2-NH2) conjugation reagents were purchased from AnaSpec (San Jose, CA, USA) and Biotium (Hayward, CA, USA), respectively. All other chemicals were from Sigma-Aldrich.

Cyanogen bromide cleavage

50 µg of model peptide or protein were subjected to cyanogen bromide cleavage in a 100 µl reaction volume containing 100 mM CNBr (diluted from a 5 M CNBr stock in ACN) in 86% trifluoroacetic acid (TFA). The incubation occurred in the dark at room temperature for the duration of 16 h. Following CNBr cleavage, all non-peptide reagents were removed by centrifugal vacuum concentration. Please note that near-complete removal of CNBr and TFA required three cycles of water addition and brief mixing of solvents followed by speed vacuum concentration to a volume of 10 µl.

Conjugation of affinity tags to homoserine lactone moiety

CNBr-cleaved model peptide or protein (1-5 µg) was fully converted into its lactonized form by addition of 20 µl of 100% TFA and subsequent drying in a speed vacuum concentrator. Primary amine-containing tags were dissolved in anhydrous dimethyl sulfoxide (DMSO) and added to the dried lactonized pellet. To maximize the conjugation efficiency multiple parameters of this reaction were optimized (as detailed in the Results section). The yield of conjugates was monitored by MALDI-TOF mass spectrometry. Following the conjugation of the tetrahistidine tag C18-reversed phase ZipTip (Millipore, Bedford, MA, USA) clean-up was undertaken to remove non-reacted tetrahistidine reagent. The peptides were eluted from the ZipTip matrix with 50% isopropanol in 0.1% TFA and the eluent evaporated to dryness in a speed vacuum concentrator. No clean-up of non-reacted labeling reagent was required following the second conjugation step with the biotin reagent.

IMAC purification

The tetrahistidine-tagged peptides were purified by immobilized metal affinity chromatography (IMAC), essentially following a method that had been described before [41]. Briefly, Gelloader® pipette tips (Eppendorf, Hamburg, Germany) were packed with 10-15 µL of 50% w/v nickel-nitrilo-tri-acetic acid (Ni-NTA) agarose (Qiagen, Valencia, CA). Following the conjugation step, the volume of the reaction mix containing an excess of non-conjugated tag reagent was initially reduced in a speed vacuum concentrator to 2 µL, then diluted with Ni-NTA Loading Buffer (50 mM Na2HPO4, 300 mM NaCl, pH 8.0) and loaded onto a pre-equilibrated Ni-NTA-packed Gelloader® tip. Following extensive washing with Ni-NTA Loading Buffer containing 2 mM imidazole, the slurry was briefly subject to a pre-elution rinse with water, and peptides were eluted with 0.2% TFA in 50% acetonitrile. Following volume reduction by speed vacuum concentration the pH of the eluate was neutralized by the addition of an excess of phosphate buffered saline (20 mM Na2HPO4, 150 mM NaCl).

Streptavidin Beads

Streptavidin agarose beads (Pierce, Rockford, IL, USA) were pre-equilibrated in PBS and added to the pH-adjusted eluent from the Ni-NTA purification step (15 ul of 50% slurry per sample). The capture of biotinylated proteins occurred with end-over-end rotation overnight at 4 °C. Subsequently, streptavidin beads were subjected to washes with: 2 × 50 µl of 0.1% BSA in PBS; 2 × 50 µl of 0.1% BSA + 0.1% SDS in 1×PBS; 2 × 50 µl of 0.1% BSA + 1 M NaCl in 1×PBS; 2 × 50 µL of 20% methanol + 100 mM imidazole in 50 mM NH4HCO3, and 50 µL of water. Insert sentence here describing rationale of BSA and imidazole washes. Elution of biotin-tagged peptides proceeded in the presence of 30% acetonitrile in 5% formic acid.

Mass Spectrometry

Analytes were mixed on the matrix assisted laser desorption ionization (MALDI) target with an equal volume of 2,5-dihydroxybenzoic acid (DHB) matrix (100 mg/ml in 50% acetonitrile, 0.1% TFA) and droplets were air dried. Data were acquired on a QStarXL quadrupole time-of-flight (QqTOF) mass spectrometer (Applied Biosystems/MDS Sciex, Concord, ON, Canada) equipped with an orthogonal MALDI (oMALDI™) source and a nitrogen laser operating at 337 nm with a pulse frequency of 20 Hz. Mass spectra were collected under the control of the operating software Analyst® QS (Appied Biosystems/MDS Sciex) by averaging 300 laser shots.

Quaternary Ammonium One Step Enrichment of Cross-linked Peptides

While the use of two separate affinity tags to enrich cross-linked peptides by their C-termini is a tremendous step forward in the field, it still suffers from a few drawbacks. With unspecific binding and less than quantitative recovery of peptides from affinity matrixes, the multiple purification steps translates into large sample losses. This loss is further exacerbated by the fact that a portion of the already substiochiometric “interpeptide” cross-links are lost due to the many tetrahistidine/biotin tag conjugation products unsuitable for purification by this method. A methodological adjustment that resolves these issues would be to label all the C-termini with a single affinity tag and enrich them in a single step. However to be able to isolate doubly-tagged “interpeptide” cross-links from other singly-tagged peptides it requires the ability to resolve one tag from two tags. Preliminary experiments using tetrahisitidine tags and biotin tags demonstrated that neither tag is suitable for enrichment on its own.

After a thorough review of the literature on chromatography techniques and types of affinity systems it was decided that a charge based separation would be suitable. Theoretically a single positive charge could be separated from two positive charges using a negatively charged matrix, if not by tight associations then at least by differential migrations. Due to the varying intrinsic charge of peptides the permanently positively charged quaternary ammonium functional group was selected as a tag. The choice reflected the assumption that the intrinsic charge of the peptides was negligible compared to the large positive charge of the quaternary ammonium. To ensure this, a weak polycarboxylate cation exchange resin was selected as the affinity matrix.

The methodology begins with the conjugation of a primary amine functionalized quaternary ammonium to the HSL at the C-termini of CNBr cleaved peptides ( 8). To remove the unreacted excess reagent peptides were purified by Ziptip with 10% ACN to compete with the small reagent. The conjugated peptides are then separated on a column packed with NTA, a preequilibrated polycarboxylate weak cation exchange resin, and cross-linked peptides are selectively purified in the eluate. The methodology was applied to the model protein Azurin. 9 demonstrates the highly efficient conjugation of the primary amine functionalized quaternary ammonium to CNBr cleaved Azurin peptides. Since the intrinsic charge of the peptides is influenced by the pH of the buffer, initial experiments were conducted at a neutral pH of 7. At this pH all peptides conjugated with a quaternary ammonium were purified while unlabeled peptides were lost in the flowthrough ( 10). Although this validates the affinity system it does not allow for separation of one positive charge from two positive charges. As such the pH of the buffer was raised to 8.5 to decrease the affinity of the peptides to the matrix. In these experiments only peptides with two C-termini and thus two quaternary ammoniums were enriched ( 11). These peptide pairs were not bound by cross-links and instead bound to each other due to an intrinsic affinity between the peptides. Unfortunately the larger disulfide linked peptides were not bound by the column. So while this experiment demonstrates that doubly charged associated peptides can be separated from singly charged peptides it can only do so with small fragments.

Benzylamine One Step Enrichment of Cross-linked Peptides

The partial success of the quanternary ammonium single tag method suggested that the approach is feasible but the appropriate selection of the tag and affinity system is critical. Due to the widespread use of reverse phase chromatography and the opportunity for automated separation by high pressure liquid chromatography (HPLC) an enrichment strategy based on hydrophobicity was selected. The choice of affinity tag was largely guided by the opportunity to simultaneously increase conjugation efficiency. Benzylamine was selected as the benzyl group is highly hydrophobic but at the same time has sufficient electron density to donate to the attached primary amine group, increasing nucleophilicty.

Due to the variable intrinsic hydrophobicity of peptides the methodology begins with the fractionation of CNBr cleaved peptides with a reverse phase column using a HPLC generated ACN gradient. Fractions within the gradient are isolated with each consisting of peptides with similar hydrophobicities. Each fraction is then conjugated with benzylamine through the HSL at the C-termini of the peptides within the fraction. Two benzylamines are incorporated within “interpeptide” cross-links while a single benzylamine is incorporated within the other peptides. A single conjugated fraction is then resolved again with a reverse phase column and an HPLC generated ACN gradient. Due to the greater hydrophobicity of two benzylamines than a single benzylamine the cross-linked peptides are retained in the column longer and elute later than the other peptides with a single benzylamine. Thus the few “interpeptide” cross-links can be selectively and easily isolated from other irrelevant peptides. Various complex samples to test the method were generated and preliminary evidence indicates that the addition of the benzylamine tags produce a significant shift in the retention of peptides on a reverse phase column with a peptide with two tags eluting later than a peptide with one tag which in turn elutes later than a peptide with no tags. However further optimizations of the HPLC gradient are necessary before the methodology is ready for application. If optimizations are not sufficient to resolve one benzylamine tag from two tags there exist many polyaromatic compounds with higher hydrophobicities to ensure the success of this methodology. When completed this method promises to revolutionize the field of proteomics.


1. Alberts, B. The cell as a collection of protein machines -- preparing the next generation of molecular biologists. Cell 92, 291-294 (1998).

2. Sali, A. & Kuriyan, J. Challenges at the frontiers of structural biology. Trends Biochem. Sci. 24, M20-M24 (1999).

3. Aloy, P. and Russel, R.B. (2002) The third dimension for protein interactions and complexes. Trends Biochem Sci 12, 633-638.

4. Mewes, H. W. et al. MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 30, 31-34 (2002).

5. Aebersold, R. and Mann, M. (2003) Mass spectrometry-based proteomics. Nature 422, 198-207.

6. Abola, E., Kuhn, P., Earnest, T. & Stevens, R. C. Automation of X-ray crystallography. Nature Struct. Biol. 7, 973-977 (2000).

7. Sali, A., Glaeser, R., Earnest, T. and Baumeister, W. (2003) From words to literature in structural proteomics. Nature 422, 216-225.

8. Govindarajan, S., Recabarren, R. & Goldstein, R. A. Estimating the total number of protein folds. Proteins 35, 408-414 (1999).

9. Gavin, A.C., Aloy, P., Grandi, P., Krause, R., Boesche, M., Marzioch, M., Rau, C., Jensen, L.J., Bastuck, S., Dumpelfeld, B., Edelmann, A., Heurtier, M.A., Hoffman, V., Hoefert, C., Klein, K., Hudak, M., Michon, A.M., Schelder, M., Schirle, M., Remor, M., Rudi, T., Hooper, S., Bauer, A., Bouwmeester, T., Casari, G., Drewes, G., Neubauer, G., Rick, J.M., Kuster, B., Bork, P., Russell, R.B. and Superti-Furga, G. (2006) Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 631-636.

10. Schmitt-Ulms, G., Legname, G., Baldwin, M.A., Ball, H.L., Bradon, N., Bosque, P.J., Crossin, K.L., Edelman, G.M., DeArmond, S.J., Cohen, F.E. and Prusiner, S.B. (2001) Binding of neural cell adhesion molecules (N-CAMs) to the cellular prion protein. J Mol Biol 314, 1209-1225.

11. Ito, T. et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl Acad. Sci. USA 98, 4569-4574 (2001).

12. Fields, S. and Song, O. (1989) A novel genetic system to detect protein-protein interactions. Nature 340, 245-246.

13. Zal, T. and Gascoigne, N.R. (2004) Using live FRET imaging to reveal early protein-protein interactions during T cell activation. Curr Opin Immunol 16, 674-683.

14. Ooi, S.L., Shoemaker, D.D. and Boeke, J.D. (2003) DNA helicase gene interaction network defined using synthetic lethality analyzed by microarray. Nat Genet 35, 277-286.

15. Tong, A.H., Evangelista, M., Parsons, A.B., Xu, H., Bader, G.D., Page, N., Robinson, M., Raghibizadeh, S., Hogue, C.W., Bussey, H., Andrews, B., Tyers, M. and Boone, C. (2001) Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 2942, 2364-2368.

16. Lockhart, D. J. & Winzeler, E. A. Genomics, gene expression and DNA arrays. Nature 405, 827-836 (2000).

17. Gavin, A.C., Bosche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., Schultz, J., Rick, J.M., Michon, A.M., Cruciat, C., Remor, M., Hofert, C., Schelder, M., Brajenovic, M., Ruffner, H., Merino, A., Klein, K., Hudak, M., Dickson, D., Rudi, T., Gnau, V., Bauch, A., Bastuck, S., Huhse, B., Leutwein, C., Heurtier, M.A., Copley, R.R., Edelmann, A., Querfurth, E., Rybin, V., Drewes, G., Raida, M., Bouwmeester, T., Bork, P., Seraphin, B., Kuester, B., Neubauer, G. and Superti-Furga, G. (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141-147.

18. Hernandez, H., Dziembowski, A., Taverner, T., Seraphin, B. and Robinson, C.V. (2006) Subunit architecture of multimeric complexes isolated directly from cells. Embo J 7, 605-610.

19. Melcher, K. (2004) New chemical crosslinking methods for the identification of transient protein-protein interactions with multiprotein complexes. Curr Protein Pept Sci 4, 287-296.

20. Sinz, A. (2006) Chemical cross-linkers and fourier transform ion cyclotron resonance mass spectrometry for structural analysis of a protein/peptide complex. J Amer Soc Mass Spec 17, 1100-1113.

21. Back, J.W., de Jong, L., Muijsers, A.O. and de Koster, C.G. (2003) Chemical cross-linking and mass spectrometry for protein structural modeling. J Mol Biol 331, 303-313.

22. Schmitt-Ulms, G., Hansen, K., Liu, J., Cowdrey, C., Yang, J., DeArmond, S., Cohen, F.E., Prusiner, S.B. and Baldwin, M.A. (2004) Time-controlled transcardiac perfusion cross-linking for the study of protein interactions in complex tissues. Nat Biotechnol 22, 724-731.

23. Alley SC, Ishmael FT, Jones AD, Benkovic SJ. Mapping protein-protein interactions in the bacteriophage T4 DNA polymerase holoenzyme using a novel trifunctional photo-cross-linking and affinity reagent. J. Am. Chem. Soc. 2000; 122: 6126.

24. Kosower EM, Kosower NS. Bromobimane probes for thiols. Methods Enzymol. 1995; 251: 133.

25. Muller DR, Schindler P, Towbin H, Wirth U, Voshol H, Hoving S, Steinmetz MO. Isotope-tagged cross-linking reagents. A new tool in mass spectrometric protein interaction analysis. Anal. Chem. 2001; 73: 1927.

26. Back JW, Artal Sanz M, de Jong L, de Koning LJ, Nijtmans LGJ, de Koster CG, Grivell LA, van der Spek H, Muijsers AO. A structure for the yeast prohibitin complex: structure prediction and evidence from chemical crosslinking and mass spectrometry. Protein Sci. 2002; 11: 2471.

27. Bennett KL, Kussmann M, Bjork P, Godzwon M, Mikkelsen M, Sørensen P, Roepstorff P. Chemical cross-linking with thiol-cleavable reagents combined with differential mass spectrometric peptide mapping—A novel approach to assess intermolecular protein contacts. Protein Sci. 2000; 9: 1503.

28. Back JW, Notenboom V, de Koning LJ, Muijsers AO, Sixma TK, de Koster CG, de Jong L. Identification of cross-linked peptides for protein interaction studies using mass spectrometry and 18O labeling. Anal. Chem. 2002; 74: 4417.

Methods and select s taken from our publication: Shi et al. Method for the Affinity Purification of Covalently Linked Peptides Following Cyanogen Bromide Cleavage of Proteins. Analytical Chemistry. 2009; 81:9885-9895