Heterologous Protein Expression In E Coli Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Heterologous protein expression means that a protein is expressed into a cell that does not normally produce it. The gene coding for the protein to be expressed is transferred via transfection or transduction into the host cell in such a way that the protein expression is encouraged. The host cells are referred as heterologous expression systems and the protein is termed as recombinant protein. The recombinant proteins are used in immunization, therapeutic and biotechnological application, biochemical studies etc. There are many heterologous expression systems such as Escherichia coli, Yeast (Saccharomyces cerevisiae, Pichia pastoris etc ), Dictyostelium discoideum (have circular nuclear plasmids packaged in a nucleosomal structure), Xenopus oocytes (South-African clawed toad), Insect cells (infection of insect cells by recombinant baculoviruses), Mammalian cell lines (COS cells, an engineered line of African green monkey kidney cells) COS stands for CV-1 origin, SV-40(virus). Out of these, E.coli is a convenient heterologous expression system because it has many advantages over others such as, it include high levels of heterologous gene expression and scalability of experiments, low cost, fast growth, ability to express labeled proteins (Niraj H Tolia and Leemor Joshua-tor, 2006) and also it has a tremendous range of vectors and genetic resources, including promoters and regulatory systems. The table below shows comparison between bacteria (E.coli), yeast, baculovirus, mammalian culture w.r.t cell growth rate, method complexity, cost, expression and post-translational modifications :

Figure 1- http://homepage.agron.ntu.edu.tw

"In general, the expressed protein accumulates either in the cytoplasmic or periplasmic space i.e. intracellular and or can be expressed extracellular. The cytoplasm is the first choice for the heterologous protein production because higher yields can be obtained" (Kay Terpe, 2006). E.coli has remarkable capacity to produce large quantities of proteins but recombinant proteins are often expressed in low amounts because of two main reasons, first cloning of the appropriate gene into the expression vector and second expression of the recombinant gene in the host. This has resulted in development of a variety of strategies for achieving high-level expression of proteins to give high yield at low cost, such as expression vectors design, promoter strength (transcriptional regulation), translation initiation and termination (translational regulation) for proper cloning of the appropriate gene and gene dosage, mRNA stability, correct protein folding and host design considerations, codon usage, toxicity of the proteins and fermentation factors available for manipulating the expression conditions for proper expression of the recombinant gene in the host cells.

DNA sequences involved in transcription

For the transcription of genes, three different DNA sequences and one multicomponent protein are involved: the promoter, the transcriptional terminator, the regulatory sequence and the RNA polymerase. RNA polymerase consists of five different components termed a, b, b', w and s. While a2bbw constitute the core protein s confers promoter specificity. The N-terminal part of a forms dimer and bind to b and b' while its C-terminal tethered through a flexible linker to its N-terminal is responsible for interaction with some promoters or transcriptional activators. The b subunit is the target for antibiotic resistance and b' allows unspecific binding to the DNA. The role of w is unknown but is assumed to play a role in RNA polymerase assembly ( Gruber and Gross, 2003).

Promoter consists of three regions called the -35 and the -10 box and the spacer region separating both boxes. Alignment of promoters allows the deduction of a so called consensus sequence, and the consensus sequence for s70 is TTGACA-N17-TATAAT which gives the optimal promoter sequence with a spacer region of 17 nucleotides. The important function of promoter is to stimulates the transcription process ( Wolfgang Schumann; Luis Carlos S. Ferreira , 2004). Promoters for heterologous protein expression in E. coli are Plac.which is negatively regulated by lacI. Need for sufficient levels of repressor (lacIq and lacIq1 alleles on vectors), PlacUV5 is very popular because its regulation is not dependent on CAP. Ptrp.it is also negatively regulated by trpR. Vectors containing this promoter can be transformed into any strain, easy induction by starvation for tryptophan. Not suitable for expression of proteins with high Trp content. In addition to this, hybrid promoters - Ptac and Ptrc are also available which are induced by IPTG and are a lot stronger than Plac and Ptrp individually. PBAD - induced by arabinose (Invitrogen), T7 system which uses T7 promoters, which require T7 RNA polymerase. T7 RNA polymerase (encoded by T7 gene 1) has stringent specificity for its own promoters. It initiates and elongates chains 5 times faster than E. coli RNA Pol and is resistant to Rifampicin (unlike E. coli Pol). pET series of vectors. pET - Plasmid for Expression by T7 RNA pol. which are made commercially available by Novagen.

A transcriptional terminator is required to allow termination of transcription. There are two classes of terminators, factor-independent and -dependent terminators ( Wolfgang Schumann; Luis Carlos S. Ferreira , 2004). The first class consists of an inverted repeat followed by several A residues on the template DNA strand. When the RNA polymerase has transcribed the inverted repeat, it folds immediately into a stem-loop structure at the level of mRNA to cause pausing of the enzyme. Since the stem-loop structure is followed by several U residues which make a weak interaction with the A residues on the template DNA, dissociation of the enzyme results. But no terminator will result in the dissociation of each RNA polymerase molecule resulting in readthrough-transcription into the neighboring gene(s). To reduce this read-through, often two different transcriptional terminators are placed in tandem on the expression vectors. Particularly effective are the two tandem transcription terminators T1 and T2, derived from the rrnB rRNA operon of E. coli (Brosius et al., 1981). For second class Rho factor-dependent terminators are been used.

DNA sequences involved in translation

The translation initiation region comprises four different sequences: (1) the Shine-Dalgarno sequence, (2) the start codon, (3) the spacer region between the Shine-Dalgarno sequence and the start codon, and (4) sometimes translational enhancers ( Wolfgang Schumann; Luis Carlos S. Ferreira , 2004).

Shine-Dalgarno sequence is a sequence in the ribosome-binding site (RBS) and interacts with the complementary 3' end of the the 16S rRNA during translation initiation (Shine and Dalgarno, 1974). In E.coli, the initiation codons used are AUG, GUG and UUG, the spacing between the Shine-Dalgarno sequence and initiation codan varies from 5 to 13 nucleodited and this influences the efficiency of translation (Gold, 1988). The secondary structure at the translational initiation region of mRNA plays an important role in the efficiency of the gene expression (Ramesh et al., 1994), the mutation of specific nucleotides up or downstream from the Shine-Dalgarno sequence suppresses the formation of mRNA secondary structures and enhanced the translation efficiency (Coleman et al., 1985; Gross et al., 1990).

ATP-dependent proteases

ATP-dependent proteases recognizes the heterologous protein in the cytoplasm and degrade them to amino acid residues. It happens in following steps first, due to internal degradation signal the protease recognize the protein for degradation, second the ATP-hydrolysis promotes both translocation and unfolding into the proteolytic chamber and finally the proteins are hydrolysed. Five different ATP-dependent proteases have been identified in E. coli -Lon, ClpAP, ClpXP, ClpYQ and FtsH ( Wolfgang Schumann; Luis Carlos S. Ferreira , 2004).

Codon Usage

There is a difference between the codon usage of the E.coli and the recombinant protein. Amino acids are coded by more than one codon and each organism have its own bias in the usage of the codons. In each cell, the tRNA population closely reflects the codon bias of the mRNA population(Dong et al. 1996). So it is necessary that the mRNA of the recombinant protein should express itself via formation of only one type of tRNA or this may lead to translational stalling, premature translation termination, translation frameshift, and amino acid misincorporation (Kurland and Gallant, 1996). Expression may be improved by supplying the limiting amino acid in the culture media (Kane 1995). To improve the expression of heterologous protein containing the rare codons in E.coli, many engineered strains of E.coli were obtained given in table 1.

For example, codon usage for arginine of four different species is presented in table 3. While the codons AGA and AGG are rare codons in E. coli, they represent frequently used codons in Saccharomyces cerevisiae and Homo sapiens. Overexpression of genes with high contents of rare arginine codons may result in defective synthesis of the corresponding protein. Besides the amount, the location of rare codons within the coding region can significantly influence the translation level. Chen and Inouye (1990) demonstrated that the closer AGG codons were to the initiation codon, the stronger the effect on protein synthesis. They showed that single and, particularly, tandems of two to five AGG have stronger effects when placed closer to the translation start, this is because the rare codons close to the initiator may stall the ribosome and prevent the entry of new incoming ribosomes (Chen and Inouye, 1994).

Optimization of the codon can be done by in vitro gene synthesis thus removing the rare codons and Site-directed-mutagenesis can be used if the number of rare codons in the gene is less.

Gene Dosage

High copy number leads to rapid intracellular production of the proteins and results in product aggregation whereas low copy number leads to slower and more sustained production of soluble and active protein. If the promoter strength, mRNA stability and the efficiency of translation initiation are favored low gene dosage will results in high yield if the production period is sustained.

mRNA Stability

mRNA stability can affect the expression rates. In E. coli, two exonucleases, RNase II (rnb) and polynucleotid phosphorylase (pnp) have been identified both attack mRNA molecules at their 3' end. No exonuclease has been identified attacking from the 5' end. And 3' ® 5' degradation of transcripts by one of the two exonucleases can be delayed by secondary structures present at or near the 3' ends. Secondary structures at the 5' end sequestering the Shine-Dalgarno and/or the start codon within a double-stranded stem significantly reduce translation of that transcript since it will be barely recognized by the 30S ribosomal subunit.. Some of these stem-lop structures may act as stabilizers when fused to heterologous mRNAs ( Wolfgang SchumannI; Luis Carlos S. Ferreira , 2004). Vectors have been developed ensuring translational coupling of recombinant genes (Tarragona et al., 1992; Birikh et al., 1995).

Inclusion bodies and how to prevent their formation

Rapid production of recombinant proteins can lead to the formation of insoluble aggregates designated as inclusion bodies (Betts and King, 1999). These are large, spherical particles which are clearly separated from the cytoplasm and result from the failure of the quality control system to repair or remove misfolded or unfolded protein mRNAs ( Wolfgang SchumannI; Luis Carlos S. Ferreira , 2004). These aggregates consists of several impurities such as host proteins, ribosomal components and circular and nicked forms of plasmid DNA and do not consist of pure recombinant polypeptide chains and may contain heat shock proteins. To avoid the formation of these inclusion bodies the production of the heterologous protein should be slow and should have low-copy number vectors, weak promoters, low temperature, use of solubilising partner and fermentation at extreme pH values.


A lower level of protein synthesis from a weaker promoter or from a strong promoter under conditions of partial induction is found to result in a higher amount of soluble protein and greater specific activity (Hockney, 1994). Growing at low temperatures promotes correct folding because there is minimum protein self-association due to decrease in the folding kinetics of the polypeptide chain. Growing the cells at relatively high concentrations of polyols or sucrose also leads to suppression of formation of inclusion bodies into the periplasmic space. These polyols influence the physiochemical processes which results in protein-protein association. These substances do not have effect on the cytoplasmic proteins as they do not permeate through the cell membrane. e.g. It has been shown that cells grown in the presence of sorbitol at 25 °C produce 400-fold higher levels of recombinant protein than control cultures (Blackwell and Horgan, 1991).

Sometimes, it might be desirable to obtain heterologous proteins such as a derivative of tissue plasminogen, human growth hormone etc. as inclusion bodies. In these proteins the hydrophobic sequences are not properly protected by chaperonins and the intermolecular interaction produces stable aggregates. The recovery involves cell lysis and centrifugation followed by solubilization in a buffer with a detergent such as urea followed by in-vitro protein folding by addition of low-molecular weight folding enhancers such as guanidiumchloride, urea, polyethyleneglycol.

Host design consideration

The complete genomic sequence of E.coli is known and it is possible to modify the host for stabilization of the protein product, more efficient metabolism and for more efficient protein folding. e.g. Chou, Bennett and San in 1994, decreased specific glucose uptake by inactivating the ptsG gene encoding the glucose-specific enzyme II of the phosphor transferase system (PTS). Other enzyme IIs could still mediate glucose transport, although at a lower rate. In this way, glycolytic flux was decreased to reduce acetate spillage from acetyl-CoA accumulation.

Fermentation Factors

Optimizing the culture medium (like LB or 2YT) is mandatory for the expression of the heterologous proteins in E.coli. various culture conditions that results in optimized protein production are proper aeration should be maintained. Optimum is 20% rule, Volume of the culture media should not exceed the 20% of the maximum capacity of the flask, Induction at proper growth phase also affects protein expression. Mostly, mid log phase cultures are used for induction, Concentration of inducer should also be empirically determined and Proper temperature also affects the expression. Grow the cultures at 37°C and after induction grow at 23-30°C.

Extracellular Protein Expression

It is very difficult to selectively release heterologous proteins from E.coli into the surrounding medium. This was shown in an experiment conducted by Wan EW, Baneyx, 1998 the third topological domain of TolA was secreted into the periplasm, When induced, however, most of the periplasmic proteins were released and the culture suffered a three order of magnitude loss in viability.

Other factors affecting the expression of recombinant protein in the host cell are Affinity tag used Size of the heterologous protein and Source of the heterologous protein.

Affinity Tag used

Affinity tags permit a variety of protein to be purified using easy procedures. The two most commonly used tags are glutathione S-transferase (GST tag) and 6 x histidine residues ((His)6 tag). Small affinity tags such as His-tag do not interfere with the protein expression. The tags of bacterial origin are expressed more than that obtained from source other than bacteria.

Size of heterologous protein

Size of the heterologous protein plays an important role in the expression and stability of the protein being expressed. Proteins of size less than 100KDa are well tolerated in E.coli and thus are significantly expressed, while proteins of size more than 100KDa are unable to express properly in E.coli and hence are degraded.

Cell-free protein synthesis

Cell free protein synthesis offers many potential advantaged such as it can allow the synthesis of proteins toxic to cell division, it allows most of the metabolic resources to be focussed only on product synthesis and it provides incredible flexibility in manipulating protein synthesis and folding. These factors gives large protein yield but it also increase the production cost (James R Swartz, 2001).

Design of an optimal expression system for E. coli

Based on the information provided above, we can try and propose the design of an optimal expression system for E. coli. It should be composed of DNA elements directing efficient transcription, powerful translation, stabilizing the transcript, resulting in authentic recombinant protein without any contamination by truncated or extended versions, and it should stay soluble and accumulate to about 20% of the total cellular protein. Proper promoters and the expression vectors should be used such as Plac and Ptrp and or combination of both along with the expression vector such as pET-series and also the properstrain of E.coli accompanied by tagging of the protein by His-tag. Readthrough transcription into neighbouring genes is prevented by two strong factor-independent transcriptional terminators arranged in tandem. The transcript itself is stabilized by inverted repeats present at both ends able to form stem-loop structures impairing endonuclease attack at the 5' end and exonucleolytic degradation from the 3' end but not translation. Last but not least, efficient translation is assured by a strong Shine-Dalgarno sequence, an AUG start codon located about 8 bp downstream and the extended UAAU stop codon. Folding of the nascent polypeptide chains is aided by coexpression of folder chaperones. But it has to be mentioned at the end that there is no optimal expression system working with all recombinant proteins. Each protein poses a new problem, and a high level of synthesis has to be optimized in each single case by empirical variation of the different parameters( Wolfgang SchumannI; Luis Carlos S. Ferreira , 2004).


There are many organisms and expression systems being used for the production of recombinant protein but E.coli remains the first choice of selection. Due to its usage history in the production of proteins its genomic sequence is known and can be easily altered as per need. The variety of promoters, expression vectors, tags, strains etc make it possible to choose it for expression of variety of recombinant proteins combined with rapid growth and high production rates makes it a powerful and versatile expression system. New approaches such as cell free protein expression opens new window of opportunities to exploit for this simple and highly productive organism.


Niraj H Tolia and Leemor Joshua-tor, 2006 Nature Publishing Group http://www.nature.com/naturemethods.


Kay terpe: overview of bacterial expression systems for heterologous protein production: from molecular and biochemical fundamentals to commercial systems, applied Molecular Biotechnol (2006) 762;211-222

Gruber TM and Gross CA (2003) Multiple sigma subunits and the partitioning of bacterial transcription space. Annu Rev Microbiol 57:441-466.

Wolfgang Schumann; Luis Carlos S. Ferreira :Production of recombinant proteins in Escherichia coli, University of Bayreuth, Institute of Genetics, Bayreuth, Germany

Universidade de São Paulo, Instituto de Ciências Biomédicas, Departamento de Microbiologia, São Paulo, SP, Brazil, Genet. Mol. Biol. Vol.27 no.3 sao Paulo 2004.

Brosius J, Ullrich A, Raker MA, Gray A, Dull TJ, Gutell RG and Noller HF (2003) Construction and fine mapping of recombinant plasmids containing the rrnB ribosomal RNA operon of E. coli. Plasmid 6:112-118.  

Shine J and Dalgarno L (1974) The 3'-terminal sequence of Escherichia coli 16S ribosomal RNA: Complementarity to nonsense triplets and ribosome binding sites. Proc Natl Acad Sci USA 71:1342-1346.   

Gold L (1988) Posttranscriptional regulatory mechanisms in Escherichia coli. Annu Rev Biochem 57:199-233.        

Ramesh V, De A and Nagaraja V (1994) Engineering hyperexpression of bacteriophage Mu C protein by removal of secondary structure at the translation initiation region. Protein Engin 7:1053-1057.

Coleman J, Inouye M and Nakamura K (1985) Mutations upstream of the ribosome-binding site affect translation efficiency. J Mol Biol 181:139-143.

Dong H, Nilsson L, Kurland CG: Gratuitous overexpression of genes in Escherichia coli leads to growth inhibition and ribosome destruction. J Bacteriol 1995, 177:1497-1504.

Kane JF, 1995: Effects of rare codon clusters on high level expression of heterologous protein in Escherichia coli.. Curr Opin Biotechnol 6;494-500.

Chen G-FT and Inouye M (1990) Suppression of the negative effect of minor arginine codons on gene expression: Preferential usage of minor codons within the first 25 codons of the Escherichia coli genes. Nucleic Acids Res 18:1465-1473. 

Chen G-FT and Inouye M (1994) Role of the AGA/AGG codons, the rarest codons in global gene expression in Escherichia coli. Genes Dev 8:2641-2652.

Betts S and King J (1999) There's a right way and a wrong way: In vivo and in vitro folding, misfolding and subunit assembly of the P22 tailspike. Structure 7:R131-R139. 

Tarragona-Fiol A, Taylorson CJ, Ward JM and Rabin BR (1992) Production of mature bovine pancreatic ribonuclease in Escherichia coli. Gene 118:239-245. 

Hockney RC (1994) Recent developments in heterologous protein production in Escherichia coli. Trends Biotechnol 12:456-463. 

Blackwell JR and Horgan R (1991) A novel strategy for production of a highly expressed recombinant protein in an active form. FEBS Lett 295:10-12.

Wan EW, Baneyx F: TolAIII co-overexpression facilitates the recovery of periplasmic recombinant proteins into the growth medium of Escherichia coli. Protein Express Purif 1998, 14:13-22.

Chou C-H, Bennett GN, San K-Y: Effect of modified glucose uptake using genetic engineering techniques on high-level recombinant protein production in Escherichia coli dense cultures. Biotechnol Bioeng 1994, 44:952-960

James R Swartz:Advances in Escherichia coli production of therapeutic proteins, Department of Chemical Engineering, Stanford University, Stanford, CA 94305-5025, USA; e-mail: swartz@chemeng.stanford.edu Current Opinion in Biotechnology 2001, 12:195-201