The general features of HPV and cervical cancer

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.


Human Papillomaviruses (HPVs) generate more than 50% of cervical cancers in women. Every year, around 50 000 news cases are diagnosed (1). That is why there is so much interest in understanding this disease, and in the creation of an effective treatment.

To find an effective drug against this infection, we need a better understanding of the viral life cycle and of the interactions between the viral proteins and the host cell. In this essay, I will focus on the interaction between two kinds of proteins: the host cell protein, TopBP1 (topoisomerase IIβ-binding protein 1) and two viral proteins, E1 and E2.

Previous research shows that the interactions between E1 and E2 (32) are involved in three important mechanisms of the life cycle: the replication and transcription of the viral genome, as well as the mitotic segregation of the episomes during the mitosis (32, 33 and 13). Some experiments (GST-tag pull down assay, yeast two hybrid assays …) provide evidence of in vitro and in vivo interactions between the complex E1-E2, and the host cell protein TopBP1 (TopBP1 is a highly regulated protein involved in a lot of cellular mechanisms in eukaryote cells, like cellular replication, ...) (18).

This leads to the following hypothesis: the HPV replication escapes the cell cycle control by interacting directly with TopBP1 which is, in healthy mammalian cells, strictly controlled by S-phase kinases. Thanks to this direct interaction, the HPV genome could duplicate without any host cell controls (18).

Previous research discovered that the interactions between TopBP1 and the two viral proteins are required for efficient viral DNA replication, it enhances the load of DNA polymerase onto the viral helicase E1 (3) and also that TopBP1 regulates the association of E2 with mitotic chromatin (13). The mechanisms of these regulations remain poorly understood. Current studies try to solve this problem by attempting to determine the crystal structure for the TopBP1-E2-E1 complex.

Further studies could lead to a better understanding of the human papillomavirus life cycle and to the identification of novel anti-viral targets with therapeutic potential. For instance, finding a molecule that could interfere with the formation of the duplex TopBP1-E2 and so could prevent the replication of the HPV.

In this essay related to my lab work, I will describe two proteins involved in the complex: TopBP1 and E2. I will describe their roles in the HPV life cycle, their structures and the nature and consequences of their interactions.

The HPV particles are viruses that establish infections in the stratified epithelium of the skin (or mucous membranes) of humans and animals. Researchers have already sequenced more than 130 different types of human papillomavirus genotypes, and have discovered that each virus is species specific as well as being tissue-tropic (5). The papillomavirus virion is non-enveloped and has an icosahedral shape of 55-60nm diameter. The viral genome is contained in this structure and is packaged in a nucleohistone complex. These viral particles could produce different types of diseases such as warts, anogenital warts, and in some cases, malignant carcinomas of the cervix, vulva, vagina, penis, anus … (6).

They can be separated into high- (HPV16, -18, -31, -33, -45 and -56) or low-risk (HPV6, -11) types, in relation to their oncogenic potential (34).

To initiate the infection, papillomaviruses have to reach the primitive basal keratinocytes of the epidermis through injuries or microabrasions and enter by clathrin-mediated endocytic pathways (22).

In the host cell, the viral genome is maintained in the nucleus of the dividing basal cells as an extra chromosomal, circular and double stranded DNA of approximately 7000-8000 bp (figure 1) (6). But recent studies showed that the integration of HPV DNA into the host genome, causing the disruption of E1 and E2 (two viral open reading frames), is an important step of the generation of a cancer (5, 20). This will be expanded upon in the next section.

There are two types of HPVs. Firstly, the low-risk HPVs: which are responsible for benign genital warts, and secondly, the high-risk HPVs that are found in malignant lesions of the cervix (HPV-16 are found in 50-70% of cases and HPV18 in 7-20%) (1).

The viral genome codes for two different kinds of proteins, those that are expressed in the lower layers of a papilloma (early regions E) and two others which are expressed in the more differentiated cells of the epidermis (late regions L).

The genome only codes for 9 proteins, but each viral protein interacts with and regulates a lot of cellular proteins. This figure lacks the E0 proteins because it was discovered only a few months ago. E0 is located between the E1 and the E7 protein on the genome (7).

All of these proteins play a very important role in the virus' life cycle and allows the virus to replicate and avoid the host's immune system.

In figure 1, we can see that the HPV genome is divided into three different regions: a non-coding region URR (upstream regulatory region), an early region with six different open reading frames E0 to E2 and E4 to E8. These proteins are expressed in the beginning of the infection, in the dividing mitotic cells, and are involved in the viral replication process and the immune system escape.

There is also a late region that codes for two proteins L1 and L2, which are capsid proteins (5). The L1 capsomer represents about 90% of the viral capsid (8).

As previously stated in the introduction, we will focus on two viral proteins. The first is E1, an ATP-dependent helicase that binds and unwinds the viral replication origin (9, 10). This protein is involved in the replication mechanism of the HPV. E1 works in cooperation with the E2 protein (8), which is also a transcriptional regulator of the virus (33). We will discuss the role of the E2 protein in the next section.

The HPV replication is dependent on the cellular DNA synthetic machinery; E1 and E2 alone are not sufficient to permit the replication of the viral genome, as the virus has to use the DNA polymerase of the host cell. The papillomavirus replication cycle happens in three steps: Establishment, Maintenance and Amplification (see figure 2).

Just after the infection, the viral genome is transported to the nucleus in order to start the first step of the papillomavirus life cycle: the Establishment. During this phase, the genome is amplified in the nucleus but is kept to a low number of copies (10 to 50 per cell).

In the Maintenance phase, these genomes are maintained at a constant low copy number in the dividing basal cells. This means that HPVs have to have a good mechanism for dividing into daughter cells mechanism; this mechanism will be discussed in the next section.

The third phase of the life cycle occurs in the differentiated cells. The genome is amplified to a large number of copies (at least 1000 copies/cell) to allow for the formation of new virion particles.

HPV uses a special replication strategy in which the replication and the virus assembly occur in cells destined for death (keratinocytes). The result is that the replication of the virion does not generate any inflammatory response and the HPV is almost invisible to the host (1).

The HPV E2 protein:

The E2 protein is a particle of about 43kDa that binds to a 12bp palyndromic DNA sequence of the viral genome. In physiological conditions, E2 forms a homodimer and interacts with E1 and other cellular proteins. E2 plays different roles in the life cycle of the HPV. It is an essential factor for regulating viral replication, viral transcription and interaction with the chromosomes (3, 12, 13, 32 and 33).

Structurally, the E2 protein has three distinct domains. The amino terminus (N-terminal transactivation domain: TAD) (~200 residues) is involved in the transcription and replication properties of the protein by interacting with cellular proteins, such as AMF1, TopBP1, TFIIB, p300/CBP and SMN (25,26,27,28 and 29) and viral proteins (E1 and L2). The carboxyl terminus (~100 residues) is involved in the formation of the homodimer and in binding in to the DNA, and finally there is a flexible proline rich-hinge region (~70 residues) between these domains, whose function is still under investigation (see figure 4) (3, 14).

Structure of E2 TAD:

Particular attention is given to the TAD domain because this domain is involved in the interactions between E2 and TopBP1.

Structure investigations of HPV16 E2 TAD show an overall L-shaped, three dimensional structure composed of two domains. The N1 is composed of a three-helix bundle (green) and the N2 domain is composed almost entirely of anti parallel β-sheet structures (red) (figure 5) (12). Between these two domains there are residues that make two consecutive single turns of helical structure (yellow), and contain 12 conserved amino acids that have a structural role; this region plays an important role in the orientation of the two E2 domains (23).

0.17% of the 200 amino acids are highly conserved among the HPVs. Mutagenesis experiments show that three of these (Arg37, Ile73 and Glu39) located on the N1 domain are involved in protein-protein interactions (figure 4) (23).

Crystal studies show the dimer association between two TAD domains of E2. The amino acids involved in the dimer interface belong to the αA and αB helix of the N1 domain and to residues 142-144 from N2. The interface is made by hydrogen bonds and no-polar contacts.

Regulation of viral replication:

The formation of the prereplication complex E1-E2 at the viral origin of replication (ori site) is an essential step toward the formation of an active replication complex (32).

As previously stated the domain of E2 which is involved in the regulation of the replication is the TAD domain. The role of E2 is to recruit E1 monomers to form a di-hexameric helicase in the origin of replication, and to transform the complex into an active ATP-dependent helicase which would be able to unwind the viral DNA (9, 10).

As both E1 and E2 form homodimer in physiological conditions, the prereplication complex at the ori site is a ternary complex. There remains ambiguity regarding steps necessary to the formation of this complex.

E2 has a double function: it initiates the prereplication complex and is a helicase loader.

E2 can also regulate the replication by interacting with cellular proteins such as TopBP1. It is possible that, under certain conditions, the ability of TopBP1 to activate transcription in conjunction with E2 may be important. This relationship is not fully understood, however and is still under investigations (3).

Regulation of viral transcription:

When E2 is expressed from the virus, it is a transcriptional repressor of E6/E7. A malfunction of that protein can be involved in carcinomas generation. It is already known that HPV-associated carcinomas express more E6 and E7 proteins. E6 and E7 proteins respectively target two cellular tumor suppressor protein, p53 and pRb (35 and 30). In most cervical carcinomas, E1 and/or E2 are disrupted by integration. Mutagenic studies show that both the DNA binding and transactivation function of E2 are required for repression of early HPV16 genes (E6/E7) transcription (20). Recent studies show a novel mechanism of the E2 regulation mechanism. The N-terminal dimer of E2 can regulate the transcription of the other ORFs of the viral genome by a DNA-looping mechanism via interaction of two dimer of E2 (see figure 7). That would bring distally bound transcription factors close to the site of transcription initiation. This mechanism was already described in other transcription factors. But this hypothesis still has to be confirmed by further experiments (24).

Interaction of HPV E2 protein with mitotic chromosomes

HPV are viruses that possess episomal genomes, so they must ensure that their viral genome will be maintained for future generations. In order to do that, they must have an efficient partitioning mechanism, which is provided by the viral E2 protein.

The E2 protein simultaneously links the viral genome to the cellular chromosome to ensure segregation into daughter cells following cell division. There is a link between these two at every stages of the mitosis. This link breaks after the last stage of the mitosis. Mutagenesis and immunofluorescence studies prove that the transactivation domain of E2 is sufficient for the association with mitotic chromosomes. But the DNA binding domain plays another important role; it mediates the link between the E2 binding site on the viral genome and the cellular chromosome (13).

Some experiments show that TopBP1 is involved in the regulation of the interactions between HPV16 E2 and chromatin. The removal of TopBP1 generates an enhanced affinity for chromatin by E2. The current hypothesis suggests that TopBP1 acts as a chromatin receptor for E2 during the mitosis and, in the absence of TopBP1, E2 associates with higher affinity with an alternative chromatin receptor. So interactions between TopBP1 and E2 may play a key role in the chromosome segregation (19).

The topoisomerase IIß binding protein 1:

TopBP1 is a protein of 180-kDa, and was originally identified as a DNA topoisomerase II interacting protein. This protein is involved in many different processes in the cell: DNA-damage checkpoint, regulation of some transcription factors, recruitment of DNA pol-α (DNA replication) … (37 and 36)

These protein-protein interactions result from the nine BRCT domains that are spread along the genome (figure 7) (15, 16).

The particularity of TopBP1 is that it contains 9 BRCT domains (carboxyl-terminal domain of the Breast Cancer Gene 1 BRCA1). These domains are conserved structures for protein-protein interactions. They are conserved regions among proteins involved in DNA repair pathways (3, 4 and 21).

The overall topology of a BRCT domain is a central, four-stranded, parallel β-sheet flanked on one side by a single helix (α2) and on the opposite side by a pair of helices (α1 and α3). There are two subclasses of BRCT domains; those who act as singletons (e.g. C-terminal domain of XRCC1) and are implicated in the forming of homo or hetero-dimer with other BRCT domains (mechanism not perfectly understood), and those who act as tandem pairs and are involved in phospho-peptide binding (e.g. BRCA1, ...) (4, 7, 15)

In this essay, we will focus on four BRCT domains: BRCT0, BRCT1, BRCT2 and BRCT6 because we suspect interactions between these BRCT domains and the viral protein E2 (hypothesis made by GST-pull down assay). BRCT0, -1 and -2 are part of the N-terminal region of the human TopBP1 (amino acids 1-290) (PDB: 2XNH). The exact nature of the interaction between TopBP1 and E2 is not completely understood, several crystal studies are in progress in order to form a clearer understanding of the relationship between these two proteins (3).

Crystal structure of the TopBP1 N-terminal region:

In figure 8 we can see that the overall 3D shape of TopBP1 N-terminal domain is an elongated cylinder containing three BRCT domains.

The three BRCT domains have the standard topology of BRCT domains, but they have a different spatial arrangement: the central β-sheets in these BRCT domains are perpendicular to each other rather than parallel (see figure 8). This could be explained by the very short length of the linkers between consecutive BRCT domains. The linker between BRCT0 α3 helix and BRCT1 β1 strand is composed of 17 amino acids (amino acids 91-108) and the same linker between α3 BRCT1 and β1 BRCT2 is 22 amino acids long (amino acids 181-203) (7, 15, 16).

In a canonical BRCT domain, the length of the linkers is between 32-49 amino acids which permits a different spatial arrangement of the BRCT domains with less steric clashes (15).

It is already known that some BRCT tandem pairs are able to bind a phospho-peptide (16). A lot of proteins involved in the recognition of phosphoprotein in DNA-damage pathway have BRCT domains. The conserved regions that allow the interaction with the phospho-peptide are made of a cluster of polar amino acids. This cluster is only present in one of the two BRCT domains of the tandem pair. These polar amino acids could interact with the phosphorylated residue by hydrogen bonds.

A second requirement of phospho-peptide binding by tandem repeats is the presence of hydrophobic residues in the cleft, formed by the junction of the two BRCT domains (α2 helix of the N-terminal BRCT and α1 and α3 helices of the C-terminal BRCT). These hydrophobic residues allow specificity binding for C-terminal ligand residues of the phosphorylated serine, threonine or tyrosine.

Crystal studies of these three BRCT domains show that the pattern of hydrophilic residues that provide the phosphate interaction (T114/K155 and T208/K250) are respectively present in BRCT1 and 2 respectively. In the BRCT0 domain, the equivalent amino acids are hydrophobic (L14 and V67) and do not allow any contact with a phospho-peptide (hydrophilic).

Because of the 3D shape of the protein, these two phospho-peptide binding zones are remote from each other (see green zones in figure 8).

Another major difference between the TopBP1 BRCT 1-2 domains and the canonical tandem domains is the lack of a peptide binding cleft at the interface of the two domains.

These evidences are proof of a new class of tandem BRCT arrangement, in which the two BRCT domains independently offer binding sites for phosphorylated residues.

Further investigations are needed to fully understand the role of the TopBP1 BRCT0 domain (7, 15 and 16).

The three closely linked BRCT domain structures of this protein were never described before (different of the singletons and the tandem pairs already known). The mechanism of the interactions between TopBP1 N-terminal domain and the viral protein E2 is still poorly understood. Crystal studies of the interface between these two proteins are still under investigation.

Crystal structure of the TopBP1 BRCT6 domain:

The overall topology of the BRCT6 domain is the same as the canonical BRCT domain structure (β1-α1-β2-β3-α2-β4-α3) (figure 10). Otherwise, the length of the connecting loops shows less conservation, and there is a 310-helix, which replaces the majority of the β1-α1 loop found in the canonical BRCT domain.

The BRCT6 domain lacks three residues which make three essential hydrogen bonds with the phosphate. Only the Ser913 is conserved in the BRCT6 domain, but it points away from the solvent. This change of orientation is caused by the 310-helix. In addition, two Lysines of the 310-helix generates steric hindrance in the phospho-peptide pocket (17).

Moreover, the BRCT6 domain shows a lack of the hydrophobic residues in the cleft formed by the junction of the two BRCT domains.

These evidences suggest that the degenerate phospho-peptide binding pocket would not allow any phospho-peptide interactions. Further investigations are needed (crystal studies of the dimmer E2-BRCT6) to understand exact interactions between E2 and the BRCT6 domain of TopBP1.


We know that 4 domains of TopBP1 interact with the TAD domain of HPV protein E2: BRCT0, -1, -2 and -6 (18). But the exact natures of these interactions, like which amino acids are involved, are still poorly understood. A better understanding could lead to the creation of drugs which would be able to prevent any interactions between E2 and TopBP1, and thereby stopping the HPV life cycle.

Other strategies could lead to similar results, like the creating molecules which could prevent the link between E1 and the DNA, or between E2 and the DNA.

However, some progresses have already been made in the interference between the two viral proteins E1 and E2. Small molecules (termed "indandiones") are able to inhibit the interactions between the N-terminal domain of E2 and E1 by binding reversibly to the TAD of E2 (31). Unfortunately, these molecules only inhibit the replication of low-risk HPV, like HPV6 and -11.

These results encourage a better understanding of the HPV replication process. So it will be possible to find other more efficient molecules which could interfere and stop the replication process of high-risk HPVs.