Entire Genome Characterization Of Human Papillomavirus Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Global prevalence of HPV16 exceeds that of other types. This project has been aimed at attaining basic molecular knowledge of HPV16 by sequencing the whole genome of HPV16 isolated from Thai women at various clinical stages of disease progression. Our group analyzed seven samples of HPV16 in infected women ranging from normal to cervical cancer and discovered two critical changes within the coding region converting the E2-219P prototype to E2-219T in cervical cancer and the L2-269S prototype to L2-269D in CIN III, respectively. Phylogenetic analysis based on the whole genome, the E2, E6, L1 and L2 gene showed the Thai samples to be more closely related to the European, European-German, East-Asian and North-American type than to the Asian-American or African type1 and 2. The vaccine strain's L1 polypeptides showed close phylogenetic relationship to our samples. The results provide basic data for future research on cervical cancer pathogenesis and specific vaccine development.


Human papillomavirus (HPV) especially, HPV type 16 is the major cause of cervical cancer and wart. HPV has been identified as a causal agent of cervical squamous neoplasia and has been linked to the development of neoplasia at several other mucosal sites. HPV is a highly variable member of small, non-enveloped, icosahedral DNA viruses that replicate in the nucleus of squamous epithelial cells. The virion particles consist of a single molecule of double-stranded circular DNA about 8,000 bp in size. The virus particles have a density in cesium chloride of 1.34 g/mL [1]. The genome encodes about 10 designated translational open reading frames (ORFs) that are classified as either early (E) or late (L) ORFs, based on their location in the genome. The viral genes, expressed from several promoters via splicing of polycistronic mRNAs, are termed either early (E) or late (L) depending on when they are expressed during infection [2]. Genital HPV have been classified into high- and low-risk HPV types, according to their potential to induce invasive cancer. Based on previous research, HPV types 16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59, 68, 73, and 82 have been considered high risk (HR-HPV) and types 26, 53, and 66 potentially high risk (probably HR-HPV), whereas types 6, 11, 40, 42, 43, 44, 54, 61, 70, 72, 81, and CP6108 have been regarded as low risk [3]. In the north-east of Thailand, HPV genotypes 16 and 18 have been reported to cause two out of three cervical cancer cases [4]. However, Clifford et al studied the global distribution of HPV including Thailand at Lampang (northern province) and Songkla (southern province), and found that heterogeneity was a significant finding in Asia [5]. HPV 16 is the most common cause of cervical cancer and numerous variants of HPV16 have been identified in different geographic locations and ethnic groups [6, 7]. Since the HPV16 genome displays numerous variations, HPV16 can be classified as a variant; this new taxonomy does not affect the traditional identification and characterization of HPV16 and independent isolates with minor genomic differences of approximately 2% nucleotide variation within the genome. HPV16 variants are associated with different forms of cervical cancer [8, 9]. In addition, a previous study reported that HPV16 variants display different biological properties in vitro which may be responsible, in part, for differences in pathogenicity, carcinogenic risk and immunogenicity [10]. HPV16 variants coevolved with the three major phylogenetic ethnic groups --Africans, Caucasians and Asians. These variants clustered into five distinct groups: European (E), Asian (As), Asian-American (AA) which originated from Central and South America, African 1 (Af-1) and African 2 (Af-2) [11, 12] and some authors have suggested that certain HPV-16 non-European variants are associated with a higher risk of cervical intraepithelial neoplasia and invasive cancer [13]. In a critical genome region of the HPV 16 variant such as the long control region (LCR), the E6 and the E2 genes may be associated with the oncogenic potential of HPV [14, 15]. Molecular characterization of the variability found in the LCR may contribute to better understand the association of HPV16, especially the non-European variants, with a potentially greater oncogenicity [13]. Some previous reports have studied HPV16 at different stages of cervical cancer by examining the distribution of the long control region, E7, E6, L1, L2 and E2 polymorphisms [16-21]. However, Lee concluded that polymorphisms of the HPV16 E6 gene rather than of other type 16 genes differentially influence tumor development and that additional factors are most likely involved as well.

The present study has been aimed at characterizing the whole genome (LCR, early gene and late gene) variability of HPV 16 in Thai women at different clinical stages of cervical cancer based on cytological findings. This study presents whole genome results representative for Southeast Asia in comparison with HPV16 variants from other countries. Finally, our study will provide useful information for identifying the nucleotides associated with viral function, viral persistence, pathogenicity and the future development of more specific vaccines.

2. Materials and methods

All study protocols were approved by the Ethics Committee of the hospital and faculty of Medicine, Chulalongkorn University. The HPV positive samples were chosen from among the specimens obtained during the patients' routine check up or investigation and treatment. All patients were informed and permission was granted by the director of the hospital. The specimens were sent as anonymous with a coding number. In addition, all specimens were exclusively used for academic research and the patients were not remunerated.

2.1 Sample collection

Seven HPV positive genotype 16 samples of Thai women representing patients with different cytological data from Bangkok province were obtained from Samitivej Srinakharin hospital, Thailand. The specimens originated from the patients' routine check up or investigation and treatment. All specimens were collected for cytology by LBC (ThinPrep®, Hologic, West Sussex, UK) and tested for HPV DNA by using Hybrid capture II (Digene). Subsequent to BLAST analysis of the whole genome amplified by PCR, the respective genotypes were determined by direct sequencing. The specimens were sent as anonymous with a coding number. All HPV samples were stored at -70 oC until used. All seven patients were positive by hybrid capture II and general information of all patients is shown in table 1.

2.2 DNA extraction

DNA was obtained by organic extraction (phenol-chloroform) of the samples. Briefly, cellular pellets were re-suspended in 400 µl of lysis buffer. Samples were incubated at 95 °C for 30 min, mixed for 2 min, and digested with 50 µl of proteinase K (20 g/l). After overnight incubation at 50 °C, samples were heated to 95 °C for 10 min to inactivate the proteinase K. Phenol chloroform extraction followed by high-salt isopropanol precipitation was performed as described previously [22] and purified material was re-suspended in a final volume of 30 µl deionized water, respectively.

2.3 Human papillomavirus (HPV) detection and amplification for whole genome sequencing

PCR amplifications of HPV were performed by using specific primer sets depicted in table2. The nucleotide sequences of interest were downloaded from the Genbank database followed by alignments using CLUSTAL X (Version 1.81 from ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalX) and BioEdit sequence alignment Software Version 5.0.9 (http://www.mbio.ncsu.edu/BioEdit/bioedit.html). Assay target regions were first identified by visual inspection of the sequence alignment. Primers were chosen from constant regions of all specific sequences. Primers and probes were analyzed using the primer design software OLIGOS Version 9.1 and FastPCR Version 3.8.20 (Ruslan Kalendar, Institute of Biotechnology, University of Helsinki, Finland) to predict the percentage of G+C content, as well as potential for dimerization, cross-linking and secondary structure.

Polymerase chain reaction was performed to amplify the HPV genome. The reaction mixture comprised 2 µl DNA, 0.5 µM of each primer shown in table 2, 10 µl 2.5X Eppendorf masterMix (Eppendorf, Hamburg, Germany), and nuclease-free water to a final volume of 25 µl. The amplification reaction was performed in a thermal cycler (Eppendorf, Hamburg, Germany) under the following conditions: initial denaturation at 94 °C for 3 min, followed by 40 amplification cycles consisting of denaturation at 94°C for 30 sec, primer annealing at 55°C for 45 sec, and extension at 72°C for 1.30 min, and concluded by a final extension at 72 °C for 7 min. HPV primer positions and PCR products are depicted in figure1 and table 2

2.4 House keeping gene detection

The house keeping gene β-globin was selected to serve as an internal control for DNA extraction, using conventional PCR as a detection method. Primer sequences for the β-globin gene have been previously described [23]. The reaction mixture consisted of 2 µl DNA, 0.5 µM β-globin forward primer and β-globin reverse primer, 10 µl 2.5X Eppendorf masterMix (Eppendorf, Hamburg, Germany), and nuclease-free water to a final volume of 25 µl. The amplification reaction was performed in a thermal cycler (Eppendorf, Hamburg, Germany) under the following conditions: Denaturation at 94 °C for 3 min, followed by 35 amplification cycles consisting of denaturation at 94°C for 30 sec, primer annealing at 55°C for 30 sec, and extension at 72°C for 30 sec, and concluded by a final extension at 72°C for 7 min.

2.5 Agarose Gel Electrophoresis and nucleotide sequencing

The PCR products were mixed with loading buffer and run on a 2% agarose gel (FMC Bioproducts, Rockland, ME) at 100 Volts for 60 minutes. After electrophoresis the DNA bands were stained with ethidium bromide and visualized by UV transillumination (Gel Doc 1000, BIO-RAD, CA). For identification of nucleotide sequences, the PCR amplified products were purified using the Perfectprep Gel Cleanup kit (Eppendrof, Hamburg, Germany) according to the manufacturer's specifications. The resulting purified DNA served as templates for DNA sequencing using the Big Dye Terminator V.3.0 Cycle Sequencing Ready Reaction kit (ABI, Foster City, CA) in the ABI PRISM® 310 automated DNA sequencer. Determination of the nucleotide sequences was performed in duplicate and analyzed in both directions using forward and reverse primers to ensure that variations of nucleotide sequences were not due to sequencing errors. When a difference was observed, triplicate sequences were determined in order to confirm the consistency of the sequencing result.

2.6 Sequence analysis and phylogenetic tree construction

Nucleotide sequences were analyzed and assembled using the Lasergene 6 Package® (DNASTAR, Inc., Madison, WI) and BLAST analysis tool (http://www.ncbi.nlm.gov/BLAST). Complete genome sequences were prepared and aligned using Clustal W applied by the BioEdit program (version Phylogenetic trees were constructed by neighbor-joining analysis with the Tamura-Nei model executed by the MEGA3 © program [24]. All nucleotide sequences of HPV16 obtained from this study were submitted to the GenBank database under designated accession numbers FJ610146-52 (Table 1).

3. Result

3.1 Complete genome analysis of HPV16 in Thailand

Whole genome sequences of HPV found in Thai samples (FJ610146-52) were aligned with NC_001562, K02718, EU118173, AF125673, U89348, FJ006723, AF536180 (African type1), AF534061 (East Asian), AF536179 (European German type), AF472508 (African type 1), AF472509 (African type 2), AY686580 (European), AY686581 (European), AF402678 (Asian-American) and AY686579 (Asia-American). The reference sequences AF402678 and AY686579 can be classified as Asian-American type which originated from Costa Rica [25]. Analysis of HPV nucleotide sequences showed 97.9-99.8 % similarity among the Thai and all reference sequences (data not shown). Phylogenetic analysis revealed that HPV16 in Thailand were closely related to the reference strain of HPV16 (Figure 2). The European-German type (AF536179) is related to CU1 (normal) and CU7 (cervical cancer; CA) samples, whereas the reference sequence AF125673 isolated from North America is closely associated with CU2 (atypical squamous cells of undetermined significance; ASC-US) and CU3 (atypical squamous cells cannot exclude HSIL; ASC-H) samples. The CU4 (cervical intraepithelial neoplasias grade1; CIN I) sample is related to the reference sequence (NC_001526) and European strain (AY686581), while CU5 (cervical intraepithelial neoplasias grade2; CIN II) is closely related to the East-Asian type (AF534061). Furthermore, CU6 (cervical intraepithelial neoplasias grade3; CIN III) is separate from all reference strains, while the African type-1,-2 and Asian-American type are not related to any samples (Figure 2).

3.2 Characterization of the non-coding and coding genes of HPV16

3.2.1 Nucleotide sequence variation in the non-coding region

Nucleotide variations in the HPV 16 genome were observed in the long control region (LCR) at positions 1-83 and 7155-7905 and upstream regulatory region of the L2 gene, positions 4102 to 4235 (position based on reference sequence NC001526). Most nucleotide variations of HPV16 variants can be found at up-stream regulatory L2. With each variant, we observed both insertion and deletion between the early and late gene (data not shown), including the samples from this study. The complete genome of CU5 and CU7 comprised 7,905 bp, whereas other samples consisted of 7,906 bp indicating variable genome lengths of HPV16 isolated from Thai women. Based on the pattern of nucleotide variation, all samples displayed more similarity to the European, European-German and East-Asian type than to the African type-1,-2 and Asian-American type (South America).

3.2.2 Amino acid variations in the coding region

Amino acid analysis showed that the E4 gene was more conserved than any other gene. In contrast, we observed pronounced amino acid variations in E2, E1, E6, E5 and E7, respectively (data not shown). Based on alignment of the amino acid sequences encoded by E2, E6, L1 and L2, we highlight only essential positions as described elsewhere [21]. In the E6 region, positions 83 of CU2, CU3 and CU6 changed from E6-83L to E6-83V (L83V; prototype to variant) and in the E2 region, position 219 of CU1, CU2, CU3, CU5 and CU6 changed from E2-219P to E2-219S (P219S) while position 219 of CU7 was translated to "T" (P219T). Furthermore, in the L2 region positions 243 of CU2 and CU3 changed from L2-243V to L2-243I (V243I). In the same region, positions 269 of CU1, CU5 and CU7 changed from L2-269S to L2-269P (S269P), while position 269 of CU6 translated to "D" (S269D). In the L1 region at position 266 of the most samples changed from L1-266T to L1-266A (T266A), except CU4 (Table 3). Subsequently, we performed phylogenetic tree analysis based on the E2, E6, L1 and L2 genes (Figure 3). The results thus obtained indicated that the European German type is closely related with CU1 and CU7 whereas CU4 and CU5 are more closely associated with the European and East Asian type, respectively. Moreover, similarity of CU2, CU3 and CU6 depended on the gene examined for comparison. Still, they are related to the North-American and European type. Finally, figure 3 shows that E2, E6, L1 and L2 of all samples are distinct from the Asian-American type and the African type-1 and -2.

4. Discussion

This project has been an attempt at analyzing the entire genome of HPV16 obtained from patients with different cervical cytology data. Cervical cytology was determined by a pathologist and the respective samples were subjected to a commercially available test such as Hybrid capture II. We found HPV16 genome variations in Thai women in that the total length of the HPV16 genome can amount to 7,905 or 7,906 base pairs comprising the long control region (LCR), early gene, late gene and upstream regulatory region. Various nucleotide variations were discernible within the upstream regulatory region between early and late gene (data not shown). Insertion and deletion will be found in this region accounting for various genome lengths displayed by each type of HPV16. In comparison with the reference (NC001526), the LCR of all samples showed a sequence change from GC to CGG at positions 7432-7433 and a deletion of A at position 7864 (positions as indicated for the reference strain). In addition, nucleotide alignment within the coding region of all samples showed an ATC insertion at position 6903 and a GAT deletion at position 6955. Depending on the genome region, nucleotide alignments of reference sequence (NC001526) with samples showed variations. For example, the LCR of the most samples, except CU4 (CIN I) showed a change from G to T at position 7193 (G7193T) and all our samples exhibited a change from G to A at position 7521 (G7521A). These positions were concentrated because both positions are integral parts of transcription factor binding sites and thus, influence infection properties [26]. Various previous studies have reported that the disruption of Yin Yang 1 (YY1) binding sites within the LCR may result in the up-regulation of E6/E7 expression potentially allowing malignant conversion without integration [14, 26]. However, we could not detect variations in the YY1 binding sites in any samples of this study. Xi et al suggested that nucleotide changes in the LCR variants of HPV 16 may be more closely associated with a risk for disease progression possibly resulting in pathogenicity [27, 28]. According to a previous study reporting an E6 variation at position 350, the E6-350G variant was found mostly in Europe and America, but not in Southeast Asia [29]. Yet, in the course of this project we detected both E6-350G and E6-350T in 3 and 4 samples, respectively. Some epidemiological studies have revealed that in HPV16 a nucleotide change at nt 350 within E6 from T to G correlates with high-grade lesions and cancer [30, 31], whereas another study speculated that it might be population dependent [17]. The T350G (L83V) variation within E6 was suggested to be associated with an increased risk of persistent infection and cytological progression to cervical intra-epithelial neoplasia grade 2/3 [32]. However, in contrast with some previous studies [29, 33] our research did not show this correlation. The dissimilarity may be explained by various factors. For example, variants of HPV16 may affect the biological properties of the protein. Also, the host's immune response to specific viral epitopes encoded by variants and the high level of diversity among HPV16 E6 and E5 proteins may be a potential force driving the evolutionary selection of the E2 since HPV16 E2 variants frequently co-segregate with the E6 T350G (L83V) variant that has been associated with viral persistence [34]. This co-segregation was proposed to act as an additional risk factor for the development of cervical cancer [25, 32]. Moreover, some previous studies revealed that variations in the E2 protein might alter the affinity for cellular transcription factors or for HPV 16 DNA because variants have been suggested to be associated with the risk of cervical neoplasia [25, 27, 35]. Yet another research group has previously suggested that amino acid variation in L1 at codon 202 (H202D), would decrease the infectious potential of HPV because it can prevent viral capsid formation [36]. However, we did not detect this polymorphism. Moreover, Lee and collaborators suggested that HPV should be analyzed at E6-83, L2-243, L1-266, L2-269 and E2-219 and concluded that five combinations were possible based on the E6-83L prototype and two combinations based on the E6-83V variant [21]. Our results correlated with Lee's except for the CA and CIN III samples. The CA patient's sample showed that E2-219 was neither the E2-219S variant nor the E2-219P prototype but instead, it was E2-219T. Similarly, the CIN III patient's sample indicated that L2-269 was neither the L2-269P variant nor the L2-269S prototype but instead, it was L2-269D (Table 3). Variations at these positions can not be seen in other reference sequences, be that the European, European-German, Asian-American, North-American, or African type 1 and 2 (south-America) Thus, these alterations deserve close observation as they may be responsible for malignant progression. In addition, a previous study from Thailand revealed that this E6 mutation coincided with a specific E7 mutation at residue 29 leading to a substitution from "N" (asparagine) to "S" (serine) [37]. This E7 variant was also more frequent in cervical cancer samples as compared to precursor lesions [38]. However, in this study we detected this change in only one CU5 sample (CIN II), whereas none of the other samples had mutated at this residue. In the E5 region, amino acid variations were focused at amino acid positions 11 to 24 as this area encodes the trans-membrane helical region and positions 46 to 50 are conserved among HPV types [39]. As for amino acid variations, the E5 gene displayed conservation at positions 46 to 50, but differed from the reference sequence (NC_001526) at amino acid position 20. According to a previous study, E5 amino acid variations may alter the protein's capacity for transformation by affecting the interactions with the EGFR, the 16 kDa subunit of the H+-ATPase or, potentially, other cellular proteins [39]. Based on complete genome analysis, HPV 16 from Thai women is more closely related to the European, European German, East-Asian and North-American type than to the Asian-American or African type1 and 2. Lately, HPV vaccines have been commonly used worldwide and many research groups have attempted to elucidate their efficiency. Consequently, we analyzed and compared the polypeptide of the vaccine strain (US Patent 6613557), Genbank database and all seven samples (Figure 4). Phylogenetic tree analysis of L1 polypeptides revealed that vaccine polypeptides were closely related to the reference sequence (NC_001526) and K02718 and thus, more similar to all seven samples than the Asian-American and African type1 and 2; therefore, the vaccine's potential to prevent HPV16 infection of Thai women is somewhat limited. In order to increase vaccine efficiency, point mutations in the gene encoding L1 should be focused on. Also, another gene may be used in order to augment the vaccine's protective efficacy.

In conclusion, we found numerous nucleotide and amino acid variations in the genome of HPV16 isolated from infected Thai women. Whole genome analysis of these samples showed HPV16 from Thailand to be more closely related to the European strain than the Asian-American and African type1 and 2. This study revealed that infected women with CIN III and CA displayed amino acid alterations at critical positions in the E2 and L2 region, but our group can not draw any inference between clinical stage of disease progression and amino acid alterations as there was only one sample available for each clinical trial. However, we hope that these new data on the HPV genome which are representative for the entire genome of HPV in Southeast Asia can serve as basic data for scientific research on cervical cancer pathogenesis and provide useful information for vaccine development on a global scale.


We would like to express our gratitude to the Thailand Research Fund (Royal Golden Jubilee Ph.D. Program), the Commission on Higher Education, Ministry of Education and the Center of Excellence in Clinical Virology, Chulalongkorn University for their generous support and the staff of the Department of Pathology, Samitivej Srinakharin hospital, Thailand for providing the samples. Finally, we also would like to thank Ms. Petra Hirsch for reviewing the manuscript.