Bio Monitoring Through Next Generation Sequencing Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

To undertake a bio-monitoring concept for environmental water samples, it is important to understand the current existing genetic diversity in metagenomics. Current studies are limited by complex ecosystems, inefficient methodologies and unreliable genetic information. In this pilot study, we illustrate how next generation sequencing (NGS) can contribute towards the metagenomics sequencing, data assembly and characterization for 14 environmental water samples collected from different locations in New Zealand. The project uses custom-made tag adapters, which contains a unique 6 bp indexing tag that allows pooling of multiple samples within a single lane of a flow cell. This multiplexing technology helps in lowering the sequencing cost for metagenomics studies and also increases the sample throughput and data for assessing water quality samples. We will also demonstrate how to deal with metagenomics downstream analysis, where tagged samples are assembled and analyzed by using both SCIMM and PHYSCIMM algorithms, which make use of the k-means values for cluster data points before using MEGAN for assigning data to phylogenetic taxonomic trees. This pilot study will also serve as a foundation for a water metagenomics database for ongoing water quality assessment.

Recent developments in DNA sequencing, known as next generation sequencing (NGS) have taken a foothold in the future of genomics research. The amount of data generated from these platforms has enabled many researchers to answer breakthrough questions in a timely manner; one such example is the human microbiome project that has unraveled many new microbial communities associated with the human body1.

Environmental habitats are very complex and full of microorganisms that previously have only been distinguished by sampling but could never be fully characterized or understood2. Building a comprehensive picture of the diversity within microbial communities is a major problem for many researchers. Complicated methodologies and high sequencing costs, unreliable and complicated data are some of the roadblocks for metagenomics research2. Recently, this field has gained momentum due to a new partnership with NGS technologies that enables researchers to utilize its high capacity hardware and software systems for analyzing environmental samples3. The data generated from the NGS platforms has proven to provide a vast amount of information that was not achievable by earlier technologies, so this has greatly improved the speed of metagenomics studies3.

New Zealand provides an inimitable opportunity for environmental studies due to its rich agricultural industry. Waste from domesticated animals, milking sheds, and fertilizer run-off combines to provide nutrient-rich environments for microbial communities present in river water adjacent to farm land. The effect of the microbes in the environments can have a significant impact on livestock and human populations, and each year the Ministry of Health spends a considerable investment in monitoring water quality in New Zealand. Microorganisms such as Giardia lamblia, Cryptosporidium parvum, Campylobacter jejuni, Salmonella enterica and Escherichia coli are examples of organisms that cause infectious zoonotic diseases in New Zealand4,5,6,7,8.

With the introduction and continual expansion of the NGS platform, a new methodology such as bio-monitoring can be applied for metagenomics sequencing. The current 'TruSeq' technology from Illumina, using a 2 x 150bp paired end run on the 'HiSeq2000' instrument is capable of generating more than 600 Gb of data with 5-6 billion reads far surpassing the Illumina GAIIx from two years ago9. Therefore the whole sequencing process has become affordable for environmental studies especially with the use of multiplexing sequencing assays9,10. Multiple samples can now be separately tagged, then pooled together into a single lane of a flow cell, thus drastically reducing the time and cost for sequencing and also allowing higher throughput capacity.

In this study, DNA extracted from river water from selected sites in the North and South Islands will be tagged and pooled and run on an Illumina NGS platform. The sequence data will then be analyzed quantitatively and qualitatively for comparative metagenomics studies and the database is stored in Institute of Veterinary, Animal and Biomedical Sciences (IVABS) Massey University, for future reference. The cost effectiveness of implementing NGS for bio-monitoring of water in New Zealand will be explored in this pilot study.


A New Vision: Environmental bio-monitoring through Next Generation Sequencing.


To collect water samples across New Zealand from previously identified high risk areas.

Collection of five, 1 litre of 'grab' water samples from the North Island and 4 more samples from the South Island by Anthony Pita from Protozoa Research Unit (PRU) in IVABS, Massey University.

Initial water filtration of 100 litres on site.

To filter the water samples.

Filtration of 100 ml and 900 ml of 'grab' samples on 0.45 and 0.22 µm filters

Filtration of further four additional concentrated water samples on 0.45 and 0.22 µm filters.

To isolate and extract genomic DNA.

Trial and optimize DNA extraction kits using filtered water from the 'Duck Pond' at Massey University.

To extract genomic DNA from the filters.

To use custom-made indexing adapter tags for multiplexing sequencing.

To prepare and perform NGS on Illumina platform

Genomic DNA libraries preparation and generation using custom indexing adapter tags.

Quantification and quality check of the prepared libraries before submission.

Submission of metagenomics samples for high throughput sequencing.

To analyze the Illumina NGS data (Bioinformatics)

Quantitative and qualitative determination of organisms in the metagenomics samples

Comparative analyses from different collection sites.

Comparative studies of metagenomic samples in different seasons. (Only applicable if time permits)

Research Design/Methodology

Collection of water samples - 1 litre of 'grab' water samples were collected from the following sites:

(North Island)

Oroua River (Manawatu - Wanganui)

Hicks Road Spring (Maungataurari, Waikato)

Waikato River (Hamilton exit)

Waikato River (Tuakau exit)

Lower Huia Dam (Auckland)

(South Island)

Ashley River (Rangiora)

North West Christchurch Aquifer Well (Christchurch)

Seadown well (Timaru)

Pareora Water Supply (South Canterbury)

For this metagenomics study, 1 litre of water was collected from each site, couriered to Massey University in a chilly bin and filtered at the earliest opportunity, usually no longer than 48 hours post collection. In addition 100 litres of water was filtered through a 1 µm pore size filter on site. Microorganisms were then extracted from this filter in a laboratory paddle blender stomacher 3500 series (Seward) instrument, not longer than 72 hours post filtering at the PRU, Massey University. 100 ml of this concentrated sample was provided for additional comparative studies.

Filtration process - Two different pore size filters were used for the filtration (0.45 and 0.22 µm). Water samples were filtered at the earliest time using filtration apparatus kit from Sigma Aldrich. This is to prevent any possible degradation of organisms in the samples11. The filtration apparatus was autoclaved between samples to prevent cross contamination and the filters were then stored at -80°C.

Extraction of genomic DNA - preliminary extraction of DNA from the filters was trialled using E. coli cultures and 'Duck Pond' water for optimization and the Magna Lyser instrument (Roche) and Epicentre Metagenomics kit to determine the best parameter for the extractions. Good recovery of microorganism from the filter paper is very important to ensure a successful extraction. For the bead bashing in the Magna Lyser, the filters are combined with special ceramic beads in a tube. The oscillation of the instrument agitates the contents in the tubes, instantaneously washing off the microorganisms off the filters and also disrupting the cells when they collide with the beads12. Alternatively, filters were washed using the Epicentre Metagenomics kit 'filter wash buffer' that contains lysozyme and 0.2% Tween 20.

Custom index tag adapter - Sequencing of multiple fragments libraries pooled from different samples is possible now with the introduction of indexing tags for multiplexing sequencing10. Such technology has enabled as many as 96 libraries spiked into a single flow cell10. For this project, custom indexing tags were designed according to the Illumina specification that includes a 6bp extra tag with phosphodiester bond from Sigma Aldrich. The phosphodiester bond is included to ensure the nucleotides are protected to any exonucleases activities, therefore allowing a proper ligation of the adapter on the flow cell during the hybridisation step in cluster generations13. The advantages of using custom indexing tag are to eliminate the extra 7 indexing cycles for the Illumina NGS run for improvement in cost effective sequencing and higher throughput data14. It also allows us to spike more samples into one lane of a flow cell thus increasing the throughput due to longer base reads.

Multiplex de novo sequencing - The Illumina NGS platform such as GAIIx or HiSeq 2000 was suggested to sequence the metagenomics samples. The main objectives for the metagenomics sequencing are to extract as much genetic information from the environmental samples and have a high coverage data to determine the composition of the microbe community within the water samples. The sequencing will be a paired end (PE) run possible, either 2 x 100bp or 2 x 150bp. Paired end sequencing offers a standard DNA library preparation as the single read, but uses the pair end adapter for reading both the forward and reverse strands of each cluster during the paired end cycle14. This unique pairing of the paired end run with multiplexing assay offers a larger coverage data for the metagenomics samples, thus generating a higher quality and alignable data14. A typical 2 x 150bp paired end run today on HiSeq 2000, can easily generate more than 200 million reads and up to 6.5 Gb data per day15.

Analysis of Illumina NGS data - for the bioinformatics the data will be assembled using a Markov model clustering method16. For this project, the data are assembled base on the estimation of 2 x 150bp run coverage, which was planned earlier. Next, the data parameters are optimized based on sequencing coverage, the insert length, error rate and quality score via the Solexa QA reporter software17,18. The downstream analysis will used two primary methods known as the unsupervised Sequencing Clustering with Interpolated Markov Models (SCIMM) and supervised Learning Method Phymm (PHYSCIMM)16,19. Both algorithms produce an effective method in sequence analysis due to their unique pairing identification of fairly constant genome signature in the data16,20. The clustering of sequences is then calculated based on K-means values and partitioned into group of same score sequences before computed to their maximum likelihood probability score for final clusters identification16,21. For the sequence classification to taxonomic contents, Meta Genome Analyzer (MEGAN) software is used for the analysis22. The software enables larger computation of metagenomics data to be assigned to individual phylogenetic trees based on the sequence taxonomy properties22,23,24. However data analysis is not just limited towards the method above as other clustering methods will also be surveyed in the future. The final objective for the project is to enter the results into a database created by Dr Patrick Biggs for future reference and comparisons.


The outcomes from this research will contribute to find an improvement protocol for water quality testing in New Zealand. The current water screening tests conducted by the Ministry of Health or the local councils are limited to detecting only certain species of microbes. Hence, this research proposal aims to find an affordable method for detecting a wider range of microbes, and may help in early the detection of outbreaks disease and will progress the study of the dynamics of water ecosystems.

Approximately 1% of species are known in environmental habitats25. Understanding metagenomics and biodiversity is very valuable in protecting the ecological systems in New Zealand. The result that we obtain from this research project will help us in better understanding how biodiversity works, and may provide an insight in identifying certain unknown gene sequences or molecules that may be useful for future gene therapies especially in antibiotic resistance26.

The bioinformatics database that is created will also be used for future assessment and applications for potential water metagenomics research. Thus, the project will significantly boost any potential environmental studies across New Zealand, particularly in the larger agricultural sector such as farming industry, where the environmental impact plays a crucial role in its quality control.




























1 - Water Collection, 2 - Water Filtration, 3 - DNA extraction, 4 - Custom Indexing Tag

5 - Multiplexing Sequencing and 6 - Bioinformatics analyses

The arrow shows the estimated time for completion of each of the aim for this metagenomics study. It is estimated the NGS shall be completed before July 2011 and the downstream analysis will be undertaken in August 2011.

Section C

Ethical/Regulatory Approval

Ethical/GTC permission obtained (copy attached) Yes No

Ethical/GTC permission applied for Yes No

Ethical/GTC permission not required Yes No

There is no ethical or regulatory approval for the project, as it does not involve any ethical test subject or any genetically modified organisms. However the project might need a permit approval for exportation of genomic DNA samples for overseas NGS sequencing.