Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of UK Essays.
The underlying force behind the rapid advancements in genomics is due to the development of novel genome sequencing. It is said notably that the invention of second generation sequencing gave scientists and other inventors the necessary throughput and costeffectiency to sequence thousands of genomes whcich were in the past was deemed to be feasible. In the resent past it gave was a dawn of what can be considered as a third generation. This third generation allows amplification free reading od DNA molecules in consective long stretches. This advent of new generation is defeated by two other methods.
- Nanopore sequencing and single molecule real time sequencing (SMRT)
- Oxford Nanopore Technologies (ONT)
Nanopore sequencing is the current easier way to demonstrate the sequencing methodology. A single small pore is inserted when the elctrical potential is appied across an insulating membrane. From the pore the DNA strand is pulled and the current from the passing base conbinations infers the sequence. In the year 1989, David Deamer came up with a rough sketch about this concept and the implementation of this concept took almost two decades.
Principle Behind Nanopore Sequencing
The main principle behind the nanopore sequencing: Minion Nanopore Sequencing uses a very basic principle; the strands of DNA or Nucleotide bases are driven into a nanopore electrophoretically.
The single stranded DNA strand or fragments are introduced into the nanopore by entering from the microscopic opening and this DNA sample is then introduced in an insulating membrane between two compartments and is filled with saline solution and an electric potential is applied across it. To one of the compartment DNA strands or fragments are added and they are allowed to pass through the nanopore where they get captured by the electric field and threaded through the pore. The way in which the bases influence the electric field through the nanopore are measured. These measured data can be decoded to retrieve the DNA sequence.
Two kinds of pores
a) Biological pore
b) Solid state pore
Charge and Structue of the Nanopore
“The structural property which makes the biological pore suitable for DNA sequencing is a constriction site at which the passing strand exerts the most influence on the electrical current”. The leng of the passage mostly determines how many bases influes the electric potential. The number of bases that is “read” concurrently at a particular time. Thenumber of reads should be low to allow identification of electric current for each different combinations of bases and high to allow the overlap between some subsequent combination of bases. This develops advantage while basecalling to allow baes to read as many times it is possible.
To initiate the sequencing, the DNA strand first needs to move towards one side of the pore which is also named as cis-side. At this pore the electric field is captured and it is threaded through the pore and comes out at the other pore called trans-side.
Here two forces are taken into consideration. Electrophoretic force is induced by positive electric field which is appied at the trans-side and attracts the negatively charged DNA and drags it in. The positive particles move to the opposite direction as the negative particles leave the cis-side. The DNA strands gets strengthened by the formation of positively charged zone around the cis entrance of the pore. The electro-osmotic flow force is induced by ion flow and net water through the pore which is influenced by strand translocation.
Nanopore Sequencing Advantages
- Target molecules can be detected at very low concentrations.
- Biomarkers or genes can be screened
- At low cost analysis can be provided at a high speed.
As the nanopore sequencing is producing quick results it can be used as a powerful diagnostic tool for identifying infectious agents. The advantages in time taken for pathogen identification and read length is a necessary need in hospital environment.
Lambda phage DNA Sequencing with Minion Nanopore Technology
Minion is a sequencer used for sequencing genes which are small enough to fit into the pockets of the sequencer. It has been developed by Oxford Nanopore. Minions main goal is to read genetic bases in DNA in a real time using Nanopore Sequencing methodology. Minion Nanopore Sequencing uses a very basic principle, the strands of DNA or Nucleotide bases are driven into a nanopore electrophoretically.
The main goal of sequencing Lambda phage DNA using MinION nanopore sequencing is to explain that it can produce long reads that are accurate enough to enable them to be aligned back to their reference genomes. This can be helpful in the real time to sequence any genome and can get results with more or less accuracy. Generally, the accuracy of the MinION sequencer is to ~93%. As it is portable, affordable and its data production at a great speed and its ability to produce long reads makes it to use in a real time environment.
High Seq Vs MinION
The difference between the Illumina seq and the Minion nanopore sequencing is that the Illumina is a second-generation DNA sequencer where it produces data using high throughput sequencing platform. It produces short reads which are highly accurate. The draw back of Illumina sequencer is that it is not portable, and it is highly cost ineffective. This cannot be used in a real time application. It needs a laboratory to run the sequencer, where as in MinION sequencing it produces very long reads up to few megabases. It is cost effective, portable as it is like a cell phone in size and can be used in real time applications for clinical diagnosis and pathogen surveillance. It doesn’t need any bend to prepare library.
The Illumina sequencer produces genome assembly at a low price but the ability to get long repeats from short reads is very limited. It is totally opposite in case of MinION where we can get the long reads for the expansion of repeated sequences.
Getting started with the Minion sequencer
In order to get started with the minion sequencer a computer is needed with good configuration, and the dummy flow cell is used to analyze the hardware and software setup by running a data exchange.
After checking whether the dummy flow cell is working or not then we have to open the MinKNOW GUI icon on the desktop and we have to establish a connection remotely.
We have to enter the flow cell ID and Sample ID which is used. We are now using a small viral genome called Lambda phage for sequencing and then we are comparing it with the reference.
In order to run the base calling which is also called MinION burn in we need to set up all the required reagents and materials.
Before the genetic sample to be loaded on to the sequencer the sample should undergo few processing steps. This is considered as Library preparation because we are breaking the long strand of DNA into library of DNA fragments along with special sequences on their ends.
Here we have used the ready-made kit for the preparation of the Library. For performing this MinION nanopore sequencing in a real time this ready-made kit is helpful as we cannot find laboratories to prepare the libraries in a real time.
Real time sequencing
With the library we have prepared we are now loading the sample into the MinION which is connected to the computer or laptop for sequencing. The prepared DNA is loaded into the flow cell. This sample after getting loaded flows over a membrane spotted with nanopores. We must make sure that there are no air bubbles flowing across the membrane. The electrical signatures of DNA bases are read with the help of electronics present within the flow cell. These electrical signatures of DNA bases are read when they pass through the pores in the flow cell.
We can visualize the status of how each pore is functioning. If there are more active pores in the flow cell it can easily be visualized. The data which is coming in can be visualized as soon as the sequencing is started. The results came out in the form of FASTQ files.
Now the results can be analyzed by using the Galaxy platform. IGV is used as a visualization software to visualize the reads generated by the Sequencer.
Below are the steps performed to analyze the Minion nanopore sequencer output files.
2. Using Galaxy with your own login: Perform the following analysis:
a) Map the raw sequence reads provided to you (use 5 fastQ files) to Lambda DNA reference sequence.
Galaxy platform is used for the analysis of the output files obtained as a result of MinION nanopore sequencing. It is an open source platform where we can analyze data related to biomedical researches. Its aim is to provide knowledge on computational biology to scientists and researchers who do not have any computational knowledge. It performs all the tasks necessary to create a bioinformatics workflow.
Here we are using Galaxy platform for the analysis of raw sequence reads known as fastQ files generated as a result of the MinION sequencing. Analysis of this fastQ files involves mapping of the these raw sequence reads to the Lambda reference sequence which is downloaded from NCBI Genbank (Nucleotide) in a FASTA format. Accession number for Enterobacteria Phage Lambda, complete genome (NC_001416.1)
Figure 1 Galaxy Platform home Page
In this we are uploading the FASTQ files which are obtained as a result of Lambda phage sequencing. The reference file is also uploaded along with the output files to check whether the interpretation is correct or not.
Figure 2 Fastq and reference files are uloaded into the galaxy platform
Custom reference Genome
It is a genome with reference that has sequence of nucleotide od scaffold,transcript, chromosomes of a species indicating a build or genome. The reference genome in FASTA format is represented automatically from the file uploaded earlier . In oder to assign our reference genome I am specifying the build name as Lambda_Enterobacteria and build key as Lambda_Reference by selecting the reference dataset.
Figure 3 Custom Builds to specify the name of the build and the build key.
Concatenation of fastQ files
Concatenation of datasets is done to link all the files of fastQ sequences to be placed in a single entry so that a single file can contain all the 5 raw sequence reads . By clicking on the option Text Manipulation tool it displays a lot of options in that Concatenate datasets tail to head(cat), here all the datasets are concatenated to form one dataset as a whole. While selecting the files for concatenation we need to make sure that we are selecting only the raw sequence fastQ files and not the reference genome file.
Figure 4 Concatenation of datasets from tail to head
The output file of the concatenate datasets from tail to head is shown below. It produces a large dataset on whole by combining all the 5 FASTQ files.
Concatenated file as a one large dataset.
Figure 5 Output page of the conatenated datsets
BWA MEM Mapping
The BWA MEM is an algorithm used for alignment of sequence reads. It is also used for the alignment of the alignment of query sequence against a reference genome. BWA (Burrows Wheeler Aligner a software used for the mapping sequence of low divergence against long reference genomes. It performs chimeric alignment and supports paired end reads. It chooses between end to end and local alignments. It is vigorous to sequencing errors and it is easily applicable to sequence lengths from few base pairs to some megabases.
In our context we are aligning the concatenated FASTQ file with the reference genome . It takes fastq files as input and gives the output file as BAM format. This BAM file is further used for various utilities.
Figure 6 Performig NGS mapping with BWA MEM for medium and long reads against the regerence genome.
The output file is generated as a result of two input files concatenated datsets file and reference file.The mapped reads are in BAM format.
Figure 7 Results of the Mapping
The outputs BAM and bai files can be downloaded here.
By clicking on this we can visualize the aligned sequence.
Figure 8 Final Mapped file to download the Bai and BAM datasets for further analysis.
Trackser is a visualization tool for large datasets . It is very interactive and very fast visualization browser.
Figure 9 Visualization of the files using Trackster
After clicking on the Trackster a new window pops up asking to visualize the dataset in a new track or in a already existing track. It is shown below.
Figure 10Window to select to view in new visualization track
Galaxy has a very good integrated visualizer in its platform which has some tools in it fot the self exploration of the data and dynamic filters are used for the understanding of analyzed data. In the visualization window we can vizualize the genomic readion from 0 to 48,501. For the more specific selection of genomic region we can specify the chromosomal location . To be more specific we can give the starting and ending numbers of the chromosome region. By clicking to visualize the dataset in a new visualization track we have to specify the name of the browser and reference genome build.
Figure 11 Naming the Browser name for the new visualization window.
Figure 12 The aligned sequences is visualized
The worlflow of the above process is listed step by step in the work flow window. It contains the list of every single step we used to run the data. After extracting the workflow the workflow can be named as follows Lambda DNA . This naming will help in extracing the workflow from history and we can edit it whenever it is necessary.
Figure 13 The Workflow of the above applied steps
Overall Work Flow
Overall workflow of thisproject with the active steps are shown below in the form of flow chart.
Figure 14 Workflow Canvas of the Lambda DNA reference sequence
Figure 15 Login for my Galaxy platform
Integrative Genomics Viewer (IGV)
It is a large interactive exploration tool for high performance visualization. IGV is a light weight visualization tool which enables integrated genomics datasets. It supports array based , next generation sequecing , genomic annotations, mutations, copy number and methylation. Efficient file formats such as multi resolution files are used by IGV for the real time analysis of large datasets. In this data can be loaded from the remote sources or from local sources.
Visualization of diverse data types across many samples and its correlation of these integrated datasets with clinical and phenotypic variables are supported by IGV. We can easily define the sample annotations and link them with data tracks with the help of tab delimited format.
Here we are using IGV to visualize the mapped reads obtained from the Galaxy platform.
1. Integrative Genomics Viewer
- Oxford Nanopore MinION Sequencing and Genome Assembly HengyunLu1aFrancescaGiordano2bZeminNing2c National Centre of Gene Research, Chinese Academy of Sciences, Shanghai 200233, China, The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK. Received 9 March 2016, Revised 7 May 2016, Accepted 31 May 2016, Available online 17 September 2016. Handled by Jun Yu.
Cite This Work
To export a reference to this article please select a referencing stye below:
Related ServicesView all
DMCA / Removal Request
If you are the original writer of this essay and no longer wish to have the essay published on the UK Essays website then please: