Pco Ant Algorithm For Dna Phylogenetic Inferencing Biology Essay

Published:

Abstract: We developed a new approach called PCO-Ant algorithm (Phylogenetic Construction and Optimization through Ant colony) for reconstruction of phylogenetic trees using ant colony optimization metaheuristics. A tree is constructed using a fully connected graph and the problem is approached similarly the well-known travelling salesman problem. This methodology was used to develop an algorithm for constructing phylogenetic tree using pheromone matrix. In the course of constructing phylogenetic tree, the ants travel in the digraph and update the pheromone on the paths it passed. At each step, ants choose the next vertex according to a certain probability depending on the pheromone and the heuristic information of the edge, and the digraph is first modified, and then the strong connected components of the updated digraph are computed which are used to construct phylogenetic trees. After getting a group of phylogenetic trees, we also use ant colony and its pheromone feedback system as a global optimization technique to derive the optimal topology of the phylogenetic tree from given object set. It has been observed that the implementation of the algorithm based on random DNA sequences give better results as compared to other tree construction methods. These results are very promising and suggest more efforts for further developments.

Lady using a tablet
Lady using a tablet

Professional

Essay Writers

Lady Using Tablet

Get your grade
or your money back

using our Essay Writing Service!

Essay Writing Service

Keywords: Phylogeny, Evolutionary computation, Ant colony optimization, Neighbor joining, Bioinformatics.

Introduction

Phylogenetics is the science which studies evolutionary relationship between species. Hypothesis about such relationship are expressed by phylogenetic tree based on the Darwinian principle of the natural evolution of the species. Phylogenetic tree construction is a challenging and widely studied problem in Bioinformatics. Other main reason is its NP-complete characteristic. Therefore, like others, it is still an open problem for research when dealing with the phylogeny of a large number of species.

The basic principle of tree construction is to infer the evolutionary process of taxa (biological entities such as genes, proteins, individuals, populations, species, or higher taxonomic units) from their molecular sequence data [1] [2]. A number of methods have been proposed for constructing phylogenetic trees. These methods can be divided into two types in terms of the type of data they use; distance matrix methods and character-state methods [3]. A distance matrix consists of a set of n(n - 1)/2 distance values for n taxa, whereas an array of character states (e.g. nucleotides of DNA sequences, residues of amino acid sequences, etc.) is used for the character-state methods. The unweighted pair-group method with arithmetic mean (UPGMA) [4] and the neighbor-joining (NJ) method [5] belong to the former, whereas the maximum parsimony (MP) method [6] and the maximum likelihood (ML) method [7] belong to the latter. In these methods, the ML method tries to make explicit and efficient use of all character-states based on stochastic models of those nucleotide sequences. Maximum likelihood in this specific case means that one intends to find the topology which yields the highest probability of producing (evolving) the observed data (alignment). The problems that arise within this context are how to compute the likelihood of a set of sequences placed in a given trees, how to optimize branch-lengths in order to obtain the maximum score (likelihood value) for that particular tree, and how to devise a probabilistic model of nucleotide substitution.

Phylogenetic tree constructing problem is NP-complete, and it is very similar to the well-known TSPs because of the combinatorial complexity of inferring the topology of the best tree, thus we develop a more efficient phylogenetic tree constructing and optimization algorithm called PCO-Ant Algorithm. This algorithm tested using randomly generated sequences. Using the same sequences, we also test the TSPs and NJ methods. In fact, PCO-Ant Algorithm not only provides a group of good initial tree topologies, but also has global optimization on these trees. It also produces an ensemble of trees of almost similar quality. The ensemble of high qualified solutions of PCO-Ant Algorithm allows experts to decide which topology is most likely since the quality criterion (the fitness value) used does not guarantee the optimal tree topology. This paper is organized into number of sections: II) Ant colony optimization III) Ant colony optimization based model IV) PCO- Ant Algorithm Overview V) Illustration with example 1 VI) Illustration with example followed by conclusion.

Ant Colony Optimization

Social insects that live in colonies, such as ants, termites, wasps, and bees, develop specific tasks according to their role in the colony. One of the main tasks is the search for food. Real ants, when searching for food, can find such resources without visual feedback (they are practically blind), and they can adapt to changes in the environment, optimizing the path between the nest and the food source. This fact is the result of stigmergy, which involves positive feedback, given by the continuous deposit of a chemical substance, known as pheromone.

Lady using a tablet
Lady using a tablet

Comprehensive

Writing Services

Lady Using Tablet

Plagiarism-free
Always on Time

Marked to Standard

Order Now

A classic example of the construction of a pheromone trail in the search for a shorter path is shown in fig. 1 and was first presented by [8]. In figure 1A there is a path between food and nest established by the ants. In figure 1B an obstacle is inserted in the path. Soon, ants spread to both sides of the obstacle, since there is no clear trail to follow (figure 1C). As the ants go around the obstacle and find the previous pheromone trail again, a new pheromone trail will be formed around the obstacle. This trail will be stronger in the shortest path than in the longest path, as shown in figure 1D.

As shown in [12], there are many differences between real ants and artificial ants, mainly: artificial ants have memory, they are completely blind and time is discrete. On the other hand, an ant colony system allows simulation of the behavior of real-world ant colonies, such as: artificial ants have preference for trails with larger amounts of pheromone, shorter paths have a stronger increment in pheromone, and there is an indirect communication system between ants, the pheromone trail, to find the best path.

Figure 1A. Ants in a pheromone trail between nest and food; B. an obstacle interrupts the trail; C. ants find two paths to go around the obstacle; D. a new pheromone trail is formed along the shorter path.

Ant colony Optimization Based Model

To define how ant colony optimization (ACO) is applied to the reconstruction of phylogenetic tree, a number of methods have been proposed by [9] [10] [11] [13]. In this paper, we used a fully connected graph, constructed using the distance matrix among species. In the graph, nodes represent the species and edges represent the evolutionary distance between species. Initially, ants start in a randomly selected node. Then, they travel across the structured graph, and at each node a transition function (Equation (i)) determines its direction. This equation represents the probability that the kth ant, being at node i, goes to node j in its next step;

Pk(i,j)= ----(i)

where Pk(i, j) is the probability of transition between node i and j, τ is the pheromone trail between two nodes, d(i, j) is the evolutionary distance between nodes i and j, Jik is the set of nodes connected to node i and already visited by the kth ant, and α and β are arbitrary constants.

Equation (i) is composed of two terms: the first is based on the evolutionary distance between species i and j, and the second is based on the accumulated experience - the pheromone trail. This trail is represented as a matrix (like that for the distance between species), whose values are dynamically changed by the algorithm, and determined according to the paths chosen by ants. Therefore, Ï„ (i,j) represents the attractiveness of node j, while the ant is at node i. Therefore, the objective of a given ant is to find a path in the graph that maximizes the transition probabilities, thus obtaining a sequence of species that produces the smallest evolutionary distance.

3.1 Heuristic Function

The heuristic function ηij or 1/ (equation (i)) is a problem dependent function that measures the "quality" of the edge (i, j) which connects the vertices i and j representing the two objects. Here the "quality" means the preference of the edge to be selected by the ants. Obviously, the less distance between the two connected objects, the more preferred the edge should have. Therefore, ηij should be associated with the distance between objects. So it is given by the following formula.

ηij = 1/d (object i, object j)

Different from pheromone τij, ηij is static and unidirectional heuristic information determined by the distance information.

3.2 Calculation of new evolutionary distance (i.e. Branch-length)

New evolutionary distance or branch length between two species is calculated by means of Equation (ii):

dnu(i,j)……(ii)

where u is a node that does not belong to the set of nodes connected to node i and already visited by the kth ant, dnu(i, j) is the distance between the new node n and node u, based on the previous distances between (i, u) and (u, j), d(i, u) is the distance between nodes i and u, and η is a scale constant that defines the distance between the new node n and its descendents i and j. This procedure is repeated until all nodes belong to the list of already visited nodes, and then a path is constructed. The score of this path is given by the sum of the transition probabilities of the adjacent nodes of the path. Paths constructed by the ants are used for updating the pheromone trail.

3.3 Pheromone Update

Lady using a tablet
Lady using a tablet

This Essay is

a Student's Work

Lady Using Tablet

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Examples of our work

3.3.1 Local Update

Based on the initial value, in the latter stages the ants will update this pheromone digraph for the construction and optimization of the phylogenetic trees. The pheromone trail matrix is updated according to Equation (iii)

-------------------(iii)

where ρ is the rate of evaporation of the pheromone, which reduces the persistence of the environment to the ants. In this system, the rate of increment of pheromone, Δτ(i, j), was modified to allow an increment proportional to all the obtained paths, given by the division of the current path and the best path, as shown in Equation (iv):

---(iv)

where k is the number of ants, c(t) is the path constructed by an ant up to time t, Sc(t) is the score of path c(t), and Sbest is the score of the best path found up to now. Using this procedure, ants travel through the graph, and at the end of a predefined number of cycles, it is possible to reconstruct the tree using the best path found.

3.3.2 Global Update

In the global updating period, is the increment of Ï„ij by sub-colony k, which is defined as follows:

--------(v)

Here, Treek denotes the phylogenetic tree constructed by sub-colonyk, fitness (sub-colonyk) is the fitness value of Treek. Once the topology and the branch lengths of Treek are determined, a new distance matrix can be deduced. By comparing this distance matrix Dk with the original distance matrix D (calculated from the given objects), a quality measurement showed in Equation (vi) can be assigned to the tree as its fitness value:

Fitness (sub-colonyk) = c. (1- )

---------------(vi)

The summation extends over all n(n-1)/2 distances between the n objects. If the distances are concentrated within a narrow scope, a high fitness value will be assigned to the tree, and if the reconstructed distance equal the original distances, the fitness will reach the highest value C. By global pheromone updating, the pheromone deposited on the edges of high fitness valued trees will be much higher than others.

3.4 Sensitive Parameter

Parameter α controls the exploration of the search space, by weighting the importance of the pheromone trail in the decision of an ant when it arrives at a branch. The algorithm is sensitive high values of this parameter, leading to a fast convergence to a local optimum.

Parameter β defines the relative importance of the distance between species in the transitions between nodes. In practice, we observed that it has to be higher than α. But values that are too high make the algorithm converge to a tree that groups species sequentially.

The pheromone trail evaporation is controlled by the parameter ρ, which is influenced by the number of ants (k) and the number of cycles. Experimentally, we observed that values higher than 0.8 do not allow convergence to the same tree, and values lower than 0.2 make the algorithm find trees with larger distances between branches. It is supposed that this is a consequence of the convergence to a local optimum at the beginning of the run. Regarding the number of ants (k), we found two distinct behaviors. When k is too low (say, k < 4), or too high (say, k > 10), a random behavior is observed in the resulting trees for repeated runs. For intermediary, but high values of k (say, 10 < k< 50), a well-defined tree can be obtained, but with distances greater than those obtained by other approaches. The range within which the best trees were obtained was 4 < k< 10, although we believe that this value may depend on other parameters. Future work will address this issue.

The evolutionary distance between an ancestor and two descendent species is controlled by parameter η. For our data set, we observed that the best tree was obtained using η = 0.5, meaning that the distance between the ancestor and the two descendents is the same.

3.5 Optimization of Phylogenetic Trees

To construct a tree from a set of species, one must have a metric to decide if a tree is better than another one. Many criteria have been proposed. But in general, they turn out to be NP-hard to optimize. There is still no consensus in the biology community on how to make a good tree.

One way to counteract this problem is to execute many different phylogenetic clustering methods resulting in various starting tree topologies. A choice from the generated trees gives rise to the best one. Another way to handle this problem is to use a global optimization technique to derive the optimal topology of the tree artificial ants in the same sub-colony try to construct an independent phylogenetic tree as a solution of the problem by their cooperation; and different sub-colonies construct different trees so as to maintain diversity of candidates. After optimizing these trees, the performance of these solutions is improved. After getting a group of phylogenetic trees, the ant colony and its pheromone feedback system act as a global optimization technique to derive the optimal topology of the phylogenetic tree.

The algorithm showed in this paper is tested using randomly generated sequences. Using the same sequences, we also test the TSP and NJ methods. In fact, PCO-Ant algorithm not only provides a group of good initial tree topologies, but also has global optimization on these trees.

PCO- Ant Algorithm Overview

Once the optimal path is found by ants, the procedure of the phylogeny construction is straightforward. The execution of the PCO-Ant Algorithm, gives a linear sequence of species and a measure of closeness between them, using the pheromone matrix. Using these elements, the phylogenetic tree can be constructed. In the course of constructing phylogenetic tree, the ants travel in the digraph and update the pheromone on the paths it passed. At each step, ants choose the next vertex according to a certain probability depending on the pheromone and the heuristic information of the edge, and the digraph is first modified and then the strong connected components of the updated digraph are computed which are used to construct phylogenetic trees. After getting a group of phylogenetic trees, we also use ant colony and its pheromone feedback system as a global optimization technique to derive the optimal topology of the phylogenetic tree. The overall algorithm for constructing the phylogenetic trees and from them, choose the best tree is given below:

Begin

1 Initialization

1.1 Initialize parameters: k, m, α, β, ρ η; (: k=No. of ants, m=No. of sub-colony, ρ= pheromone trail evaporation, η= evolutionary distance between an ancestor and two descendent species).

1.2 Initialize the pheromone digraph; //*By measuring the distance between the species.*//

1.3 For each ant in each sub-colony do

Chooses an initial object i to visit randomly;

End for

Iterative Process // Finding the optimal path

While (not termination) do

2.1 For each sub-colony do //* m sub-colonies*//

2.1.1 Set R denotes the current object set, the initial value of R equals the given object set of the problem and i is an initial object.

2.1.2 While (not termination) do

For each ant k in current sub- colony do // *m ants*//

While (allowedk not empty) do

// *Where allowedk is a set of the nodes can be chosen by the kth ant at node i for the next step and kth ant find (i, j) pair that have the largest transition probablity value*//

Select the next object j to visit;

//* The kth ant, begin at node i, goes to node j*//

Compute the probability function P; // *Transition probability*//

allowedk= allowedk-{j}

End while // *Here we get the (i, j) pair that have the largest transition probablity value*//

Reset allowedk= R

// *Where allowedk is a set of the nodes can be chosen by the kth ant at node j for the next step and kth ant find (j,l) pair that have the largest transition probablity value and step 2.1.2 will continue untill all connected paths are traversed and finally kth ant will reach to its initial position that iàjàlài*//

End for

Have local pheromone updating on each edge in the digraph according to the evolutionary distance between objects;

End while

End for // *All possible paths are found at last iteration*//

Obtain each phylogenetic tree constructed by each sub-colony (see tree construction method in algorithm)

End while

//*m sub-colony construct phylogenetic trees according to all possible paths found at last iteration. *//

3 Tree Constructions

While not (all species grouped)

3.1 Find (i, j) pair that have the largest value in the pheromone matrix

If (i or j) already grouped then change index by group index

Group (i, j) pair into a new species k

Compute the distance between current species and ancestor

Delete the value of (i,j) pair

End while

3.2 Obtain each phylogenetic tree constructed by each sub-colony

4 Tree Optimization // Global updation

While (not terminaton) do

4.1 Calculate evolutionary distance (branch length) of each tree

4.2 Calculate the fitness value of each sub-colony (tree) //*For finding the optimal tree. *//

4.3 Have global pheromone updating operation according to the fitness value of the constructed phylogenetic trees

End while

5 Output

Optimal tree found which has highest transition probability but lower evolutionary distance.

End

Illustration: Example 1

5.1 Discussion and Results

Step 1: Initialization

Step 1.1: Initialize parameters: k=8 (No. of ants), m=2 (No. of sub-colony), α=1, β=2, ρ=0.6 (pheromone trail evaporation), η= 0.5 (evolutionary distance between an ancestor and two descendent species).

Step 1.2: Initialize the pheromone digraph: Here, we consider three randomly generated DNA sequences with length 6.

Seq1: AACGTG, Seq2: TCGAAG, Seq3: ATCGAG

Creating the pheromone digraph (see figure 2) by measuring the distance between the species.

Here we calculate the distance by the formula is: Number of homologous character / Total number of characters.

Step 1.3: Here, we choose node 1# as an initial object.

Figure 2. Digraph for three sequences of example 1

Step 2: Finding the optimal path

Step 2.1: Transition probability for ants: travel across the structured graph, and at each node a transition function (equation (i)) determines its direction. This equation represents the probability that the kth ant, being at node i, goes to node j in its next step;

Pk(i,j)= --------------(i)

Iteration 1(Moving path by 1st ant)

Transition probability for Ist ant in path (1, 2): P1(1, 2)= 1/65 is calculated according to equation (i) [Where, , i=1, j=2, u=3]

Similarly, transition probability in path (1, 3): P1 (1, 3)=64/65

# Path 1à3 is selected, because transition probability of P1(1,2) < P1(1,3), and (1,3) pair have largest value in the pheromone matrix, so ant choose path (1,3).

Transition probability for path (3,2) where node 3# is an current object:

P1 (3, 2)=1/9 and # path 1à3à2 is formed.

Transition probability for path (2, 1), P1 (2,1)=1/9 and finally, #path 1à3à2à1 is found.

#Local Pheromone update: pheromone update by the formula given below (see section 3.3.1):

∆τ = 1 is calculated where, is the score of path c(t) and that is (64/65+1/9+1/9), and Sbest is the score of the best path found up to now

Pheromone updated paths : #Ï„(1,3)=4/5, #Ï„(3,2)=3/5, #Ï„(2,1)=1/2

#Pheromone update graph after 1st iteration is shown in figure 3.

Figure 3. Pheromone updated graph after 1st Iteration of example 1

Iteration 2 (Moving path by 2nd ant)

Transition probability for 2nd ant in path (1, 2): P2(1, 2)= 125/637 according to equation (i) [Where, , i=1, j=2, u=3]

Similarly, transition probability for 2nd ant in path (1, 3): P2 (1, 3)= 512/637.

# Path 1à3 is selected

Transition probability for path (3,2) where node 3# is an current object:

P2 (3, 2)=1/3 and # path 1à3à2 is formed.

Transition probability for path (2, 1), P2 (2, 1)= 125/341 and finally, path 1à3à2à1 is found.

#Local Pheromone update: pheromone updates as same as iteration 1

Step 2.2: Iteration 3 to iteration 8, for all eight ants follow the same path and that is: # path 1à3à2à1.

Step 3: Tree Construction

Step 3.1: So, the tree will be generated from the above path followed by all ants is shown in figure 4.

Figure 4. Tree obtaied after 8th iteration of example 1

Calculate the branch length of the tree by neighbor joining method using the formula given below:

dnu(i,j)

Now, the distance matrix is:

# Distance between (1,3) node to node 1# d(1,3)1=4/12

# Distance between (1,3) node to node 3# d(1,3)3=4/12

So, d((1,3) ,2)=d(1,2)+[d(1,2)-d(3,2)]η =1/12

# Distance between ((1,3) ,2) node to node 2# d((1,3) ,2), 2 =1/24

# Distance between ((1,3) ,2) node to node (1,3)

# d((1,3) ,2), (1,3)=-7/24

Step 3.2: Here, we obtain one phylogenetic tree and that is the final optimal tree (see figure 5).

Figure 5. Optimal tree with branch-length of example 1

Step 4: DNA phylogenetic tree optimization

Step 4.1: Only one final tree obtained, there is no need of optimization. For better understanding of optimization technique for this algorithm see example 2.

Step 4.2: Fitness value of the optimal tree is:

# Optimal Treetotal- branch-length=b1+b2+b3+b4

=4/12+4/12-7/24+1/24=0.416

Step 5: Output

# From example 1, we get final tree with branch-length and is shown in figure 5.

5.2 Comparison Study

We compare PCO-Ant algorithm with NJ, TSP algorithms by using same DNA sequences and after comparing we found that the performance of our algorithm is comparable and publishable to that of other tree-construction methods and table 1 shows the results.

Table 1Comparison of PCO-Ant algorithm with other existing algorithms

Method

Total lengths of the tree

Neighbor-joining

0.784

TSP-Approach

0.728

PCO-Ant

0.416

Illustration: Example 2

6.1 Discussion and Results

Step 1: Initialization

Step 1.1: Initialize parameters: k=8 (No. of ants), m=2 (No. of sub-colony), α=1, β=2, ρ=0.6 (pheromone trail evaporation), η= 0.5 (evolutionary distance between an ancestor and two descendent species).

Step 1.2: Initialize the pheromone digraph: Here, we consider four randomly generated DNA sequences with length 6.

Seq1: AACGTG, Seq2: TCGAAG, Seq3: ATCGAG, Seq4: ATGCCA

Step 1.3: Creating the pheromone digraph (see figure 6) by measuring the distance between the species and we choose node 1# as initial object.

Figure 6. Digraph for 4 sequences of example 2

Step 2: Finding the optimal path

Step 2.1: Transition probability for ants: travel across the structured graph, and at each node a transition function (equation (i)) determines its direction. This equation represents the probability that the kth ant, being at node i, goes to node j in its next step;

Pk(i,j)= --------------(i)

Iteration 1(Moving path by 1st ant)

Transition probability for Ist ant in path (1, 2): P1(1, 2)= 1/66

Similarly, transition probability in path (1, 3): P1 (1, 3)= 64/66

Similarly, transition probability in path (1, 4): P1 (1, 4)= 1/66

# Path 1à3 is selected

Transition probability for path (3,2) where node 3# is an current object:

P1 (3, 2)= 1/10

Similarly, transition probability for path (3, 4), P1 (3,4)= 1/10

# In this case two paths have same transition probability so, ant can choose any path that are path # 1à3à4 or path # 1à3à2

Case1: After path # 1à3à4 is selected then

Transition probability for path (4,2) where node 4# is an current object:

P1 (4, 2)= 1/10 and path # 1à3à4à2 is formed.

Similarly, transition probability for path (2, 1), P1(2,1)= 1/10 and finally, # path 1à3à4à2à1 is formed.

Case2: After path # # 1à3à2 is selected then

Transition probability for path (2,4) where node 2# is an current object:

P1 (2, 4)= 1/10 and path # 1à3à2à4 is formed.

Similarly, transition probability for path (4,1), P1(4,1)= 1/10 and finally, path # 1à3à2à4à1 is formed.

# Local Pheromone update: pheromone update by the formula given below (see section 3.3.1):

# ∆τ = 2 is calculated according to the formula.

# Pheromone updated paths : #Ï„(1,3)=4/5, #Ï„(3,4)=1, #Ï„(4,2)=9/10 #Ï„(2,1)=9/10

# Pheromone update graph after 1st iteration is shown in figure 7.

Figure 7. Pheromone updated graph for 1st iteration of example 2

Iteration 2 (Moving path by 2nd ant)

Second ant start to move from node 1# of updated graph (see fig. 7)

Transition probability for 2nd ant P2(1,2)= 0.5855, P2(1,3)= 0.0037, P2(1,4) =0.4110

Choose, path #1à2

P2(2,3)= 0.0247, P2(2,4)= 0.4876, obtained path #1à2à4

Similarly, P2(4,3)= 0.5768, P2(3,1)= 0.3375, obtained final path # 1à2à4à3à1

# Local pheromone updated value by 2nd ant:

# ∆τ =2.42

#Ï„(1,2)=1.508, #Ï„(2,4)=1.508, #Ï„(4,3)= 1.568, #Ï„(3,1)= 1.448

# Now, again updated new pheromone graph after 2nd iteration is shown in figure 8.

Figure 8. Pheromone updated graph for 2nd iteration of example 2

Iteration 3 (Moving path by 3rd ant)

Again third ant starts from the node 1# with updated pheromone value (see fig 8)

Transition probability for 3rd ant P3(1,2)= 0.5260, P3(1,3)= 0.4734, P3(1,4)= 0.000459 and choose path #1à2

P3(2,3)= 0.0034, P3(2,4)= 0.4969, obtained path #1à2à4

P3(4,3)= 0.525192, obtained path #1à2à4à3

P3(3,1)= 0.44688, obtained path # 1à2à4à3à1

# Local pheromone updated value by 3rd ant

# ∆τ=3.28

#Ï„(1,2)=2.216, #Ï„(2,4)=2.216, #Ï„(4,3)=2.252, #Ï„(3,1)=2.1808

# New, updated pheromone graph after 3rd iteration is shown in figure 9.

Figure 9. Pheromone updated graph for 3rd iteration of example 2

Step 2.2: Iteration 4 to iteration 8 is given in table 2 where it has been observed that other ants also follow the same path #1à2à4à3à1 i.e. the smallest path.

Step 2.3: Total three paths are formed from 1-8th iteration and that are #1à3à4à2à1, #1à3à2à4à1 and #1à2à4à3à 1.

Table 2 Traverse path and updated Pheromone values for ant 4 to ant 8 of example 2

Ant no.

Traverse Path by ants

Pheromone updated value

4

1à2à4à3à1

Ï„(1,2)=3.0376

Ï„(2,4)=3.0376

Ï„(4,3)=3.0592

Ï„(3,1)=3.016

5

1à2à4à3à1

Ï„(1,2)=3.93656

Ï„(2,4)=3.93656

Ï„(4,3)=3.94352

Ï„(3,1)=3.9176

6

1à2à4à3à1

Ï„(1,2)=4.8696

Ï„(2,4)=4.8696

Ï„(4,3)=4.8738

Ï„(3,1)=4.8585

7

1à2à4à3à1

Ï„(1,2)=5.829

Ï„(2,4)=5.829

Ï„(4,3)=5.832

Ï„ Ï„(3,1)=5.8231

8

1à2à4à3à1

Ï„(1,2)=6.7694

Ï„(2,4)=6.7694

Ï„(4,3)=6.7712

Ï„(3,1)=6.7658

Step 3: Tree Construction

Step 3.1: From these three paths three respective phylogenetic trees are generated and shown in figure 10.

Figure 10. Possible topologies are obtained after ants traverse all paths of example 2

Calculate the branch length of the three respective tree by neighbor joining method using the formula given below:

dnu(i,j)

For Tree 1:

The distance matrix is:

# Distance d(1,3)1=4/12

# Distance d(1,3)4=1/12

# Distance d(1,3)2=1/12

#Distance d(1,3,4)2=d((1,3),2)+[d((1,3),2)-d(4,2)]η =1/24

Total branch-length (evolutionary distance) of Tree1=b1+b2+b3+b4+b5+b6 =4/12+4/12+1/12+ (-3/12)+1/24+(-1/24) =0.5

For Tree 2:Tree2 total-branch. Length=0.5

For Tree 2: Tree3total-branch.length=0.42

Step 3.2: Here, we obtain three phylogenetic trees (see figure 10).

Step 4: DNA phylogenetic tree optimization

Step 4.1: In the global updating period, is the increment of Ï„ij by sub-colony k, which is defined as follows (see section 3.3.2).

# Fitness (sub-colonyk) = c. (1- )

# Fitness (Tree1) =0.9651, #Fitness (Tree2) =0.9651, #Fitness (Tree3) =0.9652

Step 4.2: Globally updated pheromone values of Tree1:

#Ï„(1,3)=4.445, #Ï„(3,4)=4.448, #Ï„(4,2)=4.442, #Ï„(2,1)=4.442

Tree2: #Ï„(1,3)= 4.445, #Ï„(3,2)=.586, #Ï„(2,4)= 4.442, #Ï„(4,1)=.486

Tree3: #Ï„(1,2)= 4.446, #Ï„(2,4)= 4.446, #Ï„(4,3)=4.452, #Ï„(3,1)=4.445

Step 5: Output

# From example 2, we get optimal tree that is tree 3 which has highest transition probability but lower evolutionary distance. So, tree 3 is optimal tree (see figure 11).

Figure 11. Optimal tree for four sequences of example 2

Note: The implementation of the algorithm using DNA sequences with maximum length can be considered as a part of future scope.

Conclusion

Since the huge amount of required biological raw data has now become available and due to the high algorithmic and computational complexity of underlying algorithms and models the inference of a "tree of life" containing representative species of all living organisms on earth is one of the "grand challenges" of Bioinformatics in our days. In the spirit of the "grand challenge", this paper covers the development of novel concepts for inference of large phylogenies based on the ant-colony optimization method, which has proved to be the most accurate model for inference of huge and complex phylogenetic trees. Thus, the overall goal of this work can be stated as: "How to inexpensively compute more accurate phylogenetic trees in less time?" Here, for tree construction and optimization both the cases we use ant colony algorithm. The results show that our algorithm converges much faster and also achieves higher quality optimal tree than NJ and TSP methods.