Applications of genetic algorithm in software testing

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.


Software testing is one of the major and primary techniques for achieving high quality software. Software testing is done to detect presence of faults, which cause software failure. However, software testing is a time consuming and laborious work. It consumes almost 50% of the software system development resources [5].

Every year software is becoming more complex, to meet the software quality standards . To ensure that software does not fail, this requires software to be tested more thoroughly. Software can be tested in many ways like unit testing, system testing, regression testing, integration testing. A thorough testing of software demands on effort, time and cost of the testing organisation Cost of testing can be reduced if each part of software can be tested automatically.

Automated software testing is better than manual testing. No powerful test data generation tools are commercially available today [7]. Several researchers have proposed different soft computing technique to generate test data/test cases automatically, like artificial intelligence, neural networks and genetic algorithms. Artificial Neural network (ANN) has been used for estimating the testing effort, maintenance, reliability prediction, metrics generation and in reverse software engineering. Some researchers have already proved that ANN can also be used as an oracle in software testing. Several researchers did work on artificial intelligence for software testing, verification and validation. Fuzzy logic is also used by many researchers for determining software maintenance. Some researchers have even combined these techniques for better results. Applications of these techniques in software testing are an emerging area of research. GA has been applied in many optimization problems, for generating test plans for functionality testing and many other areas.

2. Genetic Algorithm

A GA is a search algorithm that is inspired by the way nature evolves species using natural selection of the fittest individuals.

GA has well defined steps. A basic algorithm for GA is as follows [Somerville]

The pseudo code for GA is as follows:

Initialize (population)

Evaluate (population)

While (stopping condition not satisfied)


Selection (population)

Crossover (population)

Mutate (population)

Evaluate (population)


The possible solutions to the problem being solved are represented by a population of chromosomes. This initial population can be totally random, or can be created manually using processes such as the greedy algorithm [ ].

A GA uses three operators on its population.

• Selection

• Crossover or recombination

• Mutation

Selection generates the new population from the old one, thus starting a new generation. A selection mechanism is used to select the new chromosomes for the new population. Each chromosome is evaluated in present generation to determine its fitness value. This is done by applying a fitness function to each chromosome. This fitness value is used to select the better chromosomes from the population for the next generation. The individuals with lowest fitness value are called as elites and are passed to the next generation.

After selection, the crossover operation is applied to the selected chromosomes. The outcome of crossover is to combine two fit chromosomes to get even better chromosomes. The chromosomes of the parents are mixed in some way during crossover, most of the time by swapping a portion of the data between the parents. This process is repeated with different parent individuals until the next generation has enough individuals. After crossover, the mutation operator is applied to a randomly selected subset of the population.

Mutation alters chromosomes in small ways to introduce new good traits. Mutation operator is applied to bring diversity in the chromosome population.

3. Genetic algorithms in testing

This section will illustrate the work done by the researchers in the field of genetic algorithm. Different test case generation approaches using GA and also with the help of GA combined with model based testing, fuzzy is described in this section.GA is also used for the test planning, for prioritizing the test cases in regression testing which is also illustrated in this section.

3.1 Test case generation guided by data flow dependency

Data flow analysis is used to show the control flow and the interactions between the variables of a program. Girgis [2] used the data flow dependency information for the test case generation. A path is described as a sequence of nodes and edges to represent the sequence of a program.

Program is converted into a flow graph where each node represents a block in a program. The edges of the flow graph depict the control flow of the statements. Variables can be divided into 'c-uses' and 'p-uses'. c-uses variables are those which are used in computations or as a predicates in a program. p-uses variables are associated with edges of the flow graph. The data flow relationships can be determined by def c-use and def p-use i.e. the def clear paths to their c-use at node i and def clear paths to their p-use at edge (i, j).List of def c-use and def p-use path in a program is determined. dcu paths and dpu paths are determined along with their killing nodes. Killing nodes are those nodes path that occur in the path containing the other definition of a variable. Therefore, the term 'def-use paths' will be used to mean the set of dcu-paths and dpu-paths together [2].

The proposed GA by Girgis uses a binary vector as a chromosome to represent the input of a program. The length of the input is determined by the domain and the precision. The domain is represented by Di= [ai,bi].Each variable in a program can take values from domain and for precision each domain should be cut into (bi-ai).10di<=2mi-1.

A test case vi is effective if its fitness value eval (vi) > 0.All the test cases are selected with effective eval (vi) or good fitness value. Selection can be done by either roulette wheel selection or random selection [Goldberg 89].The effective test cases then become the parents of the new population. If none is effective then all the individuals are chosen as the parents.

3.2 Test case generation from control flow graph

In [5] have presented a method for optimizing software testing efficiency by identifying the most critical path clusters in a program. The Software under test (SUT) is converted into a control flow graph (CFG).CFG is a simple notation for the representation of control flow. Weights are assigned to the edges of the CFG .Higher weights are given to the loops, branching paths and less weight is given to the sequential paths. The summation of the weights along the edges comprising a path determines the criticality of the path. Higher the summation more critical is the path and therefore must be tested before other paths. In this way by identifying the most critical paths that must be tested first, the testing efficiency is increased.

The approach for test cases generation from the directed graph proposed by Dr.Velur Rajappa,Arun Biradar and Satanik Panda [1 ] is illustrated below. Test cases can be generated by converting the program is converted into a directed flow graph with all the intermediate states.Then the crossover and mutation operator can be applied to the directed graph based on their fitness value. This is a different technique for generating test cases from directed graph as compared to the test cases which are usually generated from the model based testing using UML diagrams.

A Directed graph is represented by G= [V, E], where V represents state or vertices and edges represents flow of control. First the graph is converted into a binary string which is illustrated below in the figure.

In the dual graph generation, if an edge1 is an incoming to some node and the edge 2 is outgoing edge for the same node then an edge is created from edge 1 to edge 2 which acts as a nodes in a dual graph generation. All possible two links combination in the dual graph for example bc,bf,bg... are traced. All the dual combinations are then encoded in 0 and 1 format as a genetic population.

Another test generation approach [13] used resource request algorithm. Using the GA suitable test data set is generated which covers the need for each process. Fitness function is determined by counting the number of times a data enters and remain in the resource request algorithm. Higher the value of the fitness function value, higher is the chances of avoiding a deadlock. The test data with the higher value of the count is selected for crossover for the next generation and the mutation.

Resource request algorithm includes Banker's algorithm in itself. During initialization two arrays are created) Allocation array b) need array. The allocation array lists the resources for the processes. The need array lists the resources which are needed by the process. The fitness function will be calculated by evaluating the strength of test data which is determined by the number of decision nodes it passes and gave the results. Another evaluation constraint is to find the number of nodes covered by the test data or the time to execute the instruction at particular node.

3.3 Test case generation from UML state chart diagram using GA

GA can be used to generate the test data using UML state chart diagram as described by Doungsa [3]. Sometimes after coding developers don't have time to test the software. Generating test cases from UML state chart diagram can solve this problem by generating them before the coding. Then the test cases can be generated as per the specifications of the software. Specifications can be in the form of UML diagrams, formal language specifications or natural language description.

State diagrams or state chart diagrams are used to help the developer better understand any complex/unusual functionalities or business flows of specialized areas of the system. In short, State diagrams depict the dynamic behaviour of the entire system, or a sub-system, or even a single object in a system.

Sequence of triggers for UML state diagram can be used as a chromosome [5]. The sequence of triggers is an input for the state diagram which acts as test cases for a program to be tested. Each trigger is examined to check for the transitions which leads to a new state .Each trigger checks for state and transition coverage. If the trigger can generate new state from the current state then next trigger is checked. If the trigger in a sequence cannot generate a new state then tracing for the state coverage will be stopped and the state and the transition coverage are recorded without taking rest of the sequence to consider. As each trigger is traced, new states and transitions are recorded. The process continues for fixed number of generations until all the states and transitions are covered. Fitness of the chromosome is evaluated by using objective function as follows:

aW + bX+ cY + Z

Where, a, b, c are constant value where a = 0 when there is no guard condition in selected transition is a number of states in test cases which value of attribute in that state make guard condition to be true's is a number of transitions which is covered by this test but have not been covered by previous test set. Y is a number of states that can be reached by test case to reachable transition source. Z is a number of state and path coverage for the test case.

Test cases are the selected based on their fitness function. Test case with best fitness value is selected as parents. Based on the fitness function the selection operator is used to apply crossover and mutation operator to the sequence of triggers. Crossover operator is then applied to the sequence of triggers .This operator then generates new states and transitions. After a new generation is created, UML state diagram is then executed again to check for new chromosomes.

3.4 Test case generation based on fuzzy based extension of genetic algorithm (FAexGA)

Mark Last, Shay Eyal and Abraham kandel [6] have used FAexGA approach for test case generation. The crossover probability varies according to the age intervals assigned during the lifetime. The crossover probability of young and old individuals is assigned low while for other age interval this probability is high. The very young offsprings crossover probability is low thus enabling exploration capability .Old offsprings have also less crossover probability and eventually dying out would help avoiding a local optimum or premature convergence. On the other hand, middle-age offsprings are frequently used for crossover operation.

Fuzzy logic controller (FLC) is used for determining probability of crossover. The fuzzification interface of the FLC includes the variables that determine the age of the offsprings. FLC assigns every parent values Young or Middle-age or Old. These values determine the membership for each rule in the FLC rule base.


















Table1: Fuzzy rule for crossover probability

The fuzzy rule base used in this experiment is presented in Table1. Each cell defines a single fuzzy rule. For example, the upper left cell refers to the following rule: "If Parent 1 is old and Parent2 is old then crossover probability is Low".

3.5 Focused software usage testing

Robert M .Patton, Annie S. Wu, and Gwendolyn H. Walton [8] have proposed the usage models for defining usage scenarios. They are used in test planning, to generate a sample of test cases that represent usage scenarios, and to test results. In system testing determining the nature and the location of the errors can be difficult which later on can be the problem for the developers to fix the errors. The system testing contains only small information from the usage scenarios of the system. Due to the limited information generalizing the testing results could be difficult.

To solve these problems GA accepts, the domain data generated by the usage model and the results of system test as the two inputs.

A set of test cases are the initial population generated from a usage model. Each individual in the population represents a single test case. Each individual represents a single test case and is sent to the Tester and is then applied to the Software under Test. The Software under Test processes this input and generates the output that is later analyzed by the Test Oracle. The Test Oracle will then determine if the output is correct or incorrect or if the software under test failed or crashed. The GA uses this result along with the likelihood that it would occur as defined by the usage model to help determine the overall fitness of the individual.

3.6 Unit testing of object-oriented software

A test case for any object oriented software consists of the definition of testing prerequisites, a test program consisting parameters, their types and their values, method calls as well as the test oracle, which is used for validation of test results[9]. Each test case can be considered as a sequence of statements S= {s1, s2,...., sn}. A statement consists of the following essential components:

• Target object

• Method

• Parameters

• Receiver

This is the information which will be needed to encode a genotype individual or chromosome.

The GA must be able to generate test programs, but since they don't have understanding of programs, statement or objects therefore some kind of encoding of these components is required to be defined which may allow the representation of a test program as a chromosome which can be used with the GA. Genetic programming (GP) is based on hierarchically organized trees which requires specialized genetic operators for crossover and mutation. Test case sequences will be represented by trees and these trees must contain the information of which methods should be called in sequence, and which target objects and parameter objects should be used for the individual method calls.

In Jose, Mario and Francisco have proposed the methodology for evaluating the quality of both feasible and unfeasible test cases [21].The test cases that are terminated with a call to a method and are completely executed are termed as feasible test cases whereas the test cases which abort prematurely are termed as unfeasible test cases.

Test cases are represented as strongly typed genetic programming (STGP) individuals where each individual contains number of STGP trees equal to the number of arguments in the current method or method under test (MUT). The trees are traversed by depth first traversal algorithm which generates the sequence of method calls or scenarios. Traversing the trees by depth first traversal generates a linear sequence of computations of MUT which is represented by a CFG.The edges between the nodes represents the flow of control from one node to the other node. The nodes are assigned the weights and the fitness of test cases is computed. The fitness of feasible test case is computed on the basis of number of times a particular CFG node was exercised by the test cases of previous generations whereas unfeasible test cases are measured by the method calls that threw the exceptions or distance between the runtime exception index which results in prematurely termination of a test case.

3.6. Prioritization of test cases in regression testing

In regression testing, test cases need to be executed again and again.Re-execution of the test cases requires resources which can be costly or insufficient. To overcome this problem, R.Krishnamoorthy and S.A Sahaya Arul Mary have prioritized test cases by ordering them according to their execution time and the number of faults detected [10]. The test cases which can detect more faults in less execution time are executed first and can overcome the problem of limited resources. An Average percentage of faults detected (APFD) metric can be used to measure the effectiveness of the test cases ordering.

The execution time of each test case should be recorded excluding the class loading time. The initial time and the shutdown time should be included while recording the execution time because these operations can increase the execution time.

The Program and each test case in a test suite are inputs to the genetic algorithm along with the user specified parameters .Few of them are mentioned below:

* Test suite

* Collection of all permutations of elements of test suite i.e. 2T

* Time budget

* Percent of total test suite time

* Crossover and mutation probability

* Program coverage weight

Fitness function

Each test suite will be assigned a fitness based on the percentage of code covered in a program P by that test suite and the time at which each test covers its associated code in P. If a test suite execution time does not exceeds the time budget then its fitness value can be considered as good and is the candidate for the next generation.


Crossover is used to vary the test suites from one test suite to the recombination .A random number r is generated such that r ε [0,1].If r is less than user specified crossover probability the crossover operator is applied. Otherwise the parent individuals remain unchanged and mutation is applied.

3.7 Automatic Structural testing

Maha alzabidi,Ajay Kumar and A.D Shaligram[7]have presented the automatic structural test case design using evolutionary testing .Software structural testing is done by taking path coverage for testing. For path testing a control flow graph (CFG) is used in this paper for the representing a program where the nodes are the basic blocks and the edges between the nodes indicates the flow of the program. Meaningful paths are extracted from CFG and are selected as a target path for testing. Test cases are generated to trace the new path which leads to the target path. The test result is evaluated to determine that the testing objective criteria are satisfied by executing the selected path.

In this work more emphasis is given to the fitness function improvement. The fitness function is named as a Shifted-Modified-SIMILARITY (SMS) which is a modification to the hamming distance. The symmetric difference or hamming distance is calculated for cascading edges for target path and current path. Similarities are then normalized and summed associated with a weighting factor. This value is used as an objective function to evaluate the individuals in the population. Performance of different GA parameters is studied in this paper. Parameters have been applied on different test programs and results have shown that the double crossover is more effective than single crossover applied on a test program.

3.8 Test plans for functionality testing

Francisca Eanuelle Vieira,Ronaldo Menezes,Marcio Braga[15] have presented GA based technique to generate good test plans in an unbiased way to avoid experts interference. The emphasis is given on the fact that an error in a program is not necessarily due to the last operation executed by the user but may have been due to a sequence of previously executed operations that leads an application to an inconsistent state. In other words, as a sequence of operations is executed, the state of inconsistency is non-decreasing or a problem in a software application is directly proportional to the level of inconsistency of the state in which application is. In this work the operation of large granularity have been chosen so that the sequence of operation that leads application to inconsistent state can be identified. Fitness value is calculated as:

Where p=l1,l2, is a test plan or sequence of operations. Where t is a transition function for converting one operation ,li to the next operation li+1 in a sequence or a new state. Larger the value of fitness function, better the sequence is considered which is likely to take the application to an inconsistent state.


[1] Dr.Velur Rajappa,Arun Biradar,Satanik Panda "Efficient software test case generation Using Genetic algorithm based Graph theory "International conference on emerging trends in Engineering and Technology,2008 IEEE.

[2] Moheb R.Girgis "Automatic test generation for data flow testing using a Genetic algorithm" Journal of computer science,vol.11,no.6(2005)898-915

[3] Chartchai Doungsa-ard ,Keshav Dahal,Alam gir Hossain ,and Taratip Suwannasart "An automatic test data generation from UML state diagram using genetic Algorithm".

[4] Praveen Ranjan Srivastava , Priyanka Gupta , Yogita Arrawatia , Suman Yadav "Use of Genetic Algorithm in generation of feasible test data" SIGSOFT software Engineering Notes, March 2009,volume 34,number 2.

[5] Praveen Ranjan Srivastava and Tai-hoon Kim "Application of Genetic aglorithm in software testing" International Journal of software Enginnering and its Applications" , vol.3, No.4, October 2009.

[6] Mark Last, Shay Eyal and Abraham kandel "Effective Black-box testing with Genetic Algorithms".

[7] Maha alzabidi, Ajay Kumarand A.D.shaligram "Automatic software structural testing by using evolutionary algorithms for test data generations" IJCSNS International Journal of Computer science and Network Security, Vol.9, No.4, April 2009.

[8] Robert M .Patton, Annie S.Wu, and Gwendolyn H .Walton "A Genetic Algorithm approach to focused software usage testing".

[9] Nirmal Kumar Gupta and Dr.Mukesh Kumar Rohil "Using Genetic algorithm for unit testing of object oriented software "First International conference on Emerging trends in Engineering and technology,2008,IEEE.

[10] R.Krishnamoorthy and S.A Sahaaya Arul Mary "Regression Test suite Prioritization Using genetic Algorithms" International Journal of Hybrid InformationTechnology,Vol.2,No.3,July 2009

[11] D.J Berndt and A. Watkins 'High volume Software Testing using Genetic Algorithms Proceedings of the 38t h Hawai International Conference on system sciences, 2005, IEEE.

[12] Sultan H.Aljahdali and Mohammed E.EI-Telbany "Genetic algorithms for Optimizing Ensemble of models in software reliability Prediction"ICGST -AIML Journal, Volume 8, Issue 1, June 2008.

[13] Praveen Ranjan Srivastava, Vinod Ramachandran, Manish Kumar, Gourab Talukder,Vivek Tiwari, Prateek Sharma "Generation of Test Data Using Meta Heuristic Approach".

[14] Stefan Wappler, Frank Lammermann "Using Evolutionary algorithms for Unit Testing of Object oriented software", GECCO, ACM, June 2005.

[15] Francisca Emanuelle,Ronaldo Menezes,Marcio Braga "Using Genetic algorithms for test plans for functional testing" ACM, March 2006.

[16] Mark Harman,Kiran Lakhotia ,Phil McMinn "A Multi-objective approach to search -based Test data generation"GECCO,ACM, July 2007.

[17] Maha Alzabidi,Ajay Kumar and A.D.Shaligram "Automatic Software structural testing by using Evolutionary Algorithms for test data generations"IJCSNS International Journal of Computer science and Network security ,Vol.9 No.4,April 2009.

[18] Somerville, I., "Software engineering", 7th Ed. Addison-Wesley

[19] Goldberg,D.E,"Genetic Algorithms: in search, optimization and machine learning", Addison Wesley, M.A,1989

[20] Kristen R.Walcott, Gregory M. Kapfhammer "Time aware Test suite prioritization", ACM, July, 2006.

[21] Jose Carlos,Mario,Alberto,Francisco "A strategy for evaluating feasible and unfeasible test cases for the evolutionary testing of object-oriented software",ACM,May,2008.