Genetic Algorithm For Regression Test Suite Prioritization Technology Essay

Published:

Regression testing is a testing technique which is used to validate the modified software. The regression test suite is typically large and needs an intelligent method to choose those test cases which will detect maximum or all faults at the earliest. Many existing prioritization techniques arrange the test cases on the basis of code coverage with respect to older version of the modified software. In this approach, a new Genetic Algorithm to prioritize the regression test suite is introduced that will prioritize test cases on the basis of total fault coverage and complete code coverage. The genetic algorithm would also automate the process of test case prioritization. The results representing the effectiveness of algorithms are presented with the help of an Average Percentage of Faults detected(APFD) metric.

Keyword: Genetic Algorithm; Prioritization; Regression Testing; Automation Testing.

1. Introduction

Regression testing is retesting changed segments of application system. It is performed frequently to ensure the validity of the altered software. In most of the cases, time and cost constraint is prominent; hence the whole test suite cannot be run. Thus, prioritization of the test cases becomes essential so that maximum faults can be exposed in minimum time. The ordering of test cases in prioritization technique is such that, the highest priority test case is executed earlier than lower priority test cases. The priority criteria can be set accordingly e.g. to increase rate of fault detection, to achieve maximum code coverage, to cover important features earlier and so on. When execution time is short, time constraint prioritization technique is necessitated.

Lady using a tablet
Lady using a tablet

Professional

Essay Writers

Lady Using Tablet

Get your grade
or your money back

using our Essay Writing Service!

Essay Writing Service

Test case prioritization involves scheduling test cases in an order that increases their effectiveness in meeting some performance goals. Two goals are considered. One of the performance goals is to run those test cases that achieve total code coverage at the earliest [Aggarwal, et. al(2004)]. Second goal is to achieve total fault coverage in minimum execution time. In this work we propose a technique that achieves 100% code coverage and total fault coverage. Many techniques are used for prioritization and are divided into various categories. The three broad categories for prioritization are Greedy algorithms, non-evolutionary algorithms and evolutionary algorithms. Evolutionary algorithms have been chosen as they are global optimization methods and scale well to higher dimensional problems. They can be easily adjusted to the problem at hand and can be change and customised.

The Evolutionary Algorithm (EA) is interactive and metaheuristic process that operates on a set of population (also known: chromosomes, individuals, genotype etc.). Each individual in population represents a latent solution to the problem being solved. Most of the implementations of evolutionary algorithms came from any of these three basic types: Genetic Algorithm(GA), Evolutionary Programming (EP) and Evolutionary Strategies (ES). All these are strongly related but independently developed. Among evolutionary techniques, the GA, invented by John Holland in the 1960s at the University of Michigan, is the most widen group of methods representing application of evolutionary tools. They depend on the use of selection, crossover (recombination) and mutation operators. Replacement is usually done by generations of new individuals. The original goal of genetic algorithms was to formally study the phenomenon of evolution and adaptation as it occurs in nature [Holland, (1992)].

Automated software testing has been considered critical for big software development organizations but is often too expensive or difficult for smaller companies to implement. Automation Testing means developing and executing test cases that can run unattended, comparing the actual to expected results and logging status. It is automating the manual testing process in use. It Improves accuracy, increase test coverage, saves time and money and helps developers and testers. This algorithm automates the process of prioritize the test suites as per the criteria given to genetic algorithm.

In this paper the related work is presented in section 2, the GA method has been explained in section 3. Section 4.1 presents the Genetic algorithm flowchart, 4.2 presents a formalised algorithm for regression test suite prioritization, 4.3 explains the explanation of the algorithm, 4.4 present the prioritization of test cases based on total fault coverage with examples, 4.5 present the prioritization of test cases based on total code coverage with an example demonstrating the working of the GA. Section 5 gives the comparision of various prioritisation approaches to GA. And finally in the succeeding sections applications,conclusion and futute work is presented.

2. Related Work

Lady using a tablet
Lady using a tablet

Comprehensive

Writing Services

Lady Using Tablet

Plagiarism-free
Always on Time

Marked to Standard

Order Now

In this section an overview of related work of the test case prioritization and genetic algorithms is presented. Many researchers addressed prioritization problem and proposed various techniques for it. Many techniques are used for prioritization such as Greedy algorithms for test case prioritization[li, (2007)], additional greedy algorithms[li, (2007)], 2-optimal algorithms[bharti], non-evolutionary algorithms such as goal programming method[Bryson], logarithmic least square method[Crawford and Williams], weighted least square method[chu] and evolutionary algorithms[salami]. Most frequent among all is total fault-detection technique [Rothermel, (1999)].

In the test case prioritization using genetic algorithms, the prioritization criterion is based on fitness function of population and genetic operators [Krishnamoorthi, (2009)]. Further, Genetic algorithm is used for network security in cryptography [Gorodilov and Morozenko (2008)]. GA is also used in Data Mining operations[Kamble(2010) ]and Robotics[Cernic, et.al]. It is also used in problem solving such as Travelling Salesman Problem[Al-Dulaimi and Ali (2008)].

3. The Genetic Algorithm Method

Over several years, organisms are evolving on the basis of fundamental principle-natural selection "survival of fittest" to accomplish noteworthy results. In 1975, Holland employed principle of natural evolution to optimization problems and built first GA. The GA is based on fundamental principle of evolution and genetics by S.N. Sivanandam.

Population is the main factor in GA. A population P = (c1, ……..,cm) is formed from a set of chromosomes and each chromosome is composed of genes where each gene is an instance of an allele. The GA populates the population of chromosomes by successively replacing one population with another and a fitness based on fitness function is assigned to each chromosome. The selection of strong individual is included in next population and individuals with low-finess are eliminated from each generation. [Kristen R. W, ( 2005)]. There are two main concepts in genetic algorithm viz: crossover and mutation.

3.1 Crossover

The crossing over (key operator) is process of yielding recombination of alleles via exchange of segments between pairs of chromosomes.Crossover is applied on an individual by switching one of its allele with another allele from another individual in the population. The individuals resulting from crossover are very different from their initial parents. The code below suggests an implementation of individual using crossover:

Child1 = c*parent1 + (1-c)*parent2

Child2 = (1-c)*parent1 + c*parent2

3.2 Mutation

The mutation is a process wherein one allele of gene is randomly replaced by (or modified to) another to yield new structure .It alters an individual in the population. It can regenerate all or a single allele in the selected individual. To maintain integrity, operations must be secure or the type of information an allele holds should be taken into consideration. That is, mutation must be aware of binary operations, or it must be able to deal with missing values.

A simple piece of code:

child = generateNewChild();

The optimization problems are solved by GA's recombination and replacement operators, where recombination is key operator and frequently used, whereas, replacement is optional and applied for solving optimization of problem.

4. Genetic Algorithm for Prioritization of Test Cases

The initial population is automatically generated in this approach. The evaluation of the set of candidate solution has been done with the help of genetic algorithm. Two different stopping criteria has been used in this approach, 1. total fault coverage 2. Total code coverage.

4.1 Flowchart

Fig.1. Flowchart of Genetic Algorithm.

4.2 Algorithm

STEP 1. Generation of initial population

Generate 'n' number of chromosomes {c1, c2… cn}

STEP 2. Initialization of population

Set Test Suite= No. of chromosomes (n)

STEP 3. Fitness function criterion set

Set fitness function= total fault coverage+less time of execution/total code coverage

STEP 4. Select suitable population on the basis of Fitness Function

SELECT (Best 2 chromosomes based on fitness function)

STEP 5. Genetic Operators Applied

Do for selected Chromosome(s)

While (all faults are covered)

Do crossover

Do mutation

Remove Duplicacy

EndWhile

EndFor

STEP 6. Optimization of solution cheked.

If (solution!= feasible)

Goto STEP 5

Else END.

4.3 Explanation of Algorithm

The main component in GA is the set of population. The optimal solution is searched on the basis of desired population which further can be replaced with the new set of population. The GA starts with generation of population (test suites). The initialization of test cases is done according to the problem e.g. if problem deals with routing problem then number of chromosomes will be number of nodes in the route. The two fitness criterion chosen are maximum fault covered in minimum execution time and total code coverage. Henceforth, this fitness function will help in selecting suitable population for problem. Further, the genetic operations are performed. Firstly, crossover, which recombines two individuals to form population for next generation. Secondly, mutation, which randomly swaps the individuals to produce the near optimal population. Thirdly, the redundant individuals are removed. Finally, the solution is checked for optimization. If solution is not optimized, then, the new population is reproduced and genetic operators are applied.

4.4 Prioritization of test cases based on total fault coverage

Lady using a tablet
Lady using a tablet

This Essay is

a Student's Work

Lady Using Tablet

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Examples of our work

This technique is based on prioritization of test cases in order to achieve total fault coverage in minimum execution time. To ensure that all the faults are covered by the test cases we use mutation testing. Mutation testing is one way to test those tests. It involves deliberately altering a program's code, then re-running a suite of valid tests against the mutated program.  A good test will detect the change in the program and fail accordingly. Thus, genetic algorithm has been developed and their effectiveness is measured by using mutation testing. This is illustrated by two examples in which mutants are created.

4.4.1 Examples

4.4.1.1 Example 1

The problem taken is "college program for admission in courses". The problem specification is available at website http://HYPERLINK "http://www.planet-source-code.com/"www.planet-source-code.com.

In this example a test suite has been developed which consisted of 40 test cases. For simplification, to explain the technique, a test suite with 9 test cases is considered in it, covering a total of 5 faults.

The regression test suite T contains nine test cases with the default ordering {T1, T2, T3, T4, T5, T6, T7, T8, T9}, the faults covered by each test case and the execution time required by each test case in finding faults are as shown in Table 1(a,b). The equal priority is given to the number of faults covered and minimum execution time in selecting test suites.

Table 1(a): Test cases with the faults.

Test case/

Faults

F1

F2

F3

F4

F5

T1

X

X

X

X

T2

X

X

T3

X

X

X

T4

X

X

X

T5

X

T6

X

X

X

T7

X

T8

X

X

X

T9

X

Table 1(b). Test cases with the number of faults covered and time taken.

TEST CASE

NO.OF FAULTSCOVERED

EXECUTION TIME (UNIT)

T1

4

11.5

T2

2

11.5

T3

3

12.33

T4

3

10.66

T5

1

15

T6

3

8.33

T7

1

15

T8

3

10

T9

1

11

Step 1: Test case Generation

We are applying the foremost step of our algorithm by selecting the randomized test suites. The number of test cases is the number of chromosomes generated.This is explained in the table given below.

Table 2. Execution of Example.

G-Genes, T-execution time, FC faults covered

CHROMOSOME

OBSERVATIONS FOR IST ITERATION

TOTAL

TIME

(UNITS)

C1

G

T1

T2

T4

33.66

T

11.5

11.5

10.66

FC

1,2,3,5

1,2

1,4,5

C2

G

T2

T7

T9

T6

T4

56.49

T

11.5

15

11

8.33

10.66

FC

1,2

1

3

1,3,5

1,4,5

C3

G

T3

T6

T9

T8

41.66

T

12.33

8.33

11

10

FC

1,3,5

1,3,5

3

1,2,4

C4

G

T4

T5

T8

T9

46.66

T

10.66

15

10

11

FC

1,4,5

1

1,2,4

3

C5

G

T5

T8

T9

T3

48.33

T

15

10

11

12.33

FC

1

1,2,4

3

1,3,5

C6

G

T6

T9

T4

T8

39.99

T

8.33

11

10.66

10

FC

1,3,5

3

1,4,5

1,2,4

C7

G

T7

T1

T2

T4

48.66

T

15

11.5

11.5

10.66

FC

1

1,2,3,5

1,2

1,4,5

C8

G

T8

T9

T5

T1

47.5

T

10

11

15

11.5

FC

1,2,4

3

1

1,2,3,5

C9

G

T9

T2

T3

T8

44.83

T

11

11.5

12.33

10

FC

3

1,2

1,3,5

1,2,4

Step 2: Select an input for GA algorithm based on the fitness function

The fitness function in this is selecting minimum test cases to cover all the faults in minimum execution time. Thus based on this fitness function, we get two iterations with test suite {T1, T2 and T4} and {T6, T9, T8, and T4}.

Step 3.1: Apply Genetic Algorithm on first test suite to further prioritize.

Step 3.1.1 Do crossover

Applying crossover on this test suite

T1

T2

T4

Thus we get the two test suites as two offsprings of order

T1

T4

T2

and

Step 3.1.2 Do mutation

After applying mutation on one of the offspring, we get the test suite as:

T4

T1

In this we have to take all the three test cases to cover all the faults and in same execution time and no test suite formed after crossover covers all the faults. Thus we take second best randomized input to solve for optimized results.Hence, we see that by applying genetic algorithm on the output of an optimization technique we get an optimal solution in less time.

Step 3.2. Apply Genetic Algorithm on second test suite to further prioritize

T6

T9

T4

T8

Step 3.2.1. Do crossover

Thus the test suites we get after crossover as two offsprings are

T6

T8

T4

T9

Step 3.2.2 Do mutation on one of the best offspring.

After applying mutation, we get the test suite as:

T8

T6

We see that by executing only two test cases T6 and T8 we can cover all the faults and in minimum execution time of 18.33 seconds.

Step 3.2.3 Removing the duplicates from the test suites we get

T8

T6

T4

T9

We remove the test case T4 and T9 because all the faults are detected by Test cases T6 and T8. Thus, test suite all faults in minimum execution time.

T8

T6

Thus, by using this example we prioritize the test cases which cover all the faults in Minimum execution time in a much simple manner.

4.4.1.2 Example 2

Another problem specification for "Hotel Reservation" which reserves the rooms in hotel and maintains the record. The complete problem specification is available at the website http://HYPERLINK "http://www.planet-source-code.com/"www.planet-source-code.com.

Table 2(a): Test cases with the faults.

TEST CASE/FAULTS

F1

F2

F3

F4

F5

T1

X

X

X

T2

X

T3

X

X

X

T4

X

T5

X

X

X

X

Table 2(b). Test cases with the number of faults covered and time taken.

TEST CASE/FAULTS

Total Faults Covered

Time of execution

T1

3

12.2

T2

1

10

T3

3

10.67

T4

1

7

T5

4

9.97

The test suite which covers all the faults is

T1

T5

4.5 Prioritization of test cases based on total code coverage

Prioritization based on total code coverage is done by structural testing. It compares test program behaviour against the source code. It can be used to find the number of test cases required to guarantee a given level of test coverage. This is achieved through path testing which is a group of test techniques based on selecting a set of test paths through the program. If the set of paths is properly chosen, it means some measure of thoroughness is achieved. The control flow of a program can be analysed using graphical representation known as flow graph. Flow graph generation is the first step of path testing. Then decision to decision (DD) path graph is generated form flow graph. It is used to find independent paths. An independent path is any path through the DD path graph that introduces at least one new set of processing statements or new conditions. Therefore, we need to execute all independent paths at least once during path testing.The metrics used for code coverage are like statement coverage, decision coverage, condition coverage and path coverage [Cornett].

The example taken for complete code coverage is the simple triangle problem which takes inputs as a triple of positive integers (x, y, and z) and gives the output as scalene, isosceles, equilateral, not a triangle and invalid inputs.[Aggarwal and Singh]. The test cases are generated according to robust value testing. The test cases, expected outputs, statements covered, conditions and independent paths covered are shown in the table 3(a).

INPUTS

Table 3(a): Test cases with inputs and outputs

TEST CASE NO.

X

Y

Z

EXPECTED OUTPUT

INDEPENDENT PATH

CONDITION COVERED

NODES COVERED

STATEMENT COVERED

1

50

50

0

INVALID INPUT

abfgnpqr

3

8

21

2

50

50

1

ISO

abcdeghjkmqr

5

12

23

3

50

50

2

ISO

abcdeghjkmqr

5

12

23

4

50

50

50

EQUI

abcdeghimqr

4

11

22

5

50

50

99

ISO

abcdeghjkmqr

5

12

23

6

50

50

100

NT A TRI

abcdegnoqr

4

10

21

7

50

50

101

INVALID INPT

abcegnpqr

4

9

20

8

50

0

50

INVALID INPT

abfgnpqr

3

8

21

9

40

20

30

SCALENE

abcdeghjlmqr

5

12

24

10

50

1

50

ISO

abcdeghjkmqr

5

12

23

11

50

2

50

ISO

abcdeghjkmqr

5

12

23

12

50

99

50

ISO

abcdeghjkmqr

5

12

23

13

50

100

50

NT A TRI

abfgnoqr

3

8

20

14

50

101

50

INVALID INPT

abcegnpqr

4

9

21

15

0

50

50

INVALID INPT

abfgnpqr

3

8

21

16

1

50

50

ISO

abcdeghjkmqr

5

12

23

17

20

40

30

SCALENE

abcdeghjlmqr

5

12

24

18

80

50

80

EQUI

abcdeghimqr

4

11

22

19

100

50

50

NT A TRI

abcdegnoqr

4

9

21

20

101

50

50

INVALID INPT

abcegnpqr

4

9

21

Step 1: Test case Generation

We are applying the foremost step of our algorithm by selecting the randomized test suites. The number of test cases is the number of chromosomes generated. This is explained in the table given below.

Table 3 (c) Execution of Example

G-Genes, C- Chromosome

CHROMOSOMES

OBSERVATIONS FOR 1stITERATION

C1

G

T1

T5

T6

T9

T4

T7

T11

T17

T20

C2

G

T2

T4

T9

T12

T16

T18

T7

T8

T10

T6

T20

C3

G

T3

T15

T17

T19

T6

T20

T4

T13

T5

T14

C4

G

T4

T6

T17

T9

T12

T10

T20

T1

T3

T12

T7

C5

G

T5

T8

T12

T15

T19

T20

T14

T6

T11

T10

T7

C6

G

T6

T12

T1

T20

T16

T2

T19

T4

C7

G

T7

T9

T13

T15

T14

T18

T19

T20

T4

C8

G

T8

T10

T14

T20

T12

T4

T9

T2

T6

T7

C9

G

T9

T19

T12

T8

T1

T5

T4

T10

T17

T20

C10

G

T10

T12

T14

T16

T18

T20

T6

T2

T4

T13

C11

G

T11

T13

T15

T20

T19

T1

T18

T17

T3

T4

C12

G

T12

T14

T16

T18

T4

T20

T9

T7

C13

G

T13

T17

T20

T19

T6

T14

T12

T6

T7

T4

C14

G

T14

T4

T19

T6

T8

T12

T5

T20

T3

T1

C15

G

T15

T3

T7

T9

T4

T1

T18

T10

T14

T20

C16

G

T16

T10

T19

T20

T5

T11

T8

T14

T4

T12

C17

G

T17

T5

T1

T16

T7

T6

T17

T12

T20

T2

T4

C18

G

T18

T19

T15

T17

T5

T20

T8

T9

T2

T4

C19

G

T19

T4

T13

T14

T3

T6

T1

T7

T3

T20

C20

G

T20

T4

T1

T2

T8

T3

T18

T6

T9

T16

T19

Step 2: Select an input for GA algorithm based on the fitness function

The fitness function in this is selecting minimum test cases to cover all the independents paths with minimum test cases. Two test suites of eight test cases and two test suites of nine test cases are selected as per the fitness function.

The crossover is applied on test suites of similar length. The 3-point crossover is applied on two sets of test suites.

Two offsprings are formed on applying crossover. One of the two offspring covers all the independent paths while the other does not cover all the independent paths and hence that offspring is discarded.

Thus based on this fitness function, we get two iterations with test suite {T6,T12,T1,T20,T16,T2, T19 and T4} and {T12,T14,T16,T18,T4,T20,T9,and T7}.

Step 3.1: Apply Genetic Algorithm on first test suite to further prioritize.

Step 3.1.1 Do crossover

Applying crossover on test suite of eight test cases

Thus the test suites we get after crossover as two offsprings are

T6

T12

T1

T18

T4

T20

T9

T7

T12

T14

T16

T20

T16

T2

T19

T4

And

The first offspring i.e. test suite obtained after crossover covers all the independent paths and that test suite is selected for mutation. The test suite selected is as:

T6

T12

T1

T18

T4

T20

T9

T7

Step 3.2.2 Do mutation on one of the best offspring and the process shown is as:

The result obtained after applying mutation is

T6

T7

T1

T9

T4

T20

T18

T12

Step 3.2.3 Removing the duplicates from the test suites

Thus, removing the duplicate test cases T18 and T12, we get the final test suite which covers all the seven independent paths as below and this is the final result.

T6

T7

T1

T9

T4

T20

Step 3.1.1 Do crossover

Applying crossover on test suite of nine test cases

Thus the test suites we get after crossover as two offsprings are

T1

T5

T6

T15

T14

T18

T19

T20

T4

T7

T9

T13

T9

T4

T7

T11

T17

T20

And

The first offspring i.e. test suite obtained after crossover covers all the independent paths and that test suite is selected for mutation. The test suite selected is as:

T1

T5

T6

T15

T14

T18

T19

T20

T4

Step 3.2.2 Do mutation on one of the best offspring and the process shown is as:

The result obtained after applying mutation is

T1

T5

T6

T4

T14

T20

T19

T18

T15

Step 3.2.3 Removing the duplicates from the test suites

Thus, removing the duplicate test cases T18 and T15, we get the final test suite which covers all the seven independent paths as below and this is the final result.

T1

T5

T6

T4

T14

T20

T19

Thus, the final test suite which covers all the independent paths is

T6

T7

T1

T9

T4

T20

5. Comparison

The examples mentioned in section 4.4 are compared with respect to the following prioritization: No order, Random order, Reverse order, optimal order of the test cases. The orderings with respect to these approaches are listed in Table 3. These approaches are compared by calculating Average Percentage of Faults Detected (APFD) [Elbaum, (2004); Rothermel, (2000), Rothermel, (1999)]. The results are shown in fig. 2 to fig. 6. It shows that the prioritization achieved using GA attain the results that are comparable to that obtain with optimum ordering. The technique is superior to other approaches also as is clearly shown with the percentage of fault coverage achieved.

Table 3. Order of test cases for various prioritization approaches for example 1.

No Order

Random Order

Reverse Order

Optimum Order

GA

Order

T1

T5

T9

T8

T8

T2

T7

T8

T6

T6

T3

T1

T7

T5

T1

T4

T3

T6

T4

T3

T5

T6

T5

T1

T5

T6

T2

T4

T7

T2

T7

T4

T3

T3

T4

T8

T8

T2

T2

T9

T9

T9

T1

T9

T7

Fig. 2 APFD for original ordering (no order) of test cases on example 1. Fig. 3. APFD for random ordering of test cases on example 1.

Fig. 4. APFD for reverse ordering of test cases on example 1. Fig. 5. APFD for Optimal Fault Coverage ordering of test cases on example 1.

Fig. 6. APFD for ordering using GA Prioritization order of test cases on example 1.

Table 4. Order of test cases for various prioritization approaches for example 2.

No Order

Random Order

Reverse Order

Optimum Order

GA

Order

T1

T5

T5

T5

T1

T2

T2

T4

T3

T5

T3

T1

T3

T4

T2

T4

T3

T2

T1

T3

T5

T4

T1

T2

T4

Fig. 7 APFD for original ordering of test cases on example 2 Fig. 8 APFD chart for random order on example 2

Fig. 9 APFD chart for OPTIMAL ORDER on example 2 Fig. 10 APFD chart for REVERSE on example 2

Fig. 11 APFD chart for GA on example 2

6. Application of the Proposed Approach

This approach may be used by the software practitioners to reduce the time and effort required for prioritization of test cases in the test suite. The proposed approach may lead to greater savings of time and effort in larger and complex projects as compared to smaller ones. Using GA approach, software practitioners can effectively select & prioritize test cases from a test suite, with minimum execution time. Hence, the proposed algorithm may prove to be useful in real-life situations.

7. Conclusion and Future Work

This paper proposes GA algorithm for effective execution of time-constrained test case prioritization. Here, different prioritization approaches have been analyzed on two different examples and their finite solution obtained, respectively. Through Genetic Algorithm technique, an approach has been identified to pull out suitable population, which was further formulated by GA operations to make it more flexible and efficient. The elaborations of results are shown through APFD, which will help in measuring usefulness of the proposed algorithm.

The algorithm is solved manually and is a step towards Test Automation. In future an automation tool is to be developed to implement the proposed algorithm which can solve large number of test cases in efficient time.