# Genetic Algorithm For High Level Synthesis Biology Essay

**Published:** **Last Edited:**

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

In recent years, a variety of algorithms have been proposed for global optimization including stochastic or heuristic algorithms (Youunis, Gu, Dong & Li, 2008). Stochastic algorithms involve randomness and/or statistical arguments and in some instances are based on analogies with natural processes (Zhigljavsky & Ilinskas, 2008). The algorithms based on the mechanics of natural selection and genetics have come to be known collectively as evolutionary algorithms (EAs) (Goldberg, 1989). Well-known examples of such algorithms are genetic algorithms (GAs), evolutionary strategies, genetic programming, ant colony optimization, and particle swarm optimization. Among these algorithms, GAs are perhaps the most widely known types of EAs today (Gen, Cheng, & Lin, 1999).

GAs received considerable attention about their potentials as novel optimization technique for complex problems, especially for the problem with non-differentiable solution space (Wang & Dang, 2007). Although GAs require a large amount of computation time, they also offer certain unique features with respect to classical gradient-based algorithms. For example, having located local suboptimal solutions, GAs can discard them in favour of more promising subsequent local solutions and, therefore, in the long run they are more likely to obtain better solutions for multimodal problems (Gen & Cheng, 1997). In addition, GAs are robust, flexible and problem independent. The robustness of GA's comes from their ability to locate the global optimum in multimodal function. This feature is especially required in engineering problems such as scheduling, system optimization, recognition, etc. Also, due to their heuristic nature, arbitrary constraints can be enforced on the objective function without adding to the complexity of the problem.

Generally, heuristics offer poor quality results and exhaustive search consumes huge computation time. Due to this, GAs are a more popular choice as they have been applied to various scheduling and allocation problems (Torbey & Knight, 1998; Vyas & Srinivas, 2006) and many combinatorial optimization problems (Naoki, 2002; Takashima, Murata, Shibata & Ito, 2003) successfully. Therefore, this chapter explores the use of GAs for high level synthesis and describes how GAs can be used to search for good quality solutions with respect to the scheduling and allocation problem. The proposed GA is appropriate to be considered for big and practical problems. It performs simultaneous allocation, scheduling and binding of storage and functional units in minimizing multiple related performance constraints such as area, throughput and latency. The potential of GA in producing high quality solutions quickly will be demonstrated using a set of high-level synthesis benchmarks.

## Introduction to Genetic Algorithms

GAs are a type of stochastic search that can be used to search for an optimal solution to the evolution function of an optimization problem (Pittman & Murthy, 2000). In the early seventies, Holland (1975) proposed GAs as computer programs that imitate the natural evolutionary process. In 1980, De Jong extended GAs to functional optimization and in 1989 a comprehensive mathematical model of a GA was presented by Goldberg. Compared to the traditional optimization and search algorithms, GAs do not focus on a single solution, rather it tackles a group of trial solutions in parallel. GA involves five basic steps as follows:

Step 1: Create an initial population of random solutions, known as chromosomes that represent one candidate solution to the problem.

Step 2: Evaluate the fitness of each chromosome using the objective function and any other constraints imposed on the required solution.

Step 3: Create a set good solutions by selecting a number of chromosomes with the best fitness

Step 4: If the top-ranking chromosome in the set satisfies the requirements imposed on the solution, that chromosome will be taken as the required solution, and the process is complete. Otherwise, proceed to Step 5.

Step 5: Apply crossover on a pair of chromosomes in the set to produce more chromosomes. Then, randomly apply mutation to the chromosome set. Then, the process is repeated from Step 2.

The idea of GA is to allow fit chromosomes in the population to reproduce by combining their genetic materials to produce better chromosomes. The chromosomes are modelled by some finite-length data structures. Each chromosome is associated with a fitness value that reflects how good it is with respect to the other solutions in the population. Based on this fitness, the selection step will select the better solutions to propagate to the next generation. The recombination process is simulated through a crossover mechanism that exchanges portions of data strings between chromosomes. On the other hand, mutation introduces new genetic material that causes random alterations of the strings. The frequency of occurrence of these genetic operations is controlled by certain pre-set probabilities. The selection, crossover, and mutation processes make up the basic GA cycle, which is repeated until some pre-determined criteria are satisfied. This allows the generation of better solutions to form the population.

A schematic representation of the genetic search approach is presented in Figure 2.1 (Vyas & Srivinas, 2006). The features related to the fundamental steps of GAs, portrayed above, such as encoding schemes, chromosome representation, fitness function, population initialization, genetic operators, and selection methods can be found in (Goldberg, 1989; Pittman & Murthy, 2000). A thorough study on GAs including the theoretical analysis based on so-called schema theorem will not be presented here since a rich literature is available on that topic. Instead, a general overview to the important aspects that is necessary to the configuration of GAs will be presented.

Figure 2.1 Conceptual representation of the optimization process through a genetic algorithm.(Vyas & Srivinas, 2006).

## 2.2 GA Evolution

There is no straightforward method to establish whether GA is able to locate a local or global optimum. The usage of time and resources will be inefficient if GA gets trapped in a local optimum. With this viewpoint in mind, a GA that converges too quickly should be avoided as it might get trapped in a local optimum. Therefore, a proper balance of convergence, which can be observed from the perspective of the so-called exploration-exploitation trade-off, should be maintained at all times.

A GA is regarded to be exploitative when the correlation of the scores between two subsequent populations is high and explorative when the correlation of the scores between two subsequent populations is low. In GA, the basic mechanism to manage the exploration-exploitation trade-off is recombination and selection.

## High-level Synthesis Problem Definition

## 2.3.1 Scheduling, Allocation and Binding

At present, one of the important steps taken to increase design productivity in the hardware/software co-design process is to raise the level of abstraction. Nowadays, the implementation process of most design flows is based on descriptions at the register transfer level (RTL). This move is made possible by the capability of high level synthesis and also the availability of languages for simulation, e.g. SystemC.

The main purpose of high-level synthesis is to describe a system at higher level in order to increase the productivity gain and to explore the design space, which should lead to better designs. This will have a direct impact on cost of a design. Figure 2.2 depicts the evolution of the cost during the development time (Axelsson, 2000), where it can be seen that early design decisions have a much higher resulting cost span than design decisions taken at the end of the development time.

Figure 2.2 Up to 90% of the costs are determined at the first part of the design. (Axelsson, 2000).

High-level synthesis is the process of automatically translating abstract behavioural models of digital systems to hardware (Arato, Mann & Orban, 2005; Mirkhani & Navabi, 2006; Vyas & Srinivas, 2006). High-level synthesis process can be divided into the three subtasks: operation scheduling, resource allocation, and resource binding. The operation scheduling assigns each operation in the design to a time step in which it will be executed. Resource allocation determines the resource type (e.g., register, multiplier, and adder) and the number of these types of resources that should be included in the design. Resource binding determines which resources that should be used to implement each specific operation. The main aim of high-level synthesis is to find the best architectural solution to implement a given behavioural description while fulfilling a set of constraints and goals. Constraints may include the area limit, execution order, completion time and throughput rate. Some possible goals are minimal power consumption, minimal area or maximum throughput.

## 2.3.2 Constraint Sets and Performance Measures

The role of a high-level synthesis system is to locate an optimal solution. It needs to satisfy the constraints as well as fulfilling the performance measures. The performance measures are crucial as it allows the high-level synthesis system to differentiate between good and bad solutions. There are many types of performance measures such as the throughput rate (find a schedule such that input data can be offered as fast as possible), completion time (optimize the number of cycle steps between consumption of input data and production of output data) and resource allocation (find a schedule which induces a minimal resource allocation). A set of feasible solutions are composed of solutions that does not violate the constraints and best fulfil the performance measures.

To construct a schedule, operations are assigned to clock cycles. However, the assignment of operations to clock cycles should not be done randomly. This has to be done by taking the constraints into consideration. The range of cycle steps available for operations to start their execution without violating any constraints is known as feasible schedule range which can change during the construction of schedule. For example, if resources become occupied in particular cycle step, the feasible schedule range of unscheduled operations needing this resource type for execution will be temporarily halted.

Scheduler allow operations to be assigned to cycle steps outside the feasible schedule range, may result in inefficient scheduling methods which need backtracking or repair algorithms to come up with a feasible schedule (Composano, 1991). Therefore, in order to have efficient scheduling, algorithms to determine feasible schedule ranges should be very efficient. One way is to restrict the search space of a scheduler by only allowing operations to be scheduled within their feasible range. The feasible scheduler ranges should be restricting search space as good as possible without excluding al optimal solutions from the search space. In practical high-level synthesis problem is subject to constraints, such as precedence constraints (algorithmic behaviour), resources constraints (network structure), and time constraints (completion time and data arrival rate) (Dhodhi, Hielsher, Storer & Bhasker, 1995).

2.3.2.1 Precedence constraints

The execution order of nodes inside a data-flow graph is constrained by the structure of the data-flow graph. For example, denote the edge (x, y) as transfer of data from operation 'x' to 'y'. This means that data produced by operation 'x' is consumed by operation 'y'. This implies that operation y only can start its execution after operation 'x' has produced data for 'y'. The number of cycle steps operation 'x' requires to produce data for operation 'y' is known as intra-iteration distance. In general the intra-iteration distance for each outgoing edge between operations to another operation equals the operation execution delay of the first operation.

2.3.2.2 Time Constraints

Another important constraint in high-level synthesis is time. Consider the same example above involving the edge (x, y) as transfer of data from operation 'x' to 'y'. The time constraints between node 'x' and node 'y' can be denoted as maximal distance between the start cycle step of operation 'x' and operation 'y'. Node 'x' and 'y' will form a dependency whereby the time constraint between 'x' and 'y' is in such a way that 'y' cannot starts its execution and consume data before 'x' has finished its execution and produced this data. The most common time constraints in high-level synthesis are the global time constraint. A global time constraint denote how many cycles it takes to process all input data into output data.

2.3.2.3 Resource Constraints

A resource constraint basically refers to the upper bound on the resource allocation that can be used during scheduling. The most common resource constraint used in high-level synthesis is an upper bound on the number of functional units. Therefore, this places a restriction on the number of operations that can be scheduled simultaneously which require the same operation type. The scheduling method in high-level synthesis that is normally used is quite successful in dealing with constraints on the number of functional units. However there exist schedulers which can cope with memory allocation constraints and interconnect allocation constraints in scheduling. For example the method cut-reduction (Depuydt, 1993) adds edges to a dependence graph to lower the number of possible simultaneously data transfers. However, cut-reduction is only suitable to be applied to smaller processes. The application of branch and bound will consume a large amount of time for larger processes and directly affects the run-time efficiency.

## 2.3.3 DFG and Execution Order

As described in the previous section, the constraints directly affect the execution order of operations. The operations are normally represented in a data-flow graph which shows the order of execution of operations at each clock cycle step. Therefore, data-flow graphs are very useful in performing the steps involved in high-level synthesis. In a data-flow graph, the edges represent the transfer of data values and nodes represent operations. If there are values available on all incoming edges of a node, the node will execute by consuming these values, and subsequently generate an output value on all outgoing edges.

Figure 2.3 shows the data-flow graph of a 5th order wave digital. This will be used as an example to illustrate the conversion of a data-flow graph to a clocked datapath. Figure 2.4 provides a schedule of the 5th order wave digital filter where operations have been assigned to cycle steps. As can be seen from Figure 2.4, the maximum number of operations to be scheduled concurrently is three multiplications and three additions. Therefore, at least three adders and three multipliers are required for this purpose. In this example, a multiplication is assumed to require two clock cycles for execution while an adder require one clock cycle.

The high-level synthesis steps are then performed which consists scheduling, allocation, and binding. Figure 2.5 shows the final data-path of the 5th order wave digital filter consisting of three multipliers, three adders, registers and multiplexers for data storage and selection respectively.

Figure 2.3 Data-flow graph of a 5th order wave digital filter.

Figure 2.4 Schedule of 5th order wave digital filter.

Figure 2.5 Data-path of 5th order wave digital filter.

## 2.3.4 The Challenge of High-level Synthesis

As the complexity of modern designs grow, the design space consisting many possibilities of implementation will grow as well. Table 2.1 shows the solution spaces for two benchmarks. EWF is an elliptic wave filter used in digital signal processing and ROTOR (Radivojevic & Brewer, 1996) is a coordinate translator for robotic applications. This makes the need for an efficient optimization algorithm crucial. Genetic Algorithm is one optimization algorithm worth considering.

Table 2.1 Solution space for EWF and ROTOR benchmarks

Design

Operations

Precedence Relations

Paths

Space

EWF

34

47

57

1070-1090

ROTOR

28

37

28

1060-1080

## 2.4 Genetic Algorithm Design

## 2.4.1 GA Parameters

The parameters of GA are as follows:

The population size: The population size is varied from 100 to 10000.

The crossover probability: The crossover probability is varied from 60 to 80%.

The maximum number of generations: This is varied with up to tens of thousands of generations. However, an industrially good solution is found within a hundred.

The probability of mutation: The probability of mutation is varied from 0.1 to 1%.

Initial population: Randomly generated.

Crossover type: Single-point crossover.

Selection method: Roulette Wheel using steady state generational replacement.

The stopping criteria: Total number of generations.

## 2.4.2 Chromosomal Representation

The construction of the chromosome is as shown in Figure 2.6. It is made up of a vector whose length is equal to the number of nodes in the data-flow graph. In Figure 2.7, the nodes in the data-flow graph are numbered in a top-down and left to right sequence. This will ensure that different hardware instances for the same resource will not have the same number. Then, each node in the data-flow graph is then encoded as a gene. A gene consists of the hardware resource, data connectivity that is stored with each gene so that the chromosome is an exact representation of the DFG, register that receives port output from the device allocated to the node and inputs and outputs stored in a special look-up table.

It is possible to combine two nodes and allocate to the same resource. Whenever this happens, the index of the gene is updated by assigning the same index number to the functional unit field in the affected gene. Referring to Figure 2.8, number 3 is allocated to the functional unit field in gene 3 and 6 to indicate that the same adder is being shared by gene 3 and 6. Referring to Figure 2.8, the output registers of both the genes overlap. This indicates that sharing of registers is not allowed.

Figure 2.6 Chromosome representation.

Figure 2.7 shows the relationship between a gene and hardware resources while Figure 2.8 shows the datapath after the combination of nodes 3 and 6.

Figure 2.7 Chromosome mapping. The chromosome corresponds to a maximal resource datapath where no sharing at all is explored or used.

Figure 2.8 Sample chromosome for the biquad example after the combination of node 3 and node 6, and the corresponding datapath.

In each generation, new datapaths are created through reproduction of chromosomes. These chromosomes may contain different number of functional units, multiplexer and registers. Nodes belonging to the chromosome which are compatible are combined together. This reduces the number of registers consumed.

2.4.2.1 Functional Unit Allocation

The data path cost will be reduced when two nodes of the chromosome are combined together. Apart from the reduction in the number of registers, this cost reduction is also contributed by the reduction in the number of functional unit. Therefore, the cost of a chromosome is reduced by Cost (FU1) +Cost (FU2) - Cost (FU1 ïƒˆïƒFU2).

2.4.2.2 Multiplexer Allocation

There are two possible arrangements to allocate input ports to multiplexer when combining two nodes to the same chromosome cell. The input ports can be assigned to the right or left multiplexer. However, the arrangement is unique for non-commutative operations such as subtraction. In this case, when two nodes are combined, the non-commutative operations are first allocated to the multiplexers at the input ports of the resulting node. This is called incremental register alignment and the remaining allocation is performed to reduce the number of multiplexers used.

2.4.2.3 Registers Allocation

It is possible to combine registers provided that their life spans do not overlap. When two or more nodes are combined, the output registers of these nodes are combined as well. It is possible to have several binding with the same functional units cost but differs in register configurations. The left edge algorithm (Mirkhani & Navabi, 2006) is applied incrementally at the output ports of the resulting gene to minimize the number of registers consumed. The left edge algorithm running time is extremely fast since only a small set of registers is considered at a time. This is illustrated in Figure 2.9.

Figure 2.9 Left Edge Algorithm: a) Initials genes, b) Resulting gene: three registers and 5 nodes.

2.4.2.4 Feasible schedules

The GA uses a modified version of List Scheduling (Composano, 1991) to construct feasible schedules. List scheduling will list all ready operations of the dataflow graph according to their relative priorities. An operation is considered ready if and only if all of its predecessors have completed executing (Composano, 1991). List scheduling then selects the operation with the highest priority from the list and assigns the operation to an available unit.

## 2.4.3 Objective Function

The register-transfer level (RTL) structure is based on the datapath cost. Examples of datapath components are multiplexers, functional units and registers. The cost of the datapath is given as the sum of the cost of the individual datapath components as given in Equation (2.1):

(2.1)

where Nfu(i) refers to the total number of functional units belonging to type i;

Afu(i) refers to the area of functional units belonging to type i.

Nr refers to the total number of registers

Ar refers to the area of a register.

Mmux(j) refers to the number of multiplexers belonging to type j

Mj refers to the area of the multiplexer

## 2.4.4 Architectural Description

In high-level synthesis, the output is given as a data-path that realizes the required functionality. The data-path is typically made up of the following:

Functional units that consist of adders, arithmetic logic unit and multiplier. The function of the functional units is to implement arithmetic or logic functions.

Storage units that consist of registers, memory and register files. Storage units are used for data storage.

Interconnection units that are used to connect or transfer data. They include tri-state drivers, buses and multiplexers.

Input and output logic.

## 2.5 Results and Discussions

## 2.5.1 Benchmark Results

The algorithm was tested on a set of examples including the AR filter, discrete cosine transform (DCT) and fifth order elliptical wave filter as described in the following sections (Agrawal, Pande & Mehendale, 2001; Ahmed & Dhodhi, 1995; Kung, Whitehouse & Kailath, 1985). The resources were taken from the component library (CL) presented in (Agrawal et al., 2001).

The AR Filter

In order to derive schedules for this benchmark, clock cycle values of 11, 13, 15, 16, 18, 20 and 34 is used. Two design characteristics were chosen: non-pipelined multicycled and pipelined multipliers. Multi-cycling allows an operation to be scheduled over multiple time steps. This happens when the latency of the functional unit used exceeds the clock period of the system. The results are tabulated in Table 2.2.

Table 2.2 Results from the AR filter benchmark

Design

Clock Cycles

ALUs

Regs

Mux

Mux In

Non-Pipelined Multicycled Multipliers

11

15

18

34

4(*), 2(+)

3(*), 2(+)

2(*), 2(+)

1(*), 1(+)

8

9

8

7

12

10

6

4

46

45

36

35

Pipelined Multipliers

11

13

16

20

4(*), 2(+)

2(*), 2(+)

2(*), 1(+)

1(*), 1(+)

8

8

10

9

12

8

6

4

43

39

42

38

Fifth-Order Elliptic Wave Filter

This benchmark is made up of multiplication and addition operators. It is comprised of 34 nodes. Four different schedules were considered using 17, 18, 19, and 21 clock cycles. Two design characteristics were chosen: non-pipelined multicycled and pipelined multipliers. Table 2.3 tabulates the results obtained.

Table 2.3 Results from the Fifth Order Elliptic Wave Filter Benchmark

Design

Clock Cycles

ALUs

Regs

Mux

Mux In

Non-Pipelined Multicycled Multipliers

17

18

19

21

3(*), 3(+)

2(*), 3(+)

1(*), 3(+)

1(*), 2(+)

10

9

9

8

8

8

5

5

32

28

28

24

Pipelined Multipliers

17

18

19

21

2(*), 3(+)

1(*), 3(+)

1(*), 2(+)

1(*), 2(+)

10

10

10

8

8

7

5

5

34

28

27

24

Discrete Cosine Transform

Discrete Cosine Transform (DCT) is widely applied in image compression. It is made up of 48 operators: 16 multiplication, 7 subtractions and 25 additions. Four schedules were generated for this example using and both pipelined and multicycled multipliers. The results are tabulated in Table 2.4.

Table 2.4 Results from the Discrete Cosine Transform (DCT) Benchmark

Design

Clock Cycles

ALUs

Regs

Mux

Mux In

Non-Pipelined Multicycled Multipliers

10

11

14

18

19

25

34

5(*), 5(+-)

5(*), 5(+-)

4(*), 4(+-)

2(*), 3(+-)

2(*), 3(+-)

1(*), 2(+-)

1(*), 2(+-)

19

19

20

18

18

21

19

14

15

14

9

8

5

5

61

58

64

56

59

56

49

Pipelined Multipliers

10

11

13

19

20

25

33

3(*), 6(+-)

3(*), 3(+-)

3(*), 4(+-)

1(*), 3(+-)

1(*), 3(+-)

1(*), 2(+-)

1(*), 2(+-)

22

18

18

14

18

21

20

13

9

10

7

9

5

5

64

60

59

47

54

56

52

Figure 2.10 depicts the results obtained for the elliptic wave filter benchmark.

Figure 2.10 shows the results for the elliptic wave filter benchmark. A population of 500 chromosomes were considered. It can be seen that GA is able to converge to a good solution in a short time. In Figure 2.11, it is observed that the data path cost improves as the population size is steadily increased. Figure 2.12 illustrates the variation in time as the number of generations is increased.

Figure 2.10 Results for the elliptic wave filter benchmark

Figure 2.11 Population size vs. design cost for the elliptic wave filter benchmark

Figure 2.12 Number of generation vs. CPU time for the elliptic wave filter benchmark

Figures 2.13-2.16 show the effect of module selection on the design cost for the fifth order elliptic wave filter and discrete cosine transform benchmarks respectively. In addition, the graph also shows the time-area relationship based on different constraints. The experiments were run using two groups of components: the reduced set and the complete set. Based on the complete set, the reduced set contains only the slowest and fastest components. The rationale of having the reduced set is to ensure that the same ending and starting points for the for the design space. This will help to clearly show the distinction between the costs of using the complete set and reduced set.

Based on the results obtained, it can be concluded that a smaller area design is produced when using a complete set of components. On the other hand, designs implemented using the reduced component set were unable to locate many design points. In addition, most of the design points with different latency values were mapped to the same area. This in turn creates a disturbed design curve. However, by using a complete component set a smoother design curve is achieved.

Figure 2.13 DCT design using resource set as(+3, *2).

Figure 2.14 DCT design using resource set as (+2, *3).

Figure 2.15 EWF design using resource set as (+2, *2).

Figure 2.16 EWF design with resource set as (+2, *1).

## 2.5.2 Real Areas and Delays and Multiple Hardware

Table 2.5 shows the results of the implementation of the elliptical wave filter using a single-cycle adder and a 2-cycle multiplier. A linear combination of area and latency is considered in these examples. All the shown results are known to be optimal (Radivojevic & Brewer, 1996). Table 2.5 also shows the implementation of the Rotor example using a single cycle Table lookup to calculate the trigonometric functions and a single-cycle ALU for all other operations. The CPU times are similar to those reported in (Radivojevic & Brewer, 1996).

Table 2.5 EWF and ROTOR implementation results

EWF

Rotor

Clock

cycles

Number of

adders

Number of

multipliers

Objective

function

Clock

cycles

ALU

Table-

lookups

CPU

17

3

3

L

13

1

1

42.1

18

2

2

L

7

2

1

17.3

21

2

1

L

7

3

1

11.7

28

1

1

L/A

6

4

1

2.1

In Section 2.5.1, the scheduling examples presented were implemented using only one type of FU per operation. The advantage of GA is that it is able to utilize multiple functional units with real latency and area constraint without introducing much complexity. Some multipliers and adders were synthesized in a standard sub-micron technology and added into a library of functional units as shown in Table 2.6. The speed is in ns while the area refers to NAND gates.

Table 2.6 Practical functional units

Adder

Multiplier

No. of bits

Area (gates)

Speed (ns)

Area (gates)

Speed (ns)

16

151

19.15

1376

33.05

16

189

9.96

1534

26.44

32

312

39.68

6053

69.50

32

461

9.98

6502

59.15

Using the resources from Table 2.6, the Rotor and 16-bit elliptical wave filter were designed as illustrated in Table 2.7. Two different objective functions were used: the first objective function represented by ad is proportional to the product of area and delay of the resulting architecture while the second objective function represented by adw emphasizes more weight on the delay. For the elliptical wave filter, the best solution was obtained when the adw is used as the objective function. For the case of Rotor, the solutions include the best-fit for each criterion demonstrating area-delay trade-offs. The best solutions correspond to the lowest period.

Table 2.7 Multiple functional units and real delays/area

Benchmark

Delay (ns)

Period (ns)

FU Area (gates)

Objective Function

EWF

210

10

1874

adw

220

10

1874

adw

220

10

1874

ad

Rotor

80

10

1098

adw

90

10

1269

ad

110

10

720

adw

130

10

682

ad

## Genetic Algorithm Parameter Tuning

Several experiments were carried out to investigate the effects of varying the GA parameters. For the elliptical wave filter benchmark, the GA is executed using a mutation rate of 0.005 and a crossover rate of 0.5. The GA is a steady-state where solutions below the average fitness will be replaced. The elliptical wave filter benchmark was also tested using a generational GA. The results obtained were compared to the steady-state and the comparison is as shown in Figure 2.18. It can be seen that the steady-state provided a better result as compared to the generational run.

Figure 2.17 Fitness of the elliptical wave filter

Figure 2.18 Generational vs. steady-state GA

Figure 2.19 shows the average fitness obtained for the elliptical wave filter by varying the mutation rate from 0 to 1%. The crossover rate is fixed at 50%. It is observed that the effect of varying the mutation on the average fitness for the first 20 generations is minimal. The impact is stronger when the rate of improvement is the highest.

Figure 2.19 Mutation rate effect.

Figure 2.20 shows the effect of varying the crossover rate on the mutation. The crossover rate is varied from 10 to 100% and the mutation rate is fixed at 0.5%. Similar to the results obtained in Figure 2.19, it is observed that a more random distribution of average fitness is obtained as the number of generations increase.

Figure 2.20 Crossover rate effect.

Further experiments were carried out to tune the GA parameters by varying the maximum number of generations, percentage of crossover and population size. The discrete cosine transform benchmark is used to run the experiments. The results are tabulated in Tables 2.8 and 2.9. The worst CPU time (using Pentium III machine) recorded for 10 generations is 2.42 minutes for the elliptical wave filter and 6.36 minutes for the discrete cosine transform.

Table 2.8 Algorithm results and tuning for the DCT design with resources (+2, *3)

no. of runs=10, population size=130, max. no. of generations=10000

Required latency

Min. area achieved

No. of runs the best fitness occurred

Best fitness during the runs

Average generation in which the best fitness has occurred

210

88000

10

8550

0

250

67550

7

9090

1030.29

300

59050

3

9180

5001.67

350

54400

2

9230

10011.00

400

55150

5

9260

4291.40

450

51500

1

9270

5140.00

500

50200

1

9290

5616.00

550

47400

6

9290

9712.67

600

45400

8

9290

1782.75

650

42600

3

9300

3115.67

704

38400

10

9300

0

Table 2.9 Algorithm results and tuning for the EWF design with resources (+2, *2)

no. of runs=10, population size=60, max. no. of generations=70000

Required latency

Min. area achieved

No. of runs the best fitness occurred

Best fitness during the runs

Average generation in which the best fitness has occurred

120

49000

10

3000

0

150

39050

2

3200

17546.5

200

32150

6

3380

4516.83

250

28050

1

3440

1433.00

300

25500

5

3470

7569.60

350

25000

2

3490

1684.50

400

23850

6

3490

1160.50

450

22750

9

3500

4270.22

500

20900

6

3515

1657.00

544

19700

10

3530

0

## Conclusions

The work described in this chapter explored the use of genetic algorithms for the design of high-level synthesis of several types of digital circuits. Despite the successful application of genetic algorithm and the encouraging results obtained, GA does not guarantee a certain degree of performance and is highly dependent on the choice of the parameters used. This can considered to be the main drawback of the use of genetic algorithms. The parameters to be used are either selected by trial and error or adapted from values gained by best practices in the selected problem domain. Both strategies are suboptimal, both in the time spent by the researchers involved and the resulting algorithm, of which only a vanishingly small parameter search space has been experimented with. Another problem is that parameter tuning does not lead to an optimal algorithm, as the optimum will change during the course of the run of the GA. It is also clear that there are not any single parameters that will be effective for all problem domains.

Many researchers have looked into adaptation in order to overcome the suboptimal fixed parameters (Eiben, Hinterding & Michalewicz, 1999) gained through the brute-force method. In adaptive parameter control, the parameters are changed using the current state of the GA. The use of adaptation is more similar as a meta algorithm where through adaptation the algorithm parameters are controlled during runtime according to an external mechanism, which stands outside the genetic algorithm itself. However, there are new issues to deal with as the GA has to be fundamentally changed in order to make the adaptation of parameters possible (Eiben et al., 1999; Naoki, 2002; Takashima et al., 2003). Adaptive GAs will be the main topic presented in the next chapter.