# Anns Non Linear Mapping Structures Computer Science Essay

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

## 2.0 Introduction

Artificial Neural Networks (ANNs) are non-linear mapping structures evolved based on the function of human brain and are powerful tools for modeling,when the underlying data relationship is unknown. ANNs can identify and learn the correlated patterns between the input data and corresponding target data values. After training, these networks can be used to predict the outcome of new independent input data set. ANNs imitate the learning process of the human brain and can process non-linear and complex data involved in problems even if the data are imprecise and noisy. Thus ANNs are ideally suited for modeling of different applications data in different fields. The main feature of these networks is their adaptive nature, where "Learning by example" replaces "programming" in solving problems. This feature makes computational models very appealing in application domains where one has little or incomplete understanding of the problems to be solved but where training data is readily available.

## 2.1.0 Literature Review:

In 1943 Mc.cullochpitts proposed a neurons model which performs weighted sum of the inputs to there elements followed by a threshold logic computations. The drawback of this model of computations is that the weights are fixed and hence the models do not learn from examples. Hebb, in 1949 prepared [1] a leaving scheme for adjusting a connection weight based on prepared part synaptic weight values of the variables. So this law became a fundamental learning rule in neural networks.

Rosen blatt ,in 1958[2] proposed the perceptron models , which has adjustable weights by the perceptron learning law . In 1960, windows and Hoff and his group [3] proposed an Adaptive Linear elements (ADALINE) model for completing elements and they used the (LMS (least mean square) learning rule to adjust the weights of the network.

The mathematical frame work for a new training scheme of layered networks was discovered in 1974 by Paul Werbos [4] . During the period from 1965 to 1986 the study of learning in networks of threshold elements and of the mathematical theory of neural network was pursued by sun -IchiAmarikunihiko Fukushima from Japan, developed a class of neural network architectures known as Neocognitrous [5], it is a model for visual pattern recognition and biological plausibility.

Hopfield , in 1982 gave the energy analysis of feedback neural network architecture for associative memories [6], [7].Another revitalization of parallel distributed processing, edited by Mcclelland and David Rumelhart in 1986 [8].

In addition to this a bias weight that is considered as a negative input weight in to all the non-input nodes. Each non input node outputs a result by applying an activator function to the weighted sum commonly used activation function is the sigmoid function defined by logistic function with a slope parameter [9].

In 1986 Rumelhartetal showed [10] that it is possible to adjust the weight of a multilayer feed forward neural network by using a implicit mapping in a systematic way to learn in a set of inputs and outputs pattern pairs.

## 2.1.1 Learning:

Learning also called "training" is accomplished in ANN's using examples, is archived by adjusting the connection weights in interactively so that trained ANN's can perform certain tasks . The popular learning rules are delta rule, the Hebbian rule etc. Under delta rule a Gradient descent-based optimization algorithm such as back propagation [17] can then be used to adjust connection weights in the ANN iteratively in order to minimize the error an adjusting a local step size for each system parameter. But BP has draw backs due to its use of Gradient descent [18], [19], its gets trapped in a local minimum of the error function and is inefficient to find a global minimum if the error function is multi model and non-differentiable. The Hebbian rule, anti Hebbian rule and the competitive learning rules are given in [11] [12] & [13].

The most advanced of these algorithms during 1990 to 2000, stochastic meta-descent (SMD) adapts local step size by a dual gradient descent procedure that uses fact curvature matrix-vector products [14], [15]. There are capable of exactly minimizing a dimensional un constrained quadratic problem in iterations [16]. Zhang et al, in 1998 provided the general summary of traditional statistical methods in ANN providing the guidelines for neural network modeling.

Adopting ideas from conjugate Gradient methods, Nicol N Schraudolph and there Gradient [18] proposed three new stochastic gradient descent techniques and tested them in very ill-conditioned quadratic bowls only one of the techniques performed well by an order of magnitude in all settings. There is no improvement through a systematic re-derivation of conjugate gradient methods in the lines stochastic setting. One way to overcome gradient descent-based training algorithm short comings is to adopt "evolutionary ANN's".

In an excellent manner, various aspects of ANN overview is provided by Cheng and Titterington [19] in 1994and Warner and Misra in 1996 [20].

## 2.1.2 Evolutionary Computation:

The origins of Evolution computing can be traced in 1950's and 1960's with the idea that evolutionary process could be applied to engineering problems of optimization. The Evolutionary computer include Evolutionary algorithms (EA's) genetic programming, evolutionary strategies, evolutionary programming. EA's refers to a class of population based stochastic search algorithms that was developed from ideas and principles of nature. Different evolutionary algorithms have evolved during these years. Genetic algorithms mainly developed in USA by JH. Holland. In Germany evolutionary strategies was developed by I Rechenberg and evolutionary programming was developed by H.P schwefel. All these constitutes a different approach, however, they are inspired by the same principle of natural evolution. Evolutionary algorithms or genetic algorithms were initially developed by Bremermann[21] in 1958 but popularized by Holland who applied EA's to formally study adaptation in nature [22] & [23] [24]

During 1960 to 1970, the evolutionary Programming introduced by Fogel [25], [33] and extended by Burgin and Fogel [26], [27] to create artificial intelligence. In these papers the approach was to evolve finite state machines (FSM) to predict events on the basis of former observations. An FSM is an abstract machine used to transforms a sequence of input symbols in to a sequence of output symbols.

Evolution Strategies, as developed by Rechemberg [28] and schwefel [29] were initially designed with the goal of solving difficult discrete and continuous, parameter optimization problems and extended by Herdy [30], Kusawe[31] in 1990.

After 1980, the evolutionary algorithms approach once again evolved to weight training in ANN's consists of two major phases. The first phase is to decide the representation of connection weights i.e. whether in the form of binary string or in Real form. The canonical genetic algorithm has always use binary strings to encode alternative solutions, often called chromosomes. The evolving ANN connection weights followed this approach [32], [33] here each connection weight is represented by a number of bits with certain length. An ANN is encoded by concatenation of all the connection weights of the neural network in the chromosome. The order of concatenation is to put weights to the same hidden/output nodes together. Separately inputs to the same hidden node for apart in the binary representation would increase the difficulty to build useful feature detectors. So to evolve connection weights it is difficult to apply crossover operators.

One of the problems faced by evolutionary training of ANN's is the permutation problem given in [34], [35], it is due to many-to-one mapping from the representation (genotype) to the actual ANN (phenol type). The permutation problem makes crossover operator very inefficient and ineffective in producing good off spring this is given in [36], [37].

To avoid the above said problem a Real number have been proposed to represent connection weights directly i.e. One real number per connection weights [38] [39]. Each individual in an evolving population will be a real vector. Special search operators have to be designed. A large number of tailored genetic operators which incorporated many heuristics about training in [40] by Montana and Davis. In their work, the useful feature detectors formed around hidden nodes was to retain during evolution. Their results showed that the evolutionary training approach was much faster then Back propagation for the problems they considered. Bartlett and Downs [30] also demonstrated that the evolutionary approach was faster and scalable than B.P. In [41], [42] the basic search operator has been Gaussian mutation other mutation operators such as Cauchy mutation [43] can also be used.

Evolving connection weights by EP can be implemented by generating initial population of each individual is a pair of real valued vector, each individual creates a off spring may be replaced by Cauchy mutation [44] for faster evolution. Determine the fitness of every individual, all parents and off spring and conducting tournament selection.

The evolutionary approach saves a lot of human efforts in developing different training algorithms for different types of ANN's. In some problems evolutionary training can be slow in comparison with fast variants of BP and conjugate gradient algorithms [45] [46].

During 1990 to 2000, Feed forward artificial neural network consists of architecture with no feedback loop on total nodes. They have the property of being static producing only signal output pattern for each input pattern, while implementing the network all the nodes are numbered in ascending order as given in [47], [48]. Each node performs a transfer function, most researches prefer to use the sigmoid function is given by

2.0

Xin Yao, in 1999 has given a reviews of different combinations between ANN's and evolutionary algorithms, which includes EA's to evolve ANN connection weights, architectures, learning rules and discussed different search operators in [49]. The architecture design is crucial in successful application of ANN's. This includes its topological structure, i.e. connectivity, and the transfer function of each node in the ANN. Architecture design depends heavily on the expert experience and a tedious trail-and-error process. There is no systematic way to design a near-optimal architecture for a given task automatically, the effort towards automatic design of constructive and destructive algorithms are given in [50], [51].

In 1994, Angaliveet. Al., [52], proposed a structural hill climbing methods, which was susceptible to trapped at structural local optima. Design of the optimal architecture for an ANN can be considered as a search problem in the architecture space where each point represents architecture. The optimal criterion is lowest training error, lowest network complexity,highest point on this space surface. The Milleret. Al,[53] indicated that EA's a beta candidate for searching the surface than those constructive and destructive algorithms given above. Considerable research has been carried out during 90's like genetic neural network on MIMD computer, [54], structure design of ANN by S.Bornohold and D.Grandanz, [55], topology of ANN's and weight optimization given by [52] relatively little amount has been done on the evolution of node transfer functions.

## 2.1.3 Review on Architectures:

The Direct encoding scheme connection of architecture is directly specified by its binary representation [57] & [58], with two inputs and one output feed forward ANN was considered its connectivity between the nodes represents a matrix with diagonal elements represents all zeros. Since ANN is feed forward, we only need to represent the upper triangle of connectivity matrix in order to reduce the chromosome length. An evolutionary algorithm can then be applied to evolve a population of such chromosomes and evaluate the fitness of each chromosome, to convert the chromosome back to an ANN, initialize it with random weights and train it. It may facilitate rapid generation and optimization of design was given in [53].

Another flexibility given by M. Bichsel D. and B. Fogel in [59], [60] provided by the evolution of architectures stems from the fitness definition. There is no limitation on differentiable or continuous on how the fitness should be for each individual according to training result and complexity of the architecture. The training result pertaining to architecture such as the error and the training time is often used in the fitness statistics can be readily being introduced into the fitness function without much difficulty. One potential problem of the direct encoding scheme is scalability. A large or complex ANN requires a large matrix and thus increase the computation time to the evolution. So reduce the size of matrices domain knowledge was used to reduce the search space. The length of the chromosome can be reduced greatly in this care [61], given by J. D. Schaffer, but sufficient domain knowledge and expertise are difficult to obtain in practice. The permutation problem given above for connection weights still exist and value un wanted side effects in the evolution of architectures. Some researchers thus avoided crossover and adopted only mutations in architectures [52], [62] because two functionally equivalent ANN's which order their hidden nodes differently have two different genotype representations, the probability of producing a highly fit offspring by recombining them is often very low.

Although, crossover may be useful and important in increasing the efficiency of evolution for some problems given by W. M. Spears in [63], [64]. Hancock, in their paper [35] suggested that the permutation problem might not be as severe as had been supported with population size and the selection mechanism he used. Thierens [65] proposed a genetic encoding scheme of ANN's which can avoid thepermutation problem.

However, EA's are much less sensitive to initial conditions of training. These algorithms always search for globally optimum solution. S. A. Harp,C. A. Perez and C. A. Holzmann discussed on Indirect encoding scheme of ANN Architectures gives more compact genotype representation explained in [66], [67] but it may not be very good generalization ability and argued that the indirect encoding scheme is biologically more plausible than direct one, because the genetic information encoded in chromosomes to specify independently the whole nervous system according to the discoveries of neuroscience.

P. J. B. Hancock proposed in his work, in parametric representation of ANN architectures may be specified by a set of parameters like number of hidden nodes in each layer, the number of connection between two layers etc. These representations are proposed by [69]. The combination of learning parameter with architectures in the genotype representation was proposed by Harpetal [70].

Developmental rule representation is quite different indirect encoding method [71], [68] which are used to construct architectures in chromosomes. A developmental rule is a recursive equation [67] or a generation rule similar to a production rule in a production system [72]. The connectivity pattern of the architecture in the form of a single element matrix, by repetitively applying suitable developmental rules to nonterminal elements in the current matrix until the matrix contains only terminal elements. There elements indicate the presence or absence of a connection. An example [73] described the illustrations how ANN Architecture can be defined given a set of rules.

Wilson [74] used simulated annealing in ANN architecture design. One advantage of using simulated annealing instead of GA's in the evolution is the avoidance of the destructive effect of crossovers. Mirrill and Port [75] proposed another method for encoding architectures is Fractal representation. Which is based on the use of fractal subsets of the plane? They given that the fractal representation of architectures was biologically more plausible than the developmental rule representation. The three real valued parameters they used were an edge code, an input co efficient, and an output co efficient to specify each node in architecture. Fast simulated annealing [76] was used in the evolution.

Andersen and TSoi [77] proposed a different method that each individual in population represents a hidden node rather than the whole architecture. Architecture was built layer by layer, i.e. hidden layers are added one by one if the current architecture could not reduce the training error below certain threshold, Smith and Cribbs [78] also used an individual to represent a hidden node rather than the entire ANN.

## 2.1.4 Evolution of node transfer functions.

The evolution of architectures deals with the topological structure of an architecture. There is a fixed transfer function of each node in the architecture and are predefined of eachnode in the architecture and are predefined by human experts. But, the transfer functions are also an important part of ANN architecture and have significant impact on ANN's performance. Such architectures as given by [79], [80] the transfer function was often assured to be the same in all the nodes of an ANN.

The first EA's were applied on both topological structures and node transfer functions even through the simple ANN's with seven nodes were considered by stork etc al [81]. White and Ligomenides [82] adopted an approach to the evolution of both topological structures and node transfer functions. In their work, each individual in the initial population 80% nodes in the ANN used the sigmoid transfer function and 20% nodes used the Gaussian transfer function. The evolution process was used to get the optimal mixture between these two transfer functions automatically.

Lin and Yao [83] used EP to evolve ANN's rather than fixing the total number of nodes and evolve mixture of different nodes; their algorithm allowed growth and shrinking of whole ANN by adding or deleting a node randomly and achieves good performance for some benchmark problems. Hwang et.al [84] went a step further. They evolved ANN, both node transfer function an connection weights for projection of neural network.

## 2.1.5 Simultaneous Evolution of Architectures and connection weights

Connection weights in ANN have to be evolved often a near-optimal architecture is found. One major problem [85] with the evolution of architectures without connection weights was nosily fitness evolution. There are two major sources of noise; first source is a random initialization of weight produce different training results. Hence the same genotype may have quite different fitness's used in training. The second source of noise is the Different training algorithms produces different results even for the same set of initial weights. Such noisemay mislead evolution. In order to reduce the noise architecture usually has to be trained many times form different random initial man fitness. Angleliveet. Al [52] [86] was given more general discussion on the mapping between genotypes to phenotypes because noise is caused by one-to-may mapping.

One way to alleviate this problem is to evolve ANN architecture and connection weights simultaneously. In this case X. Yao and Y. Liu proposed [87], [88] each individual in a population was fully specified ANN with complete weight information. Since there was a one-to-one mapping between genotype and its phenotype fitness evaluation is accurate.

In evolving ANN's, one issue is the choice of search operators used in EA's. The crossover based and mutation based EA's have been used. Crosser over works best if the building blocks are exists, but there is no clarity what a building block in an ANN since ANN's emphasize distributed representation [89], i.e. Knowledge in ANN is distributed among all the weights and recombining one part of an ANN with another part of another ANN is likely to destroy both ANN's.

If the ANN's uses a radical basis function (RBF) network or nearest - neighbor multilayer perceptron's and does not use distributed representation, crossover might be very useful operator and are discussed in [90]. Yao and Lin [91] developed an automatic system, EP net, based on evolutionary program for simultaneous evolution of ANN architecture and connection weight without crossover and they use only mutation operator.

## 2.1.6 Evolution of Learning rules:

An ANN training algorithms may give different performance when applied to different architectures. The learning rules are used to adjust the connection weights depending on the type of architectures under investigation. Hebbian and often learning rules mentioned above in the beginninghave been proposed to deal with different architectures. But the designing an optimal learning rule becomes very difficult in practice.

Artola et al., [92] proposed a new rule, can learn more patterns than optimal Hebbian rule and can exceptions as well as regularities. But it is, still difficult to say that this rule is optimal for all ANN's. An ANN should learn its learning rule dynamically instead of fixed manually. Various models [93] proposed relation between evolution and learning is extremely complex but some of them were deal with the issue of now learning can guide evolution [94] and the relation between evolution of architecture and the connection weights in [95] but also in modeling the creative process in biological system. A typical cycle of the evolution of learning rules is given below Fig (2.0).

Decode each individual in the present generation in to a learning rule

Build a set of ANNs with randomly generated architectures and initial connection weights and train them using the decoded learning rule.

Calculate the fitness of each individual according to the average training result.

According to their fitness select the parents from the current generation.

Generate off spring, using search operators to parents, this forms a new generation.

Figure 2.0. Single population Evolution cycle of the learning rule.

The iteration stops when population converges or a predefined number of iterations have been reached.

As a first attempt of learning rules [96], the adaptive adjustment of back propagation parameters like learning rate and moments through evolution could be considered. Harp et al., [97] proposed BP's parameters in chromosomes together with ANN's architecture the near optimal combination of BP with architecture can be found by Jacobs [98] because the simultaneous evolution of both algorithmic parameters and architectures facilitates exploration of interaction between learning and architectures.

The evolution of learning rules was also carried out by others [99]. Fontanari and Meir [101] used chalmers' approach to evolve learning rules for binary perceptions. They have considered four local variables. Bengioet. Al's approach [100] is slightly different from chalmers in the sense that gradient descent algorithms and simulated annealing rather than EA's were use to find near-optimal real-valued co efficient which will be determined by evolution.

The problem of finding a near optimal set of input features [102] to ANN gives very good results. In the evolution of input features, each individual represents a subset can be represented by a binary chromosomes ofall the possible inputs in the population. Each bit in the chromosomes corresponds to a feature '1' indicates presence of a feature, while 'o' indicates absences of the feature. The evolution of an individual is carried out by training an ANN with their inputs and the results were used to calculate its fitness value.

An EA is used to search for a near-optimal set of control parameters, the ANN will be in fitness evolutional instead of the real control system [103]. This combination has two advantages one is the fast fitness evolution based on ANN's and the second was provides safer evolution of control system.

After 2000 year, Evolutionary algorithms selection plays an important role in solving optimal problems. Zhan G & Kim [104] has given the comparison of selection methods for evolutionary optimization. A set of selection schemes has been compared in the context to optimization of real life industrial problems. They consider the combined effect of selection and other genetic operators on solution quality and convergence speed. The optimization process was considered to be converged if it either reached the maximum generation or achieved no performance improvement in a fixed number of consecutive generations. The experimental results shown strong evidence that ranking selection and tournament selection are better choice than proportional selection. They are suggests that large selection differential combined with small heritability is an important property for effective evolutionary optimization.

In 2001, Gosh and Verma[105] was proposed a new idea of learning weights for ANN's called as LMS based evolutionary ANN. It was a hybrid learning approach for training of a feed-forward ANN, which combines evolutionary algorithms with matrix solution methods for hidden and output layers. This algorithms was applied on benchmark problems like XOR 10 bit add parity. In this method the EA was applied on first layer and least squares in the second layers of the ANN. The hidden layer weights are evolved using the evolutionary algorithm (EA) when a certain number of generations or error goal interms of RMS was reached, the training stops.

R Ghosh et al [106], [107] in 2002 proposed a evolutionary architecture using LMS and evolutionary weight updating for a feed forward multilayered perception (MLP). They were given the criteria for stopping and near optimal solution in multidimensional complex search space comprising the architecture and the weight variables.

In 2003, Abbas, [108], proposed a new evolutionary algorithm for multi objective computation, Which speeding up the back propagation method. The optimization [109] of multi objective computation was given using neuro-evolution process for constructing ensembles of ANN.

In 2006, Rivero, Dorado et al, [110] developed an ANN by a new technique that uses genetic programming (GP). The genetic programming has be changed in order to work with graph programming has to be changed in order to work with graph structures, So that ANNs can be developed, this also allowed the obtaining of simplified network that solve the problem with a small group of neurons. This system achieves good results. The drawback of this system has an over fitting problem. One research line could be to sue any technique to avoid over fitting such as early stop.

Xin Yao and Islam [111], in 2008 proposed divide-and conquer strategy to decompose a complex problem in to simpler sub-problems and solve them by a group of solvers. Combining ANN's with evolutionary compilation at the level of weights, Architecture and learning rules, this work towards evolving ANN ensembles ANN ensembles could be regarded as an effective approach to implement the divide and conquer strategy in practice. Evolving ANN ensembles automatically, this includes automatic weight updating individual ANN architectures and the ensemble structures.

Eva Volna, [112] distinguish among three kinds of evolution in artificial neural networks, i.e.the evolution of connection weights, of architectures, and of learning rulesand each kind of evolution in detail andanalyses critical issues related to different evolutions. This articleconcentrates on finding the suitable way of using evolutionaryalgorithms for optimizing the artificial neural networkparameters using optimal combinations of architecture, learningrule and connection weightsglobal search procedures are moresuitable for the evolution of architectures and that oflearning rules on slow time scales, which tends to explorethe search space in coarse grain (locating optimal regions),while local search procedures are more suitable for theevolution of connection weights on the fast time scale.

Dario Floreano et al [113] gives an overview of the most prominent methods for evolving ANNs with a special focus on recent advances in the synthesis of learning architectures and to evolve architectures and learning for dynamic neural networks, which are very powerful systems for real-world problems such as robot control.

In 2010, Daniel Rivero et al,[114] described a new technique that uses Genetic Programming (GP) in order to automatically develop simple ANNs, with a low number of neurons and connections. Experiments have been carried out in order to measure the behavior of the system and to compare the results obtained using other ANN methods with evolutionary computation. As explained herein, the system has other important features such as variable discrimination, which provides new information on the problems to be solved.

2.2 The entire work of this thesis for design of Artificial Neural Network as follows

A Gradient Descent algorithm was developed on fixed feed-forward network and to estimate the minimum mean square error for N-Bit parity benchmark problems. In this we also find the number of iteration and time taken to reach the good convergence.

Evolutionary training algorithm was developed on a feed forward network to optimize weights and is compare mean square error with most widely used learning algorithm like Gradient Back propagation. Evolutionary training algorithm gives the optimized connection weights to salve the N-bit parity (XOR) problems and three real time/world datasets classification problems (Heart, Pima Indians diabetes, WP breast cancer) that are well known for bench marking purposes in the neural learning field.

The next aim of this work is architecture optimization which was implemented using evolutionary Algorithm will continue to find smaller, less complex neural networks that generalize better after a first solutions (optimized weights), network has been found. We use two criteria's for the evolution of the performance of this method: first is the best neural network topology classification accuracy found by the evolutionary search. This criterion gives best meaningful in the real-world problems. The second criterion is the selection of relevant input variables gives much meaningful with the artificial problems where we already know the correct answer. We developed direct network encoding method to have stronger control over the individual neurons which gives better performance in this task. This algorithm first tested on N-Bit parity problems and then applied on three real world classification problems (Heart, Pima Indians diabetes, WP breast cancer).

Finally the timing analysis was also made on the above algorithms.

## 2.3 Thesis Organization:

An organization of the thesis with relevant topics is covered in chapters 1 to 5.

Chapter 1 is a overview to artificial neural networks and to some of problems in that field and also gives a mathematical model for the ANNs structures and learning algorithms. Drawbacks of the existing back-propagation algorithm.

Chapter 2 gives an overview to literature Review provides a theoretical background for the research and a review of relevant recent work.

Chapter 3 is a similar overview to evolutionary algorithms, deals with various methods and operators used for applying evolutionary algorithms to the design of Artificial Neural networks connection weights and architectures.

Chapter 4 describes the primary aim of the thesis; development and implementation of Gradient descent learning algorithm and evolutionary optimized learning algorithm on connection weights and applying these algorithms on N-Bit parity and a Real world dataset bench mark classification problems used for evaluating the optimal weights on Feed Forward ANNs. The results of different runs and discussions of the results were given.

Chapter 5 gives the usage of the scheme theorem and genetic operators to evolving ANN structures and weights simultaneously. A proposed evolutionary mutation applied on the architecture and applied on the five classification problems. The results of different runs and discussions were covered.

Chapter 6, this chapter includes conclusions and gives some future directions for further work.

We designed and implemented the algorithms required for this work using MATLAB tool.

Appendices: Appendix covers the mathematical model of generalized delta learning rule Appendix B covers the properties of Selection operator and C refers mathematical model of Schema theorem.