# The Background On Neural Networks Computer Science Essay

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Before going into the details of Neural Networks I would like to discuss some brief history and the works of different people in the field of Artificial Intelligence that gave birth to Artificial Intelligence. Some important papers like "The Chinese room argument", "Symbol grounding problem", "Intelligence without representation" etc. will be discussed in detail.

1940 the beginning of Artificial Intelligence: McCulloch and Pitts designed fist Neural net. Net consists of combining many neurons together to increase computational power of nets. And they deduced that this net can compute and computational function.

1949 Donald Hebb learning: The next major development in AI came when Hebb published his book "The organization of behavior" in which learning through adjustment of weights idea came for the first time. He made a contrast of net with human brain and came out with the result that connectivity of brain neurons is continuously changing as one neuron learns different.

1954 Minsky: In 1954 Minsky wrote his doctorate thesis regarding comparison between NN and human brain. Then in 1961 Minsky wrote a paper on "Steps towards AI" this paper contains lot of information which is today known as Neural Networks.

1958 Frank Rosenblatt: He introduced class of NN called perceptrons. Most typical perceptron consists of input layer connected with links of weights that are adjustable to other neurons. Perceptron learning based on changing and changing of weights until net is best trained. Perceptron learning allows the net to reproduce correctly all the target input and target output pairs.

1960 Delta Rule [Widrow & Hoff]: In 1960 Widrow and his student Hoff produced a learning rule which was very close to Perceptron learning. Rule is known as Delta or Least mean square learning rule. In delta rule the weights are adjusted such that the difference between net input and the net output as compared to the desired target is reduced. That means small mean squared error.

1972 Kohonen: Kohonen earlier work was based on associative memory neural networks. His most recent work is regarding self organizing maps which are unsupervised training based. Self organizing maps have it application in speech recognition.

Grossberg & Carpnter: Carpenter together with Grossberg developed theory regarding self organizing neural networks called "Adaptive Resonance Theory". Grossberg work in the field of NN is very mathematical and biological.

Back Propagation: Minskys work showed that almost every problem can be solved using two layer feed forward network but no solution regarding the adjustment of weights presented. Williams, Rumelhart & Hinton replied to this problem in 1986. They showed that error of hidden layers can be determined by back-propagating the error of the output layer and hence the name Back-Propagation rule. For better understanding back propagation assume that learning patterns are showed to the net and net is trained to these patterns and shows the results then the net output is compared with the desired output and error is computed our goal is to reduce this error and by delta rule we know we have to adjust weights. As shown in the diagram.

ann

Ref. [4]

Back propagation learning algorithm has the problem of Local minima in BP nets there are lot of disturbances in the path or we call there are hills so the net can get trapped in the local minima when there is a much deeper minimum nearby the obvious solution to this problem is to use more hidden layers and hence have more freedom but there must be some upper limit on the hidden layers as the net can again trap in local minima.

1980 J Searle the Chinese room argument: This argument is against the existence of true artificial intelligence. This argument states that if one person sitting in a close room who cannot understand Chinese can communicate or reply back to some Chinese characters by just manipulating some rules and the person outside assumes that the person inside understands Chinese. Searle with this argument interpret that although computer implements programs but computers know nothing about languages in which programs are written.

Turing's Test: Alan's Touring test is the same example as Chinese argument in which he by writing "Paper Machine" which is a computer like program but written in English about playing chess. A person by just following this language can play chess without even knowing how to play. Touring test which is that a computer can understand or start conversation in English with humans then applied to computers and if computers pass this test then they are intelligent.

1990 S Harnad Symbol grounding problem: Harnad introduces the concept of SGP using Chinese room argument. SGP is related to the problem of how symbols (words) get their meaning or how mental states are meaningful. Symbols are manipulated according to rules that are based on the symbol's shapes, not their meanings. An artificial agent, such as a robot, appears to have no access to the meaning of symbols it can successfully manipulate. It is like someone who is expected to learn Chinese by following Chinese-Chinese dictionary. Usually symbols constituting a symbolic system never linked to their corresponding meanings, they are just conventions agreed by users as Harnad phrase "How can the semantic interpretations of a formal symbol system be made intrinsic to the system rather then just parasitic on the meaning in our heads?"

Neural Networks find its applications in Robotics, Pattern recognition, signal processing etc. NN has the ability to learn from input data and thus has close approximation to human brain. NN with different layers consists of neurons with different connections between layers. Neurons can also be thought of processing units. As in human brains signals are passed between neurons over connection links.

mlpfdfwd

Ref. [4]

And also there is a weight associated with each connection link further with every element of NN except input there is some activation function (linear or non-linear associated). Neural nets can be summarized as follows:

How the connections are made between different neurons.

How the weights are adjusted (learning of nets) through learning algorithms.

Which activation functions we use.

Above mentioned points are now discussed in detail:

Neurons in NN are arranged in layers. Behavior of these neurons is dependent on the activation function and the weighted connection over which the transmission is done. These layers are also known as hidden layers. One thing to be noted is that the input layer performs no computation so they not counted in the net total layers.

Now let us discuss the simple neuron model as sown in figure.

Ref. [1]

Figure shows different input signals on connecting links associated with different weights a signal Xm at the input of connection link m connecting to neuron n is multiplied by the weight Wkm the first subscript k refers to neuron and the second subscript refers to input. In figure we also see a summing junction for adding different input signals associated with weights and a bias. There is also an activation function associated with layers to limit the output amplitude of a neuron. We also see a bias associated that has the effect of increasing or lowering the net input of the activation function. We set bias to have a value 1 to climb the hill in its path as shown I figure otherwise net will thought the result has come and we will be left with wrong results. Depending on the complexity of the problem single layer or multilayer NN is selected.

Single Layer Feed forward Net: It can be thought of as only one layer connection weights, net as input units through which signal is fed to net and output unit through which response of net is interpreted. All inputs are directly connected to outputs but the all inputs are separated from other inputs and same is with outputs single layer net is usually suitable for simple problems but for complex problems we have to use multilayer net. Single layer is good for linearly separable data in which our patterns can be separated but for those scenarios in which data is not linearly separable like the famous XOR problem we have to use Multi layer net.

Multi Layer Feed forward Net: As the name implies have different layers or more accurately hidden layers between input and output units and thus more capable of handling complex problems as net has more relaxation of different paths for its training and of course training would be more difficult and dependent on sophisticated algorithms.

Feedback Nets: Feedback nets are different form feed forward nets as in here a single layer of neurons will feedback their output back to the input of their respective input neurons. Presence of feedback loops greatly enhances the learning ability and the performance of the neural networks. Feedback loops may be thought of unity delay loops represented by z-1. Depending on the adjustment of weights (raining) there are usually two types of training supervised and unsupervised.

Supervised Training: This is the most common forms of training in which we specify out patterns mapped with suitable targets the hidden layers and the activation functions for hidden layers and targets then weights are adjusted according to training algorithm this process is repeated again and again until our net is trained and then we show our net the unseen data which is the actual test regarding how good is the training.

Unsupervised Training: The only difference in this sort of training is that we don't specify and target value the net weights are adjusted such that most similar inputs are grouped with same patterns at the outputs. Kohonen nets

## Types of Activation Functions:

Piecewise Linear Function: As shown in the fig. there is a linear region of unity. If linear region is maintained we have a situation of linear combiner.

Sigmoid Function: The most common form is the S-shaped sigmoid function which can be seen as great balance of linear and non-linear functions.

Hyperbolic Tangent: As shown in the fig. often it is desirable to limit our function between -1 and 1 so we use hyperbolic tangent function.

## Learning Process:

One of the most significant qualities of Neural networks is their ability to learn through the input patterns and then keep on improving by learning. Learning means adjustment of weights on every iteration to improve the response of the network. Moving further learning is done through learning algorithms which are set of rules for the solution of learning problem. There are no particular algorithms for a particular problem all we have to do is keep on checking the response using different algorithms that differ in their functionality of adjusting weights and how faster they learn.

## Learning Algorithms:

Gradient Descent Algorithm: GD Algorithm is considered to be the most simplest to find the minimum of a function. It works on the following equation:

## xi+1 = xi-Î»dF(x)

Parameter updation is done by adding the negative of the scaled gradient. As the name implies it uses derivative of the function that is the slope and the idea of descent. Gradient descent is an iterative process. To reduce the value of the function we need to move our function in the negative direction of the slope. To elaborate more going back to the above equation first we need to compute the derivative of out function that needs to be minimized. Our function is F(x) and its derivative is dF(x). x is the independent variable w.r.t which we compute the derivative. Change value of x as in the above equation. Where 'i' is the iteration number and '' is the step size which should be neither too small nor too big but an intermediate value. Too large step size will overshoot the function minimum and too small step size will result in large convergence time. As I said earlier that GD is an iterative procedure so we need to repeat above steps until the minimum of the function is achieved. Gradient Descent is a nice and simple method and often it converges quickly and GD has some drawbacks as well it converges to a local minimum rather than global minimum and also the derivative of the function must also be available.

Levenberg Marquardt Back Propagation Algorithm: It is the most commonly used algorithm. In much variety of problems it outperforms gradient descent and other methods. It works on an iterative approach that locates the minimum of a function that can be expressed as the sum of squares of non-linear functions. This algorithm can be thought of as a combination of gradient descent and Gauss-Newton method. When the current targets are far from the actual targets LM algorithm behaves as gradient descent method which is slow but definitely converge. And conversely when the current target is close to actual target LM behaves as Gauss-Newton method.

Resilient Back propagation (trainrp): When use sigmoid function as activation function for multilayer network the slope of sigmoid functions approaches zero as input gets large. So using traingdx algorithm will cause problems, because the gradient can have a very small magnitude and, therefore, cause small changes in the weights and biases, even though the weights and biases are far from their optimal values. To eliminate this problems "trainrp" algorithm is used.

## Requirements Analysis:

The objective of our assignment is to develop a neural net using MATLAB programming environment that can classify the bacterial data, so one of the obvious requirements would be the data file containing all the recorded bacteria of different types. The data file has been provided to us in Excel file which can easily be converted to MATLAB environment. Data file containing the records of the bacteria collected is collected using a technique known as dielectrophoresis which is the term used to describe the polarization and associated motion induced in particles or cells by a non-uniform electric field. The phenomenon was used to collect bacteria by charging microelectrodes with appropriate frequencies (refer to Figure 1). The files can be read as a sequence of experiments where each column represents one experiment and each row corresponds to a particular applied electrical field. In the data file five different types of bacteria have been recorded.

sa0907

sa1704

ec1404

ec1104

sm1310

Pre-Processing:As the data is pre-formatted we need to pre-process the data that includes sorting of data removing the hills the caps so that data can be presented to the net in a better way and training n be done easily.untitled1.bmp

## Figure shows all bacteria plot after Preprocessing/Sorting

Starting from top

newesa1704

newec1104

newec1404

newsa0907

newsm1310

## Patterns specification:

After the data is pre-processed it is ready to be presented to the net.

## Targets specification:

Next requirement would be to specify targets against every particular type of bacteria (patterns) so at the end of training the output can be classified i.e. we specify for example for sa0907 the target is '000' in binary so every time I show the sa0907 bacteria to my net the output is '000' and similarly for other types of bacteria the targets can be set in the same fashion.

## MATLAB Neural Network Toolbox:

The major requirement to accomplish this assignment would definitely be MATLAB NN toolbox containing all the necessary tools to create a network in which we can specify our patterns and targets as the data is not linearly separable we need to specify the hidden layers to give freedom to out network by adjusting weights for best training.

## Dividevec (command to divide our patterns into train, validation and test matrices):

Dividevec is a nice command provided by MATLAB in which we can divide out patterns into training and for the real test the unseen data called the test matrix.

## Command to create Feed Forward Neural Network:

Neural net consists of input layer, the hidden layers with their activation functions, the output layer (targets) and the training or learning algorithms. MATLAB command which does all above things is the 'newff' command e.g.

net = newff(PR,[S1 S2...SNl],{TF1 TF2...TFNl},BTF,BLF,PF)

the first argument is to specify the minimum and maximum of the input data, then we specify size of our hidden layers and output layer then we specify activation functions for both and finally we specify the training algorithms. We have full freedom of using different hidden layers the activation functions and the training algorithms. MATLAB provides different activation functions like sigmoid function, pure linear, tangent Hyperbolic etc. MATLAB also provides different learning algorithms like trainlm, traingd, traingdx etc.

## Command that initializes Neural Network:

init () is the command that initializes the network with initial weight and Bias values depending on the network initialization function. Bias is usually taken as 1.

## Design Considerations:

In design consideration we need to take care of several things such as how to present our patterns to the net how to represent targets what should be the length of vector targets the matrix dimensions of patterns and targets must agree how many layers and nodes of hidden layers are required which activation function to select for better convergence how many outputs do we need depending on the number of bacteria how we divide our patterns for training and unseen data which training algorithm we need to adopt their effect on overall training all of the above points will be discussed in detail as follows:

First of all I defined my patterns in a matrix,

patterns=[newesa1704 newec1104 newec1404 newsa0907 newsm1310];

I defined all five patterns at once and all the patterns had been preprocessed (sorted and all hills had been removed). One thing to take care of here is that number of rows of all bacteria must be equal, as in the process of Preprocessing we often delete the rows that do not match or have high hilly shaped as compare to other readings. So the number of rows of all bacteria must be equal for my case numbers of rows are 28 as I deleted some rows in preprocessing phase.

secondly I normalized my patterns before feeding them into the network using:

patterns=premnmx(patterns).

As we had been provided with five bacteria we can represent each bacteria at output with three bits e.g.

newsa1704 "000", newec1104 "001", newec1404 "010", newsa0907 "011" and newsm1310 "111", new means that all bacteria had been preprocessed.

newsa1704

newec1104

newec1404

newsa0907

newsm1310

0

0

0

0

1

0

0

1

1

1

0

1

0

1

1

But number of rows for each of bacteria target needs to be 28 because every pattern bacteria has 28 rows so when we plot our patterns and targets number of rows should be equal or matrix dimensions must agree for this purpose I did two things first the targets are padded with extra 25 zeros e.g. if for newec1104 output is "001" at the output we have 001â€¦â€¦..25zeros and second thing is to match the columns of each and every bacteria so by using "repmat function" we created similar number of columns as the first column of 28 rows. So in short, I used 28 inputs and 28 nodes of outputs.

## A=[0;0;0;zeros(25,1)],B=[0;0;1;zeros(25,1)],C=[0;1;0;zeros(25,1)],D=[0;1;1;zeros(25,1)],E=[1;1;1;zeros(25,1)];

## targets=[repmat(A,[1 60]) repmat(B,[1 57]) repmat(C,[1 50]) repmat(D,[1 60]) repmat(E,[1 60])];

In the command below I am dividing my patterns into training, validation and for unseen data test. 60% of data is for training, 20% for validation and 20% for the unseen data.

## [trainV,valV,testV] = dividevec(patterns,targets,0.20,0.20);

Next I defined the structure of my net using newff command. In this command as said above in first argument I defined the minimum and maximum element of my patterns next number of Hidden layers as the data of this assignment is not linearly separable so single perceptron training will not work we need to have hidden layers for my network we can have as many as hidden layers we want allowing freedom to our network to converge for best results but when we show unseen data to our network it creates problems so we have to restrict in our selection of number of Hidden layers for my case I used 5 Hidden layers as a starting pointt. Also I specified the activation function for my Hidden layers which is usually a "Pure Linear" for Hidden layers. I also specified the size of my targets as I am plotting patterns against targets so both partitions and targets have to be of same dimension e.g. as for my patterns there are 28 rows so I have to use 28 nodes for output and activation function I used is "logsig". Although results with different activation functions will be discussed in results and discussion section. We also need to specify the activation function for our outputs which can be a "tansig, logsig etc."

## mynnet = newff([minmax(premnmx(patterns))],[5 28],{'purelin','logsig'},'trainlm');

N.B: MATLAB will always creates a MESH network

And last but not least is the leaning algorithm which significantly impacts the training. There are several algorithms provided by MATLAB like trianlm, traingdx, traingda etc. and results with these algorithms have been analyzed in the next section. Trainlm is the fastest one and gives the best results.

Next I initialized my net using init command that will simply allocate random weights.

## init(mynnet);

Next comes the training phase which is done by using command:

## net1=train(mynnet,trainV.P,trainV.T,[],[],valV,testV)

every time we run this command "train command" will change weights and will train the network using the patterns and targets. For every input there is a target.

Next two commands:

## mynnet.trainParam.epochs = 10000;

## mynnet.trainParam.goal = 0.01;

we are specifying when we stop our training, e.g. we define the number of epochs that train the network 10000 times else we can specify the error so that targets do not go far ever, so when the error is reached we achieve the goal. If we don't specify these two conditions we have to break the training using ctrl+C.

Next once the network is trained for the seen patterns we can test against the unseen data which is the actual test of network regarding how good our network has been trained, so the following command:

## result=sim(net1,patterns)

will simulate the results. The result output in this case would not be whole numbers like 1 or 0 but the network will return integer numbers like 0.866, 0.5444 etc. so we need to round the results so that decision can be made easily. We also need to define a threshold as to which value is 1 and 0. Any value above 0.5 is taken as 1 an below is taken as 0.

## result2=round(result);

Accuracy of our results is checked using formula:

## Correct Patterns/Total patterns=Accuracy

Finally I plotted my patterns against targets and patterns against the simulated results to check how close both pairs are or how good the training is I differentiated between two using 0 and x and the command for this purpose is:

## plot(patterns,targets,'o',patterns,result,'x')

## Implementation & Testing:

First of all I will discuss the accuracy of my results for the unseen data using the above formula:

Name of Bacteria

Correct Patterns

Total Patterns

Accuracy (%)

newsa1704

59

60

98%

newec1104

58

60

96%

newec1404

45

50

90%

newsa0907

55

57

96%

newsm1310

59

60

98%

After running the code in the appendix I got the following three simulations:

untitled1222.bmp

Plot of learning phase achieving the goal in black line. The above Plot shows the mean squared error of the network starting at a large value and decreasing to a smaller value. Above plot shows three lines, Blue one shoes the training phase, Green one shows the validation phase and the red one shows the real test or the unseen data converging. With the help of dividevec command I have divided my aptterns into 60% to train the network, 20% of the vectors are used to validate how well the network has generalized and 20% of the vector provide an independent test for the unseen data. Training on training vector continues as long as the network error reduces on validation vector. When the network has learnt well the training stops.

untitled12222222.bmp

## plot of patterns against the unseen data

Above is the plot of patterns against actual targets and patterns against the unseen data that actually determines how well the network has learnt as it is clear from above plot that some values of unseen data are not converging to zero or one this is because when we run the sim command the output is in the form of floating numbers e.g. for 1 it may returns 0.8, 0.7 or 0.9 etc. similarly for 0 it may return 0.4, 0.3 etc. so we need to round these values which is the next phase below.untitled12222222222.bmp

## plot of patterns against unseen data after rounding all elements of result

Above tabular results have been calculated with the plot above as we can see that all patterns and results are overlapping each other and I have only few mismatch as seen from accuracy.

asal123.bmp

## Training using traingdx algorithm

Improving Results: We can improve our results by initializing our network again and again which will change weights each time and we can save the net for best performance. Further we can also increase number of hidden layers give more freedom to my network to learn. Below is the plot when I used 8 Hidden layers instead of 5 number of epochs have been reduced.asal123.bmp

Below is the training using traingdm algorithm which is quite slow but converges well as can be seen in the plot below it also overcomes the problem of local minima.

untitled.bmp

## Conclusion:

I conclusion I can say that my network has learnt well and I can simply classify all my bacteria at the output with only a few errors. It can be concluded that "trainlm" algorithm is the fastest in convergence for the given problem as it takes lesser number of epochs to converge as can be seen in the plots "traingd" takes 246 epochs to converge while "trainlm" takes only 11 epochs. Further "trainlm" also reduced Mean Squared Error. However "trainlm" takes lot of memory. Further increasing number of Hidden layers also reduces time to converge as clear from above plots it takes only 7 epochs to converge reach to a minimum for the same problem. It can also be concluded that the learning rate should be a compromise between being too small or too big. If the learning is set too high, the algorithm can oscillate and become unstable, if learning rate is too small then it takes a lot of time for the algorithm to converge. Using traingdm, allows a network to respond not only to the local gradient, but also to recent trends in the error surface. Traingdm allows the network to ignore small features in the error surface. Without momentum a network can get stuck in a shallow local minimum. With momentum a network can slide through such a minimum. Traingdm is slow but it converges well.

## Appendix:

%loading all bacteria files, all files have been Preprocessed

load bacteria1;

%creating a matrix of all bacteria patterns

patterns=[newesa1704 newec1104 newec1404 newsa0907 newsm1310];

%Normalizing my patterns for fast training

patterns=premnmx(patterns);

%preparing targets, making no. of rows of my targets equal to the number of

%rows of bacteria

A=[0;0;0;zeros(25,1)],B=[0;0;1;zeros(25,1)],C=[0;1;0;zeros(25,1)],D=[0;1;1;zeros(25,1)],E=[1;1;1;zeros(25,1)];

%repaeating the above made row equal to the number of columns of each

%bacteria, 60,60,50,57,60 are the number of columns of sa1704, ec1104,

%ec1404, sa0907, sm1310 respectively

targets=[repmat(A,[1 60]) repmat(B,[1 60]) repmat(C,[1 50]) repmat(D,[1 57]) repmat(E,[1 60])];

%Dividing my patterns into train, test and validation portions

[trainV,valV,testV] = dividevec(patterns,targets,0.20,0.20);

%creatibg the net consists of 5 nodes of hidden layers and 28 outputs

mynnet = newff([minmax(premnmx(patterns))],[5 28],{'purelin','logsig'},'trainlm');

%initializing my net with random weights

init(mynnet);

%specifying the goal and when to stop training

mynnet.trainParam.epochs = 99900;

mynnet.trainParam.goal = 0.001;

%training my net for optimum performance

net1=train(mynnet,trainV.P,trainV.T,[],[],valV,testV)

%simulating results for the unseen data the actual test

result=sim(net1,patterns)

%rounding my result for better recognition of my patterns at output

result1=round(result)

figure

%plotting my patterns against targets with '0' and patterns against results

%of unseen data results with 'x'

plot(patterns,targets,'o',patterns,result1,'x')

%title of plot

title('Plot for the comparison of patterns against unseen data results after rounding')

figure

plot(patterns,targets,'o',patterns,result,'x')

title('plot of patterns and result before rounding the result')