This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Intelligent Wind Data Analysis
BACK PROPAGATION ALGORITHM
Multiple layer perceptrons have been applied effectively to solve various tricky problems by training them with a vastly accepted algorithm identified as the error back-propagation algorithm in a supervised manner. It is based on the error-correction learning law. It may be viewed as a simplification of the least mean square (LMS) filtering algorithm which is as equally popular as the algorithm error back-propagation algorithm.
There are two passes of computation in Error back-propagation training through different layers of a network:
- A forward pass.
- A backward pass.
The forward pass applies an input vector to the nodes of the system propagating each layer’s outcome to the next layer through the network. The output passes on from layer to layer arriving at a set of outputs that form the actual response of the system (network). The weights of the networks are all permanent in the forward pass. On the contrary, all the weights are adjusted in compliance with an error correction rule in the backward pass. The error signal is the actual response of the network minus the desired response.
This error signal is then propagated backward throughout the network in the opposite direction of the synaptic connections. The weights are tuned so as to make the real response of the network move nearer to the favoured response. A multilayer perception has three unique characteristics:
- The illustration of each neuron in the system includes a nonlinear activation function. The sigmoid function usually used is as defined by the logistical function:
One other commonly used function worth mentioning is the hyperbolic tangent.
Existence of nonlinearities is imperative since the input- output relation of the network may be reduced to single layer perceptron otherwise.
- There are layer(s) of hidden neurons not contained in the input or the output present in the network. These hidden neurons facilitate the network to study complex tasks.
- Network exhibits a high degree of connectivity. The population of the weights should be changed if there is a requirement to alter the connectivity of the network.
6.2 FLOW CHART
Figure 2.1: Flowchart showing working of BPA
6.3 Types of Transfer Function
Activation or transfer function, denoted by Φ(.) , defines the output of a neuron in terms of activity level its input. Each neuron of ANN has an associated transfer function that determines its output. Transfer functions used in MATLAB software are presented in Table 3.1 appendix(Demuth and Beale, 2004).
Log-Sigmoid transfer function (Logsig) accepts inputs between plus infinity and minus infinity and squashes the output to a range between 0 and 1 as shown here:
Hyperbolic Tangent Sigmoid function (Tansig) is another important transfer function. The input can differ from plus infinity to minus infinity. The output of the function varies from -1 to +1 as shown:
A linear function (Purelin) whose output is equal to its input is usually used at the output stage of the neural network as given below;
f (x) = x
6.4 Usefulness of back propagation technique
Back propagation is useful for training multi layer ANN. It’s a systematic method for training multi layer ANN with strong mathematical foundation. Back propagation has dramatically extended the range of problem in which ANN can be applied.
A set of input is applied from previous layer or from outside, each of these are multiplied with a weight and the product is summed termed as “NET” which calculate activation function f there by producing signal “OUT”
NET = X1W1 +X2W2+…………+XNWN (5.2)
This function is known as sigmoid.
It consist of NET so that OUT lies between zero and one. Multi layer network have greater representational power than single layer only if non linearity is introduced. Back propagation algorithm requires that the function must be everywhere differentiable, which is satisfied by sigmoid.
6.4.1 Multi layer network
A multi layer network may be considered for training with Back propagation algorithm. The first set of neuron connecting to the inputs serve only as point of distribution which implies that it performs no input summation. The input signal is passed onto their outputs as is through the weights. NET and OUT signals are produced by each layer.
6.4.1 Overview of training
The objective of the network is to adjust the weight in such a way that the application of an input set produces desired set of outputs. We refer to these input-output sets as vectors. A key assumption by the training is that each input vector coupled with the target vector represents the desired output; these are called as training pairs. Most commonly, a network is trained over a number of training pairs called training sets.
To ensure that the network does not saturate due to usage of large value of weights, all the weights must be initially set to small random numbers at the start of the training set which in turn will prevent certain other training pathologies.
Training the Back propagation algorithm require following steps:
- Select he next training pairs from the training sets, apply input vector to network input.
- Calculate network’s output.
- Calculate the error i.e., the difference between the network output and the desired/target output.
- Make suitable adjustments to the weights to minimize the error.
- Repeat step 1 to 4 for each vector in the training sets until the error in entire set is acceptably low.
The operation in step 1 and 2 are similar to the way in which we will use the trained network i.e. an output vector is applied and performed on a layer-by-layer basis. To begin with, the output of the neuron in layer j is calculated which is the input to layer k, layer k outputs are calculated and constitutes the output vector of the network.
In step 3, each network output labelled - OUT is subtracted from its corresponding components of target vector to produce an error which is used in step 4. Step 4 adjusts the weight of the network by making use of the training algorithm to determine in polarity and magnitude changes of the weights.
These four steps are repeated until the error between actual output and target output reduces to an acceptable level. Upon arriving at the acceptable level the network is said to be ‘trained’ and can be used for recognition keeping the weight constant.
Step 1 and 2 can be expressed in vector form as stated below:
An input vector X is applied to produce and output vector – Y where the input target vector pair (X,T) come from the training set.
As we can see calculations in a multilayer network is executed layer by layer, starting from the layer nearest to the input. The NET value of each neuron in the first layer is deduced as the weighted sum of its neuron’s input. The NET is then squashedby the activation function f to produce the OUT value of each of the neuron in that layer. Upon finding the set of output for a layer, the same serves as input to be fed to the next layer of neurons. This process is repeated until final set of network output is produced.
6.4.2 Adjusting the weight of output layers
The availability of the target value of each neuron in the output layer makes the adjustment of the associated weight easily accomplishable using a modification in delta 8 rule. Interior layer is referred to as “hidden layers” as output has no target value for comparison.
Output of neuron in layer k is deducted from its target value to give rise to an error signal and then multiplied by the derivative of squashing function (OUT *(1-OUT)) calculated for that layer’s neuron k, and there by producing the value.
Then is multiplied by OUT from neuron j, the source neuron for that weight given in the question. This product bin turn multiplied by a training rate coefficient (0.01 to 1.0) and the obtained result is added to weight.
5.4 Adjusting the weight to hidden layer: Hidden layer has no target vector, thus the training process discussed above cannot be used. Back propagation trains the hidden layer through the process of propagating the output error back into the network to make adjustments to the weights layer-by-layer.
For the hidden layer must be generated without the benefit of the target vector. First, the value is calculated for each neuron of the output layer. It is then used to adjust the weights fed into the output layer and is then propagated back through the same weights in order to generate a value for neurons in first hidden layer. This value of is used in turn to make adjustments to the weights of this hidden layer and in the very similar way are propagated back to all the preceding layers.
Consider a single layer neuron in hidden layer just before the output layer. This neuron’s output value is propagated through interconnecting weights to the neuron in the output layer while in the forward pass. While training, these weights operate in a reverse pattern by passing the value back to the hidden layer of neurons from outer layer of neurons. Each of the weights is multiplied by the value of the neuron that is connected to it in the output layer. The value of that is needed for the hidden layer neuron is the summation of all such products and the product of the same with the derivative of the squashing function.
6.1 Training Algorithms
There are several different back-propagation training algorithms. There exist a variety of computation and storage requirements, and no single algorithm suits all locations. Table 6.1 (appendices) Training algorithms summarizes the training algorithms included in MATLAB software. The few have been briefed below.
6.1.1 Resilient Back-propagation (trainrp)
Sigmoid transfer functions are typically employed in the hidden layers of Multilayer networks. These functions, known as “squashing” functions compress an input of infinite range into an output of finite range. A significant characteristic of Sigmoid functions is that as their inputs get larger their slope also must approach zero. This leads to a problem when the steepest descent is used for training of a multilayer network with sigmoid functions, the reason being that the gradient of the function can be of very small magnitude perpetuating small changes in the weights and biases, though the weights and biases are quite distant from their favorable values. To rid the partial derivatives of these harmful effects of the magnitudes, the resilient back-propagation training algorithm is brought to effect.
6.1.2. Scaled Conjugate Gradient (trainscg)
The conjugate gradient algorithms we have gone through so far require a line search foreach iteration. This line search is computationally pricey,since by requirement for each search the network response to all training inputs should be computed severaltimes. The scaled conjugate gradient algorithm (SCG) was consideredto consume less time for the line search but is too intricate to explain ina few lines.The basic idea is to combine thetwo approaches: the conjugate gradient approach and the model-trust region approach.
6.1.3. Levenberg-Marquardt (trainlm)
The Levenberg-Marquardt algorithm was intended to approach second-order training speed without the necessity of the Hessian matrix. If the performance function assumes the form of a sum of squares that is typical in training of feed forward networks, then the Hessian matrix is approximated and the gradient is computed with the Jacobian matrix through a standard back-propagation technique.