Wireless Communication Network And Channel Equalisation Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The field of telecommunications has made tremendous progress of the past decade, especially in wireless communications. Although wireless channel is facing higher distortion compare to wire-line channel, however, wireless service is maturing and adding aspects of wire-line and internet service that are attractive to mobile users. Wireless communication will provide apparent benefits by offering location-based services and an always-connected digital assistant in the future [1]. For many users, wireless service may completely replace wire-line and/or Internet services. As wireless service becomes more versatile and cost effective, this means of service will rise in terms of its usage. The world is becoming a global village due to the advancement in the field of wireless communications.

The digital signals are usually transmitted over band-limited channels in order to use the precious transmission bandwidth more efficiently. However, nonlinearity distortion is an unneglectable factor that caused the deficiency of data transmission in wireless communication systems [2]. In addition, dispersive nature of these channels and multipath problems give rise to intersymbol interference (ISI) at the receiver end where the transmitted data get invariably distorted in amplitude and phase [3,4]. Therefore, in the consideration of the designing wireless communication system, the aim is to minimize the effects of distortion and maximize data throughput.

1.2 Channel Equalization

Channel equalization is the process of reducing amplitude, frequency and phase distortion in a communication channel with the intent of improving transmission performance [4]. It is used to extract transmitted signals in the presence of noise and interference at the receiver end. The noise and interference may arise from variety of sources and phenomenon. For an example, the data may have been corrupted by noisy transmitter or transmission though forest.

Figure 1.2.1: Equalizer and channel response

Channel equalization may classify into linear channel equalization and nonlinear channel equalization. An equalization process is said to be linear if the output of an equalizer is a linear function of the observations applied to the equalizer''s input. Otherwise, the equalization is nonlinear [3].

If a signal x(n) is transmitted through a linear dispersive channel, the received signal y(n) can be denoted by [5]

(Equation 1.1)

The number of delay elements used in the equalizer determined the finite duration of its output response. The number of delay elements is denoted as P-1. e(n) is the Gaussian random noise and ak is the channel impulse response. The signal x(n) is random with equal probability and channel equalization is used to estimate the original input signal x(n) from the received signal y(n) in the presence of additive noise and intersymbol interference which arise due to finite bandwidth of the channel.

Adaptive equalizers have been playing a main role in digital communication systems for a reliable data transmission and numerous equalization algorithms have been proposed. Among various equalization methods, the linear transverse equalizer (LTE) with a feedforward structure has been widely used and is frequently implemented by using the least mean square (LMS) algorithm. However, due to various channel environments, complex equalization problems cannot be effectively resolved by using linear decision boundaries [3]. In particular, the linear equalizer does not perform well in a large amount of intersymbol interference. To compensate the shortcomings of linear equalizer based on the LMS algorithm, many researchers have proposed a number of nonlinear channel equalization approaches such as the blind deconvolution, back-propagation learning, radial basic function networks and so on.

Figure 1.2.2: Linear finite impulse response (FIR) equalizer

1.3 Scope of This Study

A Neural network (NN) is an information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information [6]. A NN is made up of the interconnection of a large number of nonlinear processing units referred as neurons. It has been practiced as nonlinear channel equalizer by many researchers.

In this thesis, the gradient descent, a derivative- based algorithm and Particle Swarm Optimization (PSO) which is a derivative-free algorithm are been analyzed to solve the Legendre NN (LeNN) weight optimization problem for channel equalization in wireless communication system.

The remainder of this thesis organized as follows. Chapter 2 presents the theoretical background and literature review of the communication channel equalization problem and the proposed solutions. Chapter 3 gives a details description of the equalization process of LeNN and formulation of the gradient descent method and PSO method for learning process. Chapter 4 contains the simulation results and performance analysis of PSO-based LeNN equalizer and its comparison with gradient descent-based LeNN equalizer. Finally, a conclusion is presented in Chapter 5.


2.1 Adaptive Communication Channel Equalization

The purpose of adaptive channel equalization is to compensate for signal distortion in a time-varying communication channel. As mentioned earlier in part 1, signals passed through channel in wireless communication systems are distorted by ISI and channel nonlinear dispersion.

A realistic multipath nonlinear channel model is considered. Nonlinear intersymbol interference (ISI) leads to significant error rate in nonlinear communication channel. ISI is a form of signal distortion where one symbol interferes or overlaps with the next symbol [7]l. Symbols are invariably distorted in amplitude and phase at the receiver end. This effect causes the communication becomes less reliable. ISI is usually caused by multipath propagation or band-limited channels [2].

Figure 2.1.1: Intersymbol interference

Multipath propagation is a phenomenon in which a wireless signal from a transmitter reaches the receiver via different paths [8]. These paths may be the result of reflection from buildings, refraction through foliage, and atmospheric effects such as atmospheric ducting and ionosphere reflection that maybe adjacent to the main path. The delay caused by different length of propagation paths will result different version of signal arriving at the receiver at different times. The delays means part or the entire symbol will spread into the subsequent symbols. Therefore, subsequent symbols will get interfered by the delayed symbol and leads to fault detection of those symbols. In addition, various paths often distort the amplitude and/ phase of the signal, thereby incur further interference to the transmitted signal.

Figure 2.1.2: Illustration of Multipath Propagation

Signal is facing ISI when it transmits through a band-limited channel [9], i.e., the frequency response is zero above cutoff frequency. Frequency components above the cutoff frequency are removed when a signal is passing through such a channel. For frequency components below the cutoff frequency, the amplitude might be attenuated by the channel too.

Figure 2.1.3: Spectrum of a baseband signal as a function of frequency

Therefore, in order to improve transmission performance, it is necessary to update the equalizer coefficients in order to track the channel changes. The block diagram of an adaptive equalization model is presented in Figure 2.1.4 and described as follows. The external time dependent inputs consist of the desired signal d(k), the multipath channel, the channel nonlinearity and the interfering noise q(k).

Figure 2.1.4: Block diagram of an adaptive wireless digital communication channel equalizer

The transmitted signal, tk is BPSK modulated, i.e., tk = {-1, 1} where each symbol is drawn from a uniform distribution.

The transmission medium is modeled by a linear dispersive channel is an FIR filter whose output at kth instant is given by:

(Equation 2.1)

where hi denotes the channel coefficients and Nh denotes the channel order.

Channel nonlinearity is described by

(Equation 2.2)

and represented by the ''nonlinear dispersion'' block.

The channel output is corrupted with an additive noise, qk and the received signal at the kth time instant is given by (Equation 2.3)

The task of an equalizer is to set its coefficients in such a way that the equalized received signal, yk is a close estimate of the desired output dk. In the learning phase, the desired output dk are known (i.e., the pilot learning bits) and weight optimization is performed. The weights are freezed after learning is completed and are then used to estimate subsequent transmitted sequence.

2.2 Neural Network

Neural networks are simple elements operating in parallel which are inspired by biological nervous system [10]. Similar to biological nervous systems, the connections btw each element are dependent on their network function. A neural network is trained to perform a particular function by adjusting the values of the connections (weights) between elements [11].

Figure 2.2.1: Biological Nervous System

The procedure of modifying the weights of a network is defined as the learning rule [12]. It is also known as the training algorithm. Based on a comparison of the output and the target, the weights are adjusted or trained so that a particular input is able to achieve a specific target output. The Figure 2.2.2 illustrates how it works. Training pairs consist of many sets of inputs and target outputs. Typically, many training pairs are required in order to train the network.

Figure 2.2.2: Block diagram of neural network [10]

The starting point for most neural networks is a model neuron, as in Figure 2.2.2. A neuron is made up of multiple inputs and a single output. Each input is modified by a weight, w which multiplies with the input value. The neuron will combine these weighted inputs and, with reference to a threshold value and activation function, it produces an output, y [11].

Figure 2.2.3: A Model Neuron

The first artificial neuron was produced in 1943 by the neurophysiologist Warren McCulloch and the logician Walter Pits. But due to the limitation of technology at that time, they are not allowed to do too much [12]. However, NNs have the remarkable ability to derive meaning from complicated or imprecise data. Therefore, it can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A trained NN as be regarded as an expert in the field of information which it has been given to analyze.

NNs have been trained to perform complex functions in various fields, including pattern recognition, identification, classification, speech, vision, and control systems. NNs can also be trained to solve problems that are difficult for conventional computers or human beings. NN paradigms are used in engineering, financial, and other practical applications.

From a signal processing perspective, NNs creates interests among researchers due to following important properties [3]:

? Nonlinearity. The nonlinear nature of neurons in the network is particularly useful if the system is responsible for the generation of an input signal is inherently nonlinear.

? Weak statistical assumptions. NN allows ''the dataset to speak for itself''. It performs based on the data in the environment in which it operates, provided the data are large enough to represent the environment.

? Learning. A NN has a built-in capability to learn from its environment by undergoing a learning session to adjust its free parameters.

The performances of NN-based channel equalizers have been reported to give better performance than the linear channel equalizers [13, 14, 15]. Nonlinear signal processing in the NN allows it to perform nonlinear mapping between the high- dimensional input and the output spaces [16].

Examples of NN types are: feed-forward back-propagation, generalized regression, Hopfield, perceptron, radial basis and etc. Multilayer perceptron (MLP) is one of the earliest NN proposed for models of channel equalization [16]. Later on, functional link-based NN (FLANN), Radial basis function (RBF) network, recurrent NN, Legendre NN (LeNN) and few others types of NN are proposed to solve channel dispersion problem.

The MLP based-NN is a multi-layered architecture in which each layer consists of certain number of nodes or processing units. Generally, it consists of an input layer, one or more hidden layer (s) and one output layer. An input pattern is applied to the MLP during the training phase, and the outputs for each node of all the layers are computed. The MLP network is complex due to the design of a MLP requires consideration and determination on number of hidden layers and the number of neurons in each of the hidden layers [3]. Although the MLP able to provide robust solution, however, its excessive training time and high computational complexity appears as two major drawbacks for this approach [16].

Figure 2.2.4: Multilayer perceptron with a single hidden layer

The FLANN is single-layer flat structure where the hidden layers are eliminated by transforming the input pattern to a higher dimensional space such that in the projected higher dimensional space, the pattern becomes linearly separable. It performs pattern enhancement by expanding input pattern to higher dimensional pattern by trigonometric functions. Due to the absence of hidden layer, FLANN provides computation advantage over the MLP [16].

Figure 2.2.5: Schematic diagram of an FLANN

The RBF network consists of an input layer of source nodes, a single hidden layer of nonlinear processing units, and an output layer of linear weights. The transfer functions connecting the input layer to hidden layer are nonlinear and for those connecting the hidden layer to the output layer are linear. Thereby, there is a bridge between linear adaptive filters and NNs since the weights are trained in a supervised fashion using an appropriate linear filtering method [3]. RBF networks n MLPs are both feedforward type universal approximators. However, they perform input-output mapping in different ways. Advantages of RBF networks over MLP trained with the backpropagation algorithms are more straightforward training process and a simpler network structure.

Figure 2.2.6: RBF network

Figure 2.2.7: Illustration of the transformation involved in an RBF network

The input-output mapping performed by RBF network can be described as [3]:

, (Equation 2.4)

The term is the kth radial-basis function that computes ''distance'' between input vector u and its own center tk. The output signal produced by the kth hidden unit is a nonlinear function of that distance. The wk represents a complex weight that connects the kth hidden node to the output node of the network. Constant wo represents a bias that may be complex. The nonlinear transformation is defined by the set of radial-basis functions , and the linear transformation is defined by the set of weights , k=1, 2, '', K.

Structure of LeNN is similar to FLANN. LeNN uses Legendre orthogonal functions instead of trigonometric functions which used by FLANN in the functional expansion. The major advantage of LeNN over FLANN is that the evaluation of Legendre polynomials involves less computation compared to the trigonometric functions. Therefore, LeNN can train faster compared to FLANN. Some of the important properties of LeNN are: (i) they are orthogonal polynomials, (ii) they arise in numerous problems especially in those involving spheres or spherical coordinates or exhibiting spherical symmetry and (iii) in spherical polar coordinates, and the angular dependence is always best handled by spherical harmonics that are defined in terms of Legendre functions [16].

There are many algorithms for training process in neural networks; most of them can be viewed as a straightforward application of optimization theory and statistical estimation. They include gradient descent backpropagation, levenberg-Marquadt, resilient backpropagation, one step secant and etc [17].

Recent developments in this field also found the use of particle swarm optimization and other swarm intelligence techniques, such as stochastic diffusion search, gravitational search algorithm and intelligent water drops in the training process of NNs.

2.3 Legendre NN

The structure of a Legendre NN proposed by Jagish [16] is shown in Figure 2.3.1.

Figure 2.3.1: Schematic diagram of LeNN

Signal arrives the receiver at the kth time instant is denoted by rk = rk,I +jrk,Q where rk,I and rk,Q are the in phase and quadrature components respectively.

The input signal to the LeNN equalizer is denoted by

rk = [rk,I,rk,Q],

rk-1= [rk-1,I,rk-1,Q] ,

rk-2= [rk-2,I,rk-2,Q] .

Therefore, t1= rk,I; t2=rk,Q; t3=rk-1,I; t4=rk-1,Q; t5=rk-2,I; and t6=rk-2,Q . By using the Lengedre expansion block, the 6-dimensional input pattern is enhanced to 18-dimensional pattern Xk. The expanded input for LeNN [16] is shown in Table 2.1.

Table 2.3.1: Legendre expansion for LeNN

x LeNN

x1 t1

x2 L2(t1)

x3 L3(t1)

x4 t2

x5 L2(t2)

x6 L3(t2)

x7 t3

x8 L2(t3)

x9 L3(t3)

x10 t4

x11 L2(t4)

x12 L3(t4)

x13 t5

x14 L2(t5)

x15 L3(t5)

x16 t6

x17 L2(t6)

x18 L3(t6)

LeNN uses Legendre Orthogonal functions [16]. The Legendre polynomials are denoted by Ln(x) where n is the order and -1<x<1 is the input of the polynomial. Legendre polynomials are mathematical equations which fulfills the following formula:

(Equation 2.5)

The recursive formula to generate higher order Legendre polynomials is expressed as [16]

. (Equation 2.6)

2.4 Learning Rules

The learning rule is applied to train the network to perform some particular task. It can be categorized into two main groups: supervised learning, and unsupervised learning [18].

In supervised learning, the system contributes a set of proper network behavior training data as the input to the network and acts as the corresponding target output. The output produced by the corresponding input is then compared with the target. The learning rule is then used to adjust the weights of the network in order to obtain closer outputs to the targets. The efficient supervised learning of NNs as channel equalizer is commonly judged from the minimization of an error function that depends on the weights of the network.

In unsupervised learning, weights are adjusted based on the inputs only without having to fulfill a particular target output. It mostly performs clustering operations. It is widely applied in applications such as vector quantization where the input patterns into a finite number of classes.

The network is prepared for training once the weights are initialized. Purposes of training the network are to perform function approximation (nonlinear regression), pattern association, or pattern classification. A set of examples of proper network behavior, the network inputs r and target outputs d is required for training process. The weights of the network are adjusted iteratively in order to minimize the network performance function during training process. The default performance function for feedforward networks is mean square error (mse) [19], which is the average squared error between the networks outputs y and the target outputs d.

2.4.1 Least-Mean Square (LMS) Algorithm

LMS algorithm is developed by Widrow and Hoff in 1960. It is a member of the stochastic gradient algorithms family. An important characteristic of LMS algorithm is its simplicity. It does not require matrix inversion and measurements of the pertinent correlation functions. LMS algorithm consists of two basic processes, which are filtering and adaptive process [3].

Filtering process involve obtaining output of a transversal filter produced by a set of tap inputs and generating an estimation error by comparing output with desired response. Adaptive process involves the automatic adjustment of the tap weight of the equalizer in accordance with the estimation of error.

These two processes combined and work together forming a feedback loop around the LMS algorithm.

Figure Illustration of LMS block diagram

2.4.2 Gradient Descent Backpropagation Algorithm

The term ''backpropagation'' is an abbreviation for "backwards propagation of errors". It is a generalization of the Widrow-Hoff learning rule to multiple-layer networks and nonlinear differentiable transfer functions [20]. Inputs and target outputs are used to train a network until the outputs can move and approximate to a predefined function.

It is a supervised learning method, and is a practice of the Delta rule. It requires the information on the difference between the desired output for any given input. It is mostly for feed-forward networks. Backpropagation is only applicable to a differentiable activation function used by the neurons in the network.

Standard backpropagation is a gradient descent algorithm, as is the Widrow-Hoff learning rule, in which the network weights are incremented by a fraction of the negative gradient at each iteration of the performance function. There are a number of variations on the basic algorithm that are based on other standard optimization techniques, namely; the conjugate gradient and the Newton method. Gradient descent algorithm has been one of the most studied and used algorithms for learning process in neural networks.

There are two different ways in which this gradient descent algorithm can be implemented: incremental mode and batch mode [19].

In batch (or ''deterministic'') mode, the weights of the network are updated only after the entire training set has been applied to the network. The gradients calculated at each training example are summed up together to determine the change in the weights. The weights are then adjusted in the direction of the negative gradient of the performance function.

In incremental (or ''stochastic'') mode, the gradient of the cost function is evaluated on a single training example. The parameters are then adjusted accordingly by an amount proportional to this approximate gradient. After each training, the parameters in the model are updated. Therefore, incremental gradient descent gives faster performance than batch gradient descent for large data set system.

A simple stochastic gradient descent backpropagation algorithm is always used in learning process to find weights that minimize error. The errors (the learning) propagate backwards from the output nodes to the inner nodes in NN. In order words, this algorithm is used to calculate the gradient of the error of the network with respect to the network''s adjustable weights. It allows quick convergence on satisfactory local minima for error in the suitable kind of network.

Typically, a new input leads to an output closer to the target output in each training iteration. This generalization property makes it possible to train a network with a portion set of input-target pairs and get satisfactory results without training the network on all possible input-output pairs.

A learning rate, is aim to accelerate the learning process and to exhibit fast convergence for the system. The changes to the weights are determined by multiplying the negative of the gradient with the learning rate. The size of the step is dependent on the learning rate. The algorithm becomes unstable if the learning rate is too large. If the learning rate is set too small, the algorithm takes a long time to converge [19].

Gradient descent is based on the observation that if the real-valued function F(x) is defined and differentiable in a point of neighborhood, z. F(x) decreases if one goes from z in the direction of the direction of negative gradient of F at z,

The sequence (xn) converges to the desired local minimum where

The process of gradient descent algorithm is shown in Figure F is assumed to be defined on the plane, and the shape of the graph is a bowl. The contour lines indicate the regions where the value of F is constant. The arrow is moving in the direction of the negative gradient at the respective point. The negative gradient at each point is orthogonal to the contour line. Gradient descent leads the function to the bottom of the bowl, where is the minimum value of the function F.

Figure Illustration of gradient descent search method

Recently, Jadgish, Pramod and Goutam proposed a gradient descent based LeNN for channel equalizer in digital communication system. It is a derivative-based algorithm. The LeNN simulated produces single output. They have shown that their proposed equalizer is superior to Multilayer perceptron (MLP)-based and Functional Link Neural Network (FLANN)-based equalizer for highly nonlinear and dispersive channels [16]. However, there are still some limitations [21]. For instance, it takes a relatively long time to converge and the method is only applicable to differentiable cost function.

Differentiable functions are those look like straight line in the desired point of differentiation. In other words, differentiation is about finding the slope of the line, the tangent line. No tangent line means a non-differentiable function. Examples of non-differentiable functions are:

(i) The function is not continuous since it jumps at x.

Figure Jump discontinuity

(ii) A kink. The absolute value function has a kink at x = 0 when x is positive and 'Cx when x is negative.

Figure A kink

(iii) The function is unbounded and goes to infinity.

Figure Infinite discontinuity

2.4.3 Particle Swarm Optimization

PSO is first proposed by J.Kennedy and R.Eberhart in 1995 [22], inspired by social behavior of bird flocking or fish schooling. PSO poses few similarities with a particular class of evolutionary algorithm, Genetic Algorithm (GA). The system is started with a solution of random population and searches for the optima by updating the best-so-far solutions. GA involves evolution operators such as mutation and crossover while PSO doesn''t.

The PSO algorithm is simpler stochastic search algorithm with no gradient information of the objective function; the algorithm converges quickly and its characteristic can be easily controlled.

In the PSO algorithm, each member of the population is called a ''particle'', and each particle ''flies'' around in the multidimensional search space with a velocity, which is constantly updated by the particle''s own experience and the particle''s neighbours'' experience or the experience of the whole swarm.

The position and velocity updating rule proposed by Kennedy and Eberhart [22] is as below:

(Equation 2.7)

(Equation 2.8)

where is the particle position, is the particle velocity, c1 and c2 are acceleration constants r1 and r2 are two independent random numbers uniformly distributed, is the inertia weight [23], is the best previous position along the dth dimension of particle i in iteration k (memorized by every particle), is the best previous position among all the particles along the dth dimension in iteration k (memorized in a common depository). The second term , called the cognitive component and the third term , called the social component. The particles are able to exchange information with its neighbors, to memorize a previous position and to use information to make a decision.

Another attractive reason of PSO is that there are few adjustable parameters. Different version of PSO algorithm can be formed with slight variations in order to perform well in a wide variety of applications. The number of hidden layers and the range of the weights can be manipulated in order to obtain better results in different trials. There are two key steps in the PSO method for optimization problems, which are the representation of the solution and the fitness function. In term of complexity, PSO is a simple algorithm since it takes real numbers as particles.

The parameters and are user-selected coefficient. Typically, the range for these three parameters are set at 0.8 '' wk '' 1.2 , 0 '' c1 '' 2 and 0 '' c2 '' 2. The values r1 and r2 are randomly regenerated in the range of [0 1] for each velocity update. The value of inertia weight, wk determines the particle''s inertia is damped or accelerated in its original direction. Lower value of inertia weight speeds up the convergence of the swarm to optima, and a higher value of inertia weight encourages the exploration to entire search space. The cognitive component causes the particle tend to return to its best individual position while the social component causes the particle move to the best position of the swarm at that time instance. Random values r1 in the cognitive component and r2 in the social component heavily influenced the particle to move in a semi-random manner in the directions of individual best experience of the particle and global best experience of the swarm [24].

The particles search is a repeating process and it will stop when the stopping criteria is met. Some common stopping conditions are: a maximum iteration number or a minimum error condition.

Figure Cognitive component search space contribution for 2-dimensional problem [25]

Figure Social component search space contribution for 2-dimensional problem [25]

Figure Combination of two components search space contribution for 2-dimensional problem [25]

Figure Illustration of particle updates

In past several years, PSO has been successfully used for approaches in wide range of research and application areas, as well as for specific applications focused on a specific requirement. It is used for training of NNs, structural optimization, and system identification in biomechanics. Examples of PSO applications are Parkinson''s diseases detection [26], human tumor analysis [26], image recognition [27], extraction of rules from fuzzy networks [28], electromagnetic optimization problem [29], optimal shape and sizing design, topology optimization, biochemistry process [25] and etc.


3.1 The LeNN Channel Equalization Process

Figure 3.1.1: Schematic diagram of proposed LeNN

The received signal rk is mapped using Legendre orthogonal function. Initial weights are set randomly or set as zeros. The output of the LeNN is

(Equation 3.1)

This output is then compared with the desired output, dk to produce the error signal, ek. The error signal is then used to update the weights in learning phase. The training process continues iteratively until it a predetermined maximum iteration or the when the mean square error (MSE) reaches a predefined small value ( -30dB). During the testing phase, the NN weights obtained from learning phase are used for equalization purpose.

Figure 3.1.2: Flowchart for equalization process in LeNN

Pseudocode for Equalization process in LeNN:

I) For all received bits

Normalized the bits

rx = rx/max(abs(rx));

Mapped using Legendre polynomials

X = legen(fliplr(toeplitz(rx(3:P),rx(3:-1:1))));


a) Start the training process

1) Updating the weight

2) Looping until maximum iterations is attained

%for gradient descent algorithm

W = W+0.7*dW;

%for PSO algorithm

[W,mse] = psosearch1(X,tx(2:P-1));


b) Use the new received bits

X = legen(fliplr(toeplitz(rx(P+1:N),rx(P+1:-1:P-1))));

1) Applying nonlinear activation function at the output node

y = tanh(X*W);

2) Compare the equalized output with the desired output

sign(y) ~= tx(P:N-1);

3) Calculate the bit error rate

ber(k1,k2) = sum(sign(y) ~= tx(P:N-1))/(N-P);


3.2 Formulation of Gradient Descent Algorithm

The algorithm in [16] minimizes Ek through the iterative updates of weights for all training patterns.

The cost function at kth instant, given by:

, (Equation 3.2)

Where is the desired output at the kth instant, and ek denotes the error term. In each iteration, a mapped input pattern is applied to LeNN and the error, ek is obtained. The error value is then used in the backpropagation algorithm to minimize the cost function until it reaches a pre-defined minimum value or reaches the maximum iteration. Weights are updated for each iteration as follows:

, (Equation 3.3)

Where is the learning parameter which is set between 0 and 1. In this study, its value is set to be 0.7.

The gradient of cost function (Equation 3.2) is given by

, (Equation 3.4)

The weight wji is updated by

, (Equation 3.5)

A nonlinear tanh(.) function is used at the output node, the update of weight becomes:

. (Equation 3.6)

is the mapped Legendre input function.

Figure 3.2.1: Flowchart for Gradient descent algorithm

Pseudocode for Gradient descent algorithm:

I) Initialize the weights in the network randomly

W = zeros(1,9);

II) Do:

a) Set number of iterations

for k2 = 1:1000

b) For each example in the training set

y = equalizer output

d = target ouput

1) Calculate error (y-d) at the output units

e = y-tx(k3-1);

2) Compute negative gradient weight for all weights

dW = -e.*(1-y.*y)*transpose(x);

3) Update the weights in the network

W = W+0.7*dW;

4) Evaluate the cost function



Until maximum iteration is attained

Return the network

3.3 Formulation of PSO

The learning process involves updating the weights of the LeNN in order to minimize a given cost function (also called fitness function). The best fitness value for each particle and the best fitness value for the swarm are continually updated at each iteration. The algorithm terminates after a predefined number of iteration.

Initially, two random M x N matrices are set to indicate the dimensions and number of particles, for both position and velocity.

For each column of N in position matrix, find individual best fitness and global best fitness based on a given cost function. Retrieve the respective best position in column.

[f1 f2 f3 f4 f''.. fN]

For next iteration, the particles'' positions are updated by recalculating the respective velocity (Equation 2.7)( Equation 2.8). Individual best fitness value and global best fitness value computed are compared with the values obtain from the previous iteration (or the best remembered value). The value of best positions, individual best fitness and global best fitness are kept once a better fitness value is met and iterations go on.

Figure 3.3.1: Flowchart for PSO algorithm

Pseudocode for PSO algorithm:

I) For M x N particles:

Initialize particles

M = 9;

N = 2^M;

c1 = 1.75;

c2 = 1.75;

pos = 2*rand(M,N)-1;

vel = 2*rand(M,N)-1;

bestpos = pos;

bestfit = calc(pos,X,tx);

II) Do:

a) Set number of iterations

for k = 1:20


b) For each particle:

1) Calculate fitness value

function fit = calc(W,X,tx)

y = tanh(X*W);

e = bsxfun(@minus,y,tx);

fit = sum(e.*e,1);

2) If the fitness value is better than the best fitness

value (bestfit) in history

u = logical(fit < bestfit);

3) Set current value as the new bestfit and update the

bestpos for the particles

bestfit(u) = fit(u);

bestpos(:,u) = pos(:,u);


c) For each particle:

1) Find in the particle neighborhood, the particle with

the best fitness

[u,v] = min(bestfit);

W = bestpos(:,v);

2) Calculate particle velocity according to the velocity

in equation (8)

vel = vel+c1*bsxfun(@times,r1,


3) Update particle position according to the position in

equation (9)

pos = pos+vel/4;


Until maximum iterations is attained

Return the network

3.4 Clipping in PSO

A clipping function is added into the proposed PSO algorithm. The proposed clipping technique acts as a threshold to the system in order to increase the efficiency in eliminating high frequency spectrum noise. Two parameters are computed from the PSO training process, a clipping parameter, c and a weight, w. The c and w which give the best fitness in the PSO search are then used to process received signal during Legendre mapping and at the output node of LeNN. The analysis of the signal estimate accuracy was carried out for different noise environment.

The flow of this algorithm is similar to the previous case in Chapter 3.3. The main difference is the computation for objection function.

Pseudocode for Clipping PSO algorithm:

I) For M x N particles:

Initialize particles (extra one dimension is added for each N particle)

M = 10;

N = 2^M;

c1 = 1.75;

c2 = 1.75;

pos = 2*rand(M,N)-1;

pos(end,:) = abs(2*rand(1)-1); %the threshold has a

% positive value

vel = 2*rand(M,N)-1;

II) Do:

a) Set number of iterations

for k = 1:20


b) For each particle:

1) Calculate fitness value

function fit = calc(pos,rx,tx)

legen = @(x) [x (3*x.*x-1)/2 (5*x.*x.*x-(3*x))/2];




rxnew = tanh(c*rx);

X = legen(fliplr(toeplitz(rxnew(3:P),rxnew(3:-1:1))));

y = tanh(X*W);

e = bsxfun(@minus,y,tx);

fit = sum(e.*e,1);

2) If the fitness value is better than the best fitness

value (bestfit) in history

if fit(k3)<bestfit(k3)

3) Set current value as the new bestfit and update the

bestpos for the particles.




c) For each particle:

1) Find in the particle neighborhood, the particle with

the best fitness. The first 9 positions

indicate the best weight and the last position

indicates the best clipping parameter.

[u,v]= min(bestfit);

W = bestpos(:,v);

C= bestpos(end,v);

2) Calculate particle velocity according to the velocity

in equation (8)

vel = vel+c1*bsxfun(@times,r1,


3) Update particle position according to the position in

equation (9)

pos = pos+vel/4;


Until maximum iterations is attained

Return the network


A 3-path channel is used where the channel impulse response is given by [3]

(Equation 4.1)

Eigenvalue ratio (EVR) indicates how the channel spread; the higher the EVR, the tougher to equalize the channel because the channel is worse in terms of channel spread. Parameter determines the EVR of the channel and used in this simulation is 2.9 [3].

Since nonlinear dispersion varies with time and from place to place, severe nonlinear models that used in this simulation are [16]:

NL=0: ,

NL=1: ,

NL=2: ,

NL=3: . (Equation 4.2)

The implementation of the channel model (Equation 4.1) and nonlinear model (Equation 4.2) are chosen as these have been widely used by other researchers [3,16,30].

In order to make the system more realistic, two cases of additive noise are considered, namely additive Gaussian noise and non-Gaussian noise. Additive Gaussian noise is created by using randn function in MATLAB and it generates arrays of random numbers whose elements are normally distributed with mean 0, variance ''2 = , and standard deviation '' = . The additive Gaussian noise distribution is manipulated by the SNR value in dB. For non-Gaussian noise, it involved an element with two different variances, and . is set as 0.05, which means it contains 5% of impulsive noise. While the damping factor, k is set as 100 in this simulation.

In the simulation, for gradient descent algorithm, learning rate, '' is set to be 0.7. It uses first 16 received bits as pilot bits.

For PSO algorithm, N particles of random positions (weights) and velocities spaced into M dimensions, and parameters c1 and c2 are fixed. For the leaning phase, P= 64 pilot bits are mapped using the three Legendre polynomials in (Equation 2.5) yielding M = 9 values of each triplet of .

Adjustable parameters and their respective value for PSO are shown in Table 4.1:

Table 4.1: Adjustable parameter and their value for PSO algorithm

Parameters Value

Number of particles, N 512

Dimension of particles, M 9

Learning factors, c1 and c2 1.75

Random numbers, r1 and r2 0 < r < 1

Stopping condition Maximum iteration, kmax = 20

Inertia weight,


A number of experiments were carried out to examine the performance of channel equalization using PSO and gradient descent as the learning method for LeNN. A comparative analysis of the PSO algorithm with additional clipping function and the generic PSO algorithm is performed in the simulation too.

4.1 MSE Performance

To examine the convergence characteristics and MSE performance of these two equalizers, each equalizer is trained with different number of iterations. The MSE characteristics for NL= 3 with 20dB additive noise is shown in Figure 4.1.1.



Figure 4.1.1: MSE characteristics of LeNN-based equalizer with SNR=20dB

(a) Gradient descent algorithm; (b) PSO algorithm

Gradient descent-based equalizer settles at about -40dB while PSO-based equalizer settles about -150dB. Gradient descent algorithm provided slower convergence rate. The MSE floor settles at about 1000 iterations for gradient descent-based equalizer, whereas in the case of PSO-based equalizer, it only takes 20 iterations to converge.

4.2 BER Performance

To study the BER performances, 50 trials, each consisting of 1000 bits BPSK modulated signal, are used to test the LeNN-based equalizers. The 2 equalizers were trained for 1000 iterations and 20 iterations for Gradient descent-based and PSO-based LeNN respectively and their weight is freezed and stored. New signals are transmitted and passed through the channel to calculate the bit error rate at the output of equalizers. The equalizers estimate the transmitted signals based on the new received signals. If there is a mismatch between transmitted symbol and the NN equalizer output, it gives a bit error. This process was repeated for different values of additive Gaussian noise and non-Gaussian noise ranging from 0dB to 20dB with a step increment of 2dB. The BER performances for the four NL models are shown in Figure 4.2.1.

Additive white noise environment:





Figure 4.2.1: BER performance for the LeNN-based equalizers in additive white noise environment.(a)NL=0; (b)NL=1; (c)NL=2; (d)NL=3.

Non-Gaussian noise environment:





Figure 4.2.2: BER performance for the LeNN-based equalizers in non-Gaussian noise environment.(a)NL=0;(b)NL=1;(c)NL=2;(d)NL=3.

From the graphs shown, BER decreases as SNR increases. Both learning methods generally perform satisfactorily under severe nonlinear dispersion in the presence of noise distortion. A marginal improvement (of about 1-2dB) of the PSO learning method over gradient descent in terms of BER for both additive Gaussian noise and non-Gaussian noise distortion is observed. A drawback of the gradient descent learning method is the slow learning speed, thus increasing computational time. The elapsed time for gradient descent algorithm is about 436 milliseconds per trial while for PSO it takes only about 81 milliseconds per trial. That is a computational savings of roughly 80%.

4.3 Cost Functions Analysis

Figure 4.3.1 shows the BER plot for different error cost functions using PSO search. Overall, the results prove that the sum of squared difference gives the best result for the equalizer when corrupted by both additive Gaussian noise and non-Gaussian noise.



Figure 4.3.1: BER vs SNR plot for PSO using different type of error cost functions for NL=3.(a) in additive Gaussian noise environment; (b) in non-Gaussian noise environment (5%)

4.4 Manipulation of Parameters

In the simulation, convergence rate of gradient descent algorithm for different learning parameter, with 20dB additive noise is analyzed. From the graph below, the first system with provides the slowest convergence rate. Its MSE settles between -35 and -38dB at 3000 iterations. The system with performs much better than the first system. The MSE convergence rate is the highest for . The MSE floor is about -45dB at 1500 iterations. A higher learning rate can speed up the learning process and to exhibit fast convergence for the system. A small learning rate will cause the algorithm takes longer time to converge. However, the learning rate cannot be made too large, or else the algorithm will become unstable.

Figure 4.4.1: MSE characteristics for different learning parameters with SNR = 20dB

For PSO algorithm, the typical range for inertia weight, wk is between 0.8 and 1.2. In the simulation, computation performance with respect to the change of wk in 20dB additive noise environment is analyzed. When wk = 0.8, execution time is about 0.118844s; while for wk = 1.0, execution time is about 0.087685s; and for wk =1.2, execution time is about 0.081983s for single trial. Therefore, a lower inertia weight damp the particle''s velocity, and it takes longer time to reach the best position. A higher inertia weight can accelerate the particle moving to the best position based on the best solution and hence shorter computation time needed for the training process. However, inertia weight must not exceed certain limit to avoid particle overshoots the best position, it causes the system becomes unstable.

4.5 Clipping PSO Performance

From the graph (Figure 4.5.1), it is demonstrated that the usage of clipping-based technique leads to the considerable improvement of the LeNN equalizer compare to the generic PSO algorithm for different noise environments and a wide range of input SNR values. In additive noise environment, a marginal improvement of > 2dB is observed over generic PSO algorithm in term of BER. In non-Gaussian noise environment, generic PSO gives better performance for low SNR value. This may due to clipping algorithm gives > 2dB of marginal improvement over generic PSO algorithm when SNR is above 8dB.



Figure 4.5.1: BER vs SNR plot for two different types of PSO algorithm for NL=3.

(a) in additive Gaussian noise environment; (b) in non-Gaussian noise environment (5%)


5.1 Summary of the Work

In this work, two nonlinear channel equalizers using Legendre Neural Network with gradient descent and PSO learning method are proposed and their performances are validated by using simulated BPSK signals. Moreover, the two different types of noisy environments are formulated. In addition, a clipping function is implemented in PSO algorithm and its performance is compared with the generic PSO algorithm.

It should be noted that PSO algorithm enables the use of derivative-free cost function, while gradient descent algorithm can only be applied to derivative-based cost function. In addition, it has been shown that implementation of PSO algorithm greatly reduces the system complexity and computation time. Performance comparison in terms of BER in noisy nonlinear dispersive environment suggests PSO search as the better learning method in LeNN-based equalizer over gradient descent algorithm.

5.2 Recommendations for Future Work

Future enhancements such as higher modulation schemes (16-QAM, CDMA, OFDM) may be implemented in order to form a more complete transmission system. In addition, finding an optimum function for the PSO approach to get the coefficients of the inertia weight that gives the best results for training process may be developed in the near future. Some criteria for the particles as the block size of the particle also may be changed in PSO approach.