The Origin Of Error Correcting Code Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The origin of error-correcting code. Communication is an act of conveying message form one point to the other; and communication in it self is not perfect even if the message is accurately stated because it may be garbled during transmission. As a result when a message travels from an information source to a destination both the sender and the receiver would like assurance that the received message is free from errors and if it contains error it will be detected and corrected. Richard W Hamming in 1974 was motivated by mistake which occurred internally in a computer and as a result stated to look into the theory of error correction [1 Thomas M Thompson].

In 1948 the fundamental concepts and mathematical theory of information transmission were laid by C. E Shannon [Shannon C. E. "Mathematical Theory of Communication" Bell system Tech J. Vol 27, No 23, July 1948 pp 379-423] He perceived that it is possible to transmit digital information over noisy channel with arbitrary small error probability if there is proper channel coding and decoding in place. The goal of error free transmission can be achieved if the information transmission rate is less than the channel capacity. He did not however indicate how to achieve the task his theory which is called the Shannon capacity theory and it was the theory that opens many researchers to the finding of efficient method for error-control coding by introducing redundancies to allow for error correction.

Earlier in information theory before the works of Shannon; it are the works of Nyquist and Hartley [Drajic D. B, "60 Years of Shannon Information Theory" 8th International conference on Telecommunications in modern satellite and cable broadcasting services, 2007 pp 109-116.]. Nyquist found the minimum frequency band to transmit independent discrete signals at a given rate while Hartley proposed to use the logarithmic measure for information; Hartley said that the information transmitted is proportional to the logarithm of the number of different signals we use - to the alphabet size).

The Shannon approach was totally different in the sense that one should not consider the signals, but the information. The information is represented (encoded) by signals, which are carriers of information. That means also that it is possible that transmitted signals do not carry any information at all (from the Information Theory point of view). Of course, these signals may be needed for the proper functioning of the communication system itself.

Shannon theory equation stated below is mainly to prove that any communication channel could be characterized by a capacity at which information could be reliably transmitted. It should notice that, for higher signal-to-noise ratios, the doubling of the signal (transmitted) power (3 dB) increases capacity for only one bit per second per-Hertz of the bandwidth.

C = B log2 (1 + )

C = channel capacity in bits/sec

B = bandwidth in Hertz

S/N = signal to noise ratio.

Most of Shannon's coding theorems applied to sources and channels without memory. In the last decade and a half, considerable progress has been made in establishing coding theorems for more general situations [J. Wolfowitz, Coding Theorems of Information Theory, 2nd ed. New York: Sorinaer. 1964.]; but recently, the work now incorporate the proving of the channel coding theorem for stationary continuous-time channels with asymptotically vanishing memory.

To evaluate the efficiency of error control coding, the quantity "coding gain" can be defined giving the difference in signal-to-noise ratios (in dB) achieving the same error probability without and with the error control coding Figure2.[ D. Drajic, D. Bajic, "Communication System Performance: Achieving the Ultimate Information-Theoretic Limits?", IEEE Comm. Mag., Vol. 40 (2002), No. 6, pp 124-129]). Instead of the signal-to-noise ratio (S/N) it is easier for comparison to introduce Eb/N0 ratio, where Eb is the "energy per bit." comparison of various systems, Eb is calculated (not measured) as the energy per information bit (and not per channel bit). The coding gain can be "negative" as well, especially for smaller values of Eb/N0. In the figure the "Shannon limit", obtained from the expression (calculated as 10log(1/log2e) = -1.59dB) .[ D. Drajic, D. Bajic, "Communication System Performance: Achieving the Ultimate Information-Theoretic Limits?", IEEE Comm. Mag., Vol. 40 (2002), No. 6, pp 124-129.]).

As mentioned earlier error control is achieved by adding extra bits to the properly designed redundancy into the message before transmission the redundancy can be added either by inserting the extra bits called parity-check symbols which is the conventional coding or by expanding the used channel signal-set and, finally, by coded modulation which is a combination of both.

The conventional coding lowers the bandwidth efficiency that is by either expanding the bandwidth used or reducing the data rate. It is suitable for the channel where the power is limited and not bandwidth (e.g. the space channel)[ Drajic D. B, "60 Years of Shannon Information Theory" 8th International conference on Telecommunications in modern satellite and cable broadcasting services, 2007 pp 109-116.]. The coded modulation neither expands the bandwidth used, nor does it reduces the data rate, but increases the efficiency at the expense of the increased power. Therefore, it is suitable for the bandwidth limited systems (e.g. the telephone channel). When bandwidth is limited, the spectral efficiency must be taken into account. It is the ratio of equivalent binary rate and the used bandwidth (the dimension is information bits per second per Hz).

Figure xxx coding gain (from D. Drajic, D. Bajic, "Communication System Performance)

Chapter 1


Channel Decoder

Source Decoder


Source Encoder

Channel Encoder


1 A typical communication system

Figure 1: Typical communication system (redrawn from Drajic D. B,).

Explanation of a typical communication system

A communication system is the process of getting information from the information source to the information sink (destination); but in so doing there is enormous data that is sent and because bandwidth is expensive there is therefore need to reduce the size as well as limit the errors getting to the destination.

The source encoder's task is to encode (programme) the information signals in an efficient way. An encoder is a mapping or algorithm that transforms each transmitted sequence of symbols from the message to a another sequence by adding redundancy while the decoder transforms the received sequence and remove redundancy [Henk van Tiborg]

The channel encoder's task is to encode the information by the channel signals (symbols) in such a way that no information is lost if the signals are distorted and even if some finite error probability exists. Shannon showed that the error probability can be made as small as we wish if the information flow is smaller then the channel capacity - C - depending on bandwidth and SNR.

The channel is the XXXXXXXXXXXXXXX

After that are the corresponding decoders that reinstate the signal back to it's initial state before it was encoded. And finally the information at the destination is hopefully error free.

Chapter 2

Channel coding

Channel coding is the process of matching the source encoder output to the transmission channel and it generally involves Error control coding; Transmission coding and Scrambling [Graham Wade coding techniques] and the channel coding theorem states that the information can be transmitted with the probability of error being as small as possible until the information flow is less than the channel capacity.

The statement was a surprise because it was thought that the noise limits the reliability of the transmission, but in actual sense noise limits the rate of the reliable transmission. It means that we can transmit reliable by using an "unreliable" channel. The theorem was proved on the basis of random coding argument. So, it did not show how to find a channel code. It only proved the existence of "good" codes and gave the limit that can be approached by long coding sequences (theoretically when length approaches to infinity).

Error-control coding covers a large class of codes and decoding techniques. Block codes and Convolutional codes are the commonly used types in digital communication system for reliable transmission of data from the source to destination.

Figure 2 Overview of Error control coding (redrawn from Graham Wade coding techniques pg 199)

The two main classes of error control are used to achieve an acceptable error rate at the receiver are Automatic repeat request (ARQ) and Forward error correction (FEC).

2.1 Automatic Repeat Request (ARQ)

The ARQ uses the error detection method together with a feedback channel that allows for re transmission in the case the message was received with error in which case the receiver will ask for the message again. This kind of method is basically where there is variable delay can be condoned and a constant through put is not essential. In the ARQ systems the receiver only detects error but can not correct it but request for the information to be re transmitted. The use of redundant code normally called parity check helps the receiver detects whether the message was received in error.

The hybrid ARQ is the same as the ARQ but with a little hardware modification to reduce the retransmission rate and thus increase the throughput.

2.2 Forward Error Correction FEC

Forward error correction controls the received error via forward transmission only. The FEC is applicable where the throughput needs to be constant and a bounded delay eg real time systems; digital storage systems, deep-space and satellite communication, terrestrial wireless system. Deep-space systems began using FEC in the early 1970's to reduce transmission power requirements [ Wicker, S and Bhargava, V.K. " Reed Solomon Codes and their Applications, IEEE press, New Jersey 1994]

in real time systems the error codes are normally selected for the worst case scenario ie worst transmission channel conditions to get the required BER and thus reduce when conditions are better [Graham Wade coding techniques]

Figure 3 Block diagram of a FEC communication system (redrawn from The communications handbook Jerry d Gibson)

The FEC encoder maps the source data onto the modulator and the FEC decoder attempts to correct errors that passes through the discrete data channel which comprises of the modulator, channel and the demodulator.

Reference to the above diagram figure 2 it can be seen that the different types of channel coding fall under Block codes and Convolutional codes (see later). The block codes have strong basis for linear algebra and Galois fields. Galois fields are a finite field operation where all operations performed in the field result in answers in the same field.

The choice between the use of either block code or Convolutional codes depends on factors like code rate, the type of decoding technique, word synchronization and the data format.


2.3 Linear block codes

Block code is a type of coding generated by adding extra bits (r bits) formed by a linear combination of the original data sequence (k bits) to the data block. The added bits are usually known as parity check bits and the complete bit sequence (n information bits + r check bits) is known as a codeword as shown in the figure below. Therefore n = k + r.

Figure 4 Block coding (From Mischa Schwartz Pg 162)

In block codes if the check bits r = 1 then it means that the check bit defined as the modulo 2 sum (XOR) of the k information bits. In the binary fields the operation modulo-2 equivalent to the exclusive-OR function of Boolean algebra.

The inverse of addition (subtraction) is equivalent to addition, division by zero is not allowed and division by 1 is equivalent to multiplication by 1. The modulo-2 addition is

0 + 0 = 0

0 + 1 = 1

1 + 1 = 0

0 x 0 = 0

O x 1 = 0

1 x 1 = 1 [sweeney]

In this case the mod 2 sum of the transmitted n bits should always be zero unless there is an error in detection. This is simple and can detect an odd number of errors. For example for an 8 bit information bit sequence 10110010 the parity bit is 0 and can thus be said to be even parity. Increasing the number of parity bits increases the accuracy of the error correction code and can lead to error detection. Many different codes exist to detect and correct different types of errors

2.4 Coding of Block codes

Consider an information bit sequence denoted by

The corresponding codeword will be denoted by

For a systematic code c1 = d1, c2 = d2 ………………. ck = dk and the remaining bits are provided by the weighted modulo 2 sum of the information bits.





Depending on the structure of the codes the values of the hij can be 0 or 1.

For a (7,3) code (n = 7 output bits for k = 3 information bits) there are 8 possible codewords shown in table 1 for a code defined as





000 0000


001 1101


010 0111


011 1010


100 1110


101 0011


110 1001


111 0100

Table 1 (7,3) code produced from equation (2)

The hamming distance between any two code vectors is the number of symbols in which they differ. Eg

1 1 0 1 0 0 1

1 1 1 0 1 0 0

The hamming distance in this case is 4 because the highlighted symbols differ.

Looking at the second column in table 1, each code differs in at least three positions to the next value. There dmin = 3

Number of detectable errors =

Number of correctable errors = t =

single-error-correcting, double-error-detecting (SECDED) codes. SECDED codes are commonly used in computer memory protection schemes.

In general a Hamming distance of d = 2x + 1, allows x errors to be corrected.

If equation (1) and (2) are turn into a matrix representation for ease of manipulation then


Where c is the codeword and d the information bits. G is a k x n sized matrix called the code generator matrix which has as its 1st k columns a k x k identity matrix Ik so that the information data bits are maintained in the codeword. The remaining columns represent the transposed array of weight coefficients and can be represented by an array P. i.e. G = [Ik, P].

For the example considered;




Note that the 1st three columns of G must be the identity matrix

2.5 Decoding of Block codes

Decoding a transmitted codeword can be done by applying the same parity check calculation to the information bits and comparing the parity bits (ie the last r columns) with the received codeword. If the two match then the information bits are considered to be correct. If they do not match then a look up table can be used to find the most likely values of the information bits, through comparison.

Mathematically, we know from equation (7.3) that . c can be rewritten in the form;


Where d is the received information bits and dP the received check bits.

So a bit stream is received which contains information bits d. The corresponding check bits dP should correspond to the parity check sequence cp and so a comparison is performed using modulo-2 addition. If there are no errors;


An alternative method of decoding is to form a parity check matrix


Where T represents the matrix transpose.


For the example we have considered;


The matrix H can be used for error detection and correction. Let r be a vector representing the received data stream. Multiplying this by the transpose of the parity check matrix we get a vector s called the syndrome.


If the syndrome is non-zero then this indicates that one or more errors have occurred.

From our example, if the information vector is 101, the seven bit codeword is 1010011 (table 1). Say an error occurs in the second position, changing the 0 to a 1 then the received codeword r is 1110011.

From equation (11)

Which is the same as the second column in the matrix H, indicating that an error has occurred in the second bit.

Limitation to block codes

Hamming codes normally limit our coding capabilities first they are only single-error-correcting and Secondly they have only a limited range of values of length n and dimension k, and the available values may not go well with the system. These problems are thus overcome by looking to other types of codes. [sweeney]

2.6 Cyclic codes

Cyclic codes are a subset of linear block codes, and they have all the properties of block codes but the difference is such that if any codeword is shifted cyclically.

Most common structure to be encountered belongs to the subclass known as cyclic codes.

The popularity of cyclic code is partly because their structural properties provide protection against bursty errors in addition to simplifying the logic required for encoding and decoding, though the simplified decoding may be achieved at the expense of delay [sweeyney].

The codewords lateral shifts of one another, with mod-2 addition of the codewords resulting in another codeword. Codewords can be represented as polynomial expressions, which allows them to be easily generated by shift registers. For example, the codeword can be represented as a codevector

c = (c1,c2,…,cn). (c2,c3,…,cn, c1) and (c3,c4,…,cn, c1, c2 ) are also codewords.

For example the (7,3) code shown in table 1, each codeword can be obtained by lateral shifts of the other codewords (apart from the all zero one).

Cyclic codes can be described as a polynomial, each n-bit codeword c = (c1,c2,…,cn) is described as an (n-1) order polynomial in x, where each power of x represents a 1-bit shift in time. The full polynomial is given by


Other codewords can be formed by shifting bits and allowing the highest order term to 'wrap around'. Shifting once we get;


to provide another codeword.

The G matrix described in equation (5) above can also be represented in polynomial form. Starting with the 1st column, this is easily done by representing value of 1 with its corresponding x value. So for the example we have previously considered, the G matrix is given by


And in polynomial form is given by;


The way the G matrix is constructed means the last row is always of the order n-k with the last term of the polynomial being a 1 i.e. xn-k + … + 1. This is known as the code generator polynomial as the full G matrix can be obtained from g(x). If we use r = n-k, the form of g(x) is;


Remembering that the first k columns have to represent the identity matrix, the rules for obtaining the full G matrix from the code generator polynomial are;

Put the code generator polynomial into the bottom row of G.

To calculate the next row up shift all values once to the left

if it doesn't violate the identity matrix condition for the 1st k columns, it's fine

if it violates the identity matrix condition for the 1st k columns then mod2 add it to one of the other rows below.

Continue to do this for all rows.

So for the example we have considered in equation (15) we put the code generator polynomial in the bottom row. To calculate the second row we shift by one to give us x5 + x4 + x3 + x + 0. However the term in x4 violates the identity matrix condition, therefore we must add it to the row below to give x5 + x3 + x2 + x + 1 which is the second row. Shifting this row one to the left gives us the top row as the identity matrix condition isn't violated.

So for example, the length 7 cyclic codes. (xn - 1) can be factorized into the following irreducible polynomials

x7 −1 = (x − 1)(x3 + x + 1)(x3 + x2 + 1) .

As these are binary codes, all the minus signs can be replaced by plus signs:

x7 + 1 = (x + 1)(x3 + x + 1)(x3 + x2 + 1) .

As there are 3 irreducible factors, there are 23 = 8 cyclic codes. The 8 generator polynomials are:

(i) 1=1

(ii) x + 1 = x + 1

(iii) x3 + x + 1 = x3 + x + 1

(iv) x3 + x2 + 1 = x3 + x2 + 1

(v) (x + 1)(x3 + x + 1) = x4 + x3 + x2 + 1

(vi) (x + 1)(x3 + x2 + 1) = x4 + x2 + x + 1

(vii) (x3 + x + 1)(x3 + x2 + 1) = x6 + x5 + x4 + x3 + x2 + x + 1

(viii) (x + 1)(x3 + x + 1)(x3 + x2 + 1) = x7 + 1

The polynomials of (iii) and (iv) have degree 3 and so generate [7,4] codes. [7,3] codes are generated by (v) and (vi).

As noted previously, given G, one can calculate the codewords. An alternative way of calculating codewords is to note that each codeword is a product of the polynomials representing the information bits a(x) and the code generating polynomial g(x) i.e. c(x) = d(x)g(x). Table 2 below shows how the codewords can be generated using mod2 arithmetic for the case when g(x) = x4 + x3 + x2 + 1. These are the same codewords as those shown in Table 7.2.








x4 + x3 + x2 + 1



x5 + x4 + x3 + x


x + 1

x5 + x2 + x + 1



x6 + x5 + x4 + x2


x2+ 1

x6 + x5 + x3 + 1


x2+ x

x6 + x3 + x2 + x


x2 + x + 1

x6 + x4 + x + 1


Table 2 codewords Generation directly from the generator polynomial g(x)= x4 + x3 + x2 + 1

In wireless systems, block codes are used only for error detection not correction as the number of parity bits becomes very large for error correction and other methods are more appropriate. Error detection for cyclic codes is again simply just comparing the expected parity bits with the received data. Usually 8, 12 or 16 parity bits are appended to the data bits to provide error detection.


Many of the most important block codes for random-error correction fall into the family of BCH codes, named after their discoverers Bose, Chaudhuri and Hocquenghem. BCH codes include hamming codes as a special case.

BCH codes are a class of cyclic codes discovered in 1959 by Hocquenghem [A. Hocquenghem, Codes correcteurs d'erreurs, ChifFres, Vol. 2, pp. 147-156, 1959] and

independently in 1960 by Bose and Ray-Chaudhuri [R.C. Bose and D.K. Ray-Chaudhuri, On a class of error-correcting binary group codes,Information and Control, Vol. 3, pp. 68-79, 1960]. They include both binary and multilevel codes and the codes discovered in 1960 by Reed and Solomon [I.S. Reed and G. Solomon, Polynomial codes over certain finite fields, J. Soc. Indust. Applied Math. Vol. 8, pp. 300-304, 1960] were soon

recognized to be a special case of multilevel BCH codes [D.C. Gorenstein and N. Zierler, A class of error-correcting codes in pm symbols, J. Soc. Indust. Applied Math. Vol. 9, pp. 207-214, 1961].


2.8 Reed Solomon codes

Reed Solomon codes are a special example of multilevel BCH codes. Because the symbols are nonbinary, an understanding of finite field arithmetic is essential even for encoding. Moreover the decoding methods will be similar to those encountered for binary BCH codes; there are two distinctly different approaches to the encoding of Reed Solomon codes. One works in the time domain through calculation of parity check symbols. The other works in the frequency domain through an inverse Fourier transform. We shall meet the time domain technique first as it is more likely to be encountered in practice. The standard decoding method is essentially a frequency domain technique, but modification is needed for RS codes which can be achieved in two different ways [Sweeney]

Reed Solomon codes are block- based codes with high code rate and efficiency for burst error correction. It can detect and correct multiple random symbol errors. The reed Solomon codes are used in deep-space applications as well as in Compact Disks (CD) and DVD's.

2.9 Convolutional Coding

Convolutional coding and block coding are the two major forms of channel coding. Convolutional codes operate on serial data, one or a few bits at a time. Block codes operate on relatively large (typically, up to a couple of hundred bytes) message blocks. There are a variety of useful convolutional and block codes, and a variety of algorithms for decoding the received coded information sequences to recover the original data [G. D. Forney, Jr., "Convolutional Codes II: Maximum-Likelihood Decoding," Information Control, vol. 25, June, 1974, pp. 222-226.].

Convolutional codes have the ability to correct errors and were first introduced in satellite and space communications [Schwartz].

Convolutional codes are usually described using two parameters: the code rate (R) and the constraint length (L). The code rate, is expressed as a ratio of the number of bits into the Convolutional encoder (k) to the number of channel symbols output by the Convolutional encoder (n) in a given encoder cycle R= k/n.

The constraint length parameter, L, denotes the "length" of the Convolutional encoder, i.e. how many k-bit stages are available to feed the combinatorial logic that produces the output symbols. Closely related to L is the parameter m, which indicates how many encoder cycles an input it is retained and used for encoding after it first appears at the input to the Convolutional encoder. The m parameter can be thought of as the memory length of the encoder.

Convolutionally Encoding the Data

Convolutionally encoding the data is accomplished using shift register s and associated combinatorial logic that performs modulo-two addition. (A shift register is merely a chain of flip-flops wherein the output of the nth flip-flop is tied to the input of the (n+1)th flip-flop. Every time the active edge of the

clock occurs, the input to the flip-flop is clocked through to the output, and thus the data are shifted over one stage.) The combinatorial logic is often in the form of cascaded exclusive-or gates.Convolutional codes involve the operation of modulo-2 arithmetic on a sliding stream of bits. The Convolutional coder consists of a K-stage shift register, with input bits being shifted along the register one bit at a time (K is called the constraint length of the coder). Modulo-2 sums of the contents are read out at a rate ® times as fast by shifting out ® bits for every one in. Performance of the encoder can be improved by increasing both K and®, although at the cost of complexity and bit rate of operation.

Convolutional encoding using Viterbi decoding is a FEC technique that is particularly suited to a channel in which the transmitted signal is corrupted mainly by additive white gaussian noise (AWGN). AWGN is a channel model whereby the only mutilation to the communication data is a linear addition of white noise and that implies a random signal with same power within a fixed bandwidth at a centre frequency (Flat Power spectral Density in units Watts/Hz of a bandwidth). The AWGN is suitable for mathematical models because it gives an insight of the behaviour of the system and further is a good model for broadband wireless systems, satellite and deep space communications.

An encoder can be classified as a constraint length K, rate 1/® encoder. An example of a K = 3, rate 1/® encoder is shown in Figure 5 (these are two examples of the same encoder, the data is just read out at different points of the register.

Code rate----------

Constraint length………………

Figure 5 Example of a K = 3, rate ½ encoder (from Schwartz P180)

In both cases data bits are fed into the input and for each input two data bits are read out at the output stage (the output data rate is twice that of the input). To form the outputs at 1 and 2 the values in the registers are added modulo-2 as shown at g1 and g2. The output is read first at line 1 followed by line 2. Convolutional encoders are initialized by setting all the bits to zero.

As an example, consider the encoder shown in figure 5 for the input bit sequence shown below;

Input bits: 0 1 1 0 1 0 0 1

Output bits: 0 0 1 1 0 1 0 1 0 0 1 0 1 1 1 1

For an 8 bit input sequence, 16 bits are output. The encoder is initially all zeros. For another 0 input the bits within the register are 0,0,0.

g1 = 0 ƒ… 0 ƒ… 0 = 0 (modulo-2 addition of each value)

g2 = 0 ƒ… 0 = 0 (modulo-2 addition of the 1st and third values)

For the next input of 1, the contents of the register becomes (1,0,0), which gives

g1 = 1 ƒ… 0 ƒ… 0 = 1

g2 = 1 ƒ… 0 = 1

and so on for the rest of the sequence.

Each function generator g can be represented by a K-bit vector, 1 when a connection is made and 0 when no connection is made. For the example considered g1 = [111] and g2 = [101]. For larger encoders such as the K=9 rate ½ encoder shown in figure 7.3 the function generators are represented in octal form. In binary form g1=[111101011] and g2=[101110001], in octal form g1=[753] and g2=561. This encoder is used in the third generation cdma2000 system.

Figure 6 K=9, rate ½ convolutional encoder (from Schwartz p181, used in cdma2000)

A convolutional encoder can be represented in 3 different ways (i) state diagram (ii) trellis (iii) an ever expanding tree.

Convolutional coding is also used in space communications, for example, the Mars rovers use a K = 15, rate 1/6 encoder.

State diagram

For the encoder shown in figure 5 the output bits depend on the incoming bit and the two bits in the right hand stages. The arrangement of the bits in the two right hand stages is known as the state and an incoming bit can cause a change in the state. The way the encoder can change state can be represented as a state diagram (shown in figure 7).

Figure 7 State diagram representation of a K = 3, rate ½ encoder (from Schwartz P182)

a, b, c, d represents the states, the values of the states are shown in the square box. The incoming bit is shown in brackets along the arrows and the output bits are the pair of bits along the arrow.

The two rightmost bits can be in one of four states a = 00, b = 10, c = 01, and d = 11. Consider an encoder in state c = 01, for a zero bit arriving the three bits are 001, giving g1 = 1 and g2 = 1 and an output of 11, at the next shift of bits the encoder changes to state a = 00. If a 1 bit had arrived the three bits in the encoder would be 101 giving g1 = 0 and g2 = 0, at the next shift of bits the encoder moves to state b. It is important to note that an encoder cannot move from one state to any other state, there are defined paths that an encoder can follow around the state diagram (and hence a defined set of output bits for a given sequence of input bits). This property of convolutional encoders provides the error detection and correction capability. With different definitions of g1 and g2 the values of the output bits would change, but the paths of the state diagram remain the same.

Trellis Representation

Figure 8 shows the trellis representation of the same encoder. In this case, the states are shown along the left hand column and time is represented along the horizontal axis.

Figure 8 Trellis representation of a k = 3 rate ½ encoder (from Schwartz P 183)

From each state the encoder can move into one of two states depending on the input bit. For each pair of arrows emerging from a state, the upper arrow represents a 0 input bit and the lower arrow a 1 input bit. The two values along the arrows represent the output bits.

Tree Representation

An example of the tree representation is shown in figure 9 Similar to the trellis representation, when moving from one state to another, the upper lines represent a 0 input bit and the lower lines a 1. The two bits along the lines represent the output bits.

Figure 9 Decision tree representation of a Convolutional encoder (from Schwartz P 184)

The decision tree demonstrates that the number of possible sequences increases as 2L with time, along with the Hamming distance for each sequence. This is one of the powerful features of convolutional encoders, as a given path increases in length it can be more readily distinguished from other paths and errors can be more easily detected and corrected.

3.0 Viterbi Decoding

Viterbi decoding is one of two types of decoding algorithms used with Convolutional encoding the other type is sequential decoding.

Sequential decoding has the advantage that it can perform very well with long-constraint length convolutional codes, but it has a variable decoding time [Lin, Ming-Bo, "New Path History Management Circuits for Viterbi Decoders," IEEE Transactions on Communications, vol. 48, October, 2000, pp. 1605-1608.]. The discussion of sequential decoding algorithms will not be highlighted here since it is beyond the scope of this project.

One possible strategy for decoding is simply to compare the received bits with the set of expected sequences from the encoder. The one that matches best is the one most likely to have been transmitted. The selection of the most likely sequence to have been transmitted allows error correction to be performed. Say we select a sequence of L bits, the number of possible combinations of these bits is 2L. Increasing the bit length increases the number of combinations and hence the accuracy. This is known as maximum-likelihood decoding.

The main drawback with this approach is that as L increases the number of comparisons that have to be made increases. For example, for L = 100 the number of comparisons that have to be made is 2100. One way of overcoming this problem is to use a Viterbi algorithm.

Rather than simply comparing every sequence the Viterbi algorithm uses our knowledge about the form of the encoder to reduce the number of comparisons that are made.

For example, consider the state diagram in figure 7 If an encoder is in state b, it could only have reached here from either state a or state c. The Hamming distance between the received sequence of bits and the two sequences of bits corresponding to the paths into state b is compared. The smallest Hamming distance path is retained as the 'survivor' and the other path is discarded. This means that the number of paths remains constant for each time interval. For example, consider the tree representation in Figure 9, there are four paths into state a after 4 time intervals, applying the Viterbi algorithm at each time step would mean that only one of these paths would 'survive'.

So how does this work in practice? Let's consider an example using the K = 3 rate ½ encoder considered previously. The decoder has been running for a little while, selecting the most likely paths and the most likely received bits using information from the last 5 time intervals. From previous iterations we have arrived at states a, b, c and d for the paths shown in Table 7.3 (only the previous 4 intervals are shown). Note that we do not know for certain we are in one of these four states, we are just trying to calculate the most likely paths, given the received data.

Possible Paths

Interval ®





Received bits ®


























Table 3 Initial values for the Viterbi decoding example.

Two things occur in interval 5; (i) we receive new data bits (in this case the bits 10) (ii) the output bits corresponding to the two possible transitions from the previous state (shown in Table 7.3) to the current state are calculated.

Table 4 adds the new information.

For an encoder in state a, it can either stay in state a (with the output of 00) or it can move to state b (with the output of 11).

For an encoder in state b, it can move to state c (with the output of 10) or it can move to state d (with the output of 01).


The new possible output values are written in the column corresponding to interval 5.

Possible Paths

Interval ®






Received bits ®






Previous State

Current state

Hamming distance

































































Table 4 (i) new data bits are received in interval 5 (ii) output transitions are included for the previous

Note the number of paths in Table 2 have increased by a factor of 2. To avoid this growing as 2n, we now need to decide which of the new paths survive into the next time interval. We also need to decide which of the paths corresponds to the most likely data actually transmitted. Both can be done using the Hamming distance as shown in Table 7.4.

To select the paths that will survive into the next time interval we need to select the smallest Hamming distances for paths that reach each of the states a, b, c, and d. These are highlighted by* in Table 4 and are maintained into the next time interval, dropping the oldest time interval column (1 in this case).

The path that provides the smallest overall Hamming distance provides the most likely data transmitted. The oldest bits (from interval 1) are then decoded as the output bits from the decoder. In this case the most likely bits to have been transmitted are 00 (highlighted by ** in table 4). Even though the bits received were 01, the Viterbi decoder has corrected the error.

The major advantage of the Viterbi decoder is that only 2K-1 comparisons need to be made at each time interval. Therefore for the K = 3, L = 5 encoder shown in the previous example, the number of comparisons that need to be made are 2K-1L = 40 whereas if a comparison of all paths is performed then 2L = 64 comparisons need to be made.

For this example this is not a significant improvement but consider the case of the K = 9 encoder used in cdma2000 and shown in figure 3. A value of L = 4 or 5 times K has been shown to provide a performance comparable to that of very large L so let's set L = 4K = 36. In this case the Viterbi algorithm requires 2K-1L = 256 x 36 = 9216 instead of 236 comparisons (a factor of 7 million less).

Turbo codes

We have so far considered block codes and convolutional codes but there are advantages in combining them.

Let's first consider serial concatenation of encoders shown in figure 10. A block encoder carriers out the initial coding, the encoded blocks are interleaved, or spread out over time over a number of blocks before being fed to the convolutional encoder. This reduces the occurrence of bursts of errors. The block encoder is known as the outer coder and the convolutional encoder as the inner encoder.

Figure 10 Serial concatenation of encoders (Schwartz P190)

Turbo codes offer high performance decoding by parallel concatenation of convolutional decoders. Some of the important work in this area has been by Berrou and Glavieux 'Near optimum error correcting coding and decoding: turbo codes,' IEEE Trans on Communications 44:1261-71 (1996). The excitement in the coding community comes from the near optimum performance of the codes. But what does 'optimum' mean in this context?

It relates to work by Shannon defining the capacity C of a channel. For the case of Gaussian white noise, the Capacity (in bits/sec), for input signals of power S, noise with average power N, in a W Hz bandwidth is given by;


At some rate R bps below the capacity, error free transmission can be obtained. Alternatively in order to achieve low bit error probability, the bits can be transmitted with signal levels above the noise.

For example to achieve a bit error probability of 10-5 using Phase Shift Keying (PSK) the signal to noise ratio needs to be 9.6dB.

For the K=7, rate ½ encoded described earlier in this lecture, the SNR needs to be 4.2dB.

For turbo codes, the SNR needs to be 0.7dB.

This is extremely close to the optimum, allowing low power, high bandwidth communication, hence the excitement and uptake in 3G systems (see Schwartz P191-2 for more information).

A turbo encoder involves the parallel concatenation of recursive systematic convolutional (RSC) encoders (figure 11). Systematic in this context means that the input is fed directly to the output as shown by line c2. Recursive means the output is fed back to the input.

Figure 11 Example of a recursive systematic convolutional encoder (from Schwartz P193)

Figure 12 Rate-1/3 turbo encoder

Parallel concatenation of the RSC encoder is shown in figure 7.9 with an example of a rate 1/3 turbo encoder. The first output is obtained directly from the input, the second from a RSC and the final output is obtained through an interleaver and RSC2. It should be noted that the design of the interleaver is important to the performance in order to decorrelate the input bits fed to both encoders.

A maximum likelihood approach is again used in decoding.

Glossary of terms.

Bit Error Rate (BER) is a measure of average likelihood that a bit will be in error. For example a channel with a BER of 10-9 means that there is an average of 1 bit to be received in error in every 1,000 000 000 bits.

A typical communication system.

The structure

Included Components (components' function)

Channel coding (why we need channel coding?)

* you can talk regarding to the wireless communication system

* you can talk regarding to the different implementation of wireless communication system

2. Channel coding

Different types of channel

Different types of coding scheme (Comparing on SNR, BER, Coding Gain etc)

Convolutional Code (Why choose convolutional code?)

3. Viterbi Decoding

Comparing on different algorithm and Make a conclusion on)