Designing Of Bit Serial Galois Field Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

In this work, we present Galois field multiplier design for different bit-length. Finite field arithmetic is a prominent solution for error correction coding, computer algebra and cryptographic applications because they allow to store and handle the field element in a manageable way. The Galois field GF (2m) is the binary field and is more efficient for hardware because the field operations are implemented with simple VLSI gates. The basic arithmetic operation (i.e., addition, multiplication and inversion) in prime and binary extension fields has wide application in cryptography.

A basic multiplier have been proposed for 4-bit, 8-bit, 16 bit and 32 bit which will provide multiplication in finite field. The coding has been done in Verilog HDL[7] and compilation and simulation is done using MODELSIM -Altera 6.5b. And the result has been verified through the site mentioned in the reference[6].

The cryptography uses finite field multiplication to extract logarithms, which are typically very difficult to do. It has great use in digital signal processing because finite field operations can do filtering without a additional error introduction due to rounding and other means.

Keywords: Finite field, Galois Field multiplier, unified multiplier, cryptography, irreducible polynomial, VLSI.

1. Introduction

In this era, cryptography is used for secure transmission of data. Several methods are present which can be used in cryptography. But most fascinating method is finite field theory [1]. As finite field is having the capability to store and handle the data efficiently. Property of binary finite field GF(2m) like “carry-freeâ€Â arithmetic operation make it interesting for hardware implementation. Depending on the type of application, the degree of m can be changed over a wide range. Typical application is elliptic curve cryptography[2] where m value is about 200.

Depending upon different application several architecture and algorithms have been proposed. For reference we have gone through a result of normal basis multiplier [3].But sticking to the well accepted cryptographic standard the polynomial based architecture is the best choice. A polynomial basis GF(2m) multiplier architecture can be realized using bit-serial, bit-parallel or digit-serial fashion. But for application requiring less area like smart cards, the choice of bit-serial multiplier is best.

In recent years, several methods have been proposed for multiplication in finite field theory. This paper presents a bit-serial multiplier architecture [4] for the binary finite field GF(2m) used in cryptography. The given architecture performs addition of field elements regardless of irreducible polynomial (IP) like trinomials or pentanomials. The field elements multiplications is done using the principle of shift and add algorithm and requires m cycles to produce the result.

Further, this paper has been structured as follows: basic preliminary and definitions are defined section 2. The proposed architecture is presented in section 3. Result and output in section 4. Finally section 5. concludes the paper with conclusion and comments.

2. Preliminaries and Definition

A finite field (or Galois field) consist of finite set of elements in which two operation addition and multiplication can be performed between any two field elements, which follows the commutative and associative law for both of the two operation. The smallest GF is 2m with two elements 0 and 1. GF(2m) consist of 2m elements where m is a non-zero positive integer. Any element in GF(2m) can be represented with polynomial of degree m-1.for example if α(t) is an element in GF(2m) can be represented as follows:

2.1 Addition

Addition of two elements in GF(2m) is done by adding the coefficient of the two elements using modulo 2,the operation is nothing but simply a bitwise XOR of the coefficient of the elements having equal power of t. for example if a(t) and b(t) be the two elements in GF(2m) ,then the addition result r(t) can be seen as:

Where ; ;

The addition in GF(2m) is rather simple than a normal addition because it does not require carry propagation from low significant bit to most significant bit. For hardware implementation, this function can be performed by m XOR gate.

2.2 Multiplication

Multiplication in GF(2m) involves multiplying the two polynomials using modulo m reduction technique.

In this each and every co-efficient multiplicand elements are sensed by the multiplier and partial result is added to the intermediate result. for example if a(t) and b(t) be the two elements in GF(2m) ,then the multiplication result r(t) can be seen as:


Product, =. = .;

GF(2m) involves multiplication which can be seen by a simple algorithm:

STEP1: Initialize r(t)=0.

STEP2: for (i=m-1; i=0;i--).

STEP2.a: r(t) = t.r(t)+aib(t).

STEP2.b: if degree r(t)=m; then r(t)=r(t)-p(t).

STEP3: return result r(t).

The above algorithm is called as Shift and add algorithm.

3. Multiplier Architecture

Depending on the scheduling the inputs, in general there are two version of the bit-serial architecture are available that is MSB-first version, LSB-second version. This architecture uses first version of bit-serial multiplier.

Figure1 shows the architecture of GF(2m) bit-serial adder for 4bit .This architecture is implemented using

C:\Users\megha jain\Desktop\mparh.png

Figure 1

PISO and SIPO and Multiplier Cell [figure2].

C:\Users\megha jain\Desktop\mpcell.png

Figure 2

The architecture shown in figure 1 is able to multiply when all of the operand has been loaded. The multiplier has the ability to multiply and add the current result (or intermediate result) to the multiplicand using the signal MPY/ADD that is if signal is 1 multiplication is performed otherwise addition. We have kept MPY/ADD signal 1 always and thus the previous bit of the result is feed to each multiplier cell by multiplexer. For the multiplication operation simple algorithm is followed, which can be described with the help of the flowchart shown in the figure 3.

C:\Users\megha jain\Desktop\flow.png

Figure 3

This architecture performs multiplication for the algorithm provided in the section2.2 by using a simple process of shift and add. The generation of partial product ai.b(t) is done by simple logical AND operation and we know t.r(t) is nothing but 1-bit left shift and calculation of the intermediate and final result is done by logical XOR operation . This whole operation for 1-bit of the whole polynomial is done using a cell which we defined as multiplier cell [figure2].so for the m-bit we required m such cells to produce the result.

The point which make this architecture different in the way that it subtract IP p(t) from r(t)(if rm-1=1) or

0 from r(t) (if rm-1=0) within the same clock cycle while in the algorithm described in section2.1 involves calculation of the whole result and p(t) is subtracted from r(t) if degree of r(t) equals to m. This architecture involves four register to hold the values of the polynomials a(t),b(t),p(t)and r(t). a(t) is stored in the multiplier register which should load the m-bit parallel and shift a(t) by 1-bit at each clock cycle starting from the MSB. The IP register to store the polynomial P. The multiplicand b(t) is stored in the multiplicand register which again load m-bit data parallely but provide parallel data out i.e r(t) which is computed in the result register and loaded to multiplicand register at every clock cycle which can be feed to the core of the crypto-coprocessor typically used in the smart card system.

The architecture proposed is very simple but highly power conservative in the sense that it uses the clock gating method which is providing the clock pulse to the shift register only when they required shifting action else making data stable. The typical application of elliptic cryptography required m=200 but the same multiplier can be used for m=120 so providing the clock for only 120 time for shifting the polynomial b(t) and holding the data for the further process in multiplicand register.

4. Results and output

The functionality of the GF(2m) multiplier algorithm can be understand by considering a simple example given below:

Let a=3(0011) and b=7(0111), so our multiplier perform a (a Ã- b) mod p multiplication. Initial result is assumed to be 0000 and IP is taken as 0011

Each process result has been shown in each step:

Initial result=0000+ 0(0011) =0000+0000=0000

Partial result=0000+1(0011) =0000+0011=0011

Partial result=0011+1(0011) =0110+0011=0101

Partial result=0101+1(0011) =1010+0011=1001=9

Final result=1010

Similarly, for a=15(111) and b=15(1111) output =a(1010) which is shown as the simulated output figure 4.


Figure 4

Now we increased to 8-bit, a=83,b=57 and output=c1.figure 5 shows the output


Figure 5

Now bit length is increased upto 16 bit, a=ffff, b=ffff and output=abfa. Figure 6 shows the output

C:\Users\megha jain\AppData\Local\Microsoft\Windows\Temporary Internet Files\Content.Word\162.png

Figure 6

Now bit length is increased up to 32 bit, a=ffffffff,, b=ffffffff and output=55554039 .Figure 7 shows the output

C:\Users\megha jain\AppData\Local\Microsoft\Windows\Temporary Internet Files\Content.Word\322.jpg

Figure 7

Power of the circuit is calculated using the Altera-QuartusII 9.1sp2 Power play analyzer tool at 200MHz and result has been tabulated in the table 1.

No. of bits

Dynamic thermal power

4 bits


8 bits


16 bits


32 bits


Table 1

5. Conclusion and Comments

In this paper, a design of bit serial binary finite field GF(2m) multiplier with irreducible polynomial is discussed. A 1-bit of the multiplier of the proposed architecture consists of 1-bit multiplier cell (consisting 1 Ex-OR gate, 1 Multiplexer and 2 AND gates) and 3 register cells. Although, several previous implementations of Galois field multipliers have been able to reduce the area and complexity, but this architecture allows any irreducible polynomial to be utilized. The advantage of the architecture lies in the clock gating method by which the low power implementation can be achieved.

Further the architecture is very simple to be molded into different loading style of the data.