# Cryptography Is Becoming More And More Important Computer Science Essay

**Published:** **Last Edited:**

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Cryptography involves encryption and decryption of messages. Encryption is the process of converting a plain text into cipher text by using an algorithm, while decryption is the process of getting back the

encryption message. A cryptographic algorithm is the mathematical function used for encryption and decryption.

Elliptic Curve Cryptography (ECC) is an emerging type of public cryptography and considered as the best. ECC is the most efficient public key cryptosystem that uses shorter keys while provided the same security levels as other cryptosystems. The use of shorter implies lower space requirements for key storage and faster arithmetic operations. These advantages are important when public key

cryptography is implemented in constrained devices, such as in mobile devices. ECC has a very unique mathematical structure that enables the process of taking any two points on a specific curve and adding them to get a result as another point on the same curve. This special feature is advantageous for cryptography due to the inherent difficulty of determining which original two points were used to get the new point.

The choice of various parameters in the equation will set the level difficulty exponentially as compared to the key length. ECC consists of a few basic operations and rules that define how addition, subtraction,

multiplication and doubling are performed. Implementing cryptography involves extensive mathematics

and effective engineering and also good algorithm to integrate both. Cryptography implementation of this kind imposes several challenges, which may require a trade-off in performance, security and flexibility. An approach studied in recent years combines the advantages of software (flexibility) and hardware (performance) in a new paradigm of computation referred to as reconfigurable computing.

The performance of an ECC co-processor can be related to the indices viz. throughput, power and the area involved. It is in this perspective that newer architectures are proposed to enhance the

performance of ECC co-processor suitable for Galois Field (GF) over prime field, GF(p) and binary field, GF(2m). Besides efforts are aimed to modify the existing architectures in a direction that will assist the

portable devices to effectively meet the requirements and standards. The proposed architectures use different multipliers, the Modified Programmable Cellular Automata, data selector like Linear Feedback

Shift Register (LFSR) and adders like Carry Look ahead Adder (CLA) for performance enhancement. The architectures are implemented in Spartan-3E family device XC3S1600E using ModelSim 5.7 and Xilinx 9.2i.

As the demand of wired and wireless communication keeps exploding, data security has become an urgent need for modern vital applications such as financial services, private and healthcare information, personal identification, confidential communication and storage etc. As forecasters predicted the count of wireless users are in unprecedented growth. As the wireless industry explodes, it faces a growing need for security. Both for secure (authenticated, private) web transactions and for secure (signed, encrypted) messaging, a full and efficient public key infrastructure is needed. Cryptography is the science of writing in secret code. A basic task in cryptography is to enable users to communicate securely over an insecure channel in a way that guarantees their transmissions privacy and authenticity. The process of essaying the crypt algorithm and obtaining the secured information is known as Cryptanalysis. The initial unencrypted data is referred to as plaintext. It is encrypted into cipher

text, which will in turn (usually) be decrypted into usable plaintext. Within the context of any application-to-application communication, there are some specific security requirements, including:

Authentication: The process of proving one's identity. Privacy/confidentiality: Ensuring that no one can read the message except the intended receiver. Integrity: Assuring the receiver that the received message has not been altered in any way from the original.

Non-repudiation: A mechanism to prove that the sender really sent this message.

There are various standard bodies guiding the implementation of security protocols for the industry. Some of the organizations involved in standards activities are the Internet Engineering Task Force

(IETF), American Bankers Association, International Telecommunications Union, Institute of Electrical and Electronics Engineers (IEEE), and National Institute of Standards and Technology

(NIST).

Cryptographic algorithms are mainly classified into two ways.

ô€¸ Symmetric Key Cryptography (SKC) that uses a single key for both encryption and decryption

ô€¸ Asymmetric Key Cryptography (AKC) that uses one key for encryption and another for decryption

Symmetric encryption is also called as secret or private key encryption that is used in sender and receiver stations to encrypt and decrypt the data. It uses a single key to encrypt and decrypt messages.

Symmetric-key systems are simpler, faster and require the concerned parties to exchange the key in a secure way. The Figure 1.1 shows the process of encryption and decryption using SKC. An AKC also called as Public Key Cryptography (PKC), which uses two keys, a public key known to everyone for encryption

and a private or secret key known only to the recipient of the message for decryption. The main advantage is that the underlying primitives used are based on well known problems such as Integer Factorization Problem (IFP) and Discrete Logarithm Problems (DLP). The asymmetric process offers a higher security at a lower speed and greater complexity.

Among various data security schemes, PKC is robust and effective for these cure data transaction and messaging. The robustness typically relies on the difficulty of integer factorization or on finding the

discrete logarithm in a finite field. The superiority of any cryptography lies in providing higher security levels with smaller keys, bandwidth savings, and faster implementations. Such performance improvements are particularly important in the wireless arena where computing power,

memory, and battery life of devices are more constrained. Three basic choices for public key systems are available for these applications (Lauter K 2004):

â€¢ Rivest, Shamir, and Adleman (RSA)

â€¢ Diffie-Hellman (DH) or Digital Signature Algorithm (DSA) modulo a prime p

â€¢ Elliptic Curve Diffie-Hellman (ECDH) or Elliptic Curve Digital Signature Algorithm (ECDSA)

RSA is a system that was published in 1978 by Rivest R. L., Shamir A., and Adleman L. Whitfield Diffie and Martin Hellman proposed the public key system in 1976 now called Diffie-Hellman Key Exchange. DH is key agreement and DSA is signature, and they are not directly interchangeable, although they can be combined to do authenticate key agreement. Both the key exchange and DSA are based on the difficulty of solving the DLP in the multiplicative group of integers modulo a prime p.

The Elliptic Curve Cryptography (ECC) was proposed by Koblitz (1987) and Miller (1985) and predominant usage in the portable device such as smart cards, mobile phone, Personal Digital

Assistant (PDA), Personal Identity Verification (PIV), High bandwidth Digital Content Protection (HDCP) etc. The success of cryptographic functions depends on the efficiency of fulfilling the realistic security

applications. Today's the main challenge in the cryptography is achieving same level of security per best currently known attacks, with much smaller parameters, leading to significant performance

advantages.

The ECC is a public-key based technology that involves finite field arithmetic operations. It is based on Elliptic Curves (ECs) defined over a finite binary field. The basic operations are addition, subtraction,

multiplication, squaring and inversion. The second hierarchical step extends to the point addition and the point doubling, followed by scalar or point multiplication. The major challenge in the cryptography procedure is to assist in the secured access of the original data from the encrypted data. A large computation time requires performing the operation of a large number of encrypted data and eschewing the crypt analysis to be expensive. The hardware implementation of ECC adds utility value in

this perspective and allows a platform to examine the performance of cryptography algorithms. The merits of the scheme are measured in terms of throughput, power and the area involved in the methodology. In a secured communication system, PKC plays a powerful role with little degradation of application's performance. The performance degradation occurs due to the complexity of mathematical

computations in public key cryptographic algorithms. The new cryptosystem is required to be strong enough to ensure the same or even greater level of security with small number of key sizes. The main

motive of cryptographic system is to improve the throughput with smaller key size and enhance the security of the data.

A relatively large amount of research has been performed on architecture ECC co-processors over Galois Field (GF). The ECC can be performed over GF of either prime field (p) or binary field (2m). Greater

emphasis has been placed on reducing the power dissipation of important circuit functions while maintaining these high speeds. Therefore, power dissipation as well as circuit speed should be

considered at the architectural level. For certain systems a general purpose microprocessor satisfies

the requirements, but the places where high performance is the main criteria, cryptographic co-processors in hardware are indispensable. When very high performance is required or when a high volume of coprocessors is needed, Field-Programmable Gate Array (FPGA)s are chosen as implementation platforms.

The primary objective echoes to explore the use of an efficient multiplier in scalar multiplication, with a view to enhance the speed of the process and reduce the power consumed and area involved.

In this research, ECC crypto processors proposed and designed based on the EC (Miller 1985) (Lawrence Washington 2003) and efficient architecture is designed by using the efficient multiplier, data

selector, adder and Programmable Cellular Automata (PCA). The objectives of this research are as follows.

â€¢ An ECC co-processor using the Modified Programmable Cellular Automata (MPCA) is proposed, which can perform the scalar multiplication over the GF(2m). This parallel and polynomial basic co-processor design based on Low- Power High-Speed Error Tolerant Shift and Add Multiplier attempts to provide a higher throughput and efficiency and use a lower device utility area.

â€¢ Scalar multiplication is one of the commonly used operations in ECC over GF(2m), although it is more

complex and time consuming process. A mixture of multipliers namely array multiplier, modified Booth

multiplier and Hybrid Encode Low Power (HELP) multiplier are used in Montgomery scalar multiplication

algorithm (Lopez and Dahab 1999) to reduce the circuit complexity, power consumed and computational time.

â€¢ The third aim is to implement the efficient architecture for ECC over the prime field GF(p). It is proposed to incorporate a data selector and an adder in the architecture of ECC. The high speed and low power ECC process is designed using the efficient data selector and adder in the Montgomery inversion algorithm (Kaliski Jr 1995).

â€¢ It is proposed to develop 160-bit hardware architecture for ECC over the prime field GF(p) and avail the role of Spartan-3E family device XC3S500E to investigate its performance. The processor includes a Linear Feedback Shift Register (LFSR) based effective data selector, Montgomery inversion algorithm to perform point addition, point multiplication, shifting and Carry Look ahead Adder (CLA) to reduce the power consumption.

The thesis is organized as follows.

A survey of existing ECC processors is reviewed in the second chapter.

The background of ECC involves DLP, basic arithmetic function and co-ordinate conversion over GF(2m) and GF(p), which are described in the third chapter. The features of FPGA and ModelSim5.7

software are also briefed in the same chapter.

The ECC co-processor over the GF(2m) through the MPCA is designed in the fourth chapter. The results obtained using Low-Power High-Speed Error Tolerant Shift and Add Multiplier is discussed in the

same chapter.

The details of designing ECC architecture using array multiplier, modified Booth multiplier and HELP multiplier in Montgomery scalar multiplication algorithm are outlined in the fifth chapter.

The architecture for ECC over GF(p) with an efficient data selector and adder using in Montgomery inversion algorithm is described in the sixth chapter. The theory of CLA and LFSR is reviewed

in the same chapter.

The hardware implementation of ECC over GF(p) based on the pin package available on Spartan-3E family device XC3S500E is explained in the seventh chapter.

The thesis is concluded in the eighth chapter along with the directions of future research in the area.

The PKC is based on the idea of using one key to encrypt the message and another to decrypt it. With ECC emerging as a serious alternative (Alfred Menezes et al 2001), the desired level of security can

be attained with significantly smaller keys. This makes ECC very attractive for small-foot print devices with limited computational capacities, memory and low-bandwidth network connections. Another

major advantage of ECC is that the domain parameters can be judiciously chosen to improve implementation performance. However ECC is still considered to be impracticable for very low-end devices. The contributions in the area of ECC are remarkable over last decades. There are efforts to have unified architectures for fields GF(p) and GF(2m). The performance differences of the architecture for

prime and binary fields are recorded.

The ECC can be implemented in both binary field over GF(2m) and prime field over GF(p). The ECC over binary field has a several advantages. Since ECC implementation over GF(2m) only involve the

logic and shift operations, it can be easily designed on FPGA and is faster compared with ECC over GF(p). The ECC design over GF(p) can involve a lot of integer multiplication operations. In modern computer

field binary number system is primarily preferred. Moreover GF(2m) can support all possible values of 'm' and major operations such as squaring and addition, which make the hardware architecture so simple. The ECC architecture over GF(2m) use simple XOR to perform addition operation

and squaring, which are based on linear operation. Even though the ECC implementation over GF(2m) has several advantages over the GF(p), it is not flexible because of the field parameters used are often fixed. On the other hand, the ECC implementation over GF(p) is more flexible because the algorithm is

fixed even when operation's length changes. In this research ECC architectures have been designed for both GF(2m) and GF(p).

Developing a high-speed pipelined application-specific processor for ECC using FPGA technology is being the objective of many researchers. Numerous ECC hardware cryptographic processors have been presented in the literature, which includes acceleration techniques to improve the performance also. For most of these implementations, efforts are concentrated on algorithm optimization or improved arithmetic architectures and rarely on a processor architecture particularly suited for ECC point multiplication.

Low-power, high-speed, increased throughput, reduced area and improved security level are the prime objectives of any design. The optimization goal in many applications is usually to reduce the latency

of a point multiplication in terms of the number of required cycles. The existing ECC is implemented in both software and hardware. The main cause for implementing the ECC in hardware is that the software implementation has several downsides. Some downsides of ECC software implementations are mathematical operations in finite fields that are huge and the instruction sets that are insufficient. To

achieve the real time data processing in portable devices, hardware implementation of ECC must be practically feasible. Xu Huang et al (2010) have propounded an algorithm based on 1's complement subtraction to represent scalar in scalar multiplication on ECC. It offers less Hamming weight and significantly improves the computational efficiency of scalar multiplication. Moreover it presented

a fuzzy controller for dynamic window sizing. This allows the program to run under optimum conditions by allocating available RAM and ROM at the sensor node within a wireless sensor network.

Dahab et al (2006) have proposed new software algorithms with Gaussian Normal Basis (GNB) for ECC to perform a proficient multiplication over the binary finite fields F2 m. It is considered that the

vector-level algorithms for executing binary field multiplication in software also utilize a mapping to ring where fast polynomial-based techniques can be employed. Marcio Juliato et al (2005) have explored the probable use of custom instructions in a reconfigurable hardware platform to hasten arithmetic operations in the binary finite fields F2 using a GNB representation and also using the speedy field multiplier in a hardware/software approach to accelerate point multiplication on ECC. Koschuch et al (2006) have proposed hardware/software codesign of ECC over GF(2191) on an 8051 Microcontroller. Its modest hardware accelerator confirmed the significance of removing systemlevel performance bottlenecks caused by the transfer of operands between hardware accelerator and external RAM.

Huang et al (2011) have presented a fuzzy controller for the dynamic window sizing to allow the calculation process to run under optimum conditions. This is achieved through balanced case allocation

of the available RAM and ROM at the sensor node within a wireless sensor network. The whole Quality of Service (QoS) is improved and the power consumption profile is worth noting. The simulation results

showed that the average calculation time decreased by approximately 15 percentage in comparison to traditional algorithms in an ECC wireless sensor network.

Min Tian et al (2010) have proposed an efficient EC scalar multiplication algorithm suitable for wireless network. The approach speeds up the pre-computation stage of window-w Non-Adjacent Form

(NAF) of the EC scalar multiplication in lower costs by conjugate additions and strategic points.

Yangtao et al (2010) have designed and implemented a certificate authority based on ECC by using Java programming technique, which can sign X.509v3 digital certificate to client and then validate client certificate. The key pair can be obtained from the PKC standard#12, which is used in encryption, decryption and digital signature.

Mohamed Hassan et al (2009) have described a solution to apply a low cost-low area scalable ECC over GF(2m) using a hardwaresoftware co-design approach. This work is implemented with different

word size m, such as m = 113,131,163 and193 based on the curves recommended by the ECC standards. It is also parameterized for 8, 16, and 32 bit data widths. Zhimin Chen et al (2010) have proposed a scalable parallel software implementation of Montgomery multiplication for multicore systems using parallelization algorithm. It is analysed and confirmed on dual-core, quad-core and eight-core rototypes.

Mohamed Hassan et al (2009) have presented an entirely software implementation of scalable ECC over GF(2m) on a tiny microcontroller (PicoBlaze). This design is worked as either standalone

or as part of a software-hardware co-design for ECC. Lopez et al (2000) have proposed a standard polynomial basis efficient algorithm for multiplication in the binary finite fields F2 m, for

software implementations of ECC and this new algorithm provided better results than "shift and add" method.

Hai Yan et al (2006) have suggested the study software implementations of ECC co-processors with different word sizes. It is evidenced that a 163-bit crypto system can perform in 13.9s on an 8-bit

processor at a clock rate of 8MHz.

Petre Anghelescu et al (2009) have illustrated an encryption system implemented on a structure of Hybrid Additive Programmable Cellular Automata (HAPCA), which can back up both software and

hardware implementations.

Petre Anghelescu (2011) has developed a high-performance encryption system that works according with the PCA theory. This parallel processing Cellular Automata (CA) has been implemented in

software using C# programming language.

Sheng Uei Guan et al (2004) have reported a new class of CA, Self Programming Cellular Automata (SPCA), with distinct application to pseudorandom number generation. The behavioral complexity has

been increased and utilized, by changing a cell's state transition rules. Good performance has been obtained using simple neighborhoods with certain CA length, transition rules etc.

Petre Anghelescu (2010) has developed a high-performance cryptosystem based on PCA. The CA is programmable according to the rules stored in the file memory. To ensure the security of the algorithm

the scalable structure of PCA has been used and this model has provided the security in Dynamic Link Library (DLL), in order to assure the data encryption of medical data sent over the internet.

Qiuxia Zhang et al (2011) have obtained a new digital signature scheme through improving the original digital signature scheme, and enhanced the security of the digital signature.

William Chelton et al (2008) have proposed a high-speed pipelined Application-Specific Instruction set Processor (ASIP) for ECC over GF(2163) using FPGA technology. To acquire the performance

acceleration, three methodologies have been used, namely, pipelining, reduced the instruction set and combined algorithm for point doubling and point addition by using application specific instructions.

Rodriguez et al (2005) have established EC scalar multiplication architecture over GF(2163) using FPGA arithmetic logic unit. This architecture has carried out EC point addition and point doubling efficiently using a parallel version of the half-and-add method in mixed-coordinate representation.

Ansari et al (2008) have archived high-performance architecture of EC scalar multiplication based on the Montgomery ladder method over finite field GF(2m). This proposed architecture executed the scalar multiplication for a word size of (4/8/16/32 - bit) using pseudo pipelined word-serial finite field multiplier.

Choi et al (2009) have proposed ECC processor over GF(2163) with altered Lopez-Dahab EC point multiplication algorithm and uses GNB for GF(2163) field arithmetic. Two new word level arithmetic units have been designed to achieve a high throughput rate using Xilinx XC4VLX80.

Mohammed Benaissa et al (2006) have proposed the adaptable ECC processor using novel word-level algorithm, which enables enhanced flexibility and performance of the ECC processors. Nazar Saqib et al (2004) have presented a generic parallel and reconfigurable architecture for fast EC scalar multiplication over GF(2191) using Montgomery multiplier.

Sakiyama et al (2007) have proposed a reconfigurable curvebased crypto processor that accelerates the scalar multiplication by exploiting Instruction-Level Parallelism (ILP) of ECC over GF(2163) and

Hyper Elliptic Curve Cryptography (HECC) over GF(283). This architecture has been implemented using the 0.13Î¼m Complementary Metal-Oxide-Semiconductor (CMOS) technology.

Kais et al (2012) have suggested a fast method to find the inversion in GF(2m) using FPGA by reducing the number of multiplication operations in the Fermat's Theorem and transferring the squaring into a fast method to find the exponentiation to 2k. The proposed algorithm, the multiplicative inverse in GF(2m), is achieved by number of multiplications depending on log2(m). The number of multiplications is in the range between log2(m) and 2log2(m)-2. If 'm' equals 163 then the number of multiplication operations is 9 and number of exponentiation operation each one with one clock cycle equals 10.

Jarvinen et al (2008) have proposed the parallelization of ECC hardware accelerators using ECs over binary field GF(2m). To realize the efficient ECC processor the latency of point multiplication is

reduced with parallel field using in arithmetic processors and this design has been implemented on an Altera Stratix II FPGA.

Lai et al (2010) have proposed the word-serial finite field arithmetic unit with optimized operation scheduling and bit-parallel modular reduction to perform the Montgomery scalar multiplication

algorithm for high-performance ECC architecture over binary field. Shu et al (2005) have presented the polynomial basis ECC processor over GF(2163) and GF(2233). LFSRs and the Most Significant Digit serial (MSD) multipliers have been used to speed up the scalar multiplications.

Gura et al (2002) have proposed the programmable hardware accelerator to speed up point multiplication for ECs over binary polynomial fields GF(2163). The accelerator is based on a scalable

architecture capable of handling curves of arbitrary field degrees up to m = 255.

Choudhury et al (1978) have presented a new algorithms based on CA operations for performing fast multiplication and inversion over GF(2m). The new design has highly parallel, modular and well-suited for Very Large-Scale Integration (VLSI) implementation.

Guitouni Zied et al (2008) have described the implementation of fast parallel ECC scalar multiplication architecture based on polynomial using PCA. EC point multiplication design termed in projective coordinate is developed and optimized using new designs of EC arithmetic's point addition and doubling operations.

Nandi et al (1994) have defined certain primary transformations based on CA. These fundamental functions are implemented with a class of PCA built around rules 51, 153, and 195 and the high quality pseudorandom pattern generators has been built round rules 90 and 150 with a rule selector to generate key in stream ciphers.

Petre Anghelescu et al (2008) have presented a hardware implementation in a FPGA circuit of an efficient encryption algorithm based upon a one-dimensional HAPCA .The encryption algorithm based

on the theory of CA has been ralized using simple digital circuits and memory, and it has been implemented on a XILINX XC3S400.

Philip Leong et al (2002) have presented a micro-coded ECC processor using FPGA technology. The control part of the processor is micro-coded, enabling curve operations to be incorporated into the

processor and hence reducing the chip's I/O requirements.

Kadir et al (2011) have conducted experiments on sidechannel attacks in the ECC hardware implementations using binary algorithms by observing power consumption of ECC processor on

FPGA. Experimentation of the side-channel attack is conducted to guess the secret key for data encryption and decryption by looking at the physical differences on hardware side effects. In this study, side-channel attack experimentation is successful in getting the key.

Sakamoto et al (2011) have presented a fault-based security evaluation for ECC implementation using the Montgomery Powering Ladder (MPL) and evaluated the security of the Lopez-Dahab algorithm

using fault sensitivity analysis.

Varchola et al (2011) have designed compact FPGA based architectures for standardized ECC over prime fields. The minimal FPGA resource consumption architecture has used block memories (to minimize the storage area instead of registers), a 16-bit data path and a single 16-bit hardware multiplier. The second processor design employs a 32-bit data path and several hardware multipliers for improved throughput. Both implementations are not fixed to a single curve and support point multiplications for both NIST curves P-256 and P-224.

Al-Somani et al (2011) have designed first ECC processor using Actel IGLOO AGLN250V2-VQFP100 on a Nano-FPGA. The Nano- FPGAs offer groundbreaking possibilities in power, size, leadtimes,

operating temperature and cost. The synthesis results show that the targeted Nano-FPGA can not exceed the values of m ô€‚” 11 bits. This is because of the limited number of resources available on Nano-

FPGAs, which opens a new challenging opportunity for future Nano- FPGAs to satisfy the needs of critical portable applications.

Sakthivel et al (2012) have proposed a technique to reduce the time complexity of point doubling during the scalar multiplication processing. The proposed technique schedules coding for fast processing

and reduces number of clock cycles during its atomic operations. It also reduces number of hazards and stalls during its processing.

Janagan et al (2012) have given the scope of computing the Montgomery ladder algorithm in ECC. The compactness is achieved by reducing data paths by using multipliers and carry-chain logic. Multiplier performs effectively in terms of area/time if the word size of multiplier is large. A solution for Simple Power Analysis (SPA) attack is also provided.

Xia et al (2012) have analyzed the Secure Electronic Transaction (SET) protocol's operation mode in electronic commerce. The typical design of SET protocol's ECC application scheme includes key generation, digital signature and envelop algorithms. It also presented a secure implementation model of SET transaction, which ensures the validity, confidentiality, integrity and non-repudiation of

transaction.

Wenger et al (2011) have presented a low-resource processor that supports ECC operations for less than 9 kGEs. This optimized 16- bit microcontroller provides high flexibility and scalability for various

applications. The design allows the use of an optimized RAM-macro block and reduces the complexity by sharing various resources of the controller and the datapath. The total size of the processor is 8,958 GEs

for a 0.13 ô€ˆm CMOS technology and needs 285 kilo-cycles for a point multiplication. It shows that the roposed solution is well suitable for low-power designs by providing a power consumption of only 3.2 ô€ˆW at 100 kHz.

Sakiyama et al (2007) have presented a high-speed public-key crypto-processor that exploits three-level parallelism in ECC over GF(2n). The proposed crypto-processor employs a Parallelized Modular

Arithmetic Logic Unit (P-MALU) that utilizes two types of different parallelism for accelerating modular operations. The sequence of scalar multiplications is also accelerated by exploiting ILP and processing

multiple P-MALU instructions in parallel. The synthesis results show that scalar multiplication of ECC over GF(2163) on a generic curve can be computed in 20 and 16 ô€ˆs respectively for the binary NAF and the Montgomery method. The performance can be accelerated furthermore on a Koblitz curve and reach scalar multiplication of 12 ô€ˆs with the TNAF (ô€„²-adic NAF) method. This fast performance allows to perform over 80,000 scalar multiplications per second and also to enhance the security in wireless mobile applications.

Tujillo-Olaya et al (2010) have presented efficient hardware architectures for ECC using polynomial and GNB. The scalar point multiplication is implemented using random curves over GF(2233) and

the Lopez-Dahab algoithm. In this case, the GF(2m) multiplication is implemented in hardware using three algorithms for polynomial basis and three for GNB. The crypto-processors based on polynomial basis with D=32 and GNB with D=30 use 76 ô€ˆs and 60 ô€ˆs for scalar multiplication and 26697 and 18567 LUTs, respectively. The compilation and synthesis results show that the GNB crypto-processor

presents a better performance than polynomial basis crypto-processor. However, the last one is less complex and more scalable from the design point of view.

Sameh M. Shohdy et al (2009) have proposed a modification in karatsuba-ofman algorithm, which is one of the best algorithms used to perform multiplication operation over GF. The modification

contrasted on truncating karatsuba-ofman algorithm in a low level and using the classic polynomial multiplication algorithm. In addition, the proposed architecture for implementing ECC on hardware using Montgomery algorithm is in projective coordinates. The results show that the proposed architecture is able to compute GF(2191) EC scalar multiplication operations in 72.939 ô€ˆs on Xilinx Virtex-II XC2V6000 FPGA device and 100.68 ô€ˆs on Xilinx VirtexE 2600. Also, the proposed

architecture can be changed to be suitable for any arbitrary GF size with little modifications.

Yong-ping et al (2009) have proposed a novel highperformance hardware architecture of processor for EC scalar multiplication based on the Lopez-Dahab algorithm over GF(2163) in polynomial basis representation. In the proposed architecture, multiplication, addition, and squaring can be performed in parallel by the decomposition of computation. The point addition and point doubling iteration operations can be performed in six multiplications by optimization and solution of data dependency. The implementation results based on Xilinx VirtexII XC2V6000 FPGA show that the proposed design can do random EC scalar multiplication GF(2163) in 34.11ô€ˆs, occupying 2821 registers and 13,376 LUTs.

Ciran et al (2006) have reported a hardware architecture to perform unified modular inversion and multiplication for ECC over GF(p) to improve the performance using Fermat's Little Theorem and

full-word multiplier with miniature critical path delay.

artin Feldhofer et al (2002) have proposed an efficient implementation in terms of self-timed and low-power approach in crypto arithmetic unit of ECC, which computes the modular operations addition, multiplication, and inversion in prime fields.

Byrne et al (2007) have proposed ECC architecture for prone the side channel attacks by making the use of special addition chains; it is possible to implement a SPA resistant cryptosystem. Mark Hamilton et al (2011) have proposed EC processor with comparison of different modular multipliers, when working with a

Mersenne prime modulus and it is used for fast modular reduction techniques.

Santosh Ghosh et al (2011) have proposed a programmable GF(p) arithmetic unit and a suitable counter measure against differential power analysis attack and doubling attack for ECC. The proposed scalar

multiplication hardware is implemented on the Xilinx Virtex-2 Pro FPGA platform. The proposed parallel architecture is inherently programmable, memory less, and resistant against timing and power

attacks. Vliegen et al (2010) have proposed an FPGA based application specific ECC processor over a 256-bit prime field. The FPGA's dedicated multipliers and carry-chain logic are used to obtain a small

datapath, without introducing the parallelism. The architecture can prevent SPA attacks and the instructions are stored in the FPGA's Block RAM.

Orlando et al (2001) have presented the ECC processor architecture for the computation of point multiplication for curves defined over the field of GF(p) using high-radix Montgomery multiplier.

Zhang Jiahong et al (2009) have reported a fast mechanism of accelerating the EC point operation formulae by adopting modified Jacobian coordinates.

Khalil-Hani et al (2009) presented a tightly-coupled hardware architectural enhancement to the Altera FPGA-based Nios II embedded processor for ECC. The hardware acceleration of the arithmetic

operation is provided by custom logic tightly coupled to the processor core and directly controlled by the instruction stream. Experimental results show that for the point multiplication operation with custom instructions and tightly-coupled hardware is faster than the coprocessor based hardware.

Lee et al (2006) have proposed an architecture to focus on the inversion modules on ECC coprocessor over GF (pm) using the FPGA. The design involved the three variants of Extended Euclidian Algorithm

and inversion using the iterative Frobenius map. Inversion using the iterative Frobenius map shows the best performance among various similar designs in term of speed and area. Lai et al (2011) have proposed a unified architecture capable of working both in parallel and serial for both prime field and binary field. To achieve the higher throughput and energy adaptive security the advanced field inversion method and scheduler-controlled data path have been integrated into the processor. A 160-bit dual processor has been designed using 130nm CMOS technology, the fabricated chip measures 4.97mm2 with the core area of 1.35mm2.

Chen et al (2010) have presented a microcode-based architecture with a novel reconfigurable datapath which can perform either prime field GF(p) operations or binary extension field GF(2m) operations for arbitrary prime numbers, irreducible polynomials, and precision. An algorithmic optimization or finement can be made at a higher level based on the reconfigurable datapath also proved that the

developed processor has full cryptography algorithm flexibility, high hardware utilization, and high performance.

Lai et al (2008) have presented a two-phase scheduling, highthroughput, parallel, and scalable ECC processor over the both GF(p) and GF(2m). This dual-field ECC architecture supports arbitrary

ECs and arbitrary finite fields with different field sizes. A 160-bit ECC processor has been implemented using 0.13Î¼m CMOS technology.

Lai et al (2009) have presented the high-throughput, parallel, scalability, cost effectiveness, and power consumption dual-field ECC processor chip. It enables all the ECC functions with the programmable

field and curve parameters over both the prime and binary fields. A 160- it processor has been designed using 0.13ô€ˆm CMOS technology with the core size of 1.44mm2.

Satoh et al (2003) have presented high scalability and flexibility between speed, hardware area, and operand size with an ECC processor architecture over the GF(p) and GF(2m) using a dual field

multiplier for arbitrary prime numbers and irreducible polynomials. A Montgomery multiplier with an optimized data bus and an on-the-fly redundant binary converter boost the throughput of the EC scalar

multiplication. The processor has been designed using a 0.13Î¼m CMOS standard cell library.

Lee et al (2012) have presented a new differential poweranalysis counter measure performing all field operations in a randomized Montgomery domain to eliminate the correlation between target and reference power traces. The 521-bit dual filed ECC processor is implemented in 90-nm CMOS and can perform one EC scalar multiplication in 8.08ms over the prime field GF(521) and 4.65ms over GF(2409), respectively, with 4.3 percentage area and 5.2 percentage power overhead.

Sakiyama et al (2006) have proposed the parallel processing crypto-processor for ECC over GF(p) and GF(2m) to speed up EC point multiplication. To increase the speed of modular processes the ILP and

multiple sets of modular arithmetic logic units have been used. Kazuyuki Tanimura et al (2008) have proposed a scalable unified dual-radix architecture for Montgomery multiplication in GF(p) and GF(2m). It also unifies 4 parallel radix-216 multipliers in GF(p) and a radix-264 multiplier in GF(2m) into a single unit.

Savas et al (2004) have coined two new hardware architectures for performing multiplication in GF(p) and GF(2m). The first architecture utilizes a pre-computation technique that reduces the critical

path delay at the expense of using extra logic which has a limited negative impact on the silicon area for operand precisions of cryptographic interest. The second architecture computes multiplication faster in GF(2m) than GF(p), which confirms with the premise of GF(2m) for hardware realizations.

Vijeyakumar et al (2011) have proposed Low- Power High- Speed Error Tolerant Shift and Add Multiplier, which enables the removal of input multiplexer, switching of adder cells and bypassing adder for zero bit values of the multiplier constant.

Ning Zhu et al (2010 and 2011) have proposed Low-Power High-Speed Truncation-Error-Tolerant Adder, which is able to ease the strict restriction on accuracy, and at the same time achieve tremendous

improvements in both the power consumption and speed performance.

Justin Hensley et al (2004) have presented an area and energy efficient asynchronous Booth multiplier for mobile devices. It also has novel counter flow organization to data bits flow in one direction, and

the Booth commands piggyback on the acknowledgments flowing in the opposite direction. The arithmetic and shifter units have been merged together to obtain significant improvement in area, energy as well as speed.

Sandeep Kumar et al (2006) have developed a optimum digit serial GF(2m) multipliers for curve-based cryptography with the Double Accumulator Multiplier (DAM) and N-Accumulator Multiplier (NAM).

Saravanan et al (2009 and 2010) have suggested a high efficiency HELP multiplier for image processing applications. In hybrid multiplier, the operation is performed that depends on the number of 1's

and its position in the multiplier data.

Mark Hamilton et al (2011) have presented the comparison of different modular multipliers such as serial multiplier, Booth multiplier etc suitable for use in an EC processor with Mersenne prime modulus.

The design has been made use of the DSP48E blocks on Virtex 5 FPGAs.

Li et al (2002) have presented a low-complexity PCA based versatile modular multiplier in GF(2m). The proposed versatile multiplier is flexible and easily extended to high order of 'm' for more

security, and low-cost serial implementation is feasible in the restricted computing environments, such as smart cards and wireless devices.

Yinan Kong (2010) has constructed a 12-bit modular multiplier for using in the channel of a Residue Number System (RNS). The modular multiplier is implemented on FPGA and optimized by

evaluating different versions of the improved Barrett algorithm. Rahaman et al (2007) have presented a C-testable technique for detecting transition faults with 100 percentage fault coverage in the

polynomial basis bit parallel multiplier circuits over GF(2m). These multipliers have found critical applications in PKC and needed secure internal testing. A Built-in Self-Test (BIST) circuit is proposed for

generating test patterns internally and also has three extra pins for the control inputs and provides public key security.

Hyejung Kim et al (2008) have proposed a low energy modulo-multiplier for ECC processor. The multiplier uses only two 40- bit multipliers to execute 160-bit operation based on the Montgomery

modulo-multiplication algorithm. One modulo-multiplication is executed with 20 clock cycles at 40MHz operating frequency and it has been implemented by using 0.18ô€ˆm CMOS process.

Morales Sandoval et al (2011) have presented a novel multipliers for Montgomery multiplication defined on binary fields GF(2m). Different to state of the art Montgomery multipliers, the proposed architecture uses a LFSR as the main building block, also shows architectural variations by selecting between bit-serial and digitserial Montgomery multipliers. The results show that the use of LFSRs simplifies the design of the multipliers architecture, reducing area resources and retaining high performance comparatively. The security of ECC lies on the hardness of the mathematical computation which is based on the DLP (Itoh and Tsujii 1988),

(Hankerson et al 2004). The discrete logarithm functions to the Abelian group formed by the points of an EC over a finite field. The IFP, the DLP, and the ECDLP are the three major computational problems

visage on the applications of PKC. Figure 3.1 provides a sample EC that is used to implement the cryptographic schemes. The elements of the group are the rational points on the EC, together with a special point O (called the "point at infinity").

Since ECC adapted to portable devices it has to occupy less area, low power and performed in high speed. One of the most time consuming operations in ECC is scalar multiplication (Schneier 1996),

(Diffie and Hellman 1976), an operation of the form k.P, where, 'k' is a positive integer and 'P' is a point on the EC. Scalar multiplication k.P can be calculated by adding the point P to itself k-1 times and in

addition the resulting point named as 'Q' should be on the EC. Another tedious operation on ECC is inverse operation i.e., to recover 'k' when the points 'P' and Q = k.P are given, is known as the ECDLP.

Definition 3.2.1: A group (William Stallings 2005) is a set, A, together with an operation 'â€¢' that combines any two elements 'a' and 'b' to form another element denoted a ô€¸ b . The symbol 'â€¢' is a general

placeholder for a concretely defined operation. To qualify as an Abelian group, the set and operation, (A,ô€¸) must satisfy five requirements known as the Abelian group axioms:

Closure

ô€€… 'a', 'b' in A, the result of the operation aô€¸bis also in A.

Associativity

ô€€… 'a',' b' and 'c' in A, the equation (aô€¸b )ô€¸c ô€€ aô€¸(bô€¸c )

holds. Identity element an element 'e' in A, such that ô€€… elements 'a' in A, the equation eô€¸aô€€ aô€¸eô€€ a holds.

Inverse element For each 'a' in A, an element 'b' in A such that aô€¸bô€€ bô€¸aô€€ e where, 'e' is the identity element.

Commutativity

ô€€… 'a', 'b' in 'A', aô€¸bô€€ bô€¸a.

More compactly, an Abelian group is a commutative group. A group in which the group operation is not commutative that is called a "non-abelian group" or "non-commutative group". There are some differences in exactly what axioms are used to define a ring. Here one set of axioms is given, and comments on variations follow.

Definition 3.2.2: A ring R (William Stallings 2005), sometimes denoted by{R,ô€€Ž,ô€µ}, is a set of elements with two binary operations called addition and multiplication. To qualify as a ring, the set and two

operations(R,ô€€Ž,ô€µ), must satisfy the following requirements known as the ring axioms.

(R,ô€€Ž)is required to be an Abelian group under addition:

ô€¸ Closure under ô€€… 'a', 'b' in R, the result of the operation aô€€Žb is addition also in R.

ô€¸ Associative addition of ô€€… 'a', 'b', 'c' in R, the equation (aô€€Žb)ô€€Žcô€€ aô€€Ž(bô€€Žc) holds

ô€¸ Existence of additive identity. an element '0' in R, such that ô€€… elements 'a' in R, the equation ô€’ ô€€Žaô€€ aô€€Ž ô€’ ô€€ a holds.

ô€¸ Existence of additive inverse. For each 'a' in R, an element 'b' in R such thataô€€Žbô€€ bô€€Žaô€€ 0.

ô€¸ Commutativity of addition. ô€€… 'a','b' in R the equation aô€€Žbô€€ bô€€Ža holds.

ô€¸ (R, Ã-) is required to be a monoid under multiplication:

ô€¸ Closure under multiplication. ô€€… 'a', 'b' in R, the result of the operation aô€µbis also in R.

ô€¸ Associativity of multiplication. ô€€… 'a', 'b' and 'c' in R, the equation (a ô€µb)ô€µc ô€€ a ô€µ(bô€µc) holds.

ô€¸ Existence of multiplicative identity. an element '1' in R, such that ô€€… elements 'a' in R, the equation

1ô€µaô€€ aô€µ1ô€€ a holds.

The distributive laws:

ô€¸ ô€€…'a', 'b' and 'c' in R, the equation

a ô€µ ô€€‹bô€€Žc ô€€Œô€€ ô€€‹a ô€µ b ô€€Œô€€Ž ô€€‹a ô€µcô€€Œ holds.

ô€¸ ô€€…'a', 'b' and 'c' in R, the equation

ô€€‹a ô€€Ž b ô€€Œ ô€µ c ô€€ ô€€‹a ô€µ c ô€€Œô€€Ž ô€€‹b ô€µ c ô€€Œ holds.

This definition assumes that a binary operation on R is function defined on RÃ-R with values in R. Therefore, for any 'a' and 'b' in R, the addition a + b and the product a Ã- b are

elements of R.

Definition 3.2.3: A field (William Stallings 2005) (F,ô€€Ž,ô€µ) is a set of numbers F together with two operations and that satisfies the following properties.

ô€¸ (F,ô€€Ž)is an Abelian group with identity 0.

ô€¸ (Ã-) is associative.

ô€¸ an identity 1ô€‚F with 1 ô€º 0 such that 1ô€µaô€€ aô€µ1ô€€ a

ô€€… aô€‚F.

ô€¸ the operation Ã- is distributive over +, i.e.,

aô€µ(bô€€Žc)ô€€ (aô€µb)ô€€Ž(aô€µc)and

(bô€€Žc)ô€µa ô€€ (bô€µa)ô€€Ž(cô€µa) ô€€… a,b,c ô€‚F.

ô€¸ aô€µbô€€ bô€µaô€€…a,bô€‚F.

ô€¸ For everyaô€º0,aô€‚F, an element aô€€1 ô€‚Fsuch that

aô€€1ô€µaô€€ aô€µaô€€1ô€€ 1.

Point addition is the addition of two points 'J' and 'K' on an EC to obtain another point 'L' on the same EC. It is considered as two points 'J' and 'K' on an EC as shown in Figure 3.2. If Kô€‚-J then a line drawn through the points 'J' and 'K' will intersect the EC at exactly one more point '-L'. The reflection of the point '-L' with respect to X-axis gives the point 'L', which is the result of addition of points 'J' and 'K'.

Thus on an EC L=J + K. If K=-J the line through this point intersect at a point at infinity 'O'. Hence J+(-J)=O. This is shown in Figure 3.3. 'O' is the additive identity of the EC group. A negative of a point is the

reflection of that point with respect to X-axis. Point doubling is the addition of a point 'J' on the EC to itself to obtain another point 'L' on the same EC. To double a point 'J' to get 'L', i.e. to find L=2J, consider a point 'J' on an EC as shown in Figure 3.4. If 'y' coordinate of the point 'J' is not zero then the tangent line at 'J' will intersect the EC at exactly one more point '-L'. The reflection of the point '-L' with respect to X-axis gives the point 'L', which is the result of doubling the point 'J'. Thus L=2J. If 'y' coordinate of the point 'J' is zero then the tangent at this point intersects at a point at infinity 'O'. Hence 2J=0 when yJ=0. This is shown in Figure 3.5. The polynomial basis IEEE 1363 standard specifications for PKC (2000) over the binary field GF(2m), the EC is defined as y2 ô€€Ž xy ô€€ x3 ô€€Ž ax2 ô€€Ž b (3.1) where, 'a','b'ô€‚GF(2m) and b ô€‚ 0. If 'P' is a point on EC and 'k' is a large integer, computation of the Q=kP, that is add 'P' by 'k-1' times,

where 'Q' is the point on the EC over GF(2m). The operation kP can be performed by iterative point double and point addition. The set of points on the EC along with a special point 'O', is called the point at infinity, which forms a group under addition. The identity element of the group is the point at infinity 'O'. The arithmetic operations permitted on the group are point addition and point doubling

which are described as follows.

Let 'P' and 'Q' be two points on the curve with coordinates (x1, y1) and (x2, y2). Also, let P ô€‚ Â±Q, then adding the two points results in a third point R=(P + Q). The addition is performed by drawing a line

through 'P' and 'Q' as shown in Figure 3.6. The point at which the line intersects the curve is ô€ƒ (P +Q). The inverse of this is R=(P + Q). Let the coordinates of R be (x3, y3), then the equations for x3 and y3 are

X ô€€ ô€€Ž ô€€Ž X ô€€Ž X ô€€Ž a 1 2

2

3 ô€ ô€ (3.2)

3 1 3 3 1 Y ô€€ ô€ (X ô€€Ž X ) ô€€Ž X ô€€Ž Y (3.3)

where , ( ) /( ) 1 2 1 2 ô€ ô€€ Y ô€€Ž Y X ô€€Ž X (3.4)

If P=ô€ƒQ, then P+(ô€ƒP) is O.

Let 'P' be a point on the curve with coordinates (x1, y1) and ô€€³ô€‚ô€ƒP. The double of 'P' is the point 2â€¢P=(x3, y3) obtained by drawing a tangent to the curve through 'P'. The inverse of the point at which the

tangent intersects the curve is the double of 'P' in the Figure 3.7. For point doubling the affine coordinates are:

if Pô€‚Q

X ô€€ ô€€Ž ô€€Ž X ô€€Ž X ô€€Ž a 1 2

2

3 ô€ ô€ (3.5)

41

1 3 3 1

2

3 Y ô€€ ô€ (X ô€€Ž X ) ô€€Ž X ô€€Ž Y (3.6)

( ) /( ) 2 1 2 1 ô€ ô€€ Y ô€€Ž Y X ô€€Ž X (3.7)

if P=Q

X ô€€ ô€2 ô€€Ž ô€ ô€€Ž a

3 (3.8)

3

2

3 1 Y ô€€ X ô€€Ž (ô€ ô€€Ž1)X (3.9)

1 1 1 ô€ ô€€ X ô€€Ž Y / X (3.10)

In order to avoid the time consuming inversion operation, point operations with affine coordinate (x, y) to be mapped into the projective coordinate (X,Y,Z), which can replace field inversion with a

sequence of multiplications. The general form of projective coordinate is

specified (Sandeep Kumar 2006).

In the standard projective coordinates, a point is represented by the tuple ô€€‹X ,Y ,Z ô€€Œ,Z ô€º0, which corresponds to the affine point ô€€‹X ô€€ Z,Y ô€€ Z ô€€Œ. The point at infinity 'O' is (0, 1, 0) and the negative of

ô€€‹X,Y,Zô€€Œ is ô€€‹X ,ô€€Y ,Z ô€€Œ.

In the Jacobian projective coordinates, a point is similarly represented by the tuple ô€€‹X ,Y ,Z ô€€Œ,Z ô€º0, which corresponds to the affine point ô€€‹X ô€€ Z 2 ,Y ô€€ Z 3 ô€€Œ. The point at infinity 'O' is (1, 1, 0) and the negative

of ô€€‹X ,Y ,Z ô€€Œ is ô€€‹X ,ô€€Y ,Z ô€€Œ.

The mapping to the projective coordinate is done as X ô€€ x,Y ô€€ y, Z ô€€ 1 (3.11)

After introducing the new coordinate Z, the equation (3.1) has become projective coordinate form as given below.

Y 2 ô€€Ž XYZ ô€€ X 3Z ô€€Ž aX 2Z 2 ô€€Ž bZ 4 (3.12)

Now the implementation of ECC processor is based on the above equation (3.12). After completion of the successive addition and multiplication operations, it is reverted back to affine coordinates as

follow.

x ô€€ X / Z, y ô€€ Y / Z 2 (3.13)

One module of the this thesis proposed ECC architecture over GF(2m) based on the Montgomery scalar multiplication with projective coordinate presented in IEEE 1363 standard specifications (2000).

Input = ô€€‹ 1 2 1 0 ô€€Œ K , K ,...K , K nô€€ nô€€

P(x, y)ô€‚GF(2m )

output: Qô€€‹x y ô€€Œô€€ kP 3 3 ,

set 2

2

4

1 1 2 X ô€ x, Z ô€1, X ô€ x ô€€Ž b, Z ô€ x

for i from n-2 down to 0 do

if (ki = 1) then

ô€€‹ ô€€Œ 1 1 X , Z ô€€ƒô€„¸ ô€€‹ ô€€Œ 1 1 2 2 M X , Z , X , Z add

ô€€‹ ô€€Œ 2 2 X , Z ô€€ƒô€„¸ ô€€‹ ô€€Œ 2 2 M X , Z double

else

ô€€‹ ô€€Œ 2 2 X , Z ô€€ƒô€„¸ ô€€‹ ô€€Œ 2 2 1 1 M X , Z , X , Z add

ô€€‹ ô€€Œ 1 1 X , Z ô€€ƒô€„¸ ô€€‹ ô€€Œ 1 1 M X , Z double

end if

end for

ô€€‹ ô€€Œ 1 1 2 2 Q Mxy X , Z , X , Z ô€

return Q

44

ô€€‹ 1 1 2 2 ô€€Œ M X , Z , X , Z add

## {

ô€€‹ ô€€Œ2

1 2 2 1 Z ô€ X ô€µ Z ô€€Ž X ô€µ Z

ô€€‹ ô€€Œ ô€€‹ ô€€Œ 1 2 2 1 X ô€ x ô€µ Z ô€€Ž X ô€µ Z ô€µ X ô€µ Z

return ô€€‹X , Z ô€€Œ

## }

ô€€‹ ô€€Œ 1 1 M X , Z double

## {

4

1

4

1 X ô€ X ô€€Ž bô€µ Z

2

1

2

1 Z ô€ X ô€µZ

return ô€€‹X , Z ô€€Œ

## }

In algorithm 4, shifting is used to perform the division operation.

ô€€‹ ô€€Œ 1 1 2 2 Mxy X ,Z , X ,Z

## {

1

1 Z

X ô€ X

ô€€‹ ô€€Œ Z x y

Z x X

X x y x X x Y ô€€Ž ô€‚» ô€‚¼

## ô€‚º

## ô€‚« ô€‚¬

## ô€‚ª ô€‚¸ ô€‚¹ ô€‚·

## ô€‚¨ ô€‚© ô€‚§

## ô€€Ž ô€µ ô€‚¸ ô€‚¹ ô€‚·

## ô€‚¨ ô€‚© ô€‚§

## ô€ ô€€Ž ô€µ ô€€Ž ô€€Ž ô€€Ž

2

2

1

2 1

return ô€€‹X , Z ô€€Œ

## }

The point addition is one of the most important arithmetic operations in Montgomery multiplication algorithm of key generation.

The Figure 3.8 shows the execution methodology of point addition. Inputs for point addition module are (X1, Z1), (X2, Z2) and x, output is (X3, Z3). The requirement of any multiplier in point addition module is to perform the multiplication operation. In Montgomery multiplication algorithm, another frequently

used arithmetic operation is point doubling. The Figure 3.9 depicts the algorithmic steps of point doubling. Inputs for point doubling module are (X1, Z1) and 'b', and the output is (X2, Z2). ECC architecture focuses on the Weierstrass equations of ECs over GF(p) characterized in the IEEE 1363 standard specifications for PKC(2000).

y2ô€€ x3ô€€Žô€„xô€€Žô€… (3.14)

Where, x, y ô€‚ GF(p) and 4ô€„3ô€€Ž27ô€…2ô€º0 in the GF(p). Each value of the 'ô€„®' and 'ô€ˆ•' gives a different EC. All

points (x, y) which satisfies the above equation plus a point at infinity lies on the EC. The public key is a point in the curve while the private key is a random number. The public key is generated by multiplying the private key with the generator point 'G' in the curve. The generator point 'G', the curve parameters 'ô€„®' and 'ô€ˆ•', together with few more constants constitutes the domain parameters of ECC. The fundamental operations on the ECC are point adding, doubling and scalar multiplication.

Consider two distinct points 'J' and 'K' such that ( , ) J J J ô€€ X Y

and ( , ) K K K ô€€ X Y

Let L ô€€ J ô€€Ž K where, ( , ) L L L ô€€ X Y , then

X S X X P L J K mod 2 ô€€ ô€€ ô€€ (3.15)

48

Y Y S X X P L J J L ô€€ ô€€ ô€€Ž ( ô€€ )mod (3.16)

( ) /( ) J K J K S ô€€ Y ô€€Y X ô€€ X , 'S' is the slope of the line through 'J'

and 'K'.

If K ô€€ ô€€J i.e. K X Y P J J ô€€ ( ,ô€€ )mod then J ô€€Ž K ô€€ O . where, 'O' is

the point at infinity.

If K ô€€ J then J ô€€Ž K ô€€ 2J then point doubling equations are

used. Also J ô€€Ž K ô€€ K ô€€Ž J .

Consider a point (Prabu M. and Shanmugalakshmi R 2009) 'J'

such that ( , ) J J J ô€€ X Y , where yJô€‚0.

Let L=2J where ( , ) L L L ô€€ X Y , Then

X S X P L J 2 mod 2

ô€€ ô€€ (3.17)

Y Y S X X P L J J L ô€€ ô€€ ô€€Ž ( ô€€ )mod (3.18)

S X a Y P J J ô€€ (3 ô€€Ž ) /(2 )mod , 'S' is the tangent at point 'J' and 'a' is one of the parameters chosen with the EC if yJ=0 then 2J=O, where 'O' is the point at infinity.

The scalar multiplication is done based on add-and-double method, which involved the modular inversion algorithm (IEEE 1363 standard specifications 2000). The squaring and multiplication

operations are performed in terms of point addition and doubling. These modular inversion operations are expensive using the affine coordinates. So the affine coordinate is converted into projective coordinate to avoid expensive modular inversion operation. The projective coordinate point

representation can reduce the modular inversion in each point addition and doubling. Moreover, at the end of the scalar multiplication, the modular inversion is used once when converting back from projective to affine coordinates. The sequence of operations involved in the scalar multiplication is as follows; first conversion from affine to projective coordinate, then performing the fundamental arithmetic operations, finally converting back from projective to affine coordinates. The formula for conversion from affine coordinates to projective coordinates (IEEE 1363 standard specifications 2000),

(Ciaran McIvor 2006) is presented as, X ô€ x,Y ô€ y, Z ô€ 1 (3.19)

Then the general procedure to perform the EC point addition using projective coordinates requires the following computation.

( , , ) ( , , ) ( , , ) 0 0 0 1 1 1 2 2 2 X Y Z ô€€Ž X Y Z ô€€ X Y Z (3.20) Finally, conversion from projective coordinates to affine coordinates is as given as follows.

x ô€€ X / Z 2 , y ô€€ Y / Z 3 (3.24)

As can be seen from (3.8) and (3.10), point addition and point doubling require sixteen and ten prime field multiplications, respectively.

Modular arithmetic (Rivest 1978) plays an important role in public key cryptographic systems. The modular inversion operations are the most essential operations in ECC. The complexity, speed and area of a co-processor are based on this modular inversion operation. In 1985, P. L. Montgomery has proposed the Montgomery inversion algorithm, which is being used in both in ECC and RSA.

The two-phase Montgomery inversion algorithm over GF(p) is presented in the algorithm 5. Given ô€„® ô€‚ GF(p), where, p is a prime number, The Phase I of the algorithm produces ô€„®-1 Ã- 2k (mod p), where,

m ô€‚” k ô€‚” 2m, and 'm' is the field size. Phase II performs the correction step.

Input: a ô€‚ô€€¾1,P ô€€1ô€€ and p

Output: r ô€‚ô€€¾1,P ô€€1ô€€, where r ô€€ aô€€12m mod P and m P 2 ô€€ log

Phase I

i. u ô€€ p, v ô€€ a, r ô€€ 0, s ô€€ 1, k ô€€ 0

52

ii. while v ô€€¡ 0 do

iii. if u is even then: u ô€€ u ô€€¡ô€€¡ 1, s ô€€ s ô€