# Implementation Of AES Algorithm On ARM Computer Science Essay

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The AES encryption/decryption algorithm is widely used in modern consumer electronic products for security. To shorten the encryption/decryption time of plenty of data, it is necessary to adopt the algorithm of hardware implementation; however, it is possible to meet the requirement for low cost by completely using software only. How to reach a balance between the cost and efficiency of software and hardware implementation is a question worth of being discussed. In this paper, we implemented the AES encryption algorithm with hardware in combination with part of software using the custom instruction mechanism provided by the ARM7 with keil platform. we explored various combinations of hardware and software to realize the AES algorithm and discussed possible best solutions of different needs.

## Categories and Subject Descriptors

D.3 [Programming Languages]: Miscellaneous;

B.7.1 [Integrated Circuits]: Design Styles-Advanced Technologies, Algorithm implemented in hardware

## General Terms

Algorithm, Embedded Systems and Applications

## Keywords

Cryptography, AES, custom instruction, ARM processor

## 1. INTRODUCTION

With the rapid development of the Internet and e-Commerce, portable memory devices such as USB Disk, SD Cardâ€¦ are becoming increasingly popular; how to prevent the encryption system from being decrypted has become an important issue. As the length of the data block of the Data Encryption Standard (DES) algorithm is 64 bits only, and the length of the key is 56 bits only, it can no longer meet today's system needs.

Hence, National Institute of Standard and Technology (NIST) launched a campaign to solicit new encryption algorithm. After a string of evaluations, Rijmen's algorithm of Vincent Rijmen became the new coding standard and has replaced the existing symmetrical coding standard (DES)[4].

The AES algorithm is a round-based encryption/decryption algorithm and each round includes 4 operations: AddRoundKey, ShiftRows, SubBytes and MixColumns. To shorten the encryption/decryption time of plenty of data, it is necessary to adopt the algorithm of hardware implementation; however, it is possible to meet the requirement for low cost by using software only.

In recent years, there have been plenty of literatures on hardware/software implementation of the AES algorithm. They can be divided into 3 types: (1) full software implementation on low cost devices that do not require high speed, (2) full hardware implementation SubBytes is the computation that requires most hardware. (3) Software/hardware co-design .implement part of the algorithm using hardware and the remaining algorithm using software so as to reach a balance between the cost and efficiency. To mix its computation, the hardware is turned into custom instruction to support the software, which is a feasible method. How to reach a balance between the cost and efficiency of software and hardware implementation is a question worth of being discussed. In this paper, we implemented the AES encryption algorithm with hardware in combination with part of software using the custom instruction mechanism provided by the ARM7 with Keil platform. We explored various combinations of hardware and software to realize AES algorithm and discussed possible best solutions of different needs. In the next section, we will briefly review the AES algorithm.

## 2. AES ALGORITHM

The Advanced Encryption Standard is a symmetric block cipher. The data block size is fixed to be128 bits, while the key length can be 128, 192 or 256 bits. The AES is a round-based algorithm. The number of rounds Nr is 10, 12, or 14, when the key length is 128, 192 or 256 bits, respectively. Each round of AES algorithm performs the three transformations: AddRoundKey, SubBytes, and ShiftRows. Except the final round, each round also performs Mixcolumns. The key used in each round, called as round-key, is generated from the initial key by a separate key scheduling module.

## Figure 1.AES Encryption / Decryption flow

The 128 bit data block is divided into 16 bytes, which are represented by a 4X4 matrix of bytes. The entries are denotes by S0,0, S0,1, S0,2, S0,3, S1,0,S1,1, S1,2, S1,3, S2,0, S2,1, S2,2, S2,3,S3,0 , S3,1, S3,2, S3,3. The matrix represents a state S. All the four transformations map an input state to an output state. The AddRoundKey involves only one bit-wise XOR operation between the state S and the round key. The ShiftRows cyclically shifts k bytes to the left on kth row of the state matrix, k=0~3. The position changes to S0,0, S0,1, S0,2, S0,3, S, S1,2, S2,2, S2,3, S2,0, S2,1, S3,1, S3,2, S33, S3,0. The MixColumn uses each column of the state matrix as a polynomial over GF(28)and multiples them modulo x+1 with a polynomial a(x) ={03}x + {01}x + {01}x + {02}.

The SubBytes is a nonlinear transformation, which substitutes each byte of the state with its multiplicative inverse in GF(28) and then performs an affine transformation. The irreducible polynomials m(x) = x8 + x4 + x3 + x + 1 is used in the AES algorithm to construct GF(28). The affine transformation consists of a bitwise matrix multiplication with a fixed 8x8 binary matrix followed by XOR with {63}h.. The module performing the SubBytes transformation is called as SBOX.

## 3. ROUNDKEY GENERATION

There are two main approaches to generate the round key used in the AES process. Keys can be generated on-the-fly by a concurrently executing data path that computes the next round key during the time the actual data path completes computing the current AES round. The second alternative is to pre-compute all roundkeys and store them in a roundkey memory.

A critical point in the implementation of a cryptographic system is the "key setup time" which is defined as the amount of time required to start cryptographic operations after a new cipherkey has been provided. On-the-fly key generators can be designed in a way to completely eliminate any latency overhead when changing cipherkeys, at least for encryption. For decryption, the first roundkey that is required is the last roundkey that has been used for encryption. Since the key expansion uses recursion, there is no simple way to obtain the last roundkey directly from the cipherkey. This must be done by computing all roundkeys for the encryption. The last roundkey so obtained can be used as an initial vector for the inverse key schedule.

The AES-128 mode requires 10 roundkeys with 128 bits. An on-the-fly key generator of a flexible AES implementation that supports both encryption and decryption for all standard key lengths needs to be able to store 256 bits of cipherkey, 128 bits for the roundkey, and finally 128 bits for the last roundkey. This is more than one fourth of the total amount of storage that is needed for all roundkeys. Consequently, pre-computing all roundkeys is not always a bad decision.

## 4. OPTIMISATION FOR ARM

## PROCESSORS

ARM is the leading provider of 32-bit embedded RISC mi-croprocessors with almost 75% of the market. ARM offers a wide range of processor cores based on a common architecture [5] [3], delivering high performance together with low power consumption and system cost.

ARM processors implement a load/store architecture. De-pending on the processor mode, 15 general purpose registers are visible at a time. Almost all ARM instructions can be executed conditionally on the value of the ALU status flags. Load and store instructions can load or store a 32-bit word or an 8-bit unsigned byte from memory to a register or from a register to memory.

The ARM arithmetic logic unit has a 32-bit barrel shifter

that is capable of shift and rotate operations. The second operand to all ARM data-processing and single register data-transfer instructions can be shifted before data processing or data transfer is executed, as part of the instruction. The amount by which the register should be shifted may be contained in an immediate field in the instruction, or in the bottom byte of another register. When the shift amount is specified in the instruction, it may take any value from 0 to 31, without incurring any penalty in the instruction cycle time.

At first sight, the key expansion defined for AES , does not look hardware intensive. After all, only four SubBytes operations are required per AES round. However, the additional flexibility required to support all three key lengths results in a very cumbersome and slow implementation. For faster implementations with large parallel data paths, the critical path through the key generator is usually longer than the actual data path. For small implementations that use a data path of 32 bits or less, more area is required to implement a key generator than the actual data path.

## 5. HARDWARE AND SOFTWARE

## IMPLEMENTATIONS

We will firstly describe main considerations in the hardware implementation and then in software implementation. The AddRoundKey operation involves only one bit-wise XOR operation. The MixColumn operation can be also implemented with XOR gates only [7]. The ShiftRows operation can be realized by wring. There are two approaches for designing S-Box circuits: (1) Table lookup and (2) Combinational circuit. The former uses ROM or RAM to store the table. In the latter design, the inversion in GF(28) is the most complicated operation. To reduce the hardware complexity, the composite field arithmetic is exploited [11], by that the original inversion in GF(28) is mapped to operations in composite field GF((24 )2 . Basically, an element a Ð„ GF(28) is represented as a linear polynomial a hx+al with coefficients in GF(24 ). Let us take the eight bits of a Ð„ GF(28) as {a1, a2,â€¦, a7} and the four bits for ah (al) as {ah3,â€¦, ah0} ({al3,â€¦, al0}). Then the mapping can be computed as follows [3]:

al = (al0 = acâŠ•a0âŠ•a5, all = alâŠ•a2, a12 = aA , al3 =a2 âŠ•a4 ,ah

= (ah0 = acâŠ•a5, ahl = aAâŠ•ac, ah2 = aBâŠ•a2âŠ•a3,ah3=aB),

where aA = a1âŠ•a7, aB = a5âŠ•a7, aC = a4.âŠ•a6.

The inversion of ahx+al requires modular reduction to guarantee that the result is also a two-term polynomial. The irreducible polynomial n(x) = x2 + {1}x + {e} is used. Let 'Ã-' be multiplication. The inversion can be derived as follows:

(ahx+al)-1=(ah d)x+(ahâŠ• al) d,

where d = ((ah2 {e})âŠ•( ahal)âŠ•al2)-1

Those operations can be reduced to the bit-wise logical AND and XOR functions. In this work, the operation x2 mod m(x) and x{e} are merged into the following logic implementation:

q0=a1âŠ•a2, q1=a0, q2=aBâŠ•a3, q3=aB,

where aB=a0âŠ•a1.

In the software part, AddRoundKey and SubBytes are based on individual bytes and it does not matter on how the data is arranged in the memory. However, since ShiftRows manipulate data in one row while MixColumn in one column, it is impossible for the two operations to read 4 bytes at one time. Since MixColumn involves more algorithmic actions, the original state matrix is transposed for simplifying MixColumn operations in paper [8]. However, this requires the modification of the key generation procedure. The approach in papers [9,10] combined the SubBytes and MixColumn as an extended SBox table. The extended SBox table are 32-bit, 256 word tables

## Figure 2.Hardware implementation of (a) SBOX and

## (b)AES algorithm

generated by concatenating four values Si,jÃ-{03}, Si,jÃ-{02}, Si,jÃ-{01} and SijÃ-{01} of each SBox table output Si,j.

In this paper, SubBytes and Mix Columns are executed separately.

## .6. CUSTOM INSTRUCTIONS

In this paper, we use ARM7 Processor to implement AES algorithm using custom hardware instructions. The advantages of custom instructions include the reduction of instruction sequence and the speed acceleration by hardware.

With the ARM processor development kits, we can convert one hardware circuit into a custom instruction and put it in the instruction set of the CPU. Depending on the data amount and execution, we can get the ARM supports four types of custom instructions: combinatorial, multi-cycle extended and register file. In this we are implementing AES algorithm using the custom instructions are ARM processor and keil compiler software with both hardware and software combinations.

## 7. EXPERIMENTAL RESULTS

We explored design space with a parameterized synthesizable design. Relevant programmable parameters include:

SW, TSBOX or GSBOX: A user can choose software table(SW), pre-store hardware table (TSBOX), generating transformation by combinational logic to implement SBOX (GSBOX), which is realized by composite field arithmetic as stated in the third section.

Number of SBOX: If using TSBOX or GSBOX, a user can choose how many SBOX to implement: 1, 4, 8 or 16.

MixColumn: A user can choose whether to implement it using hardware.

ShiftRow+AddRoundkey: A user can choose whether to implement it using hardware.

The custom instructions are implemented with ARM7 Processor development kit. In the experiment, ARM7 processor hardware in combination with software i.e. embedded C language developed using keil software. The time is measured for running32 packets of data, each having 128 bits. The round keys are pre-calculated using the same implementation (either look-up table or combinational logic) as used in the data path. After the round keys have been prepared, the 32 packets are encrypted sequentially.

## Figure 3.AES algorithm source code

The source code which is developed in embeddedC language the fig 3 represents example for the source code of AES encryption and decryption algorithm.

After developing the source code ,burn the programming into the ARM processor by using the flash burner called Philips flash utility V2.2.3.

## Figure 4. Burning process

The fig 4 represents the example of the dumping(burning) the program into the ARM processor.

When the burning process is completed, go to the hyper terminal and transmit the data. The fig 5(A) & (B) represents the hyper terminal link to transmit the data to the processor.

## Figure 5(A).Hyper Terminal

## Figure 5 (B).LCD display data

When the data is received then the key generation process is calculated. The number of rounds is depending up on the size of the data. The number of rounds Nr is 10, 12, or 14, when the key length is 128, 192 or 256 bits, respectively.

The fig 6 represents the examples of AES key generation method. Here the generated key length is depend on the data block size. For example The number of rounds is 10, 12, or 14, when the key length is 128, 192 or 256 bits, respectively

## Figure 6.AES key generation

When the key generation is completed; the first method in AES algorithm i.e. encryption method is started. Here the original data which transmitted in the form of encrypted. The fig 7 represents the example of the encryption form.

## Figure 7. Encryption process

After completion of the encrypted form then we perform the decryption form of the encrypted data. At finally the original message is received. The fig 8 represents the example of final received original data. By using this process we can provide high security for transmitting the data.

## Figure 8.Received original message window

Earlier this algorithm is implemented on ALTERA Nios II platform, with a parameterized synthesizable design. But when we are using ARM processor it can provide more security & accuracy for data transmission.

## 8. ACKNOWLEDGMENTS

The authors would like to thank Dr.Syed S Basha for the invaluable support during the development of this work..

## 9. CONCLUSION

The AES encryption/decryption algorithm is widely used in modern consumer electronic products for security. In this paper, we have implemented the AES encryption and decryption algorithm with hardware in combination with part of software using the custom instruction mechanism provided by the ARM7 . With a language of embedded C using of keil platform , we explored various combinations of hardware and software to realize the AES algorithm and discussed possible best solutions of different needs.