This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
The S-DES algorithm is a simplified version of the DES cryptographic algorithm. Unlike the DES algorithm which uses 64-bit block of plaintext and a 56-bit key as input, the educational purpose algorithm developed by Professor Edward Schaefer makes use of 8-bit block of plaintext and a 10 bit key. Being a private key cryptographic technique, the algorithm is known to everybody but the key is kept secret. The same algorithm can be used for decryption and encryption.
Basically, encryption is the process whereby the original plaintext is converted into ciphertext which appears to be complete gibberish in nature whereas decryption being the reverse process of encryption, converts the ciphertext back to its original form.
Before encryption or decryption process can take place two sub keys are generated from the 10-bit key. These two sub keys are later used in the stages of encryption and decryption. Firstly, the 10-bit key (k1, k2, k3, k4, k5, k6, k7, k8, k9, k10) is permuted according to the following permutation rule:
P10(k1,k2,k3,k4,k5,k6,k7,k8,k9,k10) = (k3,k5,k2,k7,k4,k10,k1,k9,k8,k6)
Then a circular left shift is performed onto the first five and second five bits of the result obtained from the permutation.
Key after permutation = 1 0 1 1 0 1 0 1 0 0
Key after left shift = 0 1 1 0 1 0 1 0 0 1
Next the function P8= (k6, k3, k7, k4, k8, k5, k10, k9) takes 8 bits from the 10-bit result and performs a permutation. This yields to the first sub key denoted as K1.To obtain the second sub key (K2), a circular left shift of two bit positions is performed on the first five and second five bits result of the previous circular shift.
The encryption algorithm involves mainly five steps:
1. An initial permutation (IP)
2. A complex function labeled fk1
3. A simple permutation function (SW)
4. A complex function labeled fk2
5. Inverse of initial permutation
Given a string of bits, the algorithm takes 8 bit block of plain text and applies permutation according to the following rule IP (b2, b6, b3, b1, b4, b8, b5, b7).
The function fk consists of a combination of permutation and substitution. In general the functions that enclose the complex function can be expressed as
Whereby L, R and SK represent the left four bits, right four bits and subkey respectively.
Function F (R, SK)
The function F (R, SK) takes the right four last bits of the formatted plain text and applies expansion/permutation denoted as E/P.
Using the above result, an exclusively ored operation is performed with the first subkey (K1).Then the first four bits and the second four bits are fed to S-box S0 and S-box S1 respectively.
The first and the fourth bits determine the row of the S-box whereas the second and third bits determine the column of the S-box. After converting each pair of bits in decimal format, the entry in each S-box is identified. Since the entries of the S-boxes are in decimal each value obtained is converted into binary. A permutation P4 is applied to the above result.
The outcome of the P4 permutation is exclusively ored with the left four bits acquired from the initial permutation. The above data and the left four bits from the initial permutation are brought together to form an 8-bit block.
The switch function swaps the first and second four bits of the previous result.This interchanging of bits is done so that on the second occurrence of fk , it will operate on different bits.The E/P,S0,S1 and P4 functions remain unchanged only the subkey changes.
Inverse initial permutation
An inverse permutation IP-1 is applied to the above result. This yield finally to the ciphertext.
The decryption algorithm follows the same procedures as that explained for the encryption one but the only difference is that on the first occurrence of function fk, the first subkey being used is K2 instead of K1.
Genetic algorithm, commonly named as GA, has been introduced in the United States in the 1970s by John Holland at the University of Michigan. GA is a subfield of evolutionary algorithms and is mostly used to solve combinatorial optimization problems. Combinatorial optimization is the process of finding the best solutions given limited resources. Therefore GA is a search algorithm that needs little information to reach optimal solutions unlike other traditional search methods . GA uses the concepts based on Darwins theory of natural selection to determine solutions to optimization problems.GA is different from other non deterministic search methods as it operates on a population of solutions rather than on a single solution.The set of solutions that are usually binary strings,represents the chromosomes.GA consists of mathematical genetic operators such as selection, crossover and mutation that are essential for generation evolution and a fitness function which defines how good is each individual in the population.
A fitness function, also known as the cost function is a problem specific user defined heuristic. The fitness function determines the quality of the solutions that is, it assesses each solution and return a measurable fitness value. In the context of cryptanalysis, a cost function evaluates the suitability of a candidate key for a given text and the most common cryptanalytic technique used for attacking ciphers besides mail, brute force is through the use of frequency analysis combined with other mathematical methods. Frequency analysis is the study of frequency of letters or group of letters also called n-grams in a ciphertext. An n size gram of 1 is called a unigram;size of 2 a bigram;size of 3 a trigram.Larger size is referred to by the value of n,e.g.,'four-gram','five-gram' and so on.Usually,the frequencies of those n-grams are compared to English language frequencies extracted from a corpus of sample text.The effectiveness of this method is dependent on the length of the ciphertext and the corpus.If the ciphertext is too short,the extracted n-gram statistics will not correlate with the frequency of natural language statistics thus yielding little insight into mapping between plaintext and ciphertext.Church house (2001) suggested that a minimum of 200 letters is necessary if unigram statistics are used and 50 to 60 letters if bigrams and trigrams statistics are used.The corpus material should possess a similar profile to the plaintext content,so it must be selected carefully,taking into account the language and style of plaintext.
Equation (1) uses the concepts of frequency analysis to find the cost value of each solution.
u,b and t represent the unigrams,bigrams and trigrams statistics.K and D denote the known language statistics and decrypted message statistics respectively.α, β and γ are weights assigning different priorities to each of the three statistics where α+ β+γ=1.The unigrams are ignored because its frequencies remain unchanged.Only bigrams and trigrams are being used.
Selection is the first reproductive stage of genetic algorithm.In the selection phase, individuals having traits that will increase the probability of survival are chosen to reproduce.That is infeasible chromosomes are discarded and only those fit for mating will have higher chances to generate new solutions.Selection is fitness dependent and is done using different algorithms.There are many schemes used in selecting fitter chromosomes and some common methods of selections are the roulette wheel,Boltzmann selection,tournament selection and steady state selection.In this paper focus will be mainly on tournament selection .
Tournament selection is being increasingly used as a GA selection scheme as it is simple to code and is efficient for both parallel and non parallel architetures .Tournament selection offers us the possibility to vary the degree to which fitter individuals are being picked up to form part of the mating pool.As the selection pressure increases,more better individuals are chosen but a too high selection pressure may lead to premature convergence.At the same time a weak selection pressure will result in too slow convergence.
Tournament selection is likely to be a competition among individuals in the search space.First, it randomly chooses k individuals from the population pool where k is the tournament size.Then it compares the objective value of each individual taking part in the competition and the best objective value is chosen to be a parent in the next generation.The fittest chromosomes may be picked up several times if the population pool is small.In the context of this paper, the mating pool contains only the tournament winners and has a lower average fitness than the population average fitness.The selection pressure can be varied by either decreasing or increasing the tournament size.
The most important part of search process in GA is crossover.Crossover exploits the search space by creating new chromosomes in order to find the solution space.Basically, two parents are randomly selected from the mating pool and those parents undergo an exchange and recombination process to produce two new chromosomes which will form part in the next generation.In this process,one point,called the crossover site,along the length of the chromosomes is randomly chosen and data from beginning of binary string to crossover site is copied from one parent and the rest is copied from the other parent.An example is illustated below.
As crossover is purely random in nature, the children can be either better or worse than their parents. The number of pairs of individuals that are picked for mating is dependent on the crossover rate. A high crossover rate will produce a greater amount of new individuals but if set too high,good individuals may be modified and the genetic material contained in them is lost.However, a low crossover rate will slow down the search due to loss of exploration power.
In this paper a two point crossover operator is applied.This scheme randomly selects two dissimilar crossover sites causing each parent to break into three segments.The segments found between the crossover points are exchanged to produce two offsprings.This illustrated below.
The mutation process,unlike the crossover scheme is an exploration operation.Mutation can search new areas of solutions by introducing changes in the genetic sequence of the chromosomes. Evolution process is dependent on mutation because it is the only way that new alleles are created.Though it cannot change many bits in an individual,mutation avoids premature convergence and preserves the population diversity.The number of mutations permissible is dependent on the mutation rate.Usually,the mutation rate is kept low to protect genetic material as a too high mutation rate will cause a huge loss of hereditary material.
Since the individuals have been encoded in terms of '0's and '1's,a flip bit method is applied to emulate mutation.For an individual that has been chosen to mutate, a point along the length of individual is randomly selected and the bit at that point is inverted to become a '0' if original value was '1' or to become '1' if original value was '0'.An example is illustrated below.
1 1 0 1 0 0
0 1 0 1 0 0
Memetic algorithm also called MA was first introduced by Pablo Moscato in 1989.The latter used Dawkin's notion of meme and concepts of natural selection to create a hybrid GA.Generally, a meme is a unit of information that reproduces itself as people exchange ideas .Memes undergo natural selection like genes but they can be transmitted between any two individuals.Meme spreading is faster than gene as a meme from a single individual can be copied and adopted by unlimited number of individuals whereas gene replication is confined to the small number of children a single parent can produce. Before a meme is passed on, it is typically adapted by the person who transmits it as that person thinks, understands and processes the meme, whereas genes get passed on whole .