# Classical Ciphers And Cryptanalysis Computer Science Essay

**Published:** **Last Edited:**

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

There are several approaches to break substitution ciphers, each of them having advantages and disadvantages over one another. Substitution ciphers represent the basic building blocks of more complex and more secure ciphers that are used today. Hence understanding the vulnerability of simple ciphers is important in using and building more complex ciphers.

Different techniques were explored to find the key of the cipher and thereby decrypting the entire cipher text. Several possible methods to break a substitution cipher were explored which include exhaustive search, simulated annealing, frequency analysis, genetic algorithm, particle swarm optimization, tabu search and relaxation algorithm.

The exhaustive search method is the simplest out of all algorithms used to break substitution ciphers. This technique is possible when the cryptographic systems have finite key space and allowing for all possible key combinations to be checked until the correct one is found. This method could be an acceptable technique for breaking a mono alphabetic shift cipher.

The first attempt using the exhaustive search is not the best choice, since it is time consuming, but it decrypts the text with 100% accuracy.

Brute force method is a way of trying to break simple substitution ciphers, but the number of possible keys that need to be checked is large. Thus, practically, it may be impossible to do an exhaustive search with in a reasonable amount of time. To overcome this, new algorithms have been developed for faster breaking of the cipher.

Another method to crack substitution ciphers faster than the exhaustive search is the frequency analysis method. This is the "classic" method of decrypting substitution cipher text. The frequency analysis is based on the assumption that each letter in the plain text can always be substituted by another letter of the original ciphered text. Frequency analysis is the process of determining at what frequency each symbol of the encrypted text occurs within the cipher text. This information will be used along with the knowledge of frequencies of symbols within the language used in the cipher to help determine which cipher text symbol maps to the corresponding plaintext symbol.

The frequency analysis algorithm is the fast approach to decipher text. But, it requires the knowledge of the language statistics of the original

text. The disadvantage is that it relies on constant human interaction for

determining the next move in the process.

Automated attack algorithms were developed for which human intervention is not necessary. These methods will be finished either after a predetermined number of iterations or after a message has been successfully decrypted. One such automated attack algorithms is the genetic algorithm which is widely used for cracking substitution ciphers.

Like other heuristic algorithms, the genetic algorithm will not always produce the exact result. They give solutions which are nearest to the correct one. In case of deciphering, after using the "best" key produced by genetic algorithm, most of the time, it is easy for a human to read the "decrypted" text. Then make small changes to reproduce the correct plain text. The experiments performed using the genetic algorithm method suggested that a fitness of about 0.9 is enough to determine the vowel substitutions and consonant substitutions after which the visual examination by a human can be used to decrypt the entire text.

Simulated annealing is another technique that is similar to the genetic algorithm which is used to break substitution ciphers. The main difference is that the genetic algorithm has a pool of possible keys at each moment, while the simulated annealing keeps one value at a time. When combined with a few other simplifications, simulated annealing makes the approach much simpler than the genetic algorithm.

The genetic algorithm matches more letters correctly than the simulation annealing does for any given length of the cipher text. This difference is not too high, so the simulated annealing method is still a good technique for breaking simple ciphers. To reach the correct solution, simulated annealing takes less iterations, but the total time it takes is more than the time needed by genetic algorithm to reach the same result since the simulated annealing method spends long time in each iteration because of its detailed examination of each possible perturbation. Even though the genetic algorithm seems to be better than simulated annealing in all aspects, the main advantage of the simulated annealing is that it is much easier to implement and therefore, it still has an important role in cracking simple substitution ciphers.

Tabu search is another optimization technique used for breaking substitution ciphers. The experimental results suggest that the genetic algorithm recovers slightly more characters than the other two algorithms. Simulated annealing algorithm is much simpler to implement than genetic algorithms and the tabu search. Tabu search method obtains the desired result faster than the other two algorithms.

The particle swarm optimization method is another method based on machine learning processes that is used for breaking substitution ciphers. The algorithm starts by selecting a random population of potential solutions, each of which is called a particle. Particle swarm optimization is a good method for breaking simple substitution ciphers as long as bigrams are used to calculate the fitness of particles. Considering unigram as a measure for fitness does not give any useful results for deciphering even the simplest ciphers.

Another technique used to break substitution ciphers is the relaxation algorithm. This is a graph-based technique that relies on iterative and parallel updating of values associated with each node. The nodes of the graph, vi, are elements of the cipher alphabet. Each node has a

random variable, li, associated with it which represents the probabilities of the possible characters that this node represents. The probabilities of a node are updated based on the appearance of its two neighbors in the cipher text and the trigram analysis of the original language.

## Sujith Ravi and Kevin Knight :Attacking Letter Substitution Ciphers with Integer Programming, Cryptologia, 33:4, 321 - 334, Oct 2009

Ravi and Knight introduced a method that uses low-order letter n-gram models to solve substitution ciphers. This method is based on integer programming which performs an optimal search over the key space. This method guarantees that no key is overlooked. This can be executed with standard integer programming solvers. The proposed method studies the variation of decipherment accuracy as a function of n-gram order and cipher length. Empirical testing of Shannon's information theory for decipherment uncertainty which includes the famous unicity distance was also carried out.

Fifty ciphers each of lengths 2, 4, 8, . . . , 256 were created. These ciphers are solved with 1-gram, 2-gram, and 3-gram language models. The average percentage of cipher text decoded incorrectly was recorded.

It was observed that solution obtained by integer programming is exact in achieving the objective. With a 2-gram model, EM algorithm resulted in 10% error for a 414-letter cipher, where as integer programming method provided a solution with 0.5% error only on the same cipher. For short cipher lengths, much higher improvement was observed when integer programming method was used. For example, on a 52-letter cipher, using a 2-gram model, the solution from integer programming method resulted in 21% error, which is low when compared to 85% error given by the EM method. The 1-gram model works badly in this scenario, which is inline with Bauer's observation that for short texts.

The average key and message equivocations are plotted using 1-gram, 2-gram, and 3-gram language models. The message equivocation curve follows Shannon's prediction which is rising and then falling. Short ciphers have relatively few solutions and the overall uncertainty is not that high. As the cipher length increases, message equivocation rises. At some point, it then decreases, as the cipher begins to disclose its secret through patterns of repetition.

The work mainly focuses on letter substitution ciphers which also include spaces. The work compares decipherment on ciphers with spaces and without spaces using different n-gram English model. For ciphers without spaces, English models are retrained on text without spaces. Cracking space-free ciphers is more difficult than with simple language models.

DE-ENC RYPTION OF A TEXT IN SPANISH USING PROBABILITY AND STATISTICS by Bárbara E. Sánchez Rinza, Diana Alejandra Bigurra Zavala, Alonso Corona Chavez 18th International Conference on Electronics, Communications and Computers

Bárbara E. et. all presented a method for de-ciphering texts in Spanish using the probability of usage of letters in the language. This method is basically to perform Crypto-analysis of a mono alphabetic cryptosystem. The method uses probability and usage of letters in Spanish language to break the encrypted text files. This method assigns weights to different alphabetical letters of Spanish language . The assignment of weights is depending on their usage in the language. For this purpose analysis of the frequency of different symbols in the Spanish plain text is done. The same analysis was done on cipher text also. Every encrypted character is mapped to a single character in the original message and vice versa. In this way the original text is retrieved from the cipher text. Few characters vary because there are letters and symbols that have the same frequency. This method of deciphering cryptograms in Spanish to obtain the original text gave positive results, still the deciphering was not 100% successful as there may be letters and symbols that have the same frequency values.

## DECRYPTING ENGLISH TEXT USING ENHANCED FREQUENCY ANALYSIS by K.W. Lee, C.E. Teh, Y.L. Tan

A deciphering model was developed by Lee et. al to automate the cryptanalysis of mono alphabetic substitution ciphers. The method proposed by Lee et. al uses enhanced frequency analysis technique. The method is a three hierarchical approach. To perform deciphering of mono alphabetic substitution cipher, monogram frequencies, keyword rules and dictionary are used one by one. As a first step, for all of the letters in the cipher text the monogram frequencies are computed. In the second step, the keyword rules are used to decipher the other unknown cipher text letters. For letters that are not yet deciphered in the second step, they will be identified in the third step. In this step, the cipher text letters will be recognized by the dictionary checking. This approach was tested on two short cryptograms and it was observed that both cryptograms achieved successful deciphering results in good computational time. It was observed that this enhanced frequency analysis approach performs faster decryption than the Hart's approach. Due to the combined properties of keyword rules and dictionary checking, the Hart's approach weakness could be hindered.

## To Decode Short Cryptograms by George W Hart

G W Hart proposed a method for solving cryptograms which works well even in difficult cases where only a small sample of text is available and the probability distribution of letters are far from what is expected. This method performs well even on longer and easier cryptograms. An exponential time is required in the worst case, but in practice it is quite fast. This method fails completely when no words of the plain text were in the dictionary.

Samuel W. Hasinoff, Solving Substitution Ciphers, Technical Report, University of Toronto, Department of Computer Science, 2003.

Sam Hasinoff presented a system for the automatic solution of short substitution ciphers. The proposed system operates by using n -gram model of English and stochastic local search over all possible keys of the key space. This method resulted in median of 94% cipher letters correctly decoded. The technique consists of two main components, a generic stochastic local search (SLS) method and a scoring function. The search method is for navigation in the key space and scoring function is to evaluate the goodness of various keys. The scoring function of a particular key is as the log likelihood of an n -gram language model that is applied to the cipher text, decrypted using that key. Here n -gram models of characters are considered. Such models can also be extended to the level of words. Most practical systems employ bigrams or trigrams.

## Analysis for Decipherment Problems by Kevin Knight, Anish Nair, Nishit Rathod etc.

Knight et. al discussed a number of natural language decipherment problems that use unsupervised learning. These include letter substitution ciphers, phonetic decipherment, character code conversion,

and word-based ciphers with importance to machine translation. Basic unsupervised learning techniques often fail on the first trial. So, techniques for understanding errors and increasing the performance were discussed. These include letter substitution, character code conversion, phonetic decipherment, word based decoding etc.

An efficient algorithm that accomplishes the same thing as a naive application of the EM algorithm to break a substitution cipher was implemented. Unsmoothed parameter values for an English letter-bigram P(p) from sample data were estimated and tabulated. Then uniform P(c | p) were set up. The decipherment yielded by EM method results in 68 errors. Using a 1.5-million character data set instead of a 70,000-character data set reduced the number of errors from 68 to 64. Lambda interpolation smoothing was applied to P(p). This reduced errors further to 62.

## Probabilistic Methods for a Japanese Syllable Cipher

## Sujith Ravi and Kevin Knight

Sujith Ravi et. al studied about attacking Japanese syllable substitution cipher. Different Natural language processing techniques were used to attack a Japanese substitution cipher. They made several novel improvements over previous probabilistic methods, and report improved results.

In general the receiver uses the key convert cipher text to plain text. But a third party who intercepts the message may guess the original plaintext by analyzing the repetition patterns of the cipher text. From a natural language perspective, this cryptanalysis task can be viewed as a kind of unsupervised tagging problem. Language modeling (LM) techniques are used to rank proposed decipherment.

This work mainly attacks on difficult cipher systems that have more characters than English, on cipher lengths that are not solved by low-order language models and relate the language-model perplexity to decipherment accuracy.

The work involves processing the input file into syllables and removing typographical errors. The sequence considered is approximately one million syllables in size and contains 65 unique syllable types. This data is split into three parts: LM training data, LM smoothing data and Plaintext messages (various sizes). When 3-gram LM was trained on various data, from the decipherment results it can be concluded that more LM data (i.e., more knowledge about the language) leads to better decipherment.

With improved LM smoothing further improvements in accurate decipherment of shorter texts can be achieved. Further algorithms may lead to accurate decipherment of more complex Japanese cipher systems, including translation to other languages.

## How to decipher Rongorongo

## Kyle Gorman

The ciphertext c is the Rongorongo corpus. Knowledge of the underlying language i.e.Rapa Nui yields in the plaintext model p which can be expressed as a n-gram language model P(p).The method follows the familiar noisy-channel framework.

When a new ciphertext sequence c was found , first use expectation-maximization (EM) to set all free parameters to maximize P(c). This is same (by Bayes Rule) as maximizing the sum over all p of P(p) o P(c|p). Then use the algorithm to choose the p maximizing P(p) oP(c|p), which is equivqlent to the original goal of maximizing P(p|c), or plaintext given

Ciphertext as discueed by Knight (2006)."

## A Method for Distorting the Frequency of Character Occurrence in the Simple Substitution Cipher by M. P. Mineev and V. N. Chubarikov

A simple substitution cipher uses substitution on the set of letters in the plain text alphabet such that different letters in the cipher text orrespond to different letters in the plaintext. To encode a text by using character wise substitution, an "infinite key" was used and each letter in the plaintext will be replaced by a letter of the cipher text by means of a one-to-one self-mapping of the set of letters. Then, the knowledge of the key is necessary to reconstruct the plaintext. The work of Mineev et.al is concerned with a similar smoothing effect on the simple substitution cipher resulting from contracting the alphabet by quadratic residues and quadratic non residues in finite fields. As a sample, the Russian alphabet were considered in the proposed work.

Instant Cipher text-Only Cryptanalysis of GSM Encrypted communication by Elad Barkan and Eli Biham

New methods were proposed for attacking the encryption and the security

protocols used by GSM and GPRS. The described attacks are easy to apply, and do not require knowledge of the conversation. GSM operators should replace the existing cryptographic algorithms and protocols as early as possible, or switch to the secured third generation cellular system. Even GSM networks that use the new A5/3 surrender to the proposed attacks. Emphasis is on cipher text-only attack which is made possible by the fact that the error-correction codes are in use before the encryption. In case of GSM, the addition of such a structured redundancy before encryption is performed, reduces the security of the system.

## A fast method for cryptanalysis of substitution ciphers by T.Jackobsen

Jackobsen proposed method for cryptanalysis of substitution ciphers. In this method the initial guess of the key is refined through a number of iterations. In each step the recovered plain text using the current key is evaluated to hoe close it is to the correct key. To solve the cipher using this method bi gram distribution of letters in cipher text and plain text are sufficient. A distribution matrix is constructed only once and in each iteration the matrix is manipulated. This method is suitable for both mono and poly alphabetic substitution ciphers.

## Solving Substitution Ciphers with Genetics Algorithm by Joe Gester

In general any substitution cipher takes as a key a transformation from each cipher-text character to a plain-text character. When the key space is large trying all possible keys i.e. a brute force attack is not a viable option. Joe Gester proposed and implemented the simplest approach based on searching the more likely used keyword generated key space.

The proposed Genetic algorithm is involves an iterative process of finding the fitness of the individuals in the population. Then selectively genetic operators are applied to the members of the population to create a new generation and the process is repeated. Each generation is created by selecting members of the previous generation randomly and weighted according to their fitness.

The proposed method uses a simple genetic algorithm approach to search the key space of cryptograms. If this method is not satisfactory, then attempt was made to search a smaller problem space by restricting the key searched to those which are generated by a keyword. In first approach fitness function were used for rating the quality of each individual in the population's Solution, that is based on trigram and bigram counts. Then new populations were constructed by selecting randomly either crossover or mutation. In this approach rapidly the populations reach local maxima or never seem to converge to anything resembling English. The fitness function has evaluated the fitness of a candidate properly. The method used for crossover was more of a crossover coupled with several mutations rather than a simple crossover.

To allow for a simplified crossover mechanism it will be necessary to allow duplicate characters. An ad-hoc heuristic approach may be suitable to automate solving substitution ciphers.