This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Abstract - As modern encryption algorithms are broken, the world of information security looks in new directions to protect the data it transmits. The concept of using DNA computing in the fields of cryptography and steganography has been identified as a possible technology that may bring forward a new hope for unbreakable algorithms. The DNA-based cryptography is a new and very promising direction in cryptography research. This paper proposes a novel algorithm to secure the data communication.
The proposed technique is a composition of both encryption and data hiding using some properties of Deoxyribonucleic Acid (DNA) sequences. Hence, the proposed scheme consists mainly of two phases. In the ï¬rst phase, the secret data is encrypted using a DNA and Amino Acids-Based RSA. While in the second phase the encrypted data is steganographically hidden into some reference DNA sequence using an insertion technique. The proposed algorithm can successfully work on any binary data since it is actually transformed into a sequence of DNA nucleotides using some binary conversion rule. Subsequently, these nucleotides are represented as an amino acids structure in order to pass through the specially designed RSA and encrypt it into another DNA sequence. Then, this encrypted DNA data is randomly inserted into some reference DNA sequence to produce a faked DNA sequence with the encrypted data hidden. In order to recover the embedded secret data, the receiver can carry out the inverse process with the help of the both the embedding parameters and the reference DNA sequence.
Index Terms - DNA, amino acids, RSA, insertion technique
The growth of computers and communication systems brought with it a demand from the private sector for means to protect information in digital form and to provide security services. Information security means protecting information and information systems from unauthorized access, use, disclosure, disruption, modiï¬cation, or destruction . Some of data may be secret information which is candidate to unauthorized access. In order to keep the unauthorized user away, variety of techniques have been used such as cryptography and data hiding.
Until modern times cryptography referred almost exclusively to encryption, which is the process of converting ordinary information (plaintext) into unintelligible gibberish (i.e., ciphertext). Decryption is the reverse, in other words, moving from the unintelligible ciphertext back to plaintext. A cipher is a pair of algorithms that create the encryption and the reversing decryption.
Steganography is a science that focuses on hiding specific messages using specialized techniques in such a way as only the sender and the intended receiver are able to disclose it. The carrier could be any medium used to convey information, including wood or slate tablets, tiny photographs or word arrangements. Modern Steganography techniques using digital information offering wonderful opportunities not only to hide information, but also to develop a general theoretical framework for hiding different kinds of data such as sound tracks, images, videos, and even 3D objects.
Since security is one of the most important issues, the evolve of cryptography and cryptographic analysis are considered as the fields of on-going research. The latest development on this field is DNA cryptography.
The DNA-based cryptography is a new and very promising direction in cryptography research-. DNA can be used in cryptography for storing and transmitting the information, as well as for computation. The massive parallelism and extraordinary information density inherent in this molecule are exploited for cryptographic purposes.
In DNA-based cryptography, DNA is used as the information carrier. DNA is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms and some viruses. The main role of DNA molecules is the long-term storage of information. DNA is often compared to a set of blueprints or a recipe, or a code, since it contains the instructions needed to construct other components of cells, such as proteins and RNA molecules.
The principle of DNA steganography is to conceal the information which needs encryption in the large numbers of irrelevant DNA sequence chains. This way of decoding like looking for a needle in a heap of hay which make attackers difficult to ascertain the correct DNA fragment. Only the proper receiver can find the correct DNA fragment based on the conventional information in advance between the two parties as well as requires the information which concealed in it. One can argue that steganography is not actually encryption, since plaintext is not encrypted but only disguised within other media.
Data-hiding techniques are becoming increasingly important in a variety of digital media applications, including annotation, ownership protection and authentication. Most previous work has focused on how to protect information from intruders using cryptology. However, cryptology is not suï¬ƒcient when transmitting data in an unsecure, public channel. Data hiding is diï¬€erent from encryption in that encryption concerns protecting the content of messages, while data hiding concerns concealing the embedded data's very existence using a steganographic approach. An increasing number of applications drive the development of data-hiding techniques.
Although different methods of Data hiding techniques were introduced including: invisible inks, microdots, digital signatures, and spread spectrum communications, DNA-based Data hiding techniques have been recently added to that list. These techniques depend on the high randomness of the DNA to hide any message without being noticed.
This paper introduces a new cryptography method based on the central dogma of molecular biology. Here real DNA is not utilized to perform the cryptography process, rather this method will simulate the critical processes in central dogma. Data hiding using Insertion technique is combined with DNA-based RSA encryption to enhance security, effectiveness and applicability.
It is Adleman, with his pioneering work ; set the stage for the new field of bio-computing research. His main idea was to use actual chemistry to solve problems that are either unsolvable by conventional computers, or require an enormous amount of computation. By the use of DNA computing, the Data Encryption Standard (DES) cryptographic protocol can be broken. In DNA steganography, A DNA encoded message is first camouflaged within the enormous complexity of human genomic DNA and then further concealed by confining this sample to a microdot. Recent research considers the use of the Human genome in cryptography.
The one-time pad cryptography with DNA strands, and the
research on DNA steganography (hiding messages in DNA),
are shown in  and .
However, researchers in DNA cryptography are still looking at much more theory than practicality. The constraints of its high tech lab requirements and computational limitations, combined with the labor intensive extrapolation means. Thus prevent DNA computing from being of efficient use in today's security world.
Another approach is lead by Ning Kang in which he did not use real DNA computing, but just used the principle ideas in central dogma of molecular biology to develop his cryptography method. The method only simulates the transcription, splicing, and translation process of the central dogma; thus, it is a pseudo DNA cryptography method.
There is another investigation conducted by which is based on a conventional symmetric encryption algorithm called "Yet Another Encryption Algorithm" (YAEA) developed by Saeb and Baith . In this study, he introduces the concept of using DNA computing in the fields of cryptography in order to enhance the security of cryptographic algorithms. This is considered a pioneering idea that stood behind my work in this paper.
OVERVIEW OF DNA
Deoxyribonucleic acid 'DNA'
DNA is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms and some viruses. The main role of DNA molecules is the long-term storage of information[-]. DNA is often compared to a set of blueprints or a recipe, or a code, since it contains the instructions needed to construct other components of cells, such as proteins and RNA molecules. The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in regulating the use of this genetic information.
The DNA double helix is stabilized by hydrogen bonds between the bases attached to the two strands. The four bases found in DNA are adenine (abbreviated A), cytosine (C), guanine (G) and thymine (T). These four bases are attached to the sugar/phosphate to form the complete nucleotide.
The genetic code
The genetic code consists of 64 triplets of nucleotides. These triplets are called codons. With three exceptions, each codon encodes for one of the 20 amino acids used in the synthesis of proteins. That produces some redundancy in the code: most of the amino acids being encoded by more than one codon.
The genetic code can be expressed as either RNA codons or DNA codons. RNA codons occur in messenger RNA (mRNA) and are the codons that are actually "read" during the synthesis of polypeptides (the process called translation). But each mRNA molecule acquires its sequence of nucleotides by transcription from the corresponding gene.
The DNA codons is read the same as the RNA codons except that the nucleotide thymidine (T) is found in place of uridine (U). So in DNA codons we have (TCAG) and in RNA codons, we have (UCAG).
Central dogma of molecular biology
GCU, GCC, GCA,GCG
UAA,UGA,UAGThe central dogma of molecular biology describes the flow of genetic information within a biological system. The dogma is a framework for understanding the transfer of sequence information between sequential information-carrying biopolymers, in the most common or general case, in living organisms. There are 3 major classes of such biopolymers: DNA and RNA (both nucleic acids), and protein. There are 3Ã-3 = 9 conceivable direct transfers of information that can occur between these. The dogma classes these into 3 groups of 3: 3 general transfers (believed to occur normally in most cells), 3 special transfers (known to occur, but only under specific conditions in case of some viruses or in a laboratory), and 3 unknown transfers (believed never to occur). The general transfers describe the normal flow of biological information: DNA can be copied to DNA (DNA replication), DNA information can be copied into mRNA (transcription), and proteins can be synthesized using the information in mRNA as a template (translation).
Phase 1: Encrypting Data Using a DNA and Amino Acids-Based RSA
RSA is an algorithm for public-key cryptography that is based on the presumed difficulty of factoring large integers, the factoring problem. RSA stands for Ron Rivest, Adi Shamir and Leonard Adleman, who first publicly described it in 1977. Whether breaking RSA encryption is as hard as factoring is an open question known as the RSA problem. The RSA algorithm involves three steps: key generation, encryption and decryption.
In the proposed algorithm, the plaintext is sent to the encryption process and it is lead through a number of steps to produce the DNA encrypted form.
The plaintext is first converted into the ASCII format and then to corresponding binary format. The binary form of data is transferred to DNA form by using a binary coding rule. 4! possible binary coding rules can be obtained. Table 1 shows one such binary coding rule.
Table 1 : Binary coding rule
After converting the whole text into A,C,G,T format, group of 3 letters are taken and checked whether an amino acid exists representing this 3 letter code. Every 3 letter sequence this denotes is an amino acid and we represent these 3 letters using single alphabets. The DNA form is transferred to the amino acids form according to any of the 26! tables dynamically generated from the standard universal table of Amino acids and their codons given in Table 2. The non-encoding sequences are taken out and we split the text into two: Encoding sequence or amino sequence and non-encoded or the key sequence or ambiguity sequence. The key is supplied to an RSA algorithm for another layer of encryption and the output of the whole encryption process is two text sequences-encoded amino and encrypted ambiguities (key sequence).
Table 2: Amino acids and their 64 codons
Constructing the alphabet table
In the Table 2 , we have only 20 amino acids in addition to 1 start and 1 stop. The letters we need to fill are (B,J,O,U,X,Z). So we will make these characters share some amino acids their codons. The start codon is repeated with amino acid (M) so we will not use it. We will assign to (B) the 3 stop codons. Assign space to (J). We have 3 amino acids (L,R,S) having 6 codons. By noticing the sequence of DNA of each, we can figure out that each has 4 codons of the same type and 2 of another type. Those 2 of the other type are shifted to the letters(O,U,X) respectively. Letter (Z) will take one codon from (Y), so that Y:UAU, Z:UAC. Now the new distribution of codons is illustrated in Table 3.
GCU, GCC, GCA, GCG
UAA, UGA, UAG
CCU, CCC, CCA, CCG
CGU, CGC, CGA, CGG
UCU, UCC, UCA, UCG
GGU, GGC, GGA, GGG
ACU, ACC, ACA, ACG
AUU, AUC, AUA
GUU, GUC, GUA, GUG
CUU, CUC, CUA, CUG
UACTable 3 : New Distribution for codons on English alphabet
Counting the number of codons of each character, we will find the number varies between 1 and 4 codons per character. This number is called the 'Ambiguity' of the character. The concept that one character can have more than one DNA representation is itself an addition to confusion concept that enhances the algorithm strength.
Fig 1: Flowchart of DNA-Based RSA
Phase 2: Hiding Encrypted Data into Reference DNA sequence Using an Insertion Technique
This phase deals with the hiding of encrypted DNA sequence in a Reference DNA sequence using the Insertion technique. Figure 2 is a block diagram which illustrates this phase.
First, divide both encrypted DNA sequence and reference DNA sequence into segments where each segment contains a random number of DNA nucleotides so the segments are not ï¬xed in length. Next, insert each segment of encrypted DNA sequence before the segments of reference DNA sequence respectively. Finally, we get a faked DNA sequence with the encrypted DNA sequence hidden.
Input :- A reference DNA sequence[Ref], random number seeds K and R, DNA coded amino sequence obtained from the encryption[DC].
Output:-A Faked DNA sequence [C] with the secret message[DC] hidden.
Step 1: Start
Step 2: Generate random number sequence (r1,r2,r3,â€¦.) using random number seed R and random number sequence (k1,k2,k3,â€¦) using another random number seed K then Find the smallest integer [t] such that âˆ‘i=1t ri >[DC]
Step 3: Divide [DC] into segments with lengths r1,r2,r3,â€¦â€¦rt-1 in order to denote these segments by DC1,DC2,DC3,â€¦â€¦DCt-1 and let the residual part be 0DCt.
Step 4: Divide [Ref] into segments with lengths k1,k2,k3,â€¦â€¦,kt-1 in order and truncate the residual part of [Ref] to denote these segments by Ref1,Ref2,Ref3,â€¦..Reft-1.
Step 5: Insert each [DCi], 1â‰¤ iâ‰¤ t-1 of [DC] before [Refi] of [Ref]. Finally, put DCt in the end of the sequence to produce [C] Faked DNA sequence with the encrypted data hidden.
Step 6: Stop
Fig 2: DNA-Based RSA Insertion method diagram
The decryption process is simply the inverse of the encryption process. The receiver processes every sequence received and extracts the plaintext. The encrypted ambiguity sequence is then encrypted using RSA to retrieve the real key sequence. Then, using this key, the amino sequence is transferred to DNA sequence. This is then converted to binary using the binary coding rule and corresponding ASCII values are taken. Finally the plaintext is retrieved.
As security is one of the most important issues, the evolve of cryptography and cryptographic analysis are considered as the fields of ongoing research. The latest development in this field is DNA cryptography. It emerged after the disclosure of computational ability of DNA. DNA computing, a new computational paradigm brings potential challenges and opportunities to traditional cryptography. So we have to look forward DNA cryptography, a new field for powerful unbreakable algorithms. If DNA cryptography is necessary to be developed, the advantages inherent in DNA should be fully explored such as developing nanoscopic storage based on the tiny volume of DNA, realizing fast encryption and decryption based on the vast parallelism and utilizing difficult biological problems that one can utilize. Thereby, DNA cryptography does not absolutely repulse traditional cryptography and it is possible to construct hybrid cryptosystem of them.
So my proposed system is RSA based on DNA. It introduces some modiï¬cations to the RSA processing by using some Biological concepts such as DNA and amino acids structures. Here real DNA is not utilized, rather the project simulates the critical processes in Central Dogma of Molecular Biology. By advent of biological aspects of DNA sequences to the computing areas, new data hiding methods have been proposed by the researchers, based on DNA sequences. Recently, Harvard researchers have been able to use sequencing technology to store 70 billion copies of a yet-unpublished book in DNA binary code. So my proposed framework consists of mainly two stages: the first encrypts the Plaintext using Amino acid and DNA based RSA. The second applies a secure insertion method to hide the encrypted DNA ciphertext into some reference DNA sequence for an increased level of security. Thus the proposed scheme can not only encrypt secret information into DNA sequences but also hide the encrypted data into another reference DNA sequence. Therefore, it is diï¬ƒcult for an attacker to detect whether or not there are secret messages hidden in a DNA sequence without knowing the embedding parameters.