This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
In this chapter, origin and need of data integrity is defined and explained. This chapter is basically concerned with all the background knowledge of validating integrity in digital system. It entails the general history regarding the integrity technology, integrity characteristics, methods of verifying integrity also a brief knowledge about hash functions applied on data to verify integrity.
Data integrity is the method in which data has not at all been changed by an illegal way since the time it was created, transmitted, or stored by a legal source. Checking of data integrity involve that only a subset of all candidate data items assure specific standard differentiate the adequate from the inadequate. 
Cryptographic techniques for data integrity mostly depend upon either on authentic channels or secret information. The particular concentration of data integrity is on the bitwise composition of data. The procedure which overthrow integrity involve bits insertion, including entirely new data items from deceptive sources; deletion of bits (short of deleting entire data items); re-ordering of bits or groups of bits; inversion or substitution of bits; and any combination of these, such as message splicing (re-use of proper substrings to construct new or altered data items). Data integrity includes the notion that data items are complete. For items split into multiple blocks, the above alterations apply analogously with blocks envisioned as substrings of a contiguous data string.
Graham Shaw in his paper digital document integrity provides a technique to verify integrity of digital documents in order to save from attacks. The technique proposed is digital watermarking.
Digital watermarking is a modern form of the ancient art of steganography which is, in essence, the ability to hide information inside other information. Such files include images, documents, audio and video data and include a wide range of formats. The approach provided by Shaw prevents removal of the watermark once applied and also, through control of the key, unauthorized application of watermarks to adulterated data. It is important to note that digital watermarking is not a form of data encryption in this the original record remains fully accessible but can be easily combined with encryption technology. Signum has commercialized two forms of its digital watermarking: 'Sure sign' for copyright ownership and 'VeriData' for data authentication. Both share the same essential features as 
Application though a secure permutation key
No requirement for additional metadata
No increase to data files sizes
Embedded codes survive high levels of data compression and print output.
From all above it is stated that the potential use of digital watermarks is very broad and customizable to a particular application. 
2.2 Methods Providing Integrity
There are many technologies applied to verify integrity of data and to insure not to change data buy unauthorized party. Some of the methods are being discussed which can be used to secure the documents.
2.2.1 Integrity with MAC
Message Authentication Codes (MACs) are designed particularly for fields where data integrity is essential. The message originator x calculates a MAC hk(x) of the message by using a secret Mac key k which is mutually joined with the deliberate receiver, and sends both (effectively x jj hk(x)). The receiver resolve by some method the claimed source identity, separates the received MAC from the received data, independently computes a MAC over this data using the shared MAC key, and compares the computed MAC to the received MAC. The recipient interprets the agreement of these values to mean the data is authentic and has integrity that is, it originated from the other party which knows the shared key, and has not been altered in transit. This corresponds to Figure 2.1. 
Figure 2.1: MAC only
2.2.2 Integrity with MDC and Authentic Channel
It is not feasible in certain cases to provide integrity with the help of only secret key. It may be reduced by hashing a message and protecting the authenticity of the hash via an authentic channel. Hash code of message data is computed by originator using an MDC, sends out the data to the receiver through an unsecured channel, and transmits the hash-code over an independent channel known to provide data origin authentication. Such authentic channels may include telephone (authenticity through voice recognition), any data medium (e.g., floppy disk, piece of paper) stored in a trusted place (e.g., locked safe), or publication over any difficult-to-forge public medium (e.g., daily newspaper). The recipient independently hashes the received data, and compares the hash-code to that received. If these values agree, the recipient accepts the data as having integrity. This corresponds to Figure 2.2. 
Some of the applications related to this field are distribution of software or public keys by untrusted networks, and virus protection software. A common example of combining an MDC with an authentic channel to provide data integrity is digital signature schemes such as RSA, which typically involve the use of MDCs, with the asymmetric signature providing the authentic channel.
Figure 2.2: MDC and Authentic Channel
2.2.3 Integrity with Encryption
The digital signature give guarantee related to both authentication and integrity, a common thought is that alone encryption does not provide both of them. A very common thinking is that by using encryption alone one can verify integrity of data that is not right to provide authentication and integrity there are some more methods applied with encryption. 
22.214.171.124 Data Integrity using Encryption and an MDC
This method is applied when both integrity and confidentiality are necessary .The technique employs that the message originator say x calculates a hash value H=h(x) of the message, join it to the data, then encrypt the improved message with the help of a symmetric encryption algorithm E with the shared key, results in a cipher text
C = E k(x jj h(x)) (2.1)
Then this cipher text is send to the recipient, who verify the key used for decryption, and split the data x0 from the received hash H0.If both matches then the received data is acknowledged as both being authentic and having integrity. [ 1]
This corresponds to Figure 2.3. The purpose is that the encryption guards the attached hash, and that it be infeasible for an attacker without the encryption key to alter the message without disturbing the correspondence between the decrypted plaintext and the recovered MDC. 
Figure 2.3: MDC and Encipherment
126.96.36.199 Data integrity using Encryption and a MAC
In many cases it is reffered that use a MAC instead of MDC in the method of equation (2.1).For this process a MAC algorithm as hk0 swap the MDc h,as shown below the message sent is
C0 = Ek(x jj hk0 (x)) (2.2)
To use a MAC here present the advantage that should the encryption algorithm be beaten, and the MAC still offer integrity. 
The common disadvantage is the necessity of organization of both an MAC key and encryption key. Measures must be taken to make sure that dependence between the encryption algorithm and MAC key do not lead to security weaknesses, and as a general recommendation these algorithms should be independent [14 ]
2.3 Cryptographic Hash Algorithms
A large number of hash functions are being developed until now, among all of them some have been found to be weak and advised not to use them .Even if for a hash function which has never been broken, an unbeaten attack a against a destabilized variant thereof may weakened the expert's belief and lead to its refusal .For illustration, in 2004 many flaws were originated in a number of hash algorithms, that involve SHA-0, RIPEMD, and MD5.these flaws created problems for the security of other algorithms which are derived from these hash functions, as SHA-1 (a strengthened version of SHA-0), RIPEMD-128, and RIPEMD-160 (both strengthened versions of RIPEMD). There is a short description of some commonly used hash functions is given as under. 
The hash functions are considered to take input as a string of some length and create an output of fixed-length hash value. The hash algorithm must have ability to survive in all forms of cryptanalytic attack. So a hash function should have following properties:
Pre image resistance: For a given hash h it should be difficult to locate any message m such that h = hash (m). This idea is related to that of one way function. The functions that not have this property are weak to pre image attacks. 
Second pre image resistance: For a particular input as m1, it should be impossible to find another input as m2 (not identical to m1) such that hash (m1) = hash (m2). This behavior is at times considered as weak collision resistance. Functions that lack this property are weak to second pre image attacks. 
Collision resistance: Another feature of hash function is that for two different messages such as m1 and m2 it is difficult to find hash as hash (m1) = hash (m2). Such a pair is referred as (cryptographic) hash collision, and this property is sometimes called as strong collision resistance. This needs a hash value at least twice as long as what is required for pre image-resistance, otherwise collisions may be found by a birthday attack.
Following are some most useful applications of hash functions.
A significant application of secure hashes is authentication of message integrity. This involves determining if any alterations to a message are made or not. It can be achieved by equating message digests calculated before, and after, transmission of message.(or any other event). 
A message digest can also perform as a means of reliably identifying a file, several source code management systems, that involve Mercurial and Monotone, Git, use the sha1sum of various types of content (file content, directory trees, ancestry information, etc) to uniquely identify them.
Another application of hash algorithms is verification of passwords. Passwords are generally not saved as visual text, for apparent reasons, but stored as in digest form. For a user authentication the password given by the user is hashed and compared with the saved hash. This is sometimes computed as one-way encryption. For both performance and security reasons, most algorithms of digital signatures identify that only the message digest be "signed", not the whole message.
Hash algorithms can also be used in the production of pseudorandom bits. Hashes are applied to recognize the files on peer-to-peer file sharing networks. For example, in an ed2k link, an MD4-variant hash is combined with the file size, providing sufficient information for locating file sources, downloading the file and verifying its contents. Magnet links are another example. Such file hashes are often the top hash of a hash list or a hash tree which allows for additional benefits. 
The Secure Hash Algorithm (SHA-1), based on MD4, was proposed by the U.S. National Institute for Standards and Technology (NIST) for certain U.S. federal government applications.
The hash-value is 160 bits, and five 32-bit chaining variables are being used. The compression function has four rounds, using the MD4 step functions f, g, and h as follows: f in the first, g in the third, and h in both the second and fourth rounds. Each round has 20 steps instead of 16.SHA-1 uses four non-zero additive constants. The byte ordering used for converting between streams of bytes and 32-bitwords in the official SHA-1 specification is big-endian. 
MD5 was designed as a strengthened version of MD4, prior to actual MD4 collisions being found. It has enjoyed widespread use in practice. It has also now been found to have weaknesses. It produces an output of 128-bit hash value. MD5 has been employed in a wide variety of security applications, and is also commonly used to check the integrity of files. The input is operated in 512-bit blocks. The MD5 algorithm is designed to be quite fast on 32-bit machines. In addition, it does not require any large substitution tables, that is, it can be coded quite compactly. MD5 is slightly more complex and slower than MD4 but it improves the security level in design. 
RIPEMD-160 (RACE Integrity Primitives Evaluation Message Digest) is a 160-bit message digest algorithm developed in 1996.It is designed by Hans Dobbertin, Antoon Bosselaers, and Bart Preneel. It is an improved version of RIPEMD, which in turn was based upon the design principles used in MD4, and is similar in performance to the more popular SHA-1.RIPEMD-160 was designed in the open academic community, in contrast to the NSA designed SHA-1 and SHA-2 algorithms. The RIPEMD-160 takes input as a message in 512-bit blocks and return to a 160-bit message digest as output. The added complexity and number of steps of SHA-1and RIPEMD-160 does bring about a slow computation on comparing to MD5. [18 ]
SHA-256 is novel hash function computed with 32- words. This hash function works on 512 bits of message blocks that produces a 256-bit message digest. Related 32-bit words of the hash values from successive message blocks are added to each other to form the message of the whole message. 
2.3.7 Performance Results of Hash Functions
Table 2.1  summarizes features of MD5, SHA-1, and RIPEMD-160. The comparison is given on the basis of the performance of MD -like hash functions: RIPEMD-160, RIPEMD-128, RIPEMD, SHA-1, MD5, and MD in Table 2.2.
Basic Unit of Processing
Number of Steps
Primitive Logical Functions
Table 2.1 Comparison of MD5, SHA-1, and RIPEMD-160
Table 2.2 Performance of MD -like hash functions
2.4 State of the Art Today
Chet Hosmer provide the overview of used integrity techniques as there are a number of methods have been applied by the information security and computer science to the field of digital evidence till now. Checksum is the method used for checking errors in digital data. Typically a 16- or 32-bit polynomial is applied to each byte of digital data that is trying to protect. The result of this is a small integer value that is 16 or 32 bits in length and represents the concatenation of the data. This integer value must be saved and secured. At any point in the future the same polynomial can be applied to the data and then compared with the original result. If the results match some level of integrity exists. Common types used are CRC 16 and CRC 32.Some advantages of checksum are as they are easy to compute, also fast, include small data storage, they are useful for detecting random errors. The drawbacks of this technique are that it gives low guarantee against nasty attacks. 
Other method is by using one-way hash algorithms as MD2, MD4, MD5 and SHA for preserving digital data against illegal change. The process generates a fixed length large integer value (ranging from 80 - 240 bits) representing the digital data. The process is known to have one-way weakness because it has two distinctive properties. First for the known hash value it is hard to form new data that results in the same hash. The second property is that by giving the first created data it is hard to get any other data identical to same hash value. The advantages are it is easy to process. It can identify both malicious modification and random errors. The main drawbacks are that it must preserve secure storage of hash values. It does not attach uniqueness with the data and also does not connect data with the time. .
Digital signature is another secure method of binding the identity of the signer with digital data integrity methods such as one-way hash values. All of the procedures apply a public key crypto-system in which the signer apply a secret key to create a digital signature. Anybody can verify the signature formed by the help of the published public key certificate of the signer.The common procedures used for digital signatures are RSA, DSA and PGP. The positive points of this method are that it binds identity to the integrity operation. It inhibits the illegal restoration of signature unless private key is compromised. The limitations of this technique are that it is slow. Digital Signature must defend the private key and does not link data with time.