Data integrity requirements

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Chapter 5 XML data integrity based on concatenated hash function

This chapter analyzes the XML data integrity requirements in practice. Based on the summarized XML data integrity requirements, the following section build the integrity model CSR for XML data. The specifications for integrity model express also described. After evaluation and comparison, the advantages and disadvantages has been given at the end. The major works of this chapter has been published on the International Journal of Computer Science and Information Security (Liu et al., 2009b).

5.1 Introduction

General applications of data integrity could exist in many domains, including e-government, e-commerce, e-financial services, e-business, e-banking, e-learning, e-healthcare, mobile communications, heterogeneous networks, digital factories, multi-agent systems, and grid computing (Wu et al., 2002; Chen and Lu, 2004; Rushinek, 2002; Boritz and No, 2005; Jones et al., 2000; O'Neill, 2007; Yee et al., 2006; Blobel, 2004; Dankers et al., 2002; Ekelhart et al., 2008; Karnouskos, 2005; Woerner and Woern, 2005; Oliveria et al., 2006; Cody et al., 2008). For example, Wu and Chen described the need for data integrity when official documents are being transmitted between government agencies for e-government in Taiwan (Wu et al., 2002; Chen and Lu, 2004). O'Neill pointed out the importance of data integrity through an assessment of a bank's web service (O'Neill, 2007). IBM gives an example of data integrity as follows: Assume the data is a funds transfer and the hacker alters a random piece of the data that happens to be the account number. When the bank decrypts the data, the account number is not a valid account; therefore, the data tampering is detected and the transaction is not completed. However, assume instead that the data altered by the hacker node is the amount of money and, changed it from 1000 units to 9000 units (IBM, 2008). In this case, the transaction would be completed using the incorrect amount. Therefore, research into this area would be of great benefit.

However, existing integrity models generate a digest value for XML data content without considering XML data features. For non-XML data formats, a user can directly generate digest value of the data content to ensure integrity, but protecting data content integrity alone is not enough for XML data. For example, a signed XML data can be copied to another document but still keep signature valid. This problem can be utilized by an attacker to forge a document with a valid signature. Therefore, besides data content integrity, XML data integrity should also consider element location information and element context meaning under a fine-grained security situation. Location information of an XML element refers to the position of this element in the XML data (Mclntosh and Austel, 2005). An element has an entire meaning related to its position in XML data, and will lose original meaning if the position has been changed. Thus, XML data integrity should also protect location information of an XML element in XML data. Another factor which affects the meaning of XML elements is the context relationship. For example, the element will no longer have its original meaning without context relationship in an XML data, and the paper defines this as context referential integrity, in other words, an XML element has an entire meaning only related to other elements in the same XML data, but there is no mechanism which can be used to protect this meaning in an existing integrity model for XML data.

Furthermore, most of these models are based on the Merkle hash tree (Devanbu et al., 2001; Bertino et al., 2004), when generate digest value, the Merkle hash tree will increase virtual nodes. The hash times will also be increased because of these virtual nodes, and this leads to a low efficiency on digest value-generation.

Motivated by the problems above, this chapter aims to present XML data integrity requirements combined with XML data features. Based on the XML data integrity requirements presented, it proposes an integrity model for XML data, and improves the efficiency of digest value-generation for XML data.

This chapter presents an XML data integrity model named as CSR. The model consists of three parts, and CSR is an acronym for these parts: 'C' for content integrity, 'S' for structure integrity, and 'R' for context referential integrity. The three parts are combined with a concatenated hash function. Content integrity is used to ensure XML data content integrity by using a concatenated hash function. Structure integrity is used to protect the location information of an element in XML data by hashing an absolute path string from the root node. Finally, context referential integrity protects the integrity of context-related elements. This paper also describes the combination of the model with XML specification, and integrates the integrity model presented into the XML signature. Through evaluation, the integrity model presented has a higher efficiency on digest value-generation than the Merkle hash tree-based integrity model for XML data.

The major contribution of this chapter is the XML data integrity requirement considering XML data features, and satisfies the requirements with an integrity model for XML data with a higher efficiency. The detail is as follows.

  1. Give a description of XML data integrity requirements related to XML data features under fine-grained XML security. Three aspects considered are content integrity, structure integrity, and context referential integrity.
  2. Based on presented requirements, an integrity model for XML data has been built based on concatenated hash function.
  3. Based on a concatenated hash function to generate digest value for XML data, this method has a higher efficiency than the Merkle hash tree-based digest value-generation process.

5.2 Theory guidance for XML data integrity

In order to ensure integrity, there are perfect means to assure the information integrity, such as hashes or check-sum mechanisms (Geuer-Pollman, 2004). Both approaches can be used to detect changes to the original message. However, hashes are more focused on malicious changes while check-sums are deployed to detect coincidental changes.

In this chapter, data integrity is ensured by a hash function mechanism. The reasons for adopting a hash function as an integrity method is as follows (Geuer-Pollman, 2004).

  • A checksum is useful in detecting accidental modification such as corruption to stored data or errors in a communication channel.
  • Checksums provide no security against a malicious agent as their simple mathematical structure makes them easy to break. An example is CRC series.
  • A hash function has one-way and collision-resistant features with a complex mathematical model, it provides a higher level security compared to a check-sum.

5.3 XML data integrity model CSR based on concatenated hash function

The integrity model in this paper referred to the model presented by DOM-HASH and Bertino although the construction process is different. The integrity model presented by Bertino is based on Merkle hash tree (Bertino et al., 2004). In this paper, the integrity model CSR is constructed based on the theory of the concatenated hash function. Just like the Merkle hash tree, the concatenated hash function also is designed to handle tree structure hash process. The reasons for adopting a concatenated hash function to construct the integrity model for XML data is as follows.

  • Concatenated hash functions can handle arbitrary tree structure, but the Merkle hash function mainly deals with binary tree structure (Merkle, 1989). Thus, a concatenated hash function is more suitable to handle XML data.
  • Concatenated hash functions can decrease the numbers of hash processes, thus it has higher efficiency in digest value-generation for XML data than the Merkle hash tree.

The basic idea of integrity model CSR is that content integrity, structure integrity, and context referential integrity are combined with a concatenated hash function. The chapter first analyzes the requirements of XML data integrity, then describes the model definition.

5.3.1 XML data integrity requirements

In order to illustrate the requirement of XML data integrity, an example has been given in figure 4.1, and it is a real application document derived from a website. Note that some details have been omitted.

  • Content integrity (CI)
  • The XML data content refers to element name, attribute, and values of an element or sub XML data. Content integrity means that XML data content will not be changed or destroyed in transmitting or storage. This is ensured by generating a digest value of XML data. As shown in figure 5.1, content integrity for element 'Title' should include tag name 'Title' and related value 'Certificate of calibration'.

  • Data structure integrity (STI)
  • An XML data structure integrity protects the location information of an element in XML data (Mclntosh and Austel, 2005). It means that if the location of an element in the XML data has been changed, it will lead to an invalid verification. Location information of an XML element refers to the position of this element in the XML data. Element location information consists of three parts: parent, level, and order in sibling. This position helps people to understand the meaning of the element. In other words, an element will have different meanings when it is located in different positions in XML data. As shown in figure 5.1, there are three 'Description' elements in line 04, 07, 11. The 'Description' element has a completely different meaning related to its location: line 04 is a description for certificate information; line 07 is a description for measurement; line 11 is the description for measured results. Thus, location information for an XML element is an important aspect and needs to be protected.

  • Context referential integrity (CRI)

In most situations, when adopting XML data format, without considering element context relationship, only one element will also lose its original meaning. For example, as shown in figure 5.1, the measurement result has a completely meaning related to measurement method or technique deployed in the certificate. These two elements are generated by different responsibilities. So, it can not be signed by only one person, or signed together, because each unit is only responsible for its own role. Under this situation, element 'Certificate/Results' has a completely meaning that is only related to element 'Certificate/Measurements'. It means this kind of testing results occurrence corresponds to a given measurement. In other words, an XML element has an entire meaning only when related to other elements in the same XML data, and these elements have been defined as context-related elements in this paper.

Context referential integrity is used to protect context-related elements of an element in XML data. It will provide a binding between an element and context-related elements. This means if context-related elements of an element have been altered, it will also lead to an invalid verification.

In summary, the basic requirement for XML data integrity is that XML data has not been changed or destroyed or lost in an unauthorized manner. Considering the features of XML data as analyzed above, the detailed integrity requirements for XML data are as follows.

  • XML data content, including element name, value, and attribute, has not been changed, destroyed, or lost.
  • Element location information, including element parent, level, and order in sibling, should be protected in an XML data.
  • In order to ensure a completely meaning of an element in an XML data, context-related elements should be protected together with this element.

5.3.2 Definition of integrity model CSR

To develop a formal model for XML data integrity this paper introduces a definition for XML data as in definition 5.1. Based on the requirement for XML data integrity presented above and XML data definition, the integrity model CSR for XML data is defined as follows:

5.4 Combination with XML specification

XML security research has two sides: firstly, how traditional security technologies can be used to solve problems existing in XML data; secondly, how to describe the security technologies in XML format. Based on the theory model presented for XML data integrity, this sub section describes how the theory model is expressed in XML format. The XML data content integrity has been described in the XML signature specification by W3C, thus, this section only gives the description for structure integrity, and context referential integrity.

5.4.1 Specification for structure integrity

The structure integrity is ensured by three elements as follows.

  • The 'STIGenerate Algorithm' is an element, which describes the algorithm used to generate the location information of an element in the original XML data.
  • The content of the 'DigestMethod' element is the definition of digest algorithm adopted in this specification, and the default algorithm is SHA-1.
  • The content of the 'DigestValue' element shall be the base64 encoding of this bit string viewed as a 20-octet octet stream.

5.4.2 Specification for context referential integrity

In specification, context referential integrity includes four elements as follows:

  • The 'CRIGenerate Algorithm' is an element, which describes the algorithm used to generate the digest value of context-related elements.
  • The content of the 'RelatedNode' is an element, which is used to record the context-related elements.
  • The content of the 'DigestMethod' element is the definition of digest algorithm adopted in this specification, and the default algorithm is SHA-1.
  • The content of the 'DigestValue' element shall be the base64 encoding of this bit string viewed as a 20-octet octet stream.

5.7 Summary

Data integrity is the fundamental for data authentication. A major problem for XML data authentication is that signed XML data can be copied to another document but still keep signature valid. This is caused by XML data integrity protecting. Through investigation, the paper discovered that besides data content integrity, XML data integrity should also protect element location information, and context referential integrity under fine-grained security situation. The aim of this paper is to propose a model for XML data integrity considering XML data features. This chapter presents an XML data integrity model named as CSR (content integrity, structure integrity, context referential integrity) based on a concatenated hash function. XML data content integrity is ensured using an iterative hash process, structure integrity is protected by hashing an absolute path string from root node, and context referential integrity is ensured by protecting context-related elements. Presented XML data integrity model can satisfy integrity requirements under situation of fine-grained security, and compatible with XML signature. Through evaluation, the integrity model presented has a higher efficiency on digest value-generation than the Merkle hash tree-based integrity model for XML data.

Writing Services

Essay Writing
Service

Find out how the very best essay writing service can help you accomplish more and achieve higher marks today.

Assignment Writing Service

From complicated assignments to tricky tasks, our experts can tackle virtually any question thrown at them.

Dissertation Writing Service

A dissertation (also known as a thesis or research project) is probably the most important piece of work for any student! From full dissertations to individual chapters, we’re on hand to support you.

Coursework Writing Service

Our expert qualified writers can help you get your coursework right first time, every time.

Dissertation Proposal Service

The first step to completing a dissertation is to create a proposal that talks about what you wish to do. Our experts can design suitable methodologies - perfect to help you get started with a dissertation.

Report Writing
Service

Reports for any audience. Perfectly structured, professionally written, and tailored to suit your exact requirements.

Essay Skeleton Answer Service

If you’re just looking for some help to get started on an essay, our outline service provides you with a perfect essay plan.

Marking & Proofreading Service

Not sure if your work is hitting the mark? Struggling to get feedback from your lecturer? Our premium marking service was created just for you - get the feedback you deserve now.

Exam Revision
Service

Exams can be one of the most stressful experiences you’ll ever have! Revision is key, and we’re here to help. With custom created revision notes and exam answers, you’ll never feel underprepared again.