This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
XML has becoming the standard way for representing and transforming data over the World Wide Web. The problem with XML documents is that they have a very high ratio of redundancy, which makes these documents demanding large storage capacity and large network band-width for transmission.
Because of their widely used, XML documents could be retrieved according to vague queries from naive users with poor background in writing a good query.
This report tries to cope with the previous two problems by designing a system with two stages. The first stage is the design of a new compression technique called XQPoint. This technique separates the XML document into containers and compresses each container using a back-end compressor which is suitable to the type of the data in this container. The other part of the proposed system is to design the vague query processor which separates each query into different sub-queries and retrieve relevant information from the compressed XML documents accordingly. Only the most relevant information will be decompressed and returned to the user. This research expects that the XQPoint will achieve better compression ratio and the query processor will be the first processor to deal with compressed XML documents to retrieve information according to vague queries.
The eXtensible Markup Language (XML) is a W3C standard which adopted and sustained by both the industry and the research community. In the recent years, we have witnessed an increasing volume of XML digital information that either created directly as an XML document or converted from another type of data representation. The importance of XML comes from different factors, its ability to represent different data types in one document, solving the problem of long-term accessibility, and to becoming the solutions to interoperability problem. (Al-Hamadani et al., 2009)
Due to the replication of the XML schema in each record, XML document is considered to be one of the self-describing data files, which means that these kinds of files have a lot of data redundancy in both of its tags and attributes (Ray, 2001). For all the above reasons the need to compress XML documents becoming increasingly dramatic. Furthermore, an extensive need evolved to retrieve information directly from the compressed documents and then decompress only the retrieved information. (Ferragina et al., 2006)
Because the wide range of XML documents usage and with different kinds of users, it is become an important issue to deal with all kinds of queries. Some of these queries may have imprecise constraints which cannot be processed directly due to the grammar limitation of the query languages. However, these types of queries, which are known as vague queries, appear to be common when the users of the XML documents have a little knowledge about the document structure, or they lack the skills on how to write a precise and meaningful query.
The cure to the previous two dilemmas is to design a compression technique that has the capability to retrieve information from the compressed version according to vague queries. Two types of XML compressors have been used. The first type is the non-queriable compressors which used to compress XML documents for archival purposes. The second type is the queriable compressors which used to query the compressed XML documents. All the compressors belonging to the second type did not solve the problem of vague queries.
This report proposes a new XML compressor technique called "XQPoint" which consists of two stages. In the first stage, it separates the data part of the XML document from the structure part, then compresses the data part using suitable compressors depending on the type of the data, while the structure part is compressed using the fixed-point dictionary-based compressor. The second stage is to process the vague queries by decomposing them into multiple sub-queries, retrieve information from the compressed XML document according to each sub-query, combine the retrieved results into groups, and finally return only the most relevant groups.
2. Theoretical perspective
This section describes the background of the research. It composes of three parts; XML compression techniques, a brief definition to XPath and NEXI, query types, and answering vague queries.
The first sub-section describes the differences between XML compressors, gives a brief description of some of these compressors, and sets a comparison between them. Since our system will use NEXI query language, a brief description will be given about the structure of this type of quires. We will discuss the types of queries and consternate on the vague query type.
2.1 XML compression techniques
Recent, large numbers of XML compression techniques have been proposed. Each of which has different characteristics. This section discusses the differences between these compressors and their main features.
XML compressors can be classified into two classes either to be XML-blind or XML-conscious compressors. XML-blind or general purpose compressors deal with the XML document as a traditional text document ignoring its structure and apply the general purpose text compression techniques to compress them. These techniques can be classified into two main classes, either to be arithmetic compressors or dictionary compressors (Augeri et al., 2007, Augeri, 2008). The arithmetic compressors represent each string of characters using a fixed number of bits per character. PPM, CACM3, and PAQ are examples of this kind of compressors (Moffat., 1990, Cleary and Witten, 1984, Alistair et al., 1998). On the other hand, dictionary compression techniques substitute each string in the input by its reference in a dictionary maintained by the encoder. WinZip, GZIP, and BZIP2 are examples of this compression class (BZip2, 1996, GZip, 1992, WinZip, 1990).
On the other hand, XML-conscious compressors try to utilize the structural behaviour of XML documents in order to achieve better compression ratio and less time in comparative with the XML-blind type and to generate a usable XML compressed documents without the need to decompress these documents. XML-conscious compressors can be classified according to their ability to querying the compressed documents into two main sub-classes; these are queriable and non-queriable compressors.
2.1.1. Non queriable XML compressors:
This kind of compressors showed good compression performance but the resulting document cannot be queried without decompressing it. The main purpose of these processors it to achieve highest compression ratio for archival purposes. Examples of this type are:
XMill (Liefke and Suciu, 2000): This technique depends on compressing the structure (i.e. tags and attributes) of the XML document separately from its data by encoding the structure in a dictionary-based fashion and then passing it to a back-end compressor. All elements and attributes name are assigned with an integer number to be considered as a key to the dictionary. The data part grouped into containers depending on the type of that data and its path from the root. Each container is compressed separately using an appropriate compression technique suitable for the data type in that container.
Millau (Girardot and Sundaresan, 2000): In order to compress the structure of the XML documents, Millau takes advantage of the document schema if this schema is available. It is an extension of the WBXML (Wireless Application Protocol Binary XML) format, which is designed to reduce the size of XML documents for transmission purposes.
XWRT (Skibinski et al., 2007): This technique has similar ideas of XMill with a slight difference. The tags and attribute names in an XML document, which are normally have high frequency within the same document, are encoded by using semi-dynamic dictionary. The XML document first scanned to determine the frequent tags and attributes and put them in the dictionary. Another scan to the document should be processed in order to replace all the occurrences of the words in the dictionary with their dictionary index.
RNGzip (League and Eng, 2007): This technique depends on compressing the XML document on Relax NG schema by . First the schema should be accepted by the sender and the receiver. It acts as a key in the encryption and decryption process. Using this schema, RNGzip builds a tree automaton for the specific XML document. Only little information should be transmitted and the receiver then reconstructs the complete XML document.
2.1.2. Queriable XML compressors:
The main goal of this type of compressors is to provide the ability to the compressed version of the XML document to be queried without decompress them. The compression ratio for these compression techniques is much lower than the blind-XML or the non-queriable techniques. However, these techniques are important when dealing with resource-limited applications and with mobiles. In the next section a brief description to some of these techniques will be given.
XGrind (Tolani and Haritsa, 2000): In 2000, Tolani et al. introduced the first queriable XML compressor that has the ability to query the compressed file without full decompress it. It is considered to be a homomorphic compressor in which the compressed XML document can be viewed like the original XML document except that its tags, elements and attribute names are replaced with their corresponding encoding, which is a dictionary-based encoding. The data part of the document is encoded using Huffman encoding. For the purpose of querying the compressed document, XGrind's query processor finds the simple path to check whether it satisfies the path in the given query. The main drawback with XGrind is that it can handle only exact-match and prefix match queries on the compressed documents.
Xpress(Min et al., 2003): This technique uses the reverse arithmetic encoding method to encode the labels and paths of the XML document. Instead of representing each tag as a unique identifier, as XGrind did, Xpress encodes a label path as a distinct interval between 0.0 and 1.0. To encode the data part of the XML document, Xpress uses different compression techniques depending on the type of the data and without the need to the human interface.
XQZip (Cheng and NG, 2004): Unlike XQueC, XQZip groups the XML data into blocks and then applies gzip compressor on them. To process queries, it decompresses a specific block in order to retrieve its contents. This technique removes the duplicate structures occur in an XML document in order to improve query performance. Although XQZip processes different types of XPath queries, it is slower than other compressors because of its partial decompression.
XSeq(Lin et al., 2005): This technique adapts Sequitur, which is a grammar-based text compression, to compress the containers. Sequitur is a linear-time algorithm that makes a context-free grammar for the input string. XSeq uses this grammar to process the data values that match the given query and avoid scanning irrelevant data. Furthermore, the context-free grammar gives the ability to XSeq to process queries without even partial decompression.
XQueC (Arion et al., 2007): This technique uses the separation between data and structure of XML documents. The data stored on containers according to their path location within the document. Each container element is individually compressed. This process will positively affect the retrieval technique, since the complete container could be retrieved as a response to a query. With this idea, XQueC has the ability to process more types of queries on the compressed version without the need of the partial decompression that has been used in some previous compressors.
QXT (Skibinski and Swache, 2007): It is an extension of XWRT adding query-friendly concepts in order to process queries by partial decompression. This technique scans the XML file twice. In the first pass, a dynamic-dictionary created with the frequencies of its items. This dictionary is stored within the compressed file. In the second pass, QXT encodes the data and places them into the containers. When the size of a specific container exceeds a given threshold, the container should be compressed using a general-purpose compressor and written to disk. To process a query, QXT- query processor first searches the dictionary to determine which container should be decompressed. After decompressing a specific container, only the relevant data will be decoded to XML format.
2.2 XPath and NEXI query languages
XPath is an expression language not a programming or query language per se (Kay, 2008) . Its main object is to return a node or several nodes from an XML document according to a specific expression. XPath's three data model categories and the three operations categories are the main building block of XPath.
According to (Kay, 2004), a typical path expression in XPath, consists of a sequence of steps, separated by the «/» operator. Each step works by following a relationship between nodes in the document (Holman, 2002, Andrew Watt, 2002). Furthermore, there are several signs that are used in XPath expression, such as:
- «*» indicates the selection of all elements and attributes.
- «@ » indicates an attribute
- «//» indicates all the nodes below the current node.
Path expressions thus provide a very powerful mechanism for selecting nodes within an XML document, and this power lies at the heart of the XPath language (Kay, 2004, Sigurbjornsson and Trotman, 2003).
Narrowed Extended XPath I (NEXI) is an XML query language that follows the steps of XPath with some modifications. First, the NEXI retrieval engine designed to deduce the semantics from the query in reverse to XPath which has predefined semantics. Furthermore, NEXI extended the use of the contains() function, which is used by XPath to indicate an element that is contain a specific content, to be about() function to indicate the element to be about the content. This modification allows NEXI to process fuzzy queries. The language has extensions for question answering, multimedia searching, and searching heterogeneous document collections. (Trotman and Sigurbj¨ornsson, 2005)
Which requires a certain context (i.e., path) should be relevant to a specific content description (i.e., cont) (Trotman and Sigurbj¨ornsson, 2005).
2.3 Types of queries
There are different types of queries and most of them had been processed by the previous compression techniques in order to retrieve information from the compressed XML documents (Lin et al., 2005). Table 2 shows that main types of queries with brief description for each one.
2.4 Vague queries
Since vague queries are the central issue in our research, this section will give a brief description on such queries and how can be appeared in information retrieval domain
Many imprecise and uncertain data exist in the real world. Since it is important to answer any user's query with exact or approximate answers even if these queries have vague conditions, the need to process vague queries is increased rapidly. (Zhao and Ma, 2009)
Vague logic is the generalization of fuzzy logic (Kumar and Biswas, 2009). According to vague set theory by (Gua W. L. and Buehrer, 1993), vague search is a combination of the following search techniques:
- a-vague-equality search, and
- Vague-proximity search
Vague set (VS) is a combination of two sets: (1) 'evidence for', or truth membership tA(x) for the element x in the vague set A, and (2) 'evidence against', or false membership fA(x) for the element x in the vague set A, such that:
Furthermore, each membership µ(u) in a vague set A should be graded by the subinterval [tAu, 1-fAu], i.e 0=µ(u)=1. (Liu et al., 2008)
There are several kinds of queries that considered being vague. Some of these kinds are:
- Variation in spelling (ex: papers published at International Conference on Internet Computing, which can be spelled in ICIC)
- Low correlation between the query components (ex: Leader of the University of Huddersfield)
- Comparative words (ex: cheaper, most beautiful)
- Statistical concepts (ex: average, median).
3. Current state and Problem identification
Although vague queries have been processed before, using different approaches, all of them were dealing only with the original XML documents. Some of these approaches depend on the tree pattern of the XML document (Sihem Amer-Yahia et al., 2002, P. Mark Pettovello and Fotouhi, 2006), while others depend on decomposing the vague query into two sub-queries and retrieve information depending on the nested tags that distinguish XML documents from other text documents.(Vojkan Mihajlovi´c et al., 2006, Pehcevski, 2006, Andrew Trotman and Mounia Lalmas, 2006)
Neither the tree structure nor the nested tags still exist in the compressed XML documents. This makes it impossible for the existing techniques to retrieve information from the compressed documents.
Many systems nowadays are converting the plain XML documents to a compressed one before they answer the user's query, but none of them handled the vague query. Table 3 shows some examples of well-known XML compressors and the query types they processed. Figure 1 explains the experiment that has been done on all the queriable XML compressors.
For this reason, our research is focusing on how to handle a vague query in retrieving information from a compressed XML document.
3.1 Aims and Objectives
According to the literature review, the main aim of this research is to develop a system that solves the problem of retrieving information from compressed files according to vague queries. The objectives drawn from this research are:
- Develop XQPoint as a new compression technique. The input to XQPoint is an XML document and the output is the compression version of this document.
- Develop the query processor which process vague queries and retrieve information accordingly from the compressed XML document.
4. Initial System Architecture
The proposed system consists of two main stages. The first one is designing a new XML compression technique named XQPoint which converts the normal XML documents to a compressed version. The second is designing a retrieving technique that processes the NEXI vague queries type in order to retrieve the relevant information from the compressed document accordingly. The next two paragraphs describe the structure of the previous two stages.
4.1 Design the XQPoint compression technique
The following section describes first how the XQPoint treats each part of the XML document, then explain its architecture, and the set of data that will be used to test the compressor part of the system.
4.1.1 Architecture of XQPoint
XQPoint compressor treats the structure part of an XML document in different manner than treating the data part of the document. Figure 2 shows that the XML document should be analyzed first in order to separate its component into different containers. Each element or attribute name is associated with a unique pair of numbers [IDpre, IDpost]. This procedure is called structural identifiers, which has been used in some querying techniques (Al-Khalif A et al., 2002, Halverson et al., 2003, Grust, 2002, Paparizos et al., 2003). IDpre represents the order of the node under the preorder traversal of the tree, while IDpost represents the order of the node under postorder traversal. In this way the position of each node within the complete XML tree become recognizable. Figure 3 shows a sample document with the node's structural identifiers. The list of all pair of identifiers then are encoded into a binary code with 2* log2 (N) bits for each node, where N is the number of elements in the document. So, the total size of bits needed to store all the elements is N*2* log2 (N) bits.
In order to compress the data part of the XML document, XQPoint separates the data of the document into different containers according to their path position (an encoded path) from the root and type of these data. Each of these containers is compressed using different encoding techniques as follows:
- Integer data type: XQPoint uses variable-byte coding to encode the integer numbers. This approach stores the integer numbers which are of variable sizes into a byte sequence. The first left most seven bits of each byte stored a part of the integer value, while the right most bit of the byte indicates whether that byte is the last byte in the sequence (Stanford, M.Manikandan et al., 2006).
- Floating-point type: Predictive Floating-Point Compression is used to compress this kind of data. This approach splits both parts of the number into sign, exponent, and mantissa and treats each part with a context based arithmetic coder. (Cheng and NG, 2004)
- Enumerated data and text data: the enumerated data in XML are the attribute values that occur frequently, such as set of countries, departments in a university, or zip-codes. XQPoint uses the Fixed Point Number Representation Technique (FPNRT), which has been used in spelling checker as a compression technique, to convert the enumerated data words and the text data into numeric values. The numeric value is calculated using the following formula:
Is used to
Where n represents a data word length, ASC is the ASCII code of any letter in the word, and i is the letter's position in the word.
4.1.2 Data set used to test XQPoint compressor
To test XQPoint compression technique, we should choose a set of different types of XML documents. These documents should be in different sizes, number of tags, number of nodes, the depth of the longest path, and the data ratio (DR) which is:
Where, DRd is the data ratio for the XML document (d), (D) is the data, and (Si) represents the size of the XML document.
According to their main characteristics, XML documents can be categorized into three types: (Maneth et al., 2008, Sakr, 2009)
- Textual documents (TD): The DRd of this type of documents exceeds 70%. The structure of these documents is very simple. Books and articles are examples of this type.
- Structural documents (SD): In this type of XML documents, the DRd is less than 30%. Baseball box score and line-item shipping are two examples of this type.
- Regular documents (RD): These documents have DRd between 40 and 60 percent. Relational databases are examples of this type.
4.2 Design the query processor
The second part of the proposed system is the vague query processor. As shown in Figure 2 , the query should be manipulated by the query processor part of the XQPoint architecture. The structure of this part is adopted from the query decomposition technique proposed in (Al-Hamadani et al., 2009), which decomposed the vague query into two parts, QCO which refers to Content-Only retrieval, and QCAS which refers to Content-And-Structure retrieval.
As you can see in Figure 4, the query processor manipulates each query through different steps. The first step is the query decomposition step, which separated each query into different sub-queries. Figure 5 depicts an example of a NEXI vague query that passed through this step and be decompressed into four sub-queries.
The second step is the sub-query relaxation, where each sub-query is manipulated separately according to a specific XML document. The relaxation process could be made either by changing the node sequence, adding more nodes, deleting some nodes, or changing some attribute values. A threshold should be attached to each sub-query to in order to determine the level of relaxation that is made to it. The lower threshold means low level relaxation (i.e. fewer changes), while the higher threshold means high level of relaxation.
The third step of the query processor is retrieving the compressed XML document according to each relaxed sub-query. These retrieved documents are ranked according to the attached sub-query threshold. The final step is to group the retrieved documents depending on the main NEXI query. The Top-K ranked documents are decompressed and returned to the user.
5. Study plan
As the increasing importance of XML usage in storing and transferring data via the World Wide Web, there is an increasing need to decrease the size of XML documents and to deal with these documents in their compressed mode. And as XML documents are spread, their users are varying from an expert with strong queries to a naive user with vague queries. Due to the previous reasons, there is an increasing need to design a system that has the ability to achieve both, compressing the XML document and retrieving the most relevant information according to vague queries. This report proposes such a system that develops a new compression technique; XQPoint which separates data from the structure of XML documents and then compresses each part as applicable. Next, the vague query processor is used to decompose the vague query, process each sub-query separately, and then combine the retrieved results.
We expect that this technique achieve high compression ratio and efficiently retrieve information from the compressed version.
- GZip Compressor, . http://www.gzip.org/.
- AL-HAMADANI, B. T., ALWAN, R. F., LU, J. & YIP, J. (2009) Vague Content and Structure (VCAS) Retrieval for XML Electronic Healthcare Records (EHR). Proceeding of the 2009 International Conference on Internet Computing. USA.
- AL-KHALIF A, S., JAGADISH, H., PATEL, J., WU, Y., KOUDAS, N. & SRIVASTAVA, D. (2002) Structural Joins: A Primitive for Efficient XML Query Pattern Matching. IN IEEE (Ed. 8th International Conference on Data Engineering. San Jose, CA, USA.
- ALISTAIR, M., RADFORD, M. N. & IAN, H. W. (1998) Arithmetic coding revisited. ACM Trans. Inf. Syst., 16, 256-294.
- ANDREW TROTMAN & MOUNIA LALMAS (2006) Strict and Vague Interpretation of XML-Retrieval Queries. SIGIR'06.
- ANDREW WATT (2002) XPath Essentials, John Wiley & Sons Publishing.
- ARION, A., BONIFATI, A., MANOLESCU, I. & PUGLIESE, A. (2007) XQueC: A query-conscious compressed XML database. ACM Trans. Internet Technol., 7, 10.
- AUGERI, C. (2008) On Some Results in Unmanned Aerial Vehicle Swarms. Dept. of Electrical & Computer Engineering. San Diego, CA, USA, Air Force Institute of Technology.
- AUGERI, C. J., BULUTOGLU, D. A., MULLINS, B. E., BALDWIN, R. O. & LEEMON C. BAIRD, I. (2007) An analysis of XML compression efficiency. Proceedings of the 2007 workshop on Experimental computer science. San Diego, California, ACM.
- BZIP2 (1996) http://www.bzip.org/.
- CHENG, J. & NG, W. (2004) XQZip: Querying Compressed XML using Structural Indexing. International Conference on Extending Data Base Technology (EDBT).
- CLEARY, J. & WITTEN, I. (1984) Data Compression Using Adaptive Coding and Partial String Matching. Communications, IEEE Transactions, 32, 396 - 402.
- EUROPEAN, E., AGANCY, http://www.eea.europa.eu/data-and-maps/data/airbase-the-european-air-quality-database-1
- FERRAGINA, P., LUCCIO, F., MANZINI, G. & MUTHUKRISHNAN, S. (2006) Compressing and searching XML data via two zips. Proceedings of the 15th international conference on World Wide Web. Edinburgh, Scotland, ACM.
- GIRARDOT, M. & SUNDARESAN, N. (2000) Millau: an encoding format for efficient representation and exchange of XML over the Web. Computer Networks, 33, 747-765.
- GRUST, T. (2002) Accelerating XPath location steps. IN ACM (Ed. ACM SIGMOD International Conference on Management of Data. Madison, WI, USA.
- GUA W. L. & BUEHRER, D., J (1993) Vague sets. IEEE Transactions on Systems, 23, 610-614.
- GZIP (1992) http://www.gzip.org/.
- HALVERSON, A., BURGER, J., GALANIS, L., KINI, A., KRISHNAMURTHY, R., RAO, A., TIAN, F., VIGLAS, S., WANG, Y., NAUGHTON, J. & DEWITT, D. (2003) Mixed Mode XML Query Processing. 29th International Conference on Very Large Data Bases. Berlin, Germany.
- HOLMAN, G. K. (2002) XSLT and XPath, Prentice Hall PTR.
- KAY, M. (2004) XPath 2.0 Programmers Reference, Wiley Publishing, Inc.
- KAY, M. (2008) XSLT 2.0 and XPath 2.0 Programmer's Reference, Wiley Publishing, Inc.
- KUMAR, A. & BISWAS, R. (2009) A Study of Vague Search to Answer Imprecise Query. IJCSNS International Journal of Computer Science and Network Security, VOL.9, 198-205.
- LEAGUE, C. & ENG, K. (2007) Schema-Based Compression of XML Data with Relax NG. IEEE data compression conference (DCC). Utah.
- LIEFKE, H. & SUCIU, D. (2000) XMill: an Efficient Compressor for XML Data. ACM.
- LIN, Y., ZHANG, Y., LI, Q. & YANG, J. (2005) Supporting Efficient Query Processing on Compressed XML Files. SAC'05. USA, ACM.
- LIU, Y., WANG, G. & FENG, L. (2008) A General Model for Transforming Vague Sets into Fuzzy Sets. Springer Berlin / Heidelberg, 5150, 133-144.
- M.MANIKANDAN, BAGAN, K. B. & T.PRATHIBA (2006) Images and Integer Compression Using Bit Based Cascade Coding. GVIP Journal, Volume 6.
- MANETH, S., MIHAYLOV, N. & SAKER, S. (2008) XML Tree Structure Compression. XANTEC'08, IEEE Computer Society.
- MIN, J.-K., PARK, M.-J. & CHUNG, C.-W. (2003) XPRESS: a queriable compression for XML data. Proceedings of the 2003 ACM SIGMOD international conference on Management of data. San Diego, California, ACM.
- MOFFAT., A. (1990) Implementing the PPM data compression scheme. IEEE Trans. on Comm., 38(11), 1917-1921.
- P. MARK PETTOVELLO & FOTOUHI, F. (2006) MTree: An XML XPath Graph Index. SAC'06. Dijon, France, ACM.
- PAPARIZOS, S., AL-KHALIFA, S., CHAPMAN, A., JAGADISH, H., V., LAKSHMANAN, L. V. S., NIERMAN, A., PATEL, J. M., SIRVASTAVA, D., WIWATWATTANA, N., WU, Y. & YU, C. (2003) TIMBER:A Native System for Querying XML. ACM SIGMOD International Conference on Management of Data. San Diego, CA, USA, ACM.
- PEHCEVSKI, J. (2006) Evaluation of Effective XML Information Retrieval. School of Computer Science and Information Technology. Australia, RMIT University.
- PLAYS, S. http://www.cafeconleche.org/examples/shakespeare/.
- RAY, E. T. (2001) Learning XML Guide to Creating Self-Describing Data, O'Reilly Media Inc.
- SAKR, S. (2009) XML compression techniques: A survey and comparison. Journal of Computer and System Sciences, 75, 303-322.
- SIGURBJORNSSON, B. & TROTMAN, A. (2003) Queries: INEX 2003 working group report. 2nd workshop of the initiative for the evaluation of XML retrieval (INEX).
- SIHEM AMER-YAHIA, SUNGRAN CHO & SRIVASTAVA, D. (2002) Tree Pattern Relaxation Sihem Amer. Extending Database Technology, 2287, 496 - 513.
- SKIBINSKI, P., GRABOWSKI, S. & SWACHA, J. (2007) Effective Asymmertic XML Compression. CADSM.
- SKIBINSKI, P. & SWACHE, J. (2007) Combining Efficient XML Compression with Query Processing. ADBIS. Springer-Verlag.
- STANFORD http://nlp.stanford.edu/IRbook/html/htmledition/variable-byte-codes-1.html#fig:vbalgorithm.
- TOLANI, P. M. & HARITSA, J. R. (2000) XGRIND: A Query-friendly XML Compressor. IEEE 18th international conference on Data Engineering.
- TROTMAN, A. & SIGURBJ¨ORNSSON, B. O. (2005) Narrowed Extended XPath I (NEXI) Advances in XML Information Retrieval Berlin / Heidelberg, Springer
- VOJKAN MIHAJLOVI´C, DJOERD HIEMSTRA & BLOK, H. E. (2006) Vague Element Selection and Query Rewriting for XML Retrieval. Sixth Information Retrieval workshop. Dutch Belgian.
- WASHINGTON http://www.cs.washington.edu/research/xmldatasets/data/.
- WATERLOO http://softbase.uwaterloo.ca/~ddbms/projects/xbench/index.html.
- WIKIPEDIA http://download.wikipedia.org/enwikinews/.
- WINZIP (1990) http://www.winzip.com/.
- ZHAO, F. & MA, Z. M. (2009) Vague Query Based on Vague Relational Model. AISC Springer-Verlag Berlin Heidelberg.