McAfee SECURE sites help keep you safe from identity theft, credit card fraud, spyware, spam, viruses and online scams

Cookie Information

Privacy Information

XQuery Enterprise Information

XQuery - The future of Enterprise Information Integration?

Abstract

Explain what xquery is and why it was developed.

Say that I am going to investigate what xquey offers by comparing it to EII.

The Holy Grail that is EII.

Every organisation has had the same problem at one time or another. Data, whilst in existence, is not readably available through one easy to handle interface and format. The main culprit in this problem is that ever since the database era was introduced, enterprises have been accumulating more and more information in just as many formats. This problem has been exasperated in recent years by the sheer increase in e-business and the domination of the internet. Expectations have also grown. Users now expect to be able to access analytical, operational, internal and/or external information at the drop of a hat.

The Holy Grail, in this situation, would be true Enterprise Information Integration (EII). By EII I mean access to information from disparate data sources quickly and easily without first moving this information in a physical central repository such as a data warehouse. While the notion of this seems relatively straight forward, in reality we are faced with a number of challenges.

First and foremost amongst these is the fact that, within any modern enterprise, information is stored in a wide variety of formats, spanning different business units. Data is not longer stored solely in structured formats such as relational database systems (RDBMSs). Enterprises are increasingly amassing data in unstructured formats such as emails, word documents or WebPages. Additionally, XML is fast becoming the preferred method for transferring data and so needs to be catered for.

These heterogeneous sources may be scattered not only over a number of servers but potentially over millions of databases, some of which may be owned by various other enterprises. For any EII system to truly have a future, it must be possible to run queries on structured, semi structured, unstructured and distributed information. One possible solution to this problem is to create a virtual data model, making all the disparate data sources look like one. Distributed queries are run on individual data sources and the results joined or federated. Federated queries can then be run on this data model. Choosing a common and appropriate data model, therefore, becomes a key factor in our EII solution.

Whist this is a good starting point, EII needs to provide a lot more that access alone. As mentioned above, modern users require information from various sources. To present this information in any useful format, data needs to be analysed, combined, manipulated and potentially transformed. Additionally, it may be necessary to run functions on this data or allow each user to define the data retrieved by supplying certain request variables. Furthermore, different users will require different views of this data. For example, a Credit Approval portal may show a customer's personal details, registered credit cards, and credit rating while a Product Service portal may show similar information with the exception of the credit rating and the addition of service information.

Another crucial requirement is real time information. The best information is useless if supplied out of date. It is true that not all applications or industries require real time data however, to carter for industries who do, such as financial enterprises, the capability must exist. There are other aspects relating to real time data such as administrators denying access to their systems on the grounds of performance. These will not be explored in this paper.

Possibly even more crucial than the above mentioned considerations is the meaning of data within the various data sources. As can be expected, database schemas of the same domain, developed independently will inevitably be very different. This problem is not isolated to databases. Different XML documents or any other data structure, for that fact, designed to hold similar data will be subject to semantic heterogeneity. Due to the nature of semi-structured data being more flexible, the problem of semantic heterogeneity can quite often be more pronounced within data structures of this type. Semantic heterogeneity does not only occur at the schema level. Data values, while representing the same information, might be stored differently. An example of this might be two fields both holding company names. In one schema the company may be represented as IBM whilst in the other it may be represented as International Business Machines. For these reasons it is imperative that all the data sources involved in an EII solution understand each others' schemas and can derive intelligence from their data.

Let's not forget that the data itself may be a challenge. Data inconsistencies and missing data plays a huge role in EII. Any error in the underlying data structure needed to be address prior to integration. Failure to do so will result in the error being replicated in the final view. Additionally, as in any RDBMS, missing data needs to be addressed to ensure that the correct results are eventually returned. Any query language used on the virtual data models, proposed earlier, needs to be able to carter and deal with these issues.

Finally, once we have retrieved the data from all of the disparate sources and have constructed our view, it would be useful, if not critical for some industries, to perform validation or rule checking on this data. By validation I do not mean simply checking that the syntax is correct such as XML Schema validation but rather that the data complies with a number of business rules. For example a company may want to check that a supplier is not based in a country upon which sanctions are currently imposed.

Considering all of these requirements, it is clear to see why true EII has posed such a challenge in past years. If we are to progress down the route of first creating a virtual data model on which federated and/or decomposed queries can be run, we first need to find a flexible yet powerful query language which can offer solutions to these challenges.

XQuery as a possible solution.

A number of data models and query languages have previously been suggested as possible solutions to the information integration problem. Amongst these are object oriented solutions such as ODMG/OQL and relational solutions such as SQL. The main reasons why these solutions have failed in the past are the difficulties experience when attempting to naturally map data from all relevant data sources and the lack of consensus on which data model should be used.

XML is fast becoming the preferred format for data integration and interchange. This is largely due to the nature of XML being simple yet very flexible. Additionally almost all data can easily be represented and held within XML. It therefore stands to reason that a XML query language with the ability of selecting, re-ordering and manipulation XML data would be a perfect candidate for EII.

Explain the original requirements of xquery and the, now, characteristics.

Relate my requirements to what Xquery offers.

(Relate = show how they are connected to each other and to what extent )

We provide a professional essay writing service that thousands of our customers use as an effective way of improving their grades, improving their research and saving them lots of time.

Order Now. It takes less than 2 minutes.

  1.  
  2.  
  3.  
  1.  

Sign up and be the first to receive our latest offers:

See the order process