The Concept Of Metadata And Access Tools Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

For example, a digital image may include metadata that describes how large the picture is, the color depth, the image resolution, when the image was created, and other data. A text document's metadata may contain information about how long the document is, who the author is, when the document was written, and a short summary of the document.

Metadata is data. As such, metadata can be stored and managed in a database, often called a registry or repository. However, it is impossible to identify metadata just by looking at it because a user would not know when data is metadata or just data.[1]

Data Warehouse Access Tools

The principal purpose of data warehousing is to providing information to business users for strategic decision making.

These users interact with data warehouse using front-end tools. Although regular reports and custom reports are the primary delivery vehicles for analysis done in most data warehouse, many development efforts in data warehouse arena are focusing on exceptional reporting also known as alerts.

Example: If the data warehouse designed for accessing the risk for currency treading, an alert can be activated when a certain currency rate drops below a predefined threshold.

Access tools can be divided in to five main groups.

Data query and reporting tools.

Application development tools.

Executive information system (EIS) tools.

Data mining tools.

Q2: Discuss the concept of Data warehouse administration and management in detail.

ANS. Metadata storage

Metadata can be stored either internally, in the same file as the data, or externally, in a separate file. Metadata that is embedded with content is called embedded metadata. A data repository typically stores the metadata detached from the data. Both ways have advantages and disadvantages:

Internal storage allows transferring metadata together with the data it describes; thus, metadata is always at hand and can be manipulated easily. This method creates high redundancy and does not allow holding metadata together.

External storage allows bundling metadata, for example in a database, for more efficient searching. There is no redundancy and metadata can be transferred simultaneously when using streaming. However, as most formats use URIs for that purpose, the method of how the metadata is linked to its data should be treated with care. What if a resource does not have a URI (resources on a local hard disk or web pages that are created on-the-fly using a content management system)? What if metadata can only be evaluated if there is a connection to the Web, especially when using RDF? How to realize that a resource is replaced by another with the same name but different content?

Moreover, there is the question of data format: storing metadata in a human-readable format such as XML can be useful because users can understand and edit it without specialized tools. On the other hand, these formats are not optimized for storage capacity; it may be useful to store metadata in a binary, non-human-readable format instead to speed up transfer and save memory.

Database management

Each relational database system has its own mechanisms for storing metadata. Examples of relational-database metadata include:

Tables of all tables in a database, their names, sizes and number of rows in each table.

Tables of columns in each database, what tables they are used in, and the type of data stored in each column.

In database terminology, this set of metadata is referred to as the catalog. The SQL standard specifies a uniform means to access the catalog, called the INFORMATION_SCHEMA, but not all databases implement it, even if they implement other aspects of the SQL standard. For an example of database-specific metadata access methods, see Oracle metadata

Q3: What are the various DBMS schemas for decision support? Elaborate


A dimensional model looks very different. In an enterprise data warehouse, you can have a number of separate fact tables, each representing a different process within the organization, such as orders, inventory, shipments, and returns. These separate fact tables will be "threaded" by as many common dimension tables as possible. The surrounding tables are called dimension tables, and are much smaller than the fact table. Although the dimension tables have several descriptive text fields, they will always have far fewer rows and take up much less disk space than the fact table. Each dimension table has a single part key. The fields in dimension tables are typically textual and are used as the source of constraints and row headers in reports.

Star join schemas support two specific kinds of queries: browse and multitable join. Browse queries operate on only one of the dimension tables and do not involve joins. A typical browse query occurs when the user asks for a pull-down list of all the brand names in the product dimension table, perhaps subject to constraints on other elements in the dimension table. This query must respond instantly because the user's full attention is on the screen. Multitable join queries occur after a series of browses and involve constraints placed on several of the dimension tables that are all joined to the fact table simultaneously. The goal is to fetch hundreds or possibly thousands of underlying records into a small answer set for the user, grouped together by one or more textual attributes selected from the dimension tables. Even so-called table scans fit this second paradigm, because there will always be some kind of constraint and some kind of grouping action in a decision-support query. This second kind of query is rarely instantaneous, because of the significant resources required to satisfy the query.

Part B:

Q4: How can we map the DW to Multi Processor architecture? Elaborate.


The central data warehouse database is the cornerstone of the data warehousing environment. This database is almost always implemented on the relational database management system (RDBMS) technology. However, this kind of implementation is often constrained by the fact that traditional RDBMS products are optimized for transactional database processing. Certain data warehouse attributes, such as very large database size, ad hoc query processing and the need for flexible user view creation including aggregates, multi-table joins and drill-downs, have become drivers for different technological approaches to the data warehouse database. These approaches include:

Parallel relational database designs for scalability that include shared-memory, shared disk, or shared-nothing models implemented on various multiprocessor configurations (symmetric multiprocessors or SMP, massively parallel processors or MPP, and/or clusters of uni- or multiprocessors).

An innovative approach to speed up a traditional RDBMS by using new index structures to bypass relational table scans.

Multidimensional databases (MDDBs) that are based on proprietary database technology; conversely, a dimensional data model can be implemented using a familiar RDBMS. Multi-dimensional databases are designed to overcome any limitations placed on the warehouse by the nature of the relational data model. MDDBs enable on-line analytical processing (OLAP) tools that architecturally belong to a group of data warehousing components jointly categorized as the data query, reporting, analysis and mining tools.

Q5: What is role of OLAP in data mining?


In large data warehouse environments, many different types of analysis can occur. You can enrich your data warehouse with advance analytics using OLAP (On-Line Analytic Processing) and data mining. Rather than having a separate OLAP or data mining engine, Oracle has integrated OLAP and data mining capabilities directly into the database server. Oracle OLAP and Oracle Data Mining (ODM) are options to the Oracle Database.

OLAP and data mining are used to solve different kinds of analytic problems:

OLAP provides summary data and generates rich calculations. For example, OLAP answers questions like "How do sales of mutual funds in North America for this quarter compare with sales a year ago? What can we predict for sales next quarter? What is the trend as measured by percent change?"

Data mining discovers hidden patterns in data. Data mining operates at a detail level instead of a summary level. Data mining answers questions like "Who is likely to buy a mutual fund in the next six months, and what are the characteristics of these likely buyers?"

OLAP and data mining can complement each other. For example, OLAP might pinpoint problems with sales of mutual funds in a certain region. Data mining could then be used to gain insight about the behavior of individual customers in the region. Finally, after data mining predicts something like a 5% increase in sales, OLAP can be used to track the net income. Or, Data Mining might be used to identify the most important attributes concerning sales of mutual funds, and those attributes could be used to design the data model in OLAP.

Q6: Draw a Star schema diagram for Sales database


Consider a database of sales, perhaps from a store chain, classified by date, store and product.

Fact.Sales is the fact table and there are three dimension tables Dim.Date, Dim.Store and Dim.Product.

Each dimension table has a primary key on its PK column, relating to one of the columns (viewed as rows in the example schema) of the Fact.Sales table's three-column (compound) primary key (Date_FK, Store_FK, Product_FK). The non-primary key [Units Sold] column of the fact table in this example represents a measure or metric that can be used in calculations and analysis. The non-primary key columns of the dimension tables represent additional attributes of the dimensions (such as the Year of the Dim.Date dimension).

Using schema descriptors with dot-notation, combined with simple suffix decorations for column differentiation, makes it easier to write the SQL for Star Schema queries. This is because fewer underscores are required and table aliasing is minimized.Most SQL database engines allow schemata descriptors, and also permit decoration suffixes on surrogate keys columns. Using square brackets, which are physically easier to type on the keyboard (no shift key needed) are not intrusive and make the code easier to read.

For example, the following query extracts how many TV sets have been sold, for each brand and country, in 1997

Star schema used by example query.