Metadata And Access Tools Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

For example, a digital image may include metadata that describes how large the picture is, the color depth, the image resolution, when the image was created, and other data. A text document's metadata may contain information about how long the document is, who the author is, when the document was written, and a short summary of the document.

Metadata is data about data. As such, metadata can be stored and managed in a database, often called a registry or repository. However, it is impossible to identify metadata just by looking at it because a user would not know when data is metadata or just data.

A real world example of Metadata is Library catalog, which contain data about information about a book like contents, name of authors and location of the book.

Metadata Types:-.

Structural Metadata

Guide Metadata

Business metadata

Technical Metadata

Process Metadata

Administrative Metadata

Access Tools:- There are mainly two access tools such as OLAP and Data Mining

OLAP, is Online Analytical Processing, is a category of software technology that enables analysts, managers and executives to gain in sight in to data through fast, consistent, interactive access to a wide variety of possible views of information. It is an approach to swiftly answer multi-dimensional analytical queries. OLAP is part of the broader category of business intelligence, which also encompasses relational reporting and data mining.

For example:- A simple example would be a cube that contains a store's sales as a measure, and Date/Time as a dimension. Each Sale has a Date/Time label that describes more about that sale. Moreover, it is used in time intelligence.

For example: - comparisons of sales performance between different time period.

Data Mining:- It is a term used to describe knowledge discovery in databases. Sometimes we have extract data from large databases, which can have storage of data from several years. An end user who has little or no programming skill ask some ad hoc queries and gets the required answer quickly. The data mining technique is easily combined with spreadsheets and other software development tools, so that mined data can be analyzed and processed quickly.

For Example:- An example of data mining is using past behaviour to rank customers

Qus 2:- Discuss the concept of Data Warehouse administration and management in detail.

Answer: - Data Warehouse is very huge and it contain large amount of data. If there is any requirement of updation then, it is updated. In addition to the main component of data warehouse, almost all data warehouse products includes gateways to transparently access multiple enterprise data sources without having to rewrite applications to interpret and utilize the data. Moreover, in heterogeneous data warehouse environment, various databases resides in disparate systems, thus requiring internetworking technologies and tools. Managing data warehouse includes:

Data quality checks.

Data warehouse storage management.

Backup and recovery.

Managing and updating metadata.

Monitoring updates from multiple sources.

Replicating and distributing data.

Auditing and reporting data warehouse usage and status.

Qus3:- What are the DBMS schemas for decision support? Elaborate.

Answer:- There are many methods and tools that are used to query on data and take decision. For Example, ER design techniques. But these are not appropriate as they are not efficient to handle complex or multidimensional queries. So,there are amny other DBMS schemas for decision support. Design should reflect multidimensional view

Star Schema

Snowflake Schema

Fact Constellation Schema

Star Schema is best suited for the decision support.

Star Schema:- It provides the multidimensional view of data that is expressed using relational database semantics. The information through star schema is classified into two groups: Facts and Dimensions.

Facts are the core data elements. The fact table contains raw numeric items that represent relevant business facts.

For example: Unit of individual items sold, Price, discount values etc.

Dimensions are the attributes about the facts. It contain no compound primary key.

For example: - dimensions are the product types purchased and date of purchase.

In star schema, fact table is much larger than any of its dimension table. By star schema the complex queries can be easily handled. For Example, the query is find the share of total sales represented by each product in different markets, categories, periods, compared with the same period a year ago.

Qus 4:- How can we map the Data warehouse to multiprocessor architecture? Elaborate.

Answer:- We use multiprocessor architecture to speed-up and scale-up the data warehouse. Because as there is huge amount of data in data warehouse so the performance decreases. To speed-up and scale-up the data warehouse, we use parallelism is used. In fact parallel database architecture is better to determine the scalability of the solution. There are three main DBMS architectures for parallel processing.

Shared Memory architecture or shared everything

Shared Disk architecture

Shared nothing architecture

Shared Memory architecture or shared everything:- It is a traditional approach to implement as Relational database management system on Symmetric multiprocessor hardware. It is simple to implement as compare to other approaches. In this, everything is shared. Basically memory will decide which data is to be stored where. Each processor has dedicated disks. The best part is that if any processor or disk fail then data can be copied from other disks. Here disk is not shared.

This approach is easy to implement

Expensive to build

Difficult to scale-up

PU=Processor Unit


Shared Disk Architecture:- In this all disks are shared. Every processor has its own memory. Whenever there is a query, it is look up to in its local memory. If it is not get from memory then it finds in disk. If numbers of requests are there then it may become bottleneck. To avoid conflicts we need locking system. To attempt this we use DLM (Distributed lock manager). Shared disk architectures can reduce performance bottlenecks resulting from data skew and can increase system availability.

Shared Nothing Architecture:- In this environment, data is partitioned across all disks, and the DBMS is partitioned across multiple co-servers, each of which is resides on individual nodes of parallel system. But this type of environment is hard to implement, because if all disks fails then as no sharing then we have to copy all data to other disks.

It supports for function shipping

Support for data partitioning

Parallel join strategies.

Qus:-5 what is the role of OLAP in data mining?

Answer: - OLAP stands for Online Analytical Processing. It is a category of software tools that provides analysis for data stored in a database. The chief component of OLAP is the OLAP server, which sits between a client and a Database management system (DBMS). The OLAP server understands how data is organized in the database and has special functions for analyzing the data. There are OLAP servers available for nearly all the major database systems.

OLAP tools enable users to analyze different dimensions of multidimensional data. For example, it provides time series and trend analysis views. OLAP often is used in data mining.

When we use mining techniques on data of data warehouse, we get hidden patterns. Moreover, after applying mining, data in two dimensional forms is got. But When OLAP is used in mining that is when OLAP is used on Two Dimensional data; Multidimensional view of data is get (data that can be viewed from various perspectives) which is the more refined data. That multidimensional data is easy to interpret and is more useful to take important decisions.

Qus 6:- Draw a star schema diagram for sales database.

Answer:- The star schema (also called star-join schema, data cube, or multi-dimensional schema) is the simplest style of data warehouse schema. The star schema consists of one or more fact tables referencing any number of dimension tables. The star schema is more effective for handling simpler queries.

The fact table holds the main data. It includes a large amount of aggregated data, such as price and units sold. There may be multiple fact tables in a star schema.

Dimension tables, which are usually smaller than fact tables, include the attributes that describe the facts. Often this is a separate table for each dimension. Dimension tables can be joined to the fact table as needed.

The reason for using a star schema is its simplicity for users that is queries are never complex because the only joins and conditions involve a fact table and a single level of dimension tables.