This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Metadata is data about data that describes the data warehouse. Metadata is used to build, manage and maintain the data warehouse. Data stored in data warehouse is used with the help of metadata. It is necessary to create an interface between users and metadata. Metadata is defined as data providing information about one or more other pieces of data, such as:
means of creation of the data,
purpose of the data,
time and date of creation,
creator or author of data,
placement on a computer network where the data was created
For example, a digital image may include metadata that describes how large the picture is, the color depth, the image resolution, when the image was created, and other data. Other example is that, as text document's metadata may contain information about how long the document is, who the author is, when the document was written, and a short summary of the document.
Metadata can of two types. These are following:
Technical metadata contains the information about data warehouse. This is used by warehouse designers when they develop and manage tasks. It includes the information about the data sources and the algorithms used to transfer the data. Technical metadata includes:
Access authorization, backup history, information delivery history, data access etc.
Data mapping operation
The rules used to perform data cleanup and data enhancement
Data structure definitions for data target
Business metadata contains the information about the data stored in data warehouse. This information is very useful for the users to understand the perspective of the information. This metadata informed about:
Subject areas and information object type
Information home pages
Information to support all data warehouse components
Data warehouse operational information
The main purpose of data warehouse is to provide information to business
Users for decision making. These users interact with data warehouse using front end tools. Many of these tools require an information specialist. All end user tools use metadata to obtain access to data stored in the ware house. These tools are divided into five main groups. These are following:
Data query and reporting tools
Application development tools
Executive information system tools
On line analytical processing tools
Data mining tools
Data query and reporting tools:
These tools are designed for easy to use and point to click operation. This can be further divide into two parts:
Production reporting tools:
These tools are used in large companies to report regular operations and support high volume batch jobs such as calculating and printing paycheck.
Desktop report writes:
These tools are designed for end users.
Application development tools:
These tools are designed for client server environment so that they can access all major database system including Oracle. Example of application development tool is power builder from Power Soft, Visual Basic from Microsoft etc.
Online analytical processing tools:
These tools are based on the concept of multidimensional database and allow the users to analyze the data using multidimensional complex view. These tools assume that the data is organized in a multidimensional model which is supported by relational database to enable multidimensional properties.
Data mining tools:
Data mining is used to discover the meaningful new pattern and trend to mine large amount of data stored in data warehouse using artificial intelligence and mathematical techniques. Data mining is used to discover data, visualize data and to correct data. The strategic value of data mining is time sensitive in finance sector of the industry.
Q2. Discuss the concept of data warehouse administration and management in detail.
Data ware house is four times larger than operational database because large amount of historical data is stored in data warehouse. Data warehouse is updated as often as once a day if the application requires it. All data warehouse products include gateways to transparently access multiple enterprise data sources without having to write applications to utilize the data.
To manage the data warehouse it includes:
Security and priority management
Data query checks
Updates from multiple sources
Backup and recovery
Storage management of data warehouse
Replicating and distributing data
Report about the use of data warehouse
Q3. What are the various DBMS schemas for decision support? Elaborate.
Ans: Data layout for best access: Business across all industries has developed considerable expertise in implementing efficient operational systems such as payroll systems, inventory tracking and purchasing. Indeed, the original objectives in developing an abstract model known as the relational model were to address a number of short comings of non relational data base management and application development. For example, the early database systems were complex to develop and difficult to understand, install, maintain and use. The required skill set was expensive, difficult to attain and in short supply.
Database schema definition often focuses on maximizing concurrency and optimiszing insert, update and deletes performance by defining relational tables that map very efficiently to operational request while minimizing contention for access to individual records.
Multidimensional Data Model: The multidimensional nature of business questions is reflected in the fact that, for example, marketing managers are no longer satisfied by asking simple one dimensional question such as â€œhow much revenue did the new product generate? Instead, they ask questions such as â€œhow much revenue did the new product generate by month. One way to look at the multidimensional data model is to view it as a cube. The multidimensional cube is the foundation of the multidimensional technology that is not relational by nature.
Star Schema: The multidimensional vie4w of data that is expressed using relational data base semantics is provided by the data base schema design called star schema. The basic premise of star schemas is that information is classified into two groups facts and dimensions. Facts are the core data elements being analyzed. For example, units of individual items sold are facts, while dimensions are attributes about the facts.
STAR join and STAR index: A star join is a high speed, single pass, parallelizable multiple join. Red bricks RDBMS can join more than two tables in a single operation. Moreover, even when joining only two tables, Star join outperforms many join methods implemented by traditional OLTP RDBMS. Star indexes are created on one or more foreign key columns of a fact table. STAR indexes are very efficient; therefore, they can be built and maintained rapidly.
Q4. How can we map the DW to Multi Processor architecture? Elaborate.
Ans: There is very large amount of data in data warehouse. With this large amount of data search for batter performance and scalability becomes the real necessity. So scale up and speed up are used. These are used for linear performance and scalability. These can be satisfied by parallel hardware architecture. This architecture is based on multiprocessor system design as a shared memory model and shared disk memory model. Database vendors started to advantages of parallel hardware architecture by implementing the multiserver and multithreaded systems designed to handle a large number of client requests. For better performance there are two goals for search. These are:
It is the ability to execute the same request on the same amount of data in less time. It gives the output in less time when there are multiple requests at the same time. It cuts the response time in half. It allows supporting more concurrent users.
It is the ability to obtain the same performance on the same request as the database size increases. It provides the same performance on twice as much data. Multiple queries can be processed at the same time. To map the data warehouse to multiprocessor architecture there are three main data base management system software architecture styles. These are shared memory architecture, shared disk architecture and shared nothing architecture.
Shared memory architecture:
Shared memory means shared everything in which a single RDBMS server can utilize all processors, access all memory and access the entire database and a single system image is provided to the users. In shared memory multiprocessor system all components are communicate with each other by exchanging messages and data via the shared memory. All processors can access the all data, which is partitioned across local disk.
Global Shared Memory
Shared Disk Architecture:
In shared disk architecture each RDBMS server can read, write, update and delete records from the same shared data base. Shared disk architecture can reduce performance and increase system availability.
Shared Nothing Architecture:
In this architecture the data is partitioned across all disks and the DBMS is partitioned across servers. Each server has an ownership of its own disk. Each processor has its own memory and disk and communicates with other processor by exchanging messages data over the network.
Q5. What is the role of OLAP in data mining?
Ans: OLAP stands for Online-analytical processing tools. Solving business problems such as market analysis and financial forecasting requires query-centric database schemas that are array oriented and multidimensional in nature. These business problems are characterized by the need to retrieve large numbers of records from very large data sets and summarized them on the fly. The multidimensional nature of the problems it is designed to address is the key driver for OLAP.
These tools are based on the concept of multidimensional databases and allow a sophisticated user to analyze the data using elaborate, multidimensional, complex views. Typical business applications for these tools include product performance band profitability, effectiveness of a sales program or a marketing campaign, sales forecasting, and capacity planning. These tools assume that the data is organized in a multidimensional database or by a relation database designed to enable multidimensional properties.
For example, a telephone company might want a customer dimension to include detail such as all telephone numbers as part of an application that is used to analyze customer turnover. This would require support for multi-million row
Dimension tables and very large volumes of fact data. OLAP can handle very large data sets using parallel execution and partitioning, as well as offering support for advanced hardware and clustering.
Q6. Draw a star schema diagram for sales database.
Ans;A star schema consists of fact tables and dimension tables. Fact tables contain the quantitative or factual data about a business--the information being queried. This information is often numerical, additive measurements and can consist of many columns and millions or billions of rows. Dimension tables are usually smaller and hold descriptive data that reflects the dimensions, or attributes, of a business. SQL queries then use joins between fact and dimension tables and constraints on the data to return selected information. Fact and dimension tables differ from each other only in their use within a schema. For example, a fact table in a sales database, implemented with a star schema, might contain the sales revenue for the products of the company from each customer in each geographic market over a period of time. The dimension tables in this database define the customers, products, markets, and time periods used in the fact table. Dimension tables allow a user to browse a database to become familiar with the information in it and then to write queries with constraints so that only the information that satisfies those constraints is returned from the database.