Multi Dimensional Databases Versus Relational Databases Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Lots of people ask about the dissimilarity between implementing On-Line Analytical Processing (OLAP) with a Relational Database Management System (ROLAP) against a Multidimensional Database (MDD). In this document, we will demonstrate that an MDD afford major advantages over a ROLAP such as many orders of magnitude faster data recovery, Many instructions of magnitude faster computation, with a reduced amount of disk space, and le,~ programming attempt.

Distinctiveness of On-Line Analytical Processing

OLAP software permit analyst, managers, and executives to amplify impending into an enterprise concert through quick interactive right of entry to a wide diversity of vision of data organized to mirror the multidimensional feature of the enterprise data. An OLAP service must meet the following fundamental necessities:

• The stand level of data is summary data (e.g., total sales of a product in a region in a given period)

• Chronological, present, and projected data

• Aggregation of data and the ability to navigate interactively to various level of aggregation (drill


•Resulting data which is calculated from input data (concert rados, variance Actual/Budget,...)

• Multidimensional outlook of the data (sales per product, per region, per channel, per period,..)

• Ad hoc fast interactive analysis (response in seconds)

• Medium to huge data sets ( 1 to 500 Gigabytes)

• Regularly altering business model (Weekly)

As design we will apply the six dimension business replica of a hypothetical beverage company. Every dimension is made of a chain of command of members: for instance the time dimension has months (January, February,..) as the child members of the hierarchy, Quarters (Quarter 1, Quarter 2,..) as the next level members, and Year as the peak part of the hierarchy. We will imagine the following number of members for each dimension:

• C "hannel 6 members

• Product 1500 members

• Market 100 members

• Time 17 members

• Scenario 8 members

• Measures 50 members

An easy OLAP scenario consists of receiving the actual profit of the company for the present month and

Comparing it with the budget, then drilling downward per market region plus product family. Additional drilling might be needed in situation of a huge difference among Budget and Actual.

Relational Approach

specified the reputation of relational DBMS's, it is appealing to apply the OLAP facility as a semantic layer on peak of a relational store. And this layer would offer a multidimensional view, computation of derived data, drill down intelligence, and production of the appropriate SQL to contact the relational storage. The typical 3rd-normal form illustration of data is totally unsuitable in this environment the reason for this is overhead of processing joins and borders across a very huge number of tables In its place a denonnalized Star Schema is utilized to provide suitable presentation. The relational plan contain of Fact table and single table per dimension. The Fact table contains onerow for every set of measures and a dimension-id column for all dimension. Rollup reviews, such as East region, are recalculated and stores in the Fact table. Each dimension table symbolizes the hierarchy of its dimension and contains the whole name of the members of the dimension. The number of likely rows in the fact "able is the cross product of the dimensions: Channels (6) * Products(1500) *Markets(100) * Time(17) * Scenario(8) = 122 million, With 80% sparsity, which is usual, the number of rows is up to 24 million. Along with 50 columns in the Measure dimension the row size is concerning 500 bytesa~d the Fact table amounts up to 13 Gigabytes. Every column must require an index to perform the joins and borders with reasonable efficiency. If we assume a 4K block size, and a 30 byte average index entry, each index would require about 800 megabytes; 5 indexed dimensions correspond to about 4Gigabytes of indexes. The entire database mass is upto 17 Gigabytes.

Fact Table


Repossess all Measures for Actual and Budget, for every Month of the year at the corporate level (total products, total channels, total markets) could be spoken as:

SELECT Scenario. Member, Tilae.mcmbcr, Sales, COGS, Margin ..... Profit

FROM Fact, Channel, Pr,Muct, Market, Time, Scenario


and Fact.Time-id = Time.Time-id and Time.rollup = 'Year"

and Fact.Scen-id = Scenario.Scen-id and Scenario.member = "Budget"

It is sensible to anticipate that at most one fo,mh of the top three levels of indexes (365 Megabytes) will

remain in memory. In this case the 6 way join will effect in an average about 10 I/O's per retrieved row or

about 240 I/O's for this inquiry.


Now we will inspect the computation of resulting data using SQL and a 3GL. The roll-up columns are

trivial to produce by means of the column expression facility of SQL. The given SQL table statement compute the Margin. Tot Exp, and Profit columns:

UPDATE Fact SET Margin = Sales - COGS, WHERE Fact.Chan-id = ChanneI.Chan-id ,and Ch~umel.member ='Channel"

Fact.Prod-id = Product.Prod-id and Product.member = "Product"

and Fact.Mkt-id = Markct.Mkt-id and Market.member = "Market"

and Fact.Time-id = Time.Time-id and Thne.rollup = 'Year"

and Fact.Scen-id = Scenario.Scen-id and Scenario.member = 'Actual"

UNION SELECT Scenario.member, Time.member, Margin ..... Profit

FROM Fact, Channel, Product, Market, Time, Scenario

WHERE Fact.Chan-id = Channel.Chan-id and ChaPmel.member ='Channel"

Fact.Prod-id = Froduct.Prod-id and Product.member = 'product"

and Fact.Mkt-id = Market.Mkt-id and Market.member = 'Market

Tot Exp = Mkt Exp + Payroll + Misc

Profit = Sales - COGS - (Mkt Exp + Payroll + Misc)

The roll up rows can be generated by SQL INSERT statements similar to the following ones that generate

the roli-up's for the E~st region:

INSERT INTO Market (Market, East, "100")

INSERT INTO Fact (Chan-id, Prod-id, Mkt-id, Time-id, Scen-id. Sales, COGS ..... Profit)

SELECT Chan-id, Prod-id, "100", Time-id, Scen-id, SUM(Sales), SUM(COGS) ...... SUM(Profit)

FROM Fact, Market

WHERE Fact.Mkt-id = Market.Mkt-id and Market.roUup = "East"

GROUP BY Chan-id, Prod-id, Time-id, Scen-id

This roll up method can merely be useful for easy aggregations. Several computation which is not commutative and associative will need a 3GL program along with cursors. For example the computation of Variance as Actual - Budget will need a SELECT of an Actual row, a SELECT of the matching Budget row, a computation of the variance, and an INSERT of the Variance row. This will cost on the average of 17 llO's per row. To compute and store all the Variance rows would take: 20% * Product* Market

*Channel * Time * 17=52 million l/O's (about 237 hours of l/O time)!

This procedure would have to be repetitive for all derived values. It is obviously not practical to do this through SQL. The merely practical solution is to write a individual 3GL program which will perform all the calculation at the time the relational database is stored with the base data. When both base and derived data has been loaded the indexes can be build to provide suitable efficiency.66 SIGMOD Record.In spite of relational databases reputation in OLTP applications, this an,-dysis illustrate that the relational model is not perfectly matched for OLAP because of the large number of l/O',s needed to perform simple drill downs and calculation. An alternative is to stage the OLAP data in a storage wifich is intended for multidimensional analysis.

Multidimensional Database Approach

We will utilize the , similar OLAP model with a server that is depend upon a Multidimensional database such as Essbase. The data applicable to the analysis is take out from a relational Data Warehouse or other datasources and encumbered in a multidimensional database which looks like a hypercube with 6 dimensions (in this example). The following execution of this hypercube is accurate to the Essbase results. It is unproved [7] and some of the compensation described might not concern to other multidimensional database implementations. The dimensions which typically have data in every cell, such as Time, Scenario, and Measure are characterize by a dense block symbolized in the following depiction by a cube. The further dimensions are called sparse dimensions and for every arrangement of sparse dimensions where data present (Retail->Cola->New York, Retail->Cola->Florida, Retail->Cola ->East, etc..), there is an entry in the index pointing to a dense block on disk. In the beverage company example, a block would consist of Time members * Scenario members * Measure members * 8bytes per cell=55K bytes, with 80% sparsity all the blocks would occupy 10 Gigabytes. The directory would occupy 6 Megabytes,because of its small size it wouM renutin in memory.

A typical OLAP recover proceeds top down, i.e. opening from the peak level of aggregation in each dimension (Channel-> Market->Product->Year->Actual>Profit). The block corresponding to the grouping of the top level members of the thin dimensions is situated via an index look for and bring into memory with a single I/0) and the data is situated by offset calculation inside the block. In this review the following termination can be drawn from this :

1. Recovery is very fast because

• The data related to any combination of dimension members can be recover with a

single I/O.

• Data is grouped proficiently in a multidimensional array.

• Values are designed ahead of time (see Calculation below).

• The index is tiny and can therefore usually exist totally in memory

2. Storage is very well-organized because

The blocks hold only data

A single index find the block matching to a combination of sparse dimension numbers.

• A single tiny index usually exist completely in memory.


The Essbasc gives a default calculation which is optimized for capable roll-up and to take benefit of the clustering described previously. Naturally all cells of a block containing input data such as Retail- >Cola->Florida are calculated at once within the block, then a roll up block, such as Retail->Cola->East, is computed by summing the cells of each and every children blocks. The computation of the difference among Actual and Budget can be skilled with 2 I/O's per block (Read & Write) for a total of 20% * Product * Market * Channel * 2 = 360.000 l/O's (about 2 hours of I/0 time)for the whole database compared to 237 hours in the relational approach. All further derived data can be rolled up at the same time without the necessity of any more l/O's'.

Thc calculation is very well-organized because:

• Only one read and one write I/O per block are essential to roll-up a entire database

• The memory image of a block is an array with well-organized relative offset addressing.

• Roll-up's can be completed by benefitting from the isomorphic character of the multidimensional array representation of the data.

Comparison among the Relational and the Multidimensional models The study of our example demonstrate the following dissimilarity between the best Relational alternative and the Multidimensional approach.

Relational Multidimensional development

Disk space necessity 17 10 1.7


recover the corporate measures, Actual vs 240 1 240

Budget, by month (l/O's)

computation of Variance Budget/Actual for 237 2* 110"

the whole database 41/O time in hours)

* This might take in the calculation of numerous other derived data without any additional I/O.

The Multidimensional database in our example utilize about half the disk space, retrieves Jam among 8 and 200 times faster, and calculates the resultant data at least 2 orders of magnitude earlier than Relational. We have also seen that, when the best relational substitute was used, the generation and calculation of the Fact table and Dimension tables were a grave programming and preservation challenge, requiring a complex 3GL program for the computation of the resulting values and a sophisticated query processor to produce the proper SQL.Thc basic reason for the huge dissimilarity in concert comes from the data models. The Relational model imagine a table model where the merely way to address a row is through the contents of one of its fields, requiring massive indexing. The Multidimensional replica utilize array addressing, with relative offsets. The stuffing addressing of the Relational model was shaped in the early 1970"s to give litheness in data reformation which did not survive in the popular databases of the time, IMS andCODASYL. It has established over the years to be a very victorious model for OLTP. Moreover for OLAP applications, the Relational cursor that navigates through a multitude of indexes cannot compete with the Multidimensional array cursor that work via relative offsets.

68 SIGMOD Record