# Online Analytical Processing Olap Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The need for more intensive decision support prompted the introduction of a new generation of tools. These new tools, called online analytical processing (OLAP), create an advanced data analysis environment that supports decision making, business modeling, and operations research activities. OLAP system share four main characteristics; they:

ï‚· Use multidimensional data analysis techniques

ï‚· Provide easy-to-use end user interface.

ï‚· Support client/server architecture.

## 9.1 MULTIDIMENSIONAL DATA ANALYSIS TECHNIQUES

This is the most distinguishing characteristic of OLAP tools. Multidimensional data analysis refers to the processing of data such that data are viewed as part of a multidimensional structure. The interest in the multidimensional aspect of data analysis stems from the fact that business decision makers usually view data from a business perspective. That is, they tend to view business data as they relate to other business data.

To better understand this view, let's examine how a business data analyst might investigate sales figures. In this case, (s)he probably is interested in the sales figures as they relate to other business variables such as customers and time. In other words, customers and time are viewed as different dimensions of sales.Sales are located in the intersection of a customer row and a time column

Note that the tabular view of sales data is not well suited to decision support, because the relationship INVOICE ---> LINE between INVOICE and LINE does not provide a business perspective of the sales data. On the other hand, the end user's view of sales data from a business perspective is more closely represented by the multidimensional view of sales than by the tabular view of separate tables. Note also that the multidimensional view allows end users to consolidate or aggregate data at different levels: total sales figures by customer and by date. Finally, the multidimensional view of data allows a business data analyst to easily switch business perspectives (dimensions) from sales by customer to sales by division, by region, and so on.

Multidimensional data analysis techniques are augmented by the following functions:

ï€ ï‚· Advanced data presentation functions: 3-D graphics, pivot tables, crosstabs, data rotation. three-dimensional cubes, and so on. Such data presentation facilities are compatible with desktop spreadsheets, statistical packages, and query and report- writer packages.

## ï€ ï‚· Advanced data aggregation, consolidation, and classification functions:

That allow the business data analyst to create multiple data aggregation levels,

slice and dice data ,and drill down and roll up data across different dimensions and aggregation levels. For example, aggregating data across the time dimension (by week, month, quarter, and year) allows the business data analyst to drill down and roll up across time dimensions.

ï‚· Advanced computational functions: business-oriented variables (market share, period comparisons, sales margins, product margins, percentage changes, etc.), financial and accounting ratios (profitability, overhead, cost allocations, returns, etc.), statistical and forecasting functions, and so on. These functions are provided automatically, and the end user does not need to redefine their components each time they are accessed.

ï‚· Advanced data modeling functions: support for "what-if" scenarios, variable assessment, variable contributions to outcome, linear programming, and other modeling tools.

Because many of the analysis and presentation functions are common to existing desktop spreadsheet packages, most OLAP vendors have closely integrated their systems with desktop spreadsheet such as Microsoft Excel and Lotus 1-2-3. Using the features available in graphical end user interfacess such as Windows, the OLAP menu option simply becomes another option within the Lotus or Excel menu bar. This seamless integration is a plus for OLAP systems and for spreadsheet vendors, because end users gain access to advanced data analysis features by using familiar programs and interfaces. Therefore, additional training and development costs are minimized.

To deliver efficient decision support, OLAP tools must have advanced data access features. Such features include:

ï‚· Access to many different kinds of DBMSs, flat files, and internal and external data sources.

ï‚· Access to aggregated data warehouse data as well as to the detail data found in operational databases.

ï‚· Rapid and consistent query response times.

ï‚· The ability to map end user requests, expressed in either business or model terms, to the appropriate data source and then to the proper data access language (usually SQL). The query code must be optimized to match the data source, regardless of whether the source i operational or data warehouse.

ï‚· Support for very large databases. As we already explained, the data warehouse can easily and quickly grow to multiple gigabytes and even terabytes.

To provide a seamless interface, OLAP tools map to their own data dictionaries the data elements from the data warehouse and from the operational database. These metadata are then used to translate end user data analysis requests into the proper (optimized) query codes, which are then directed to the appropriate data source(s).

## 9.3 EASY-TO-USE END USER INTERFACE

Advanced OLAP features become much more useful if access to them is kept simple. OLAP tool vendors learned this lesson early and have equipped their sophisticated data extraction and analysis tools with easy-to-use graphical interfaces. Many of the interface features are "borrowed" from previous generations of data analysis tools that are already familiar to end users. This familiarity makes OLAP easily accepted and readily used.

## 9.4 CLIENT/SERVER ARCHITECTURE

Client/server architecture provides a framework within which new systems can be designed, developed, and implemented.

The client/server environment enables us to divide an OLAP system into several components that define its architecture. These components can then be placed on the same computer or they can be distributed among several computers. Thus OLAP is designed to meet ease-of-use, as well as system flexibility, requirements.

## 10. OLAP ARCHITECTURE

The OLAP operational characteristics can be divided into t h r e e main modules:

ï‚· OLAP graphical user interface (GUI).

ï‚· OLAP analytical processing logic.

ï‚· OLAP data-processing logic.

These three OLAP modules, residing in the client/server environment, make it possible to use OLAP's three defining characteristics: multidimensional data analysis, advanced database support, and easy-to use interface. Figure illustrates the client/server OLAP components and attributes.

The OLAP system exhibits:

Client/server architecture

Easy to use GUI

Dimensional presentation

Dimensional modeling

Dimensional analysis

Multidimensional data

Analysis

Manipulation

Structure

Database support

Datawarehouse

Operational DB

Relational

Multidimensional

As Next Figure illustrates, OLAP systems are designed to use both operational and data warehouse data. Although Figure shows the OLAP system's components to be located on a single computer, this scenario is only one of many. In fact, one problem with the installation shown here is that each-data analyst must have a powerful computer to store the entire OLAP system and perform all data processing

In other words, each end user must have his/her own "private" copy (extract) of the data and programs. This approach does _not provide the benefits of a single business-image shared among all users.

A more common and more practical architecture is one in which the OLAP GUI runs on client workstations, while the OLAP engine, or server, composed of the OLAP analytical processing logic and OLAP data-processing logic, runs on a shared computer. In this case, the OLAP server will be a front end to the data warehouse's decision support data. This front-end or middle layer (because it sits between the data warehouse and the end user GUI) accepts and processes the data-processing requests generated by the many end user analytical tools. The end user GUI might be a custom- made program or, more likely, a plug-in module that is integrated with Lotus 1-2-3, Microsoft Excel, or some third-party data analysis and query tool.

If you examine Figure, you will note that the data warehouse is created and maintained by a process or software tool that is independent of the OLAP system. This independent software performs the data extraction, filtering, and integration necessary to transform operational data into data warehouse data. This scenario reflects the fact that, in most cases, the data warehousing and data analysis activities are handled separately.

FIGURE: OLAP SERVER WITH MULTIDIMENSIONAL DATA STORE

ARRANGEMENT

In most implementations, the data warehouse and OLAP are two interrelated and

complementary environments. While the data warehouse represents the integrated, subject-oriented, time-variant, and nonvolatile decision support data, the OLAP system provides the front end through which end users access and analyze such data. Yet an OLAP system can also directly access operational data, transforming it and storing it in a multidimensional structure. In other words, the OLAP system can

Next Figure represents a scenario in which the OLAP engine extracts data from an operational dates base and then stores it in a multidimensional structure for further data analysis. The extraction process follows the same conventions used with data warehouses. Therefore, the OLAP provides a mini-data-warehouse component that looks remarkably like the data mart mentioned in previous sections. In this scenario, the OLAP engine has to perform all the data extraction, filtering, integration, classification, and aggregation functions that the data warehouse normally provides. In fact, when properly implemented, the data warehouse performs all data preparation functions instead of letting OLAP perform those chores, so there is no duplication of functions. Better yet, the data warehouse handles the data component much more efficiently than OLAP does, so you can appreciate the benefits of having a central data warehouse serve as the large enterprise decision support database.

We have now summarized the main OLAP architectures you are likely to encounter. Whatever arrangements of the OLAP components, one thing is certain:multidimensional data must be used. But how are such multidimensional data to be stored and managed best? OLAP proponents are sharply divided: some favor the use of relational databases to store the multidimensional data, whereas others argue for the superiority of specialized multidimensional databases to store multidimensional data. We will next examine the basic characteristics of each approach

## 11 OLAP TOOLS

OLAP tools are categorized according to the architecture used to store and process multi-dimensional data.

There are four main categories:

- Multi-dimensional OLAP (MOLAP)

- Relational OLAP (ROLAP)

- Hybrid OLAP (HOLAP)

- Desktop OLAP (DOLAP)

## 11.1 RELATIONAL OLAP

Relational online analytical processing (ROLAP) provides OLAP functionality by using relational databases and familiar relational query tools to store and analyze multidimensional data. This approach builds on existing relational technologies and represents a_natural extension to all those companies that already use relational database management systems within their organizations. ROLAP adds the following extensions to traditional RDBMS technology:

ï‚· Multidimensional data schema support within the RDBMS.

ï‚· Data access language and query performance are optimized for multidimensional data.

ï‚· Support for very large databases (VLDBs).

## 11.1.1 MULTIDIMENSIONAL DATA SCHEMA WITHIN THE RDBMS

Relational technology uses normalized tables to store data. The reliance on normalization as the design methodology for relational databases is seen as a stumbling block to its use in OLAP sys1ans. Normalization divides business entities into smaller pieces to produce the normalized tables. For example, sales data components might be stored in four or five different tables. The reason for losing normalized tables is to reduce redundancies, thereby eliminating data anomalies, and to facilitate data updates. Unfortunately, for decision support purposes, it is easier to understand data when they are seen with respect to other data Given this view of the data environment, we have stressed that decision support data tend to be non-normalized, duplicated, and preaggregated. These characteristics seem to preclude the use of standard relational design techniques and RDBMS as the foundation for multidimensional data.

Fortunately for those heavily invested in relational technology, ROLAP uses a special design technique to enable the RDBMS technology to support multidimensional data representations. This special design technique is known as a star schema.In effect, the star schema creates the near equivalent of a multidimensional database schema from the existing relational database.

11.1.1.1 Star Schema: The new star schema is designed to optimize data query operations rather than data update o p e r a tions. Naturally, changing the data design foundation means that the tools used to access such dam will have to change. End users who are familiar with the traditional relational query tools will discover that these tools will not work efficiently with the new star schema. However, RQLAP saves the day by adding support for the star schema to the use of familiar query tools. ROLAP provides advanced data analysis functions, and improves query optimization and data visualization-methods

11.1.1.2 Snow Flake Schema: In this schema there is No LEVEL in dimension tables. Dimension tables are normalized by decomposing at the attribute level. Each dimension table has one key for each level of the dimensionís hierarchy The lowest level key joins the dimension table to both the fact table and the lower level attribute table.

## 11.1.2 DATA ACCESS LANGUAGE AND QUERY PERFORMANCE OPTIMIZED FOR MULTIDIMENSIONAL DATA

Another criticism of relational databases is that the SQL used with RDBMSs is not

suited to perform advanced data__analysis. Most decision support_data requests require the use of multiple-pass SQL queries or multiple nested SQL statements. To answer this criticism, ROLAP extends SQL_so that it can differentiate between access requirements for data warehouse data (based on the sta.schema) and operational data

(normalized tables). In this way, a ROLAP system is able to properly generate the

SQL code required to access the star schema data.

Query performance is also improved because the query optimizer is modified so it can

identify the SQL code's intended query targets. For example, if the query target is the data warehouse, the optimizer passes the requests to the data warehouse. However, if the end user performs drill-down queries against operational data, the query optimizer identifies this operation and properly optimizes the SQL requests before passing them through to the operational DBMS. Another source of improved query performance is the use of advanced indexing techniques such as bitmapped indexes within relational databases. Bitmapped indexes are much more efficient at handling large amounts of data than are the indexes typically found in many relational databases.

ROLAP tools are mainly 3-tier client/server products in which the end user interface, the analytical processing, and the data processing take place on different computers.

## 11.1.3 SUPPORT FOR VERY LARGE DATABASES

If the relational database is used in a DSS role, it also must be able to store very large mounts of data. Both the storage capability and the process of loading data into the database are crucial. Decision support data are normally loaded in bulk (batch) mode from the operational data. Therefore, the RDBMS must have the proper tools to import, integrate, and populate the data warehouse with operational data. Most of the relational data-loading tools perform load operations in batch mode. However, batch operations require that both the source and the destination databases be reserved

(locked). The speed of the data-loading operations is important, especially when you

realize that most operational systems run 24 hours a day, 7 days a week, 52 weeks a year. Therefore, the window of opportunity for maintenance and batch loading is open only briefly.

Given the existence-of an open client/server architecture, ROLAP provides advanced decision support- capabilities that are scalable to the entire enterprise. Clearly, ROLAP is a logical choice for companies that already use relational databases for their operational data. Given the size of the relational database market, it is hardly surprising that most current RDBMS vendors have extended their products to support

data warehouses.

## 11.2 MULTIDIMENSIONAL OLAP

Multidimensional online analytical processing (MOLAP) extends OLAP

functionality to multidimensional database management systems (MDBMSs).

(An MDBMS uses special proprietary techniques to store data in matrixlike n- dimensional arrays.) MOLAP's premise is that multidimensional databases are best suited to manage,store, and analyze multidimensional data. Most of the proprietary techniques used in MDBMSs are derived from engineering fields such as computer- aided design/computer-aided manufacturing (CAD/CAM) and geographic information systems (GlS).C o nceptually, MDBMS end users visualize the stored data as a three- dimensional cube_ known as a data cube. The location of each data value in the data cube is a function of the x, y, and z axes in a three-dimensional space. The x, y, and z axes represent the dimensions of the data value. The data cubes can grow to n-number of dimensions, thus becoming hypercubes. Data cubes are created by extracting data from the operational databases or from the data warehouse. One important charac- teristic of data cubes is that they are static; that is, they are not subject to change and must be created before they can be used. In other words, data cubes cannot be created by ad hoc queries. Instead, you query precreated cubes with defined axes; for example a cube for sales will have the product, location, and time dimensions, and you will only be able to query those dimensions. Therefore, the data cube creation process is critical and requires in-depth front-end design work. This front end design work may be well justified by the fact that MOLAP databases are known to be much faster than their ROLAP counterparts, especially when dealing with small to medium data sets: In order to speed data access, data cubes are normally held in memory, in what is called the cube cache. Because MOLAP also benefits from a client/server infrastructure, the cube cache can be located at the MOLAP server at the MOLAP client, or in both locations. Figure shows the basic MOLAP architecture.

FIGURE : MOLAP CLIENT/SERVER ARCHITECTURE

The ability to capture the data cube-in memory provides faster response times, but it also makes the MDBMS more resource-intensive (memory, storage, and processor) than its relational counterpart. In addition, ROLAP proponents argue that the data cube approach limits the flexibility, scalability, and ease of integration.

Because the data cube is predefined with a set number of dimensions, the addition of a new dimension requires that the entire data cube be recreated. This recreation process is a time-consuming operation. Therefore, if data cubes are created too often, the MDBMS loses some of its speed advantage over the relational database. And, although MDBMSs have performance advantages over relational databases, the MDBMS is best suited to small and medium data sets. Scalability is somewhat limited, because the size of the data cube is restricted to avoid lengthy data access times caused by having less work space (memory) available for the operating system and the application programs. In addition, the MDBMS makes use of proprietary data storage techniques that, in turn, require proprietary data access methods using a multidimensional query language.

Multidimensional data analysis is also affected by how the_database system handles sparsity is a measurement of the density of the dataheld in the data cube. Sparsity is.

computed by dividing the total number of actual values-in the cube by the total number of cells in the cube because the data cube's dimensions are predefined, not all cells are populated. In other words some cells are empty. Returning to our sales example, there might be many products that are not sold during a given time period in a given location. In fact, you will often find that fewer than 50 percent of data cube's cells are populated. In any case, multidimensional databases must handle sparsity effectively in order to reduce processing overhead and resource requirements.

Relational proponents also argue that using proprietary solutions makes it difficult to integrate the MDBMS with other data sources and tools used within the enterprise. Yet, in spite of the fact that it takes a substantial investment of time and effort to integrate the new technology and the existing information systems architecture, MOLAP may be a good solution for those shops in which smal medium-sized databases are the norm and application software speed is critical.

## 11.3 RELATIONAL VERSUS MULTIDIMENSIONAL OLAP

Table summarizes some OLAP and MOLAP pros and cons. However, we should emphasize that some of the advantages of one over the other may be rearranged, as technology advances. For Example, faster processors and more powerful computers might make the speed and size arguments coot. Keep in mind, too, that the selection of one or the other often depends on the evaluator's vantage point. For example, a proper OLAP evaluation must include price, supported hardware platforms, compatibility with the existing DBMS, programming requirements, performance, and availability of administrative tools. Nevertheless, the summary in Table provides a useful comparison starting point.

ROLAP and MOLAP vendors are working toward the integration of their respective solutions within a unified decision support framework. As additional features are added to their products, the differences in their functionality and capability are reduced. It is quite conceivable that common ground will be found by the RDBMS and MDBMS proponents, thus producing a new kind of DBMS that uses the best features of both ROLAP and MOLAP. Perhaps this new DBMS will be able to handle tabular and multidimensional data with the same ease. In the meantime, relational databases use the star schema design successfully to handle multidimensional data, and their market share makes it unlikely that their popularity will fade anytime soon.

## 11.4 Web-based OLAP

A category of OLAP products that began to emerge in 1997 is Web-based OLAP(WOLAP). These products allow users of Web browsers or network computers to access and analyze data warehouse data. WOLAP users may not actually be using the Internet-housed World Wide Web. Security issues deter many organizations, quite reasonably, from putting their valuable data on display at a corporate Web site. Even if passwords are used to limit access, it is an opportunity for a security breach. More likely, an intranet will provide users with access to internal data while using familiar Web browsers and protocols.

A WOLAP system is still an OLAP system operating in a client/server model. The limitation to Web-based operation limits client functionality to what Web browsers or their equivalent can be expected to do. This leads to two possibilities:

1. The analysis can be done entirely on the server, with the results converted to

HTML and sent to the client for display.

2. The analysis programs can be written as Java applets, JavaScript code, or any other form usable by most browsers, and downloaded as needed.

ï‚· Either approach is independent of client architecture.

ï‚· As long as the client has the necessary Web software and is attached to a suitable network, it can use any version of Windows, the Mac OS, or any other environment of the user's choice.

ï‚· WOLAP also eliminates the need to install a software package on the user's computer, with the attendant issues of administration, upgrades, and more.

Although the surface similarity between the acronyms ROLAP and WOLAP seems to suggest that these are alternative approaches to OLAP, that is not the case.

ï‚· ROLAP refers to the nature of the underlying database - in this case relational.

ï‚· WOLAP refers to the way in which analysis software is structured and executed -

in this case via the Web or a Weblike intranet.

WOLAP applications can work with any type of database they are programmed to use:

relational, multidimensional, or perhaps something else yet to be invented.