The Parallel Database Architecture Computer Science Essay

Published:

Parallel database systems are the key to high performance transaction and database processing. These systems utilize the capacity of multiple locally coupled processing nodes. Typically, fast and inexpensive microprocessors are used as processors to achieve high cost-effectiveness compared to mainframe-based configurations. Parallel database systems aim at providing both high throughputs for on-line transaction processing (OLTP) as well as short response times for complex ad-hoc queries. In this report I will illustrate the parallel processing capabilities of oracle and IBM DB2.Both DBMSs offers a high performance parallel processing capabilities. But they have some differences in the details as following.

Parallel database architecture:

Both Oracle and IBM offer parallel processing to support very large databases (VLDB). This can be achieved by dividing the database over several numbers of servers. Oracle uses (RAC) Real Application Cluster, and IBM uses DB2 UDB ESE with (DPF) the Database Partitioning Feature.

Lady using a tablet
Lady using a tablet

Professional

Essay Writers

Lady Using Tablet

Get your grade
or your money back

using our Essay Writing Service!

Essay Writing Service

IBM DB2 is considered a shared-nothing architecture. However, in order to provide availability to the data, the database must be created on shared-disks. Shared-nothing refers to ownership of the data during runtime, not the physical connectivity. In IBM DB2, it is possible for the disks to be connected only to a subset of nodes that serve as secondary owners of the data in the partition. If only a subset is used then some nodes will execute heavier workloads than others and reduce the overall system throughput and performance. Unlike IBM DB2, Oracle RAC 11g requires full connectivity from the disks to all nodes and hence avoids this problem.

1.2.1 Oracle with Real Application Clusters

Real Application Clusters (RAC) is Oracle 9i's clustering technology, which provides an environment capable of supporting large databases. RAC is based on a shared disk architecture aimed at achieving high availability of a distributed environment.

RAC is an extension to the Oracle database, which enables building a multi node database environment.

1.2.2 IBM DB2 UDB ESE with the Database Partitioning Feature

IBM Data management software extends DB2 UDB to the parallel multi node environment in order to provide a scalable solution capable of supporting large amounts of data. The partitioning feature in DB2 UDB is based on a shared nothing architecture. In DPF every node in the cluster has its own dedicated memory, operating system, and storage units. An application of a shared nothing architecture is aimed at achieving high scalability and improving performance. This option in DB2 UDB ESE DPF does not require any clustering technologies to run.

DB2 UDB ESE DPF uses two levels of parallelism in order to achieve high-quality performance:

Intra-partition parallelism: This is the ability to have multiple processors process different parts of an SQL query, index creation, or a database load within a database partition.

Inter-partition parallelism: This provides the ability to break up a query into multiple parts across multiple partitions of a partitioned database, on one server or multiple database servers.

Fig2: Shared disk architecture

Fig3: Shared nothing architecture

1.3 Parallel Query Processing

Without the parallel query feature, the processing of a SQL statement is always performed by a single server process. With the parallel query feature, multiple processes can work together simultaneously to process a single SQL statement. This capability is called parallel query processing. By dividing the work necessary to process a statement among multiple server processes, the Oracle Server can process the statement more quickly than if only a single server process processed it.

The parallel query feature can dramatically improve performance for data-intensive operations associated with decision support applications or very large database environments. Symmetric multiprocessing (SMP), clustered, or massively parallel systems gain the largest performance benefits from the parallel query feature because query processing can be effectively split up among many CPUs on a single system.

It is important to note that the query is parallelized dynamically at execution time. Thus, if the distribution or location of the data changes, Oracle automatically adapts to optimize the parallelization for each execution of a SQL statement.

The Oracle Server can use parallel query processing for any of these statements:

SELECT statements

sub queries in UPDATE, INSERT, and DELETE statements

Lady using a tablet
Lady using a tablet

Comprehensive

Writing Services

Lady Using Tablet

Plagiarism-free
Always on Time

Marked to Standard

Order Now

CREATE TABLE ... AS SELECT statements

CREATE INDEX

In IBM DB2 parallel query execution can be achieved through operation pipelining and data partitioning. IBM decides to implement the data partitioning because:

DB2 supports partitioned table which allows a DB2 user to distribute his large table up to 64 different disks.

Parallel query execution through data partitioning works well with simple query which accesses large amount of data.

DB2 query processing includes two phases, a query compilation phase and a query execution phase.

1.4 Parallelizing individual operations

IBM DB2 can initiate multiple parallel operations when it accesses data from a table or index in a partitioned table space.

Query I/O parallelism manages concurrent I/O requests for a single query, fetching pages into the buffer pool in parallel. This processing can significantly improve the performance of I/O-bound queries. I/O parallelism is used only when one of the other parallelism modes cannot be used.

Query CP parallelism enables true multitasking within a query. A large query can be broken into multiple smaller queries. These smaller queries run simultaneously on multiple processors accessing data in parallel, which reduces the elapsed time for a query.

To expand even farther the processing capacity available for processor-intensive queries, DB2 can split a large query across different DB2 members in a data sharing group. This feature is known as Sysplex query parallelism.

DB2 can use parallel operations for processing the following types of operations:

Static and dynamic queries

Local and remote data access

Queries using single table scans and multi-table joins

Access through an index, by table space scan or by list prefetch.

Sort

When a view or table expression is materialized, DB2 generates a temporary work file. This type of work file is shareable in CP mode if there is no full outer join case.

On the other hand, Oracle can parallelize operations that involve processing an entire table or an entire partition. These operations include:

SQL queries requiring at least one full table scan or queries involving an index range scan spanning multiple partitions.

Operations such as creating or rebuilding an index or rebuilding one or more partitions of an index.

Partition operations such as moving or splitting partitions

CREATE TABLE AS SELECT operations, if the SELECT involves

a full table or partition scan.

INSERT INTO . . . SELECT operations, if the SELECT involves a full table or partition scan.

Update and delete operations on partitioned tables

Oracle divides the task of executing a SQL statement into multiple smaller units, each of which is executed by a separate process. When parallel execution is used, the user's shadow process takes on the role of the parallel coordinator. The parallel coordinator is also, referred to as parallel execution coordinator or query coordinator.

1.5 Parallel query optimization

Parallel query optimization is very important and essential for the overall performance of a relational database, especially for the execution of complex SQL statements. A query optimizer determines the most efficient plan for executing each query by considering available access paths and by factoring in information based on statistics for the schema objects (tables or indexes) accessed by the SQL statement. Most of modern DBMSs including oracle and IBM DB2 use a cost-based query optimizer. The goal of cost-based query optimizer is to maintain the computer resources which are CPU path length, amount of disk buffer space, disk storage service time, and interconnect usage between units of parallelism. These decisions query optimizer takes have a great effect on SQL performance.

Oracle has two different version of a query optimizer.

Cost-Based Optimizer (CBO)

In evaluating potential query plans, this optimizer makes use of available statistics to better estimate the cost of the query. Besides simply knowing the size of tables and indexes involved, it tries to estimate the selectivity of conditions involved in devising its plan.

Rule-Based Optimizer (RBO)

The Rule-Based Optimizer does not look at the explicit statistics, but tries to make decisions on general rules, such as push selections before joins, or prefer a sort-merge join to a nested loops join.

Lady using a tablet
Lady using a tablet

This Essay is

a Student's Work

Lady Using Tablet

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Examples of our work

In the new version of Oracle (Oracle10i) the RBO was no longer supported and they depend entirely on the CBO.

On the other hand, IBM DB2 uses autonomic query optimizer that automatically self-validates its model without requiring any user interaction to repair incorrect statistics or cardinality estimates. By monitoring queries as they execute, the autonomic optimizer compares the optimizer's estimates with actual at each step in a QEP (query execution plan), and computes adjustments to its estimates that may be used during future optimizations of similar queries.