Detection And Recovery Systems For Database Corruption Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Detection and recovery system for database corruption is important topic in computer science. Databases are used to store data needed for computer programs. Database may store very sensitive data. Database can be corrupt physically or logically due to catastrophic and non catastrophic failures. Software and hardware failures are main issues under database corruption. To prevent data loss and inconsistency of database after failure, recovery is needed. This paper describes ways of database corruption, detection, prevention and recovery.

\section {Introduction}

Database is an integrated collection of data. Database Management System (DBMS) provides necessary services for maintenance of databases. One important service or function DBMS provide is recovery from database corruptions. Databases become more complicated and that results more corruptions to the database. Databases can be corrupt by both hardware and software crashes.

Transaction performs insertion, deletion, modification, or retrieval on databases. These functions are done by basic read and write operations. Transaction failure causes database corruption. Computer failure, transaction or system error, local errors or exception conditions detected by the transaction, concurrency control enforcement, disk failure, physical problems and catastrophes are some reasons for fail transaction during executing.

Users access databases using software interfaces. Software errors are greatest threat to integrity, consistency of a database. Application code may be integrated with the database or can be independent. But both of these two types perform operations on database. Databases stored in some physical media such as hard drivers. In present hardware gets more reliable than past. But there is no guarantee hardware always reliable. So there can be database corruptions due to hardware failures.

So database detection and recovery from database corruption is very important to protect data and reduce data loss. Recover from hardware failures usually done by restore database using a backup. Backup is mostly using recovery technique. Recovery from transaction failures usually means that the database is restored to the most recent consistent state before the failure. To restore database to previous consistence state it must be keep track of information of past changes. This information is typically kept in the system log. Deferred update (NO-UNDO/REDO) and immediate update (UNDO/REDO and UNDO/NO-REDO) are two main techniques for recovery from non-catastrophic transaction failures. Implementation of deferred update and immediate update are different in single user environment and multiuser environment.

Transaction rollback, shadow paging, ARIES recovery algorithm based methods and backup are some other recovery techniques. Shadow paging in single user environment doesn’t require log, but in multiuser environment it requires a system log. Recovery of Multi database systems is bit different than recovery of singe database because they may be use different types of DBMS.

Detection of database corruption is very important. For recovery it is necessary to detect the corruption. Generating reports, performing queries on data are some basic ways to detect database inconsistency. Most of Database management systems ensure data integrity and consistency.


\section {Database corruption}

A physical or logical damage of a database is database corruption. There are many reasons for database corruption. Database can be corrupt in two different ways. One way is Databases can physically corrupt or large portion of a database can damage due to catastrophic failures. And other way is database become inconsistence, which means database is logically damaged.

There are three types of failures that cause database corruption. They are transaction failures, system failures and dish or media failures. Transaction is a logical program unit consists of one or more database operations. To ensure data integrity database must be ensure four properties of transaction which are called ACID properties \cite{ref3}. Atomicity, consistency, isolation and durability are ACID properties. Database become inconsistence if transaction fails its execution or terminated in abnormal way. Transaction execution fails due to many reasons. Operating system errors, software errors terminate transaction execution. Transactions may contain different types of errors and erroneous condition such as divide by zero, integer overflow, invalid parameters and etc. Concurrency control mechanisms used by the database management system may decide to terminate the execution of a transaction to prevent deadlocks or system crashes. Disk failures and system failures also causes transaction failures. Recovery is needed to get the database to the consistent state.

A hardware malfunction or a bug in the database software or the operating system causes system crashes and loss content of the volatile media. Databases stored in a nonvolatile media such as hard disk. But when using the database, full database or part of database is loaded into main memory which is volatile. After a system crash content in the main memory loss and any changes made on pages in the main memory not affect to the database stored in the hard disk. Then recovery is needed to recover data loss and redo the unaffected operations or transactions.

A disk or media crash is loss data of disk bocks. Read/write head crash of a disk or malfunction read/write operations are some reasons for disk crashes. Back up or copy of a database is needed to recover from disk crashes.

Database hacking is another way of corruption. Database becomes inconsistence due to changes done by hackers. Other than inconsistence database can hold invalid data without take database into inconsistence state.


\section{Detection and prevention database corruption}

Most DBMS ensures data integrity and consistency. After a failure it is necessary to identify whether database is corrupted or not. Many DBMS uses codeword based techniques to detect and prevent database corruption.

Read Prechecking is a mechanism used to prevent use of corrupted data by a transaction. Data consistency is checked before reading any data in the protection region. Each protection region has a codeword calculated for its data. When each read operation it checks codeword to check data integrity. If they unmatched then there must be some corruption in that protection region. Protection latch for protection region is acquired when calculation codeword and updating data. Data Codeword and Data Codeword deferred maintenance \cite{ref2} and database auditing are other two techniques used to detect data corruption.

There is no exact way to prevent database corruption from failures. But recovery techniques ensure data protection from database corruptions. Therefore recovery techniques are act as preventing mechanisms. Recovery manager of a DBMS always monitor and log details needed for recovery.


\section {Recovery}

This section describes the recovery techniques used to recover database corruptions in detail. Algorithms used to recover from transaction failures must ensure ACID properties of a transaction.

\subsection {Log based recovery}

Log is a file structure which is used to record modification of the database. Log contains sequence of log records. Log record contains details of the database activities which are needed for recovery process. As an example update log record contains transaction identifier to uniquely identify the transaction, data-item identifier which is used uniquely identifies the data item changed by the transaction, old value data item before update and new value which is the value after updated. Other than these details log also contains start of the transaction and commit or abort state of the transaction. When transaction performs an update operation on a database it is required to write log record before database is modified.

There are two types of log entry information included in a log record for write operation. Information needed for UNDO and information needed for REDO. The old value or before image (BFIM) is considered as UNDO type log entry and new value or after image (AFIM) is considered as REDO type log entry \cite{ref1}. Undo type log entry is needed to undo the affects and redo type log entry used to redo the operation again. Most of recovery techniques used both type of log entries or one of them.

Checkpoints are another type of log entry. These records add to log when all modified buffers are written to the disk. These buffers are force written to the disk. Checkpoints can be written periodically or after some number of transactions are committed. It is decided by recovery manger of a DBMS. To write checkpoint entry to the log it is necessary to suspend current transaction. But if the buffers are very large then that transaction is delayed more time. To reduce this delay fuzzy check pointing is used. In fuzzy check pointing, current transaction is not suspended until buffers are written to the disk and use previous check point as valid until it write all buffers to the disk. After write all buffers into the disk new check point is considered as valid checkpoint. Check points are very useful in undo/redo type recovery mechanisms. Because it is necessary to identify which transaction needs undo and redo in the recovery process.

For increase efficiency of recovery process DBMS also maintains lists for active transactions and list for all committed and aborted transactions since last check point.

Steal/no-steal and force/no-force \cite{ref1} are other two terms which are included in log based recovery terminologies. In no-steal approach cache page cannot be written back to the disk until transaction commits. In steal approach cache page can be written back to the disk before transaction commits. If all pages are updated by transaction are immediately written to the disk when transaction commits, it is called force approach. Otherwise it is called no-force approach. Most of the typical database systems use steal/no-force strategy.

When update performs on database Write Ahead Logging (WAL) \cite{ref1} protocol is used to write log. In this protocol appropriate log entries are recorded and flushed to the disk before the database is modified. For recovery from disk failures and system crashes log must be reside in the stable storage. And log must be stored in the stable storage as soon as log is created. It is not useful if log also reside in the same disk where the database is resided. Large transactions with many database updates may add millions of log records to the log. In that case size of the log in the buffer may exceed its size and there can be loss log details of the transaction.

Deferred update and immediate update are two main recovery techniques based on logging mechanism.

\subsubsection {Deferred Update}

In deferred update technique, actual updates to the database are postponed until the transaction completes its execution and reaches its commit point. During transaction execution all updates are written only in the buffers. When transaction reaches its commit point log is force written to the disk as write-ahead-logging (WAL) and database buffers are written to disk. The actual database update only transaction reaches its commit point. If transaction fails before its commit point then no need to undo any operation, it only requires ignore buffer in the main memory because any changes done by transaction not affected to the actual database.

The recovery process with deferred update in single user environment uses REDO procedures and because of that deferred update also known as NO-UNDO/REDO algorithm. All write operations are redone in the recovery process. The algorithm uses list of committed transactions since last check point and list of active transactions which can include at most one transaction because it is in single user environment. Redo all the write operations of the committed transactions. Redo operations are perform in order which they were written to the log. After image or the redo type log entry is used for the redo write operations. Restart the transaction in the active list. And redo operations must be idempotent. It means executing over and over is equivalent to executing it once. Otherwise database become inconsistence after the recovery.

In the multiuser environment with concurrency, recovery process may very complex because concurrency control methodologies used by the DBMS. But the theory is same as single user environment. Need to redo all write operations of the committed transactions since last checkpoint and restart all the active transactions. The redo is done in reverse order in which were written to the log. Each data item is redone only once. Therefore each data item contains most recent value.

Deferred update technique is suitable if transactions are short. In deferred update it is necessary to buffer all the pages until transaction commits. If transactions are very large or time consuming and change large portion of database pages it is necessary to maintain large buffers. But it is not effective or cost effective way. It is good for transactions which are deal with limited number of data items. If same data item is changed several times by a transaction, then deferred update is the best because only few pages can fulfill the requirement. Disk failures may occur after log recorded, but before buffers are flush to the disk. In that kind of situations all recovery can be complex.

\subsubsection {Immediate update}

In this technique any update operation in the transaction is written to the actual database without waiting transaction reaches its commit point. But the log records are added to the log before update the database. There are two categories of immediate update technique. If all the updates are recorded in the disk before transaction commits, in recovery from failure there is no need of redo any operation. Using before image database can be recovered from transaction failure. This is known as UNDO/NO-REDO recovery algorithm.

But in the other way transaction is allowed to commit before all updates are recorded in the database in the disk. In the recovery both undo and redo type log entries are used. This variation of immediate update technique is called UNDO/REDO recovery algorithm. This algorithm also changes in the single user environment and multi user environment, because in multi user environment different concurrency control protocols are used by different DBMS. There are two lists are maintained and one contains active transactions and other contains committed transaction since last check point. In the single user environment there is only one transaction in the active list. When recovery from a failure in single user environment, undo all the write operations of the active transaction and redo all the write operations of committed transaction in order those were written to the log. Undo is done in reverse order which they were written to the log. These two lists for transactions are maintained in the multiuser environment and same recovery process is done with concurrent execution. When redoing write operations of committed transaction, redo is start from the end of the log and only last update of each item is redone. Recovery in concurrent environment requires some locking mechanism to achieve concurrency control.

Checkpoints are very useful in both deferred update and immediate update based recovery algorithms. Otherwise it is required to search whole log to find redo and undo operations. In both techniques it is required only undo or redo operations until last check point. Because changes made before last check point were successfully recorded to the database in the check points.

\subsection {ARIES}

Algorithm for Recovery and Isolation Exploiting Semantics (ARIES) another is advanced recovery algorithm used by most of DBMS. ARIES guarantees the atomicity and durability properties of transactions in the fact of process, transaction, system and media failures. ARIES uses steal/no-force approach for writing and it is based on three concepts. Write-ahead logging (WAL), repeating history during redo and logging changes during undo are main concepts used by ARIES. ARIES retrace all action of the database system prior to the failure to reconstruct or recover database state in a failure. It is the repeating history during redo. Logging changes during undo means, logging the undo operations of the recovery in the log. In ARIES undo the operations of the uncommitted transactions. If recovery process fails after some of undo operations then there is no need to undo all the operations of uncommitted transactions. It is required only undo uncompleted undo operations of uncommitted transactions. Log records which are written in the log during are called compensation log records or CLRs \cite{ref4}. In ARIES CLRs are redo-only log records. This log mechanism prevents repeating the completed undo operations. Check points and fuzzy check points are used by ARIES to increase efficiency and avoid unnecessary redo and undo operation in the recovery process.

ARIES uses a single log sequence number (LSN). Every log record has an associated LSN which is increasing and contains the disk location or the address of the log record. The page\_LSN field which is placed in the page itself update when the page is updated and a log record is written. The page\_LSN contains the LSN of the log record that describes the latest update to the page. This is important to track the logged updates for the page in the restart and media recovery.

ARIES uses some data structures \cite{ref4} to maintain data needed for the recovery or restart process. Log record is a data structure which contains several fields such as LSN, type (update, compensation), transaction id, previous LSN, page id, undo next LSN and data. Transaction table is another data structure used by the ARIES. It contains transaction id, transaction state, last LSN, Undo next LSN fields. To represent information about dirty buffer pages in the normal processing, ARIES uses dirty page table. The dirty page table is used by ARIES in the recovery process. Dirty page table contains two fields, page id and recovery LSN (RecLSN). These data structures increase the efficiency of the recovery.

Analysis phase, redo phase and undo phase are three main steps consisted in the ARIES recovery process .After any kind of failure or crash ARIES recovery manager first access the last checkpoint in the log and starts its recovery process. In the analysis phase identify the updated or dirty pages in the buffer and active transactions when the failure occurs. The start point of the redo operations in the log also determined in this phase. It is determined by finding smallest LSN in the dirty page table. All pages with LSN smaller than the smallest LSN in the dirty page table are already written to the disk successfully or overwritten to the buffer.

In redo phase, reapply updates from log to the database. Generally, redo is applied to the committed transactions, but in the ARIES redo all the necessary updates from log, start from redo point which is identified in the analysis phase until the end of log. Data pages help to determine the necessary redo operations.

In the undo phase of ARIES log is scanned backwards and undo all the update operations of the active transactions in reverse order. In the undo phase CRLs are written to the log.

Sometimes recovery process needs to restart due to some failures. To avoid redo completed recovery phases, ARIES uses check points in analysis redo and undo phases.

After a system failure, sometimes it may necessary to restart processing new transactions as soon as possible. And the recovery is deferred and starts new transaction processing. This is called selective or deferred restart. DB2 facilitate this type of functions implementing ARIES.

Using ARIES, some pages can be recovered independently without affecting others. This is called recovery independence. It is very effective way of recover transaction failures without stopping transaction execution. Transaction can record savepoints \cite{ref4} and then transaction can partially rollback to the savepoint when situation such as deadlocks. ARIES has several optimization techniques to reduce recovery time, improve concurrency and reduce logging overhead.

\subsection {Shadow paging}

Shadow paging is a crash recovery technique which is not used a log for recovery. But the log may used in the multi user environment for concurrency control. In shadow paging technique it consider database as fixed size disk block called page. Database can have any number of pages. There must be some identification for pages in the disk. Page table is used to identify the database pages; if database consists of n pages then page table contains n entries in the page table each record points to a page on the disk. The concept behind the shadow paging is keep two page tables during the transaction: shadow page table and current page table. These two tables are identical when the transaction starts. Shadow page table is never modified during transaction execution. When write operation encounter in the transaction, create a new copy of page which is going to modify and modify current page table to point new page. All the modifications are visible only for the current page. Then there are two copies of database page, a new version which is pointed by current page table entry and old version which is pointed by shadow page table. Figure 1 illustrates the concepts of shadow and current page tables. Always there a copy of database page without any modification. Shadow page table must store in a stable storage for successful recovery.

\begin{figure} [ht]



\caption{Shadow paging}



When transaction commits, first all buffer pages are written back to the disk. Then current page table is written to the disk without overwriting shadow page table. At last use this current page table as new shadow page table and discard old shadow page table. If any crash occurs after buffer pages written to the disk, old shadow page table can be used to recover. There is no need of redo or undo any operation. If any failure occurred during transaction execution, discard the all pages which are modified by the transaction and restart the transaction. Redo and undo type operations are not used in shadow paging. Therefore it is called NO-UNDO /NO-REDO algorithm.

Shadow paging offers several advantages over log based recovery techniques. But there are major disadvantages of shadow paging itself. It is eliminate the overhead of writing log records in each and every operation done in the database. Other major advantage of shadow paging is faster recovery from crashes since no need of redo undo.

Commit overhead is a disadvantage of shadow paging. The commit of a single transaction may require multiple pages to be output to the disk. But in the log based schemas only log records need to output in the commit point. Using tree structures for page tables reduce the overhead. But the modified pages must be output to the disk. At least one page of a database is changed by a transaction with few modifications. In log based recovery only modified data only applied to the database in the disk. Data fragmentation is another problem with shadow paging. Shadow paging causes databases to change location. So locality of each pages are lost. It reduces the performance of the database. Another disadvantage of shadow paging is garbage collection. When each transaction commit old database pages used by the transaction become inaccessible. And the old shadow page table also becomes inaccessible after transaction commits. Those inaccessible data must be garbage collected. So there must be another mechanism for garbage collection. In concurrent environment shadow paging is more complicated and required some logging system. Because of these drawbacks shadow paging is not very popular recovery mechanism.

\subsection {Transaction rollback}

If transaction fails due any reason then it must be rollback to avoid corruption. If any data item is changed by the failed transaction then those data items must be set to the old values. Undo type log entries are used to restore those values. If any transaction T reads a value written by rolled back transaction S, then T must be rolled back. If any transaction reads value written by T then it must also be rolled back. This phenomenon is called cascading rollback. Cascading rollback is complex and time consuming. Therefore cascading rollback is not used. Other recovery mechanisms work better than cascading rollback.

\subsection {Backup}

Log based recovery techniques and shadow paging are recovery techniques used to recover from non catastrophic failures. Recovery manager of a DBMS is responsible to recovery from catastrophic failures such as disk crashes or any other physical failures \cite{ref6}. Log can be used to recover from media failures if log is not on same disk and never threw away after a checkpoint. Usually log will grow faster than database and it is not practical to keep log forever.

Back up is the most used technique for recover from such failures. Generally full database archive and log periodically copied into cheap medium such as magnetic tapes or optical disk. The archive database copy and log must be stored in remote secure location. In recovery from failure database is restored using latest back up copy. To avoid loss recent updated data log is backup more frequently intervals than full database. Then all the transaction in the backup log can be applied to the restored database.

Getting an archive copy of a database is a lengthy process if database is large. To shutdown database for get a backup is not possible in every time. Full dump and incremental dump \cite{ref5} are two levels of archiving. Full dump is entire database will copied as backup. In incremental dump, database elements changed since the previous full or incremental dump are copied. To recover from media failure, full dump and incremental dump can be used. First restore database using full dump and then make the changes recorded in the incremental dumps. Backup is a main recovery feature of most DBMS. There is a responsibility for database administrator to decide what frequently create backup copy of database and where they store.


\section {Conclusion and Future Directions}

In this paper we described ways of database corruption and some recovery techniques used to recover from corruptions. There are small no of mechanism available for the detect database corruption. Most of recovery algorithms aimed the recovery database from transaction failures. Log based recovery algorithms are the most used technique. But shadow paging is good techniques which has a fast recovery mechanism. If we can include some garbage collection mechanism and any algorithm for maintaining page tables to shadow paging technique, then it would be the best. Deferred update is only good for transaction with few updates or with few data items. Log based algorithms have big problem with logging mechanisms. If transaction execute large amount of time then log will increase in size and main memory or allocated disk space can be exceeded by the log. ARIES is recovery algorithm use several techniques to reduce recovery time and increase performance. But it is very complex algorithm to understand. But most of commercial DBMS uses various implementations of ARIES for their recovery features. To reduce the time taken for recovery, most of techniques record necessary details for recovery in the normal processing. Therefore performance may decrease in significant percentage.

There is another problem with all of these recovery algorithms. There is no way to find when the corruption occurs in the database. Sometimes transactions may use corrupted data and execute successfully. There is a database corruption and cannot be identify. Errors of user programs may corrupt database due to some problems. There must be a responsible for programs to write programs without affecting integrity of database. Security of a database is very important because hacking a database is a kind of corruption. Database administrators are responsible for these types of failures. Most of DBMS provide security as key feature.

In this paper we concerned about recovery concepts other than implementation of them. Various DBMS implements recovery algorithms according to their requirements. All requirements of a DBMS cannot be achieved from one technique and therefore DBMS implements more than one recovery technique. When implementing a recovery algorithm we must consider data protection, performance, concurrency, security and etc.