Detection And Recovery Techniques For Database Corruptions Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Corruption in databases is losing some data or its functions are not accessible due to hardware or software failures. This paper discusses techniques that are used to prevent and detect corruptions.

Detection mechanisms can be used to identify corruption early before any major damage is caused. DBMS provide various tools to analyze and detect database for corruptions. Techniques that modern DBMS use in order to prevent database corruptions in non-catastrophic conditions can be classified as Differed update and Immediate update .Write ahead log, check-pointing in log file recovery concepts use log file while Shadowing paging and disk mirroring does not require log file to recover.

Backup facility provided with DBMS helps to recover from catastrophic conditions when log file becomes unavailable. Since hacking also corrupts database, organizations apply various security mechanism and database auditing to block unauthorized access.




The more people use databases, the more they corrupt it. Reasons for database corruptions can be categorized in to system failures, transaction failures, and human made errors and other. System failures occur due to a hardware failure or a problem with OS. Modern DBMS are required to support high capacity, speed and concurrency. Failures in the mechanism that are implemented to handle these requirements also cause corruptions. Mistakes done by the DBA also contribute to database corruptions. Natural disasters, terrorist attacks, hacking, and virus attacks may also corrupt the database.\\


It is DBA's due to regularly check the database for corruptions. In order to help DBA in this task, almost all DBMS provide tools which analyze database for consistency. Other than that database corruptions can be detected if something unusual results of errors occur when reading to or writing from databases.\\


Database recovery can be discussed in two categories, recovery from catastrophic conditions and recovery from non catastrophic conditions. Every organization take takes backup of their valuable data with the intention of recover the data if a considerable part of database get corrupted due to a catastrophe. There are different types of backups like full, incremental and differential backups. DBA should be able to design and plan backups to ensure both availability and safety of data.\\


Differed update and immediate update techniques are used by modern DBMS to recover database from non catastrophic conditions. In differed update, changes to the database are first written to database buffers and physical database is updated only after transaction is committed. If transaction aborted, physical database does not get updated. In immediate update technique database is allowed to be changed before transaction commits. If transaction aborted log file is used to recover database since all actions to database is written to log file before modifying the physical database.\\


There are some recovery algorithms available; ARIES is the most used among them.\\


Due to the rapid growth of internet and databases, attacks to the database also has grown. Thus organizations are compelled to use security mechanisms to protect their database. Role based access to database; firewall, VPN, IDS and antivirus software are most popular security techniques of today. In addition to that they perform database auditing to look for mal functions of database system.


\section{Database Corruptions}

Simply database corruption is losing some data of a database or some of its functions are not accessible \cite{website:databaseDev}. There are many reasons for corrupting a database and they can be categorized in to system failure, human error,transaction failures, other.

\subsection{System Failures}

Most system failures are hardware related failures. They are disk driver fails, system peripheral fails, running out of disk space, power outages, network hardware problems. Software issues like operating system problems also contribute to this.

\subsection{Transaction Failures}

Every database system has to deal with transactions that modify the database. So manually or automatically aborting transactions before committing, updating database with incorrect but valid data corrupt databases\cite{modernDBmanage}. Having no proper concurrency control mechanism when many users access database also contribute to corruption in databases..\\


A transaction that does not corrupts databases are required to maintain four properties. They are ACID properties, atomicity, consistency, isolation and durability.


\item Atomicity\\

This means modifications must follow all or nothing rule. Entire transaction has to be completed successfully to if it is to change the data\cite{website:wikiACID}. If any part of transactions fails, whole transactions is failed\cite{transaction} and no change must be done to database.

\item Consistency\\

A transaction should not break the consistency rules of database\cite{transaction}. Transactions should bring database from one consistent state to another state\cite{website:wikiACID}.

\item Isolation\\

A transaction should not see data values that is been updated by partially completed concurrent transactions\cite{website:wikiACID}.

\item Durability\\

Once a transaction commited successfully,its data should not be lost\cite{transaction}. DBMS maintain durability keeping a log file.


\subsection{Human Errors}

Database administrator can do critical corruptions to database.DBA running wrong batch jobs, inserting incorrect data to tables, delete table data or log file data. Careless mistakes like shutting down computer without closing DBMS\cite{website:everythingAccess} and not installing proper updates also contribute to corruptions.

\subsection{Other Causes}

Data center disasters due to natural disasters like flood or fire or terrorist attacks may make the database unavailable\cite{modernDBmanage}. Disk mirroring discussed in section 7.4.1 helps to overcome this with no downtime\cite{modernDBmanage}. Malfunctioning of software due to virus attacks\cite{website:goArticles}, hacking databases also contribute to database corruptions.


%Corruption detection

\section{Corruption Detection}

Early detection of database recovery helps DBAs a lot. It will helpful to quickly rescue from major corruptions and prevent from unexpected results due to corrupted data.\\


Database in the sense data plus database objects like tables, queries, reports, forms, macros…etc\cite{website:databaseDev}. If a database is corrupted you will face difficulties when dealing with it. Generating errors while opening or closing database, running queries reports or forms ,updating records or traversing through records may be an indication that database is corrupted\cite{website:databaseDev}.\\


It is database administrator's duty to keep database uncorrupted and to take corrective actions if it is corrupted. In this case almost all the DBMS are equipped with corruption detection tools which helps DBAs to check for corruptions\cite{oracle11g}.\\




\begin{tabular}{|p{0.5in}|p{2.0in}|p{2.7in}|} \hline

\textbf{Vender } & \textbf{Tool} & \textbf{Action } \\ \hline

Oracle & ANALYSE & Check for corrupted data blocks \\

&DBVERIFY Utility & Check for structural integrity of database files for corruption \\

& DBMS\_Repair & Detect and fix data block corruptions \\ \hline

MSSQL & DBCC CHECKDB & Validate the overall database integrity \\

& DBCC~CHECKCATALOG & Validate the system catalog integrity \\

& DBCC~CHECKTABLE & Validate the integrity for a single table \\ \hline

MySQL & CHECK TABLE & checks a table or tables for errors \\

& & \\

& Mysqlcheck & Performs table maintenance: It checks, repairs, optimizes, or analyzes tables. \\ \hline



\caption{A simple table}


Other than that there are third-party softwares that analyze database for corruptions.


%Corruption recevery

\section{Corruption Recovery}

If a database is corrupted then it is said to be in an inconsistent condition. Recovery means bringing back the database to the most recent consistent stage \cite{databaseSystems}t, the stage just before it becomes inconsistent. Recovery strategies can be divided in to two categories \cite{databaseSystems}. First one is keeping a backup. When a large portion of database is corrupted, backed up database is restored and reapply the transactions from the log file until it come to the last consistent stage \cite{databaseSystems}. This is usually applied in catastrophic conditions. If the database is not inconsistent due to a catastrophic reason, second strategy is applied. In this method database is brought back to consistent stage by undoing or reversing the changes that caused the inconsistency \cite{databaseSystems}.\\


There are two types of techniques to recover from non catastrophic corruptions. They are Deferred update and Immediate update. In deferred update method physical database is updated only after the transaction is committed. So Undo is not needed. During transaction is processing, updates are written to database buffers \cite{databaseSystems}. If the transaction completed successfully, updates are first written to log file and actual database is updated. If transaction fails to complete successfully, there is no need to update the physical database. in this case Redo may be needed. So differed update is "No Undo/Redo" type technique.\\


In the Immediate update technique database may be updated before a transaction reaches commit \cite{databaseSystems}. Updates are recorded in log file which will help to recover if the transaction fails before it commits. This technique is "Undo/Redo" type algorithm.\\


Since it keeps buffers, differed update is applicable only in short transaction. Otherwise buffers may require large memory size. If multiple users access same dataset, differed update is preferable\cite{website:msdnImediateUpdate}. It allows changes to be kept in local buffer and write them when the transaction commits. So other users will not see partial changes of a transaction which means it helps to maintain the durability property of a transaction .

\subsection{Recovery Concepts}

\subsubsection{Write Ahead Log, Steal/Nosteal, Force/No force}

Write ahead log make sure that physical database is not updated until the associated log file is written to disk\cite{databaseSystems}. Advantage of WAL is, If disk is updated with the dirty page(modified page)first, DMBS will not be able to roll back if the server fails before log file is updated\cite{website:microsoftWal}.\\


Let's see how differed update and immediate update use WAL. Deferred update can be written in two steps\cite{databaseSystems}.


\item Transaction is not allowed to update disk until it reaches commit.

\item Transaction is not marked as commit until updates are written to log file and log is force written to disk.


Step 2 includes WAL in it.

As discussed early, in immediate update modification should be written to log file before it is written to disk which protects WAL concept.\\


Most DMBSs uses steal/no steal, force/no force concepts which defines when a cached or buffered data blocks are written to physical disk.\\


If a cached page that is modified by a transaction is not allowed to be written to disk before transaction commits, it is called {\bf "No-Steal"}\cite{databaseSystems}. Deferred update technique falls under this category. Most DBMS use a bit called "pin-unpinned" bit which indicates weather a page can be written to disk\cite{databaseSystems}. If cached can be written to disk before committing stage, it is called {\bf "Steal"}.

If the modified is written to disk immediately after transaction commits it is {\bf "Force"} approach, else it is "No force"\cite{databaseSystems}.\\

Modern DBMS uses steal-No force combination. There is no need of a large buffer size when steal method is used. No-force approach saves I/O cost since it allows multiple transactions to use modified page which is since it is buffered.

\subsubsection{Check Point in System Log}

In this mechanism database pages whose dirty bit is set to 1(Which means only the modified pages) are force written to the disk periodically clearing all the database buffers. This means at the point of checkpoint all the modifications up to that point are guaranteed to be written to disk\cite{databaseSystems}. So if a crash occurs, there is no need to redo the transactions which have committed before the checkpoint because they are guaranteed to be written to disk\cite{databaseSystems}.\\


Check pointing may be done in time interval or transaction intervals. However there are specific scenarios where check pointing should be done. Checkpoint must be processed when "ALTER DATABASE" command is applied on a database\cite{website:checkpoint}. When database server is going to shut down checkpoint must be processed on every database runs on that server\cite{website:checkpoint}. \\


This mechanism saves time in database recovery process. But processing check pointing may be time consuming and need high processing power because all the transactions running should be suspended and buffers and log file should be force write to disk.

\subsubsection{Transaction Rollback}

If a transaction fails due to some reasons it has to be rolled back\cite{databaseSystems}. And the data values that are updated by the failed transactions should be taken to the state which is before the transaction begins. Simply database should be restored to its BFIM. Undo log entries is used for restoration.

\subsubsection*{Cascading Rollback}

Cascading rollback is something like recursive rollback. During transaction is processing it updates some data values in database. But if that transaction rollbacks due to same reason, the updated values should be restored. But another transaction may have read that updated data values before restoration. So that second transaction also should be roll backed\cite{databaseSystems}. This has to be done recursively. Problems arises when second transaction is committed before first transaction is decided to rollback .Basically cascading transaction is a heavy weight process and requires more time and CPU. So most databases are designed in a way that cascading rollback is avoided\cite{databaseSystems}.

\subsubsection{Shadowing Pages}

Unlike other recovery mechanism this does not require log file for recovery. Here the database is assumed to be made of fixed size blocks (pages)\cite{databaseSystems}. And there is a directory which maps to database pages. All the updates to physical database are done through this directory which is in main memory. {\bf Current directory} points to most recent database pages. When a transaction begins execution, current directory is copied to a directory called {\bf shadow directory} and shadow directory is never modified until transaction commits. This means shadow directory points to BFIM. Shadow directory is kept in physical directory, thus it can be recovered if a crash occur. When a page is to get updated during transaction, the page is not over written, instead a modified page is copied to a unused location in directory and current directory entry is pointed to the new page which means current directory always represents the new image. But shadow directory is not updated.\\


If a transaction fails before commits, recovery is straight forward since the BFIM of the database is stored in shadow directory. Recovery requires discarding current directory, instantiating shadow directory back and free disk directory slots that is used to store updated pages. If transaction committed successfully , shadow directory is discarded.


\subsubsection{Disk Mirroring}

In this approach two or more copies of the single database is in place. All instances are simultaneously updated. When one copy becomes unavailable, automatically switch to an existing copy\cite{modernDBmanage}. "Oracle Data Guard" in Oracle\cite{website:oracaleDataGuard} , "Replication" in MySQL \cite{website:mysqlReplication} and "Standby Server" in MSSQL \cite{website:stanadbyServer}are examples for this. This is a simple solution which provides shortest downtime and high availability. And this does not require log files to recover. But this mechanism does not provide a solution if all instances are corrupted or unavailable.

\subsection{Database Recovery Algorithms}

There are many algorithms implemented in order to recover database systems. AREIS(Algorithms for Recovery and Isolation Exploiting Semantics) is the most used among them. ARIES guarentees the atomicity and the durability properties of a transaction in the fact of process, transaction, system and media failure\cite{databaseSystems}. It is steal/no force type algorithm. It is based on WAL, repeating history during redo and logging changing during undo\cite{databaseSystems}.\\


ARIES has several optimization techniques to reduce recovery time, improve concurrency and reduce logging cost.


\section{Backups/Recovery from Catastrophic Conditions}

In the above chapter "Corruption recovery" some recovery techniques are discussed. In all of them except shadowing technique, we use database log file to recover. Even in shadowing we had to store the shadow directory in physical disk. But in a major crash like disk failure, above techniques won't help us if log file also become unavailable.\\

Since modern organizations heavily depend on data it is database administrators' duty to minimize downtime and make it available. In order to do this most DBMS provides backing up facilities.

\subsection{Backup Types}

\subsubsection{Full Backup}

Full backup backs up all the data, folders and files. This is the heaviest backup technique which is performed less frequently.

\subsubsection{Differential Backup}

These technique backups all the data that has been modified by the last full backup \cite{Chervenak}. This backs up difference between last full backup and current database.

\subsubsection{Incremental Backup}

Only modified data since last full or differential backup is backed up in this approach\cite{website:wikiIncreBackup}. This is the quickest backup since it backs up least data.\\


Note that in any backup type ,backing up does not always means backing up the database, it may sometime means the log file associated with the database. Log file then can then be used to update the already backed up copy. Log file backing up is efficient since the size of log is relatively low.\\


The DBA should choose appropriate backup type which best fits the situation. Since full backup takes much time normally a full back up should be done when a major changed occurred. Otherwise database downtime will be long which will affect availability of data. it is best paractice to performs full backupsl less often and incremental and incremental backup more frequently\cite{Chervenak}.

\subsection{Designing Backup Strategy}

DBAs must design optimal backup strategy in order to meet organization's business needs. To do that several factors should be taken in to consideration.


\item It is better to schedule full backups when application least access data(off peak times)\cite{website:microsoftBackupPlan}.

\item If changes to the data are frequent, schedule several differential backups between a full backup\cite{website:microsoftBackupPlan}. It will be effective since differential backups backup only modified data.

\item If changes occurs only on a particular part of the database, it is unwise to backup whole database\cite{website:microsoftBackupPlan}. It is effective to back up only changing part(partial backup).


\subsection{Backup Security}

If we use full backup and somehow lose backed up log file, it may not be possible to recover the database. So it is advisable to take multiple copies of log backup. Then if one is not available, another one can be used instead.\\

It is recommended to store the chain of log backups for a series of database backups. If most recent database backup is lost, earlier database backup can be restored and apply changes from the log file which is related to latest full backup. By doing this database can be recovered.


DBMS provides restrictive access facility in order to secure the backups such as password recovery, role based access. They also provide backup media protection using EFS.

% database security

\section{Database Security}

If an organization have provided users a feature like online transaction facility or it involves some military activities or simply which interacts with valuable data, these date are often targeted to many attacks. So these organizations must take measures to safeguard database from this kind of corruptions. That's where database security matters.

\subsection{Security Measures}

There are many security mechanism used in modern day, some of popular mechanism will be discussed here.


\item Physical Server Protection\\

It is best practice to keep the physical server in a secured place. Loosing or damaging a server may be sever since it holds valuable data. If unauthorized personnel get access to it he can miss use it, hardware may be damaged, super user password may be changed etc\cite{databaseSecurity}.

\item Role Based Access Control\\

This refers to restricting access to database objects based on the identity of a individual user or group of people\cite{roleBasedAccess}. As an example users will be identified by a username and password and each user are allowed to perform some tasks depending on the role assigned to them. Access control has two major branches, MAC and DAC.\\


MAC defines user's access to system resources\cite{roleBasedAccess}. This technique is often used by the government and military\cite{website:softpedia}. Under this access to database objects are strictly controlled by OS according to the configuration details provided by DBA.\\

DAC allows users to control his own data\cite{roleBasedAccess}. This user can grant or revoke privileges for others. Most industries use this technique since their clients control their own data as they wish\cite{website:softpedia}.

\item Firewall\\

Firewall sits between database server and outside world to protect the database from unauthorized access\cite{databaseSecurity}. Almost all the organizations use firewalls to protect their data. There are four types of firewalls such as packet filter, application gateway, circuit level gateway, proxy server\cite{databaseSecurity}. They all use some rules to detect unauthorized access.

\item VPN\\

This technique allows only authorized remote users to access data. Organizations use this facility to provide their authorized personnel to access data from outside the organization environment and keep off unauthorized outside access\cite{databaseSecurity}.

\item Antivirus applications\\

This is to protect data from malicious code such as viruses worms Trojans …etc.

\item IDS\\

They collect information from various system resources and analyze to pattern for misuse or abnormal activity and alert if there is such kind of thing\cite{databaseSecurity}. They monitor the network and protect from attacks like DOS, IP spoofing etc.


Though they seems to protect data some have very simple holes. As an example firewall prevent from unauthorized outside access. But surveys have found that 70\% of attacks comes from inside the co operate network, only 30\% of attacks are from outside. A firewall cannot stop these inside attacks. Some security holes of applications that interact with databases also let the attackers go through the security mechanisms. SQL injection and buffer overflow attacks are example for them.



Auditing is the process of increasing the security, reliability and availability if database\cite{databaseSecurity}. It ensures that mistakes does not make security layers become invalid and inefficient. The advantage of auditing is it allows identify and take corrective actions to security issues quickly before they cause a major damage\cite{databaseSecurity}.\\


Even though there are many ways to protect the data none of them provide 100\% guarantee and some have . So it is better to have a contingency plan setup because it may be the only thing to save from some crisis.


\section{Conclusion }

Database corruption can be thought of corruptions due to hardware failures and software failures. In modern day hardware is more reliable. But they also fail due to some reasons. Actually there is not much to do for hardware failures rather than repairing or replacing them. But handling software corruptions is more complex. Software corruptions mainly happens due to transaction problems, concurrency problems etc. Every DBMS provides several recovery mechanisms to protect database from corruptions. Even though there are many ways to protect the data none of them provide 100\% guarantee. So it is better to have a contingency plan setup because it is always a must to expect unexpected in computer world. \\


Though there are many theories to protect database from corrupting, most DBMS vendors does not implement them as it is. As an example in some cases in theories concurrency is not allowed, but vendors allow concurrency due to performance efficiency. But they include some mechanism to recover if some corruption happens due to concurrency. Commercial vendors deviate from theories because high concurrency and availability is required in commercial applications. \\


Although DBMS inbuilt and other security tools are available, they do not replace the DMB, they just helps DBA's duty.DBA should regularly check for corruptions, apply security patches and updates. Choosing the best security measure to protect unauthorized access, designing and scheduling backup plans to optimize data availability are responsibilities of DBA. \\


Rapid development of internet and increasing value of data, security of database is more concerned. There are number of techniques to prevent from these attacks. Though they seem to protect data some have very simple holes. As an example firewall prevent from unauthorized outside access. But surveys have found that 70\% of attacks comes from inside the co operate network, only 30\% of attacks are from outside. A firewall cannot stop these inside attacks. In addition firewalls are not intelligent; they operate based on same predefined rules. There is a chance that an intelligent attacker who knows about these firewalls has the ability to bypass them. Some security holes of applications that interact with databases also let the attackers go through the security mechanisms. SQL injection and buffer overflow attacks are example for them. \\


Researches about the traditional ''database corruption detection and prevention'' have reached a stable state. But there is a trend on researching on distributed databases. Most research tends to direct towards finding solutions on how to apply these traditional concepts on distributed databases. This may be due to rapid development of distributed computing.