Taking measures to prevent a disaster or minimize its effects before hand is as important as drawing the disaster recovery plan itself. The proposed plan considers various potential threats that can lead to a disaster, the areas of vulnerabilities, and procedures to minimize the risks. Both natural and human created threats are covered in this plan.
Tornado and high winds
Terrorists actions and Sabotage
The threat of fire in the Administrative building particularly in the equipment room is quite high and possesses the highest risk factor of all the disasters mentioned above. The building has hydrogen gas producing batteries in UPS and equipped with many electrical devices and connections which could overheat or short out and can cause fire. Computers within the building pose a quick target from anyone wishing to disrupt university operations. A wide area fire of dry times is also a potential threat.
The administrative building is equipped with fire alarms and smoke detectors.
Fire extinguishers and fire blankets are placed at regular intervals throughout the building. Halo gas fire extinguishing system is setup at machine room and tape vaults.
The building is furnished with flame resistant products which can reduce the impact of fire.
Proper training has been given to the staff to use fire blankets and extinguishers in an event of fire.
Flood water will not only cause disruption of power but also bring in mud silt that can destroy sensitive electrical connections. The water in the room with high voltage electrical equipments poses a high threat of electric shock.
Storm sewer modifications were done to the building. Machine room floor is installed with water detectors and sump pumps.
Tornados and High Winds:
Tornado is one of the most destructive disasters. Nothing much can be done except few precautionary measures to minimize the damage.
Protective covering should be deployed over the computing services, magnetic tape racks to avoid water and wind damage.
Weather radio alerts are installed in the administrative building and the staff is informed about the strong points of the building to seek shelter in such circumstance.
The threat of earthquake in Hyderabad is low but should not be ignored. The administrative building is not built to earthquake resistant standards. The earthquake has a potential of being the most disruptive for the disaster recovery plan. Due to the chances that the cold side on the campus could be affected similarly.
The preventive measures of an Earthquake are same as tornado.
With the latest networking the potential for unauthorized access has become more common. Computer crime usually affects the integrity, misuse of data. Computer crime can be from external or internal sources. Intruders can break through the network. The staff can build viruses and make error codes.
Systems should be protected from unauthorized access with the help of security products.
Passwords should be changed periodically.
Logs of invalid attempts to access data should be review regularly by the Security Administrator.
Antivirus software is installed on the systems
Firewalls are place.
Data should be encrypted.
Terroristic Action and Sabotage:
University computer systems are potential targets for terrorist actions such as bomb. The threat of kidnapping of key personnel also exists.
In order to avoid the terrorist action, few measures can be taken such as adequate light throughout the building during night.
All the doors in the machine room should be strong and properly locked.
Security measures should be observed while letting people in the machine room area.
Suspicious activities should be reported to police.
Security guards should be guard.
II. Disaster Preparation
Certain preparations have to be made in advance in order to facilitate recovery from the disaster in the administrative building.
Disaster Recovery Planning
Disaster Lock Boxes
Disaster Recovery Planning:
To have a plan is the most important thing. In an event of disaster the computing service will use this plan to analyse the effectiveness of the plan, which other departments is dependent on the disaster recovery plan within the university.
Each business unit of the university should develop a plan on how they will they conduct their business both in event of disaster in their own building or computing services which will prevent access to data for a period of time. The other business units should be in a position to function while the network is down and also have some plan to synchronize the data i.e., restored on the central computers. For example: a payroll office should be able to produce the payroll while the central computer systems are down and further they should update the central computers which the services are resumed.
In an event of disaster, if the main building is destroyed then rebuilding or repairing may take an extended period of time. Restoration of computer and network services at an alternative site is required. The university has number of option:
Warm Site: It is an expensive but most appropriate option for very large organisation. This warm site is setup in a remote location probably in a different city which is identical to the main site in every aspect. The two sites are ideally connected with high speed communication channels enabling the users at the primary site to access the computers for the classrooms and offices.
Cold Site: It is a site setup away from primary site used on temporary basis to place the computers and network services. This cold site has all the necessary arrangements needed to setup temporary network to
provide emergency services. The aim of cold site is to reduce the recovery time and lessen the impact of disaster economically. The university has chosen to use the cold site approach for the disaster recovery plan. Arrangements have been made for it in Business administration building(BAB).
Disaster Recovery Company: Many companies provide disaster recovery service on annual fee subscription basis. These companies may choose to deploy the warm site at their own office for sometimes provide mobile service (equipments in vans). This is a typical third party scenario.
Disaster Partnership: It is a mutual agreement between different organisations to aid each other in an event of disaster. The agreement will cover different type of facilities to be provided by each one of them.
It is plan that consists of a complete inventory of computers and network systems, softwares that are supposed to be restored after a disaster. The changes in the systems require a periodic updates to reflect the current configuration. Agreements have been made with vendors for emergency replacements. The current systems should be replicated to avoid the problems in delaying the recovery.
Data is The most important asset of any organisation in a context that new hardware, building, employees and be purchased or hired but the data that was stored on the old equipments cannot be gained at any price. Unless restored from backup site.
There are many ways in which data can survive a disaster
Remote Dual Copy
Disk subsystem: The approach is based on disk subsystem located at a site away from the primary computer facility and fibre optic cabling coupling the remote disk to the disk subsystem at the primary site.
Automatic data update: Data written to disk at the primary site are automatically transmitted to the remote site and written to disk there as well. This guarantees that you have the most up-to-the-second updates for the databases at the primary site in case it is destroyed.
Recovery process: is simplified by locating the remote disk subsystem at the disaster recovery site.
Cost factor: This option is somewhat expensive, but not prohibitively so. It does not require that an entire computer system be built at a hot site, just the disk subsystem. This option is typically limited to mainframe disk systems only.
Automated Off-Site Tape Backup
Tape subsystem The approach is for a robotic tape subsystem located at a site away from the primary computer facility and fibre optic cabling (the campus backbone network would be suitable) coupling the subsystem to the primary computer facility.
Data update: Copies of operating system data, application and user programs, and databases can be transmitted to the remote tape subsystem where it is stored on magnetic tape (optical writable disk media can also be used, but may be more expensive).
While this option does not guarantee the up-to-the-second updates available with the remote dual copy disk option,
Convenience: it does provide means for conveniently taking backups and storing them off-site any time of the day or night.
Advantage: Major benefit is that backups can be made from mainframes, file servers, distributed (Unix-based) systems, and personal computers.
Cost factor: Although such a system is expensive, it is not prohibitively so.
Off-Site Tape Backup Storage
Transportation: This option calls for the moving of backup tapes made at the primary computer facility to an off-site location.
Location: Choice of the location is important. Because securing the data and quick availability are two side of a coin.
Drawbacks: If a disaster strikes at the time the backup is made? There is always chance of the risk that tapes can be physical damaged or lost while transporting them.
Cost factor: There is also the time, expense, and energy of having to transport the tapes
The Approach Chosen by the Osmania University for backup
The University has opted to taking periodic backups of its primary mainframe systems, databases, file servers, and UNIX systems and storing those backups in two locations elsewhere on campus. The primary storage location is in Deccan information system centre (DISC). The second location is in the Business Administration Building (BAB) Room 105 which is also Cold Site recovery suite. The tape vault at the Administrative Services Building is the final storage location where the oldest generation of system and application backup tapes are kept.
Backup procedure followed by Osmania University
Every system that Computing Services operates is backed up regularly. The backup media for each of these systems is relocated to an off-site storage area where there is a high probability that the media will survive in the event a disaster strikes. Two off-site storage locations are used:
Deccan Information system centre (DISC) secondary site
Business Administration Building, (BAB) Room 105 (Cold site)
Three backups are maintained at all time. The latest backups are stored at the. Deccan information system centre (DISC). The second most recent are stored at the Business
Administration location (BAB). And the oldest are stored in the tape vault at the Administrative Services Building.
University Backup Cycle
When a new backup is made, the tapes are rotated through these sites. The new tapes go to DISC, Its tapes go to BAB. And its tapes go to ADSB. The tapes at ADSB are retained for use with the next round of backups.
After careful consideration it was decided that the backups should be made and rotated by weekly to be effective economically
In general, backups for each subsystem are cycled through the three sites.
Disaster Lock Boxes
The up-to-date copy of the plan is secured at cold site. This plan is very curtail at an event of disaster to take decisions. Two Lock Boxes have been maintained to hold these materials. The contents of both lock boxes are identical. One resides at Business Administration building, the other resides in the tape vault just off the machine room in the Administrative Services Building. The information in the boxes are kept updated and necessary measures are taken to avail the at least one copy in an event of the disaster
The lock boxes are to remain locked at all times. Keys to the boxes are kept by several key people within the department, including
Director of Computing Services - Mohammed Imran
Technical Services Manager -Aziz Ahmed
Operations Manager - Shravan Kumar
Disaster Recovery Plan Coordinator - Mohammed Irfan
In a disaster situation when entry into a lock box is needed but the key is not available, you can physically break the lock with bolt cutters.
Disaster recovery planning for University
The first step of drawing the plan is organizing the Disaster recovery planning team. The selected team members are expected to have experience, in-sight knowledge of the network system, ability to perform under pressure, quick and efficient decision making, great analytical skills, reliable and upfront.
If the team is efficient in the areas mentioned above the impact of disaster can be minimized to a great extend .quick recovery, business continuity can be expected.
The university disaster team is divided in to 8 sub teams in order to progress independently. For an efficient recovery the eight Disaster Recovery Teams are supposed to work simultaneously on different areas in an event of disaster at the university
Recovery Management Team: The Recovery Management Team oversees the whole recovery process. The other seven teams are represented in the Recovery Management Team. The Recovery Manager leads the Recovery Management Team. The Manager has the final authority on decisions that must be made during the recovery. The Recovery Manager is responsible for appointing the other members of the Recovery Management Team. Each member of the Recovery Management Team will have the responsibility for appointing the other members of the respective team(s).
2. Damage Assessment Team: the team is lead by technical coordinator. it is responsible to provide two things.
i. Provides information for the Recovery Management Team to be able to make the choice of the recovery site.
ii. Provide an assessment of the salvage ability of major hardware components.
Based on this assessment the Recovery Management Team can begin the process of acquiring replacement equipment for the recovery.
Facility Recovery Team : The team will be lead by Facility Coordinator.
i. This team is responsible for the details of preparing the recovery site to accommodate the hardware, supplies, and personnel necessary for recovery.
ii. Detailed layouts and instructions for the Cold Site preparation are included in the recovery plan.
4. Network Recovery Team : The Team will be led by the Network Coordinator.
This team will be responsible for overseeing the restoration of the campus network and all network connections necessary at the recovery site.
Because there is such a high degree of reliance on the campus network, for instruction, research, and administrative purposes, very high emphasis must be placed on restoring the network as quickly as possible.
Platform Recovery Team : Team will be led by the Technical Coordinator.
They are responsible for communicating needs and status information to other recovery teams and to coordinate restoration operations between parties working on different computer platforms.
Applications Recovery Team : Team will be led by the Application Coordinator.
This team will be responsible for conducting activities leading up to the approval and acceptance of application systems for production use.
Computer Operations Team : Team will be led by the Computer Operations Coordinator.
The team is responsible for providing all computing services.
Administrative Support Team : Team will be led by the Administrative Coordinator.
One of the most important functions that this team can provide is to take the burden of administrative details so that the engineers and technicians who are responsible for systems recovery can concentrate on their recovery work.
Recovery Management Team Roster
Computer Operations Coordinator
Some of the anticipated team tasks include:
Provide support for executing acquisition paperwork.
Assist with the detailed damage assessment and insurance procedures.
Determine the status of staff working at the time of the disaster.
Provide counselling services for staff or family members having emotional problems resulting from the disaster.
Assist the individual Team Coordinators in locating potential team members.
Coordinate food and sleeping arrangements of recovery staff as necessary.
Provide support to track time and expenses related to the disaster.
Provide delivery and transportation services to the Cold Site or other locations as required.
Provide public relations support (this function may be provided by University Relations).
Assist in contracting with outside parties for work to be done in the recovery process (such as the installation of equipment, or consulting assistance for the installation or recovery of software systems).
Network and back-up scenario of disaster recovery plan
This is a brief classification of the scenario. The scenario which we have taken is of Osmania University.
This University scenario has three sites.
Administrative building in which all admissions, student details, payrolls, staff details are present. This is the place where the fire disaster had occurred where all the important data is stored. If there isn't adequate recovery plan to get the data back the University would be in a critical position.
The University have a backup policy.
One is the primary backup in the Business Administrative Building which is also called as Cold Site.
Other one is secondary backup which is in Deccan Information System.
Though the Administrative Building has undergone disaster the University can recover the data because of these backup systems. Even if we have any problem with one of the backup systems like Cold Site we can retrieve the data from Deccan Information System Centre.
Disasters Event of Fire at Administrative building of Osmania University
In May 2008 there was a fire caused due to Electric short circuit in the air-condition of Administrative building of Osmania University ,the building was centre of communication of the campus with resources such as
Loss of resources
Computing and the network system
Human (death or injury)
Data and backup tapes
Switches, Routers, Hubs
All the network resources which were necessary for the smooth functioning of computing system of the department and other dependent departments was destroyed by the fire flames.
Data: the loss of data if not recovered could lead to the major impact of the disaster as all the information across the campus was centrally stored at the site. All most all the facilities will be aborted unless the data is backed.
Impact of the disaster: Since the data was stores centrally, the impact of disaster is not imaginable. It has impaired the network and also the various units to which it was source of information.
Loss of Infrastructure: The disaster has destroyed the infrastructure the maximum extend that it has to re-build from scratch. It is a major financial loss as well.
Loss of reputation: If the system is not recovered quickly the reputation of the university will be in disgrace, the reputation gained through the services provided (Quality of education, Availability, Management, timeliness, etc in many decades will sink.
Mistrust: It is the major concern to be consider
As all major losses have been mentioned above with their impact the worst scenario is of data loss.
New hardware can be purchased; Infrastructure can be built, but data if not recovered will leave the business out of surviving scope.
We shall see what kind of data backup procedures were practised, how efficient is the disaster recovery plan is of Osmania University (Hyderabad India)
Activating the Disaster Recovery Plan
The first step is initiated by Recovery Manager who can process through the recovery plan with the help able-bodied team. The next step is to establish the Recovery Control Centre. Then the Recovery Manager sets the plan into motion.
The Recovery Manager should retrieve the Disaster Recovery Lock Box to obtain an up-to-date copy of the plan if not handed out at the first meeting of the Recovery Management Team.
Recovery Manager should appoint the remaining team member.
Recovery Manager discusses the agenda with the team in the Recovery Management meeting at the Recovery Control Centre.
Each team member is responsible for their respective area. Tasks are designated to the team members.
Recovery Manage reviews and makes final decision about where to do the recovery.
Adjustments to the plan are discussed and key personnel are identified.
Recovery Management Team should immediately start the recovery process with their respective team members. Immediately relocating the equipments such as computers, links, telephone lines, fax machines, copier, furniture etc.
If possible mobile communication will be helpful during the early phases of recovery process.
All personnel must exercise extreme caution to ensure that physical injury or death is avoided while working in and around the disaster site itself. No one is to perform any hazardous tasks without first taking appropriate safety measures
Equipment Protection and Salvage
This document guides through the steps that need to follow in an incident (here fire) in order to protect the critical resources in the damaged area.
It is extremely important that any equipment with vital information such as magnetic media, paper stocks must be protected from further damage at the incident site. Some of them may be repairable or salvageable and can save restoration time. In an event of fire quickly cover all the magnetic tape cartridges, computer equipments, undamaged paper stock and with plastic sheeting or tarpaulins. Contact police to post security guards at the incident site to prevent it from looting or scavenging.
After protecting the media from further damage, start an immediate recovery to avoid further damage. Although we retain backups at offsite but the data stored on the media is priceless and would be tough to lose.
Move the salvageable equipment and supplies to a safe location until the cold site is ready. Take great care when moving the equipment to avoid damage.
Inventory estimation will be submitted to the Technical Coordinator and Administrative Coordinator who will decide the list of damaged items to procure to begin building the recovery systems.
In damage assessment we are trying to determine the damage accord to the hardware and facility due to the fire. The recovery will be tacking place in the cols site (Business Administration Building).
The team is trying to estimate the time to repair or replace the damaged resource. The estimated time include ordering, shipping, installation and testing time.
According to the list provided in the recovery section for each platform, we will consider the hardware items. They will separate items into two groups in which one group contains the destroyed and missing item and other are salvageable. The hardware engineers will evaluate and repaired if necessary. Based on this input Recovery Management team can begin the process of acquiring replacements.
Evaluation of damage to the structure, electrical system, air conditioning, and building network are to be conducted. According to the estimation, if the recovery at the original site will take more than 14 days then migration to the cold site which is Business Administration Building is recommended.
University's Emergency Procurement Procedures
The success or failure of this plan's ability to ensure a successful and timely recovery of the central computer and network facilities hinges on our ability to purchase goods and services with lightening speed.
The Administrative Support Coordinator is responsible for all emergency procurement for Computing Services. All Disaster Recovery Team members must submit their requests to the Coordinator. The Coordinator will follow the regulations established for emergency procurement and will work with the Buyer that has been appointed by the Purchasing Office to complete the acquisition.
The Administrative Support Coordinator is also responsible for tracking all acquisitions to ensure that financial records of the disaster recovery process are maintained and that all acquisition procedures will pass audit review.
The Administrative Support Coordinator must also be aware of the University's insurance coverage to know what is and is not allowed under our policies. In the event an item to be purchased is disallowed by insurance coverage, or if expenses exceed the dollar limits of the insurance coverage, the Coordinator must consult with the Recovery Manager and other responsible University personnel (such as the University's Business Manager).
BUILDING AND PROPERTY INSURANCE:
Currently, the University of Osmania System carries insurance coverage provided by Safeguard Insurance Company - through Reliance Insurance Agency, Abids Station road (04023348058 voice, 04024540930 fax). Mr.Kashif Mohiuddin, Vice President, is the authorized Agent's Representative.
The insurance provides very comprehensive coverage.
- Cost of restoring a damaged facility.
- Equipment covered, loss of business income.
- The extra expenses incurred in equipping and operating a replacement or temporary facility.
As soon as possible after a loss, the Director of Risk Management and Insurance should be notified. If this individual cannot be reached, the University's insurance agent should be notified. The agent will then notify the insurance company. The insurance company will assign an adjuster, loss control engineer, and other experts to assist in the loss claim.
In the event of loss or damage, take all reasonable steps to protect the covered property from further damage. Also, keep a record of all expenses for emergency and temporary repairs which should be covered in the settlement.
Business Interruption Coverage
The University's insurance coverage provides for loss of business income in the event a disaster strikes. The coverage is quite extensive and should cover most expenses as a result of the
implementation of this disaster recovery plan. The following is a portion of the University's insurance policy which describes the coverage for loss of business income.
BUSINESS INCOME - We will pay for the actual loss of Business Income you sustain due to the necessary suspension of your "operations" or of tenancy during the "period of restoration". The suspension must be caused by direct physical loss of or damage to property at or within 500 feet of insured premises. The loss or damage must be caused by or result from a Covered Cause of Loss.
Some exclusion applies. Refer to the actual University insurance policy for detailed descriptions of these exclusions, loss determination and other coverage.
Purchasing Vendor List
This document contains a list of hardware, software, and supplies vendors.
Block and Quall
2342 Station Road
0402445566 (Mohammed Masood,Shaid)
Products: BTI ELC2 and ELC3 LAN Controllers
0406578345 (Gopal Krishna)
Products: Omegamon/CICS, Omegamon/MVS, OmegaView
0405590804 (Pradeep )
Cold site preparation
Cold site has been prepared for the recovery of primary computing and network facilities after disaster. In Osmania University the Business Administration Building, the BAB 105 suite is thought to be made as the cold site. If the Recovery Management Team opts to use this site for recovery, some work should be done for housing the computer systems, network equipment and disaster recovery personnel.
The site must require additional work for the power and cooling requirements of the mainframe equipment. We also need to consider proper telecommunications and networking to the building.
Cold site area
The cold site in the BAB 105 suite is located on the fifth floor of Business Administration Building.
The suite consists of four computer labs (105C, 105E, 105G,105J), space for graduate students (105B, 105D, 105F, 105K), storage areas (105A,105J, 105JA, 105L, 105M), network enters the building (105AB), and some one place (105).
Site preparation works
The cold site is less prepared, so there is a lot of work which has to be done to get the site ready. This is a quick review of work that has to be done.
All occupied space in BAB 105 must be cleared to make it available for the computing services staff and select users to do the work.
Breaker panels and conduit back to the main power source. Electrical contractors will have to install a power distribution unit.
The plan does not call for either of the power conditioning equipments i.e., UPS or motor generator, which may put the installed equipments at risk during power failure.
Install the air conditioning equipment.
Additional fibre optic cable is needed for its termination of the campus backbone network at the cold site.
Plan makes special provision for the installation of equipment such that raised flooring is unnecessary which are found in major computer rooms.
105C/105E/105G/105J rooms should be combined for the installation of the equipment.
The entrance of the 105G should be widened for the accommodation of the larger pieces of equipment.
The cold site should be kept under electronic entry security.
There should be a access point for a loading dock in the vicinity of the cold site where large truck can deliver equipment.
The task of preparing the remaining end-user application can begin as soon as subsystem and platform system software operate correctly. Each platform has there Owen unique recovery road, like in some cases there will be very little to do except for testing and in other case analysis and data synchronization are required.
Application Recovery Team is responsible to carry out this phase. They should review each application area. Analyst who is familiar with the application should do the review while working with an application user representative.
The items that should be considered:
Review of the application and status of file and database after general platform recovery.
Identify any changes to area where the application must be synchronized with others.
Identify changes to bring the application for a production status.
Identify and review application output to certify the application.
Pay roll department was analysed as one of the most the critic application. Delay in the processing of this application could put a lot of pressure on students, staff and others who depend on it. Web server and purchasing applications are also given high priority since they will be needed during recovery.
There are three critical payment applications every month:
FIRST PAYMENT-which should be paid on 2nd of every month
SECOND PAYMENT-which should be paid on 15th of every month
THIRED PAYMENT-which should be paid on 28th of every month
If a disaster takes place and the University is in a position where these obligations can't be met, then a secondary plan is being developed.
The main interim solutions
Manual procedures should be implemented which would allow payment obligations to be met and records would be maintained and updated when ready.
There are a lot of ways of testing DRP. We are considering Walk Through Testing for our scenario.
Walk Through Testing:
All the recovery team members meet to verbally walk through the specific steps of each specification of the disaster recovery process as documented in disaster recovery plan. This is mainly done to confirm the effectiveness of the plan and to identify bottlenecks, gaps or other weakness in the plan
Some benefits of testing are:
Feasibility determination of recovery process
compatibility of backup facilities is verified
teams working in the recovery process ensured adequately.
the ability of the organization to recover is demonstrated
mechanism for maintaining and updating the recovery is provided
Maintenance of DR plan of the university
Disaster recovery plan is critical, but if a workable procedure for maintaining the plan is not also developed and implemented then plan will rapidly become obsolete.
Access to the plan:
Through fake-scape browser or web server the disaster recovery plan has been designed to be access as a www document.
For the consistency in the format and use some standards have been applied into the design. This standard should be used by the maintainers well adding and revising the plan.
The plan will be evaluated frequently and all portions will be checked by technical services. The coordinator of disaster recovery plan has the responsibility of ensuring that they meet standards.
Updating the plan:
In this changing environment of computer industry it is inevitable that this disaster recovery plan will become outdated unless someone keeps it updated. Whenever there are any changes in software or hardware, the Computer Service management will decide whether it is necessary to change the plan. The technical Services will incorporate the changes in to the body and distribute.
The zeal of business continuity is the motivation aspect for recovery planning. Although it does require investment but the benefit will come to light when the disaster strikes. Although existence of DRP does not mean that the site is completely secure but it will let you be in business by minimizing the impact of the disaster. The Disaster Recovery Planning made by the University was a wise decision. The DRP of the university enabled the functionality of the university stable if not 100% up front. the pay roll department which was critical was given top priority at the temporary site. Therefore university had invested in DRP and avoided the worst consequence.