This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Since the more quickly excused types of faults in a software are failures after all, it is for sure much difficult to miscalculate the quantity of weight a bridge can keep than it is to forget free memory, they are not resistant to causing a great amount of damage. A broader margin of error is allowed because of the integration of software into billions of modern electromechanical equipment which are combined with the fallibility of its developers. (Musick)
Software's small error's negative effect on a great amount of population was witnessed by the London Ambulance Service's (LAS) in the year of 1992 as computer aided dispatch (CAD) software system failure. However, a careful observation of the events around the incident give an explanation that there was more than just an error in the software. In fact, the overall careless attitude with which the application's making was reached from its conception set the reason for such a heavy failure. (Musick)
Causes of system failures are intended to be explained by accident reports which are based upon the reports of many various teams of experts and are the outcome of a long-time investigation process. They are vital documents from a perspective of software engineering for the reason that they guide the intervention of authorities that must lessen the impact and frequency of failures of system. However, a number of problems are associated with current practice. For example, the lack of techniques which can investigate the role of software failure was reported by Rand recently. Similarly, no established technique exist which aid to make sure that these failures help to inform subsequent design. Therefore, this paper indicates the improvement of next generation accident reports by different numbers of relatively simple graphical notations. (Johnson)
Provided the importance of accident reports that are for the operation and development of safety-critical systems, it is surprising to see that there has been relatively less research. The amount of related literature about safety-critical development and even, in general, the use of design documents, is not matched in the area of the reporting of accident. The research continues to come up with the techniques that can help in analysing the reasons of the failure of software rather than upon the mechanism of delivery that publicize those solutions to practice systems engineers and interface designers. (Johnson)
The Ambulance service of London is the largest ambulance service in the whole world which covers 620 square miles in distance and conscientiousness of seven million people. London Ambulance service (LAS) consist of 700 vehicles, 70 ambulance stations, helicopter, over 3000 staff members and ample of motor cycles. In an average, they respond over 2500 calls per day. Keeping all these things in mind, LAS planned to launch an efficient Computer aided Dispatch (CAD) system which however, also resulted in loss and failure. The following tells us about the problem and disaster that took place in implementing high responsibility, sensitivity projects and technology. (London Ambulance service essay)
The LAS emergency dispatch system was completely run manually and included three main tasks in the mid-1980s:
After the call taker received call requesting emergency ambulance service, he filled out a paper form, and then tracked the caller's position on the map, and took the form on a conveyor belt. (Musick)
The forms were taken from the conveyor belt to another LAS employee who used to analyse the statuses and locations of ambulances in the region of call. The call was then assigned to an available unit and the assignment on the form was written then. (Musick)
Then a dispatcher made contact with the ambulance which was assigned the call and he provided the operator(s) with further details of the call. From the perspective of 2006 looking back on this system, in which computers have ruled everyday lives, a manual system that depends on people's memories, human reasoning for optimal resource utilization and paper looks silly. (Musick)
The government imposed the rule that all the calls should be answered within the next three minutes. This system's highlights included mobile data terminals (MDTs) and automatic vehicle locating system (AVLS) that would be put inside the emergency vehicles and would aid in communication by using a computer terminal. The LAS searched for looking to adopting a previous system but met with complications with each of the available choices. LAS management decided to make a novel one and proceeded to collect requirements without receiving input from the crews of the ambulance or dispatchers. (Musick)
Management set bigger goals. Rather than simply helping dispatchers, this purely computerized system would do nearly all the things automatically. The simple sequence of actions performed by a person would be to receive a call, enter the data into a computer terminal, and then take action if the system showed exception messages as a result of no ambulances being available for a time longer than 11 minutes. The call locations would be mapped by the software. This map data would then be used by the system and the status and location details given by the AVLS to search and dispatch the available ambulance that is closest to the incident's place. (Musick)
After a lot of problems, also including cancellation of project and re-design, a software system was made and deployed the morning of October 26, 1992. However, a few hours later, complications began to rise. The AVLS was not able to keep track of all the ambulances. It started to send various units to some locations and no units at all to the other places. The efficiency of assigning vehicles to call locations was not up to the standard. The system began to create a high quantity of exception messages. The problem was increased when people called back again additional times because the ambulances they thought would come, did not come. As more and more incidents got into the system, it became increasingly choked. The LAS switched back to a part-manual system the next day, and the computer systems got shut down completely when it stopped working at all eight days later. (Musick)
Many people were affected by the failure of computer system because of the vast area serviced by the LAS. As many as 46 deaths resulted that had requested ambulance and it did not arrive on time. One heart attack patient was made to wait six hours for an ambulance to arrive before her son finally took her to the hospital. After four hours of that, the LAS called to know if there was still a need of an ambulance. Another woman had to call the LAS every 30 minutes for up to three hours before an ambulance finally reached. It was too late then because her husband had already expired. (Musick)
The time that the system was run, 81 known issues existed within the software and no load-tests had been run. Provisions for a backup system had not been made. With the gap of 10 months between the times, dispatchers were prepared first to use the software and when it deployed played its role in the disaster, there were three primary faults that immediately caused the failure:
With the insertion of the given incomplete or invalid data regarding the statuses and positions of ambulances, the software system did not function well. (Musick)
A broad variety of quirks existed in the deployed system in various parts of the user interface. For example, black spots were present in parts of the MDT terminal screens and this caused prevention of ambulance operators from gathering all the information required. When ambulance crews tried to correct their mistakes after pushing the incorrect wrong buttons, the fix was not accepted by the system. The compensation for the error conditions that happen in normal, day-to-day operations was again failed by the software. (Musick)
However, the root cause of the main failure of the system was a memory leak that occurred in a small part of code. This deficiency reserved the associated memory that held the main information on the server even after when it was no longer required. After a sufficient time, as with any memory leak, the memory got filled up and caused system failure. (Musick)
Risk management expert and IEEE member Robert Charette suggests, "Project managers make bad decisions and this is probably the single biggest reason of the failure of software today." As the software controlling the LAS's CAD system had many faults and failures, the events surrounding the failure, the main process which the LAS employed for the development of the CAD and the state of the LAS as an organizational entity also played a larger role in the failure of the system. (Musick)
After the reopening of the project, the LAS individuals running it approached it with the primary aim to save as much money and time as possible. Even though this is a reasonable goal of any project, it weighed a lot in the LAS's CAD project. As an attempt to save money, the LAS concluded to reuse the associated hardware that was already bought when they were working on the failed project. (Musick)
Two people were in charge of selecting a software vendor for creating the system. They were a manager who was expected in becoming as a redundant along with a contractor who was already in the organisation temporarily. The roles of these individuals' in the project gained attention to question their capabilities to choose the best company for the job. Furthermore, as the most vital factor in choosing a vendor, the selection committee had to weigh a bid's price. All the companies that submitted bids that were greater than 1.5 million were immediately turned down. This is a very low price, especially considering the fact that the CAD system project had failed even after 7.5 million had been poured it. (Musick)
Choosing a developing organization was further banned by the need of the project be finished in 11 months. Any bids not coming up to this heavy constraint were once again not considered. Many companies gave modified deployment schedules in which after the 11 month was the deadline and the rest a year, some functionality would be given. These were turned down too. (Musick)
The LAS received a bid of just fewer than 1 million that was given by a conglomeration of companies. The software part of the system was "presented as a throw-in in a deal of hardware" for a meagre 35 thousand and was finished by a company known as System Options. System Options were announced as the project lead furthermore. Even though it had formed many smaller packages of software still System Options had never done such a vast project and had no past experience overall with "control systems, safety-critical, real-time, and command." Its inexperience led many of the contractor selection committee to increase concerns over the company's capabilities to act on the task at hand. Even though these concerns were further approved by an audit of the selection process, however, the LAS selected the company. (Musick)
It was unable to fetch all the important data.
There were radio communication faults.
Problems with the crew.
Less people to take calls. (Hougham, 1996)
The Crash of the System
The system was not quite loaded around 26 October 1992. The main problems which were caused by particularly the communications systems could easily be managed by the related staff. In case when the number of ambulance requirements in different places increased, the amount of wrong and incorrect information that was recorded by the system gradually increased. For instance, many ambulances were sent at the same place, to the same incident, and as a result, the system was left with fewer ambulances to allocate. The system was also responsible of the placed calls that didn't go through the required protocol and these eventually generated the exception messages. As a result, these exception messages increased rapidly increased to such an extent that the related staff was unable to clear the queue. It became very difficult for the staff in attending all the messages that just quickly scrolled off the screen. The eventually slowed the system. (FLowers & Stephen, 1996)
With very less resources to allocate, along with the increasing problems to deal with the waiting queues, it took even longer to allocate the required resources to incidents. The two worst problems that were seen in the system included:
Lack of ability of the software in distinguishing between the duplicate calls that were from different people but from the same incident.
Failure of the associated software in maintaining and keeping a track of the logged calls.
On the other hand, the patients on the receiving end also became very frustrated due to the delays of the ambulances that arrived at the incidents. This eventually led to a great increase in the calls that were made to the LAS HQ and these were mainly related to already recorded event. The increase in calls along with a slow system resulted in great failure which in turn caused further problems and delays for the patients. In the case of ambulance end, all the crew members became very frustrated due to the incorrect allocations. This eventually resulted in an increased number of events in which the related crew failed in pressing the right buttons, etc. Hence; this crew frustration seemed to contribute to an increased volume of the voice radio traffic. (FLowers & Stephen, 1996)
Factors Contributed to Such a Disaster
The different inquiry reports suggested that neither the CAD system nor its associated users were prepared for the proper implementation on the 26th of October.
The CAD software however was incomplete and not properly tuned. It was also not fully tested before.
The existing problems that were associated with the information communications of this system, mainly the communication to and from all the mobile data terminals.
Scepticism was present over the Automatic Vehicle Location System (AVLS).
The Staff which was present in the Central Ambulance Control (CAC) along with the ambulance crews lacked confidence in their system and they were also not fully trained.
The main physical changes in the layout of the control room had clear meaning that all inter-linked CAC staff worked in many unfamiliar positions. They worked without any paper backup, and they were very less likely to able to work with other colleagues.
There wasn't any attempt seen in foreseeing of the effect of the inaccurate or the incomplete data that was available to the system.
All these imperfections eventually resulted into an increase in the exception messages that they had to deal with and which could eventually lead to more problems. (FLowers & Stephen, 1996)
For the solution of all these problems, a partial manual system was very much devolved with the basic aim and opportunity to override the allocations. Upon a reboot that failed in fixing the persistent problem, the fully established manual system was formulated. (LASCAD FAILURE)
â€¢ Re-configuration of the control room.
â€¢ To install more CAD terminals along with the RIFS screens.
â€¢ Neglecting the paper backup system.
â€¢ Properly separating the resource allocators from the radio operators and the associated exception rectifiers.
â€¢ Going for the 'pan London' instead of the operating in three of its divisions.
â€¢ Using only the system that was proposed for the resource allocations.
â€¢ To allow some of the call takers to allocate the resources.
â€¢ Separate the allocators from various call sources. (Sommerville, 2004)
Due to many of these short comings, the LAS eventually planned to computerize its main dispatch system. The first attempt for this however failed in the year of 1990 after spending around £7.5 million. For the second attempt that was embarked upon in the 1991 also faced crucial challenges and this eventually collapsed on the 4th of November 1992, 9 days after its launching all around. The proposed project was specifically designed so that it had Gazetteer and the required mapping soft wares for proper finding of the places. CAD hardware along with the software for the automatic resource allocation and well communication system between all the vehicles obtained accurate results. (Adamu, Alkazmi, Alsufyani, Shaigy, Chapman, & Chappell.)
The CAD system mainly suffered from a strict time and also from the financial constraints. For positive results, there should have been a greater negotiation along with communication between the different stakeholders that were controlling the time and also all the associated financial issues of the system. A breakdown in the LAS communication was also seen between the management and its staff. This clearly proved that all the stakeholders' interests were not included in the LAS system. The LAS management along with its workers should have gone through the sets of training sessions and the required standards for the benefit of the system. Moreover, the system was also never tested which meant that it had no before-hand knowledge of how their system will appear and how all the related problems will be solved. (Adamu et al)
The tight schedule was also coupled with the on-going and continuous changes to the system that eventually did not allow the related system developers to carry out their required testing. Hence, the system would have been benefited from the eventual independent quality of the assurance team that was working on the system. The LAS from then has been greatly benefited from the computerizing its dispatch system and it is now working on how to improve its efficiency by signing a new contract. (Adamu et al)