# ATH Diagnostics Inc.

**Published:** **Last Edited:**

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

### Abstract—

The present paper is the second coursework for the module CIM238, Network Design and Management. The coursework comprises of two parts. The first part is an availability study of the expanded extranet of ATH Diagnostics Inc, which was developed at the final part of coursework A. The expanded extranet is the second's floor LAN at Thessaloniki branch office and the LAN at the branch office in Sofia, Bulgaria. The second part is about a capacity assignment problem appertaining to the minimum average delay of a packet in a given topology.

### I. Introduction

The subject of the present essay is divided in two parts. The first part is an availability study on the expanded extranet, which is one of the objects in coursework A. The expanded extranet of ATH Diagnostics Inc. is the second's floor LAN in Thessaloniki branch office, which facilitates 25 users, and the branch office's in Sofia LAN, which facilitates 10 users. It is assumed that a company, which sells turn key solutions, lays an offer on the maintenance of the expanded extranet, and the stated availability is part of the proposed Service Level Agreement (SLA). The SLA declares that the annual overall downtime of the expanded network should be less than 8 hours and 46 minutes with 48 hours response time.

From the above declarations I assume that the company contracts that skilled personnel will be in contact with the customer, within 48 hours after receiving the first phone call. The offered “8.766 hours” overall downtime is translated as the company contracts that resolution time will be not more that the quoted time. That includes from simple desk help up to field work offered by specialized personnel. In general, I assume that the signed agreement is oriented to service availability, considering that the company sells the supporting hardware and the application software, including the image view application [2],[3].

Consequently, the first part evaluates the offered infrastructure according to the total availability of the several service scenarios, from the user's perspective. The second part is a network optimization problem. In a specific network topology with a given maximum cost, I calculate the optimum link capacities so the average delay of a packet that traverses the network, will be minimal.

### II. Exercise A

A. Definitions

Before proceeding to the study, I cite the necessary equations and definitions [1],[8]. Reliability is the probability that the system will perform its intended function under specified working condition, for a specified period of time.

The reliability function R(t) is given by the probability that a system will be successfully operating without failure, in the interval (0,t], or:

where T is a random variable representing the time to failure.

The failure probability F(t), or Unreliability, is the probability that the system will have fail by the time t, or:

The Mean Time to Failure or MTTF is the expected time of a system in working condition before a failure occurs. It is given by the type:

where parameter λ is the failure rate.

Maintainability is the probability that a failed system will be restored to a functioning state, within a given period of time. If T is the time to repair or the total downtime has PDF g(t), then the maintainability V(t) is defined as the probability that the fail system will be back in operating condition by time t,

where parameter μ is the repair rate.

The Mean Time to Repair or MTTR is the expected time of a system to remain in non-operating condition. It is given by the type:

The formulas 4 and 5 are valid if the time distribution functions follow an exponential distribution.

The Mean Time between Failures or MTBF is the expected value of the random time between system failures. It is given by the formula:

The Availability A(t) of a system is defined as the probability that the system is functional (available) at time t. The steady state availability is given by the formula:

Practically, Availability A is the percentage of time during which a service is functioning properly. By replacing from formulas 3 and 5 in formula 7, we get the availability as a function of rates λ and μ:

If the MTTF is replaced with MTBF in formula 7 we get a good approximation of availability A. Indeed, the formula 9 will be in use for the rest of the coursework. Similarly, Unavailability UA is the percentage of time during which a service is down. It is expressed as follows:

### 1) Availability of a System

If a system comprises of separate subsystems and the system fails when any one of its elements fails (serial A), then the total availability is the product of the availabilities of the partial subsystems [2]:

If a system comprises of partial redundant subsystems (parallel A) and the failure of a subsystem does not postulates a total failure, then the total unavailability is the product of the un-availabilities of the subsystems:

The formulas 9, 11 and 14 will be in use for the rest of the coursework.

### B. The ATH Diagnostics Case

1) Methodology

Most real systems (or networks) contain both serial and parallel subsystems according to their specific topology. The method [2] which I am going to use is that I first calculate the parallel availability of the parallel subsystems with formula 14 and then I calculate the availability of the serial subsystems, with formula 11. The partial availabilities are calculated from formula 9.

I assume that the MTBF values are accrued from datasheets of the relevant components, and the MTTR times are given. I assume that ATH diagnostics does not keep any spare parts in stock, for troubleshooting by its own personnel, and depends exclusively on the external support. As long as the SLA offers 48 hours response time, I assume that this is the mean time that the specific component will be inoperable, or the component's MTTR. Thus, every part which necessitates replacement will be done after 48 hours minimal.

As I have mentioned in the introduction, I assume that the specific SLA contains paragraphs which state the availabilities for the services which the expanded extranet supports [3],[4]. These are the Web service, the Email service, the Database service and the Image View. It is obvious that the user in Sofia branch office will counter different availability comparing to the user in Thessaloniki. Therefore, I consider different scenarios, each for every application and for both, the remote offices. The intersection among the distinct scenarios is that the overall annual downtime should always be less that 8 hours and 46 minutes, or the total availability should be not less than 99.9%.

### 2) Assumptions

I have assigned some up-to-date hardware as network components, small business models mainly, according to the nature of ATH Diagnostics [6],[7]. I have selected some arbitrary MTBF values for the network components as well, although real values are likely to be close. In general the MTBF of the server is calculated as the availability of a system, which combines parallel and serial components. Serious contributors to total MTBF are the movable parts, as the hard disks and the chassis fans. Similarly the MTBF of the router is derived from the basic frame and the several add-on interfaces. Usually, most of the parts of an assembly are essential to device functionality, therefore the MTBF calculation accounts serial parts.

The two subnets are depicted in the following 1. The selected hardware is:

Routers: I have selected the Cisco 2901 router (MTBF is 300,000 hours), for ATH_2 and THES nodes and the Cisco 1941 router (250,000 hours) for the SOF node. A feature which is in common is that they have at least 2 empty slots for general purpose interface cards (like X.21 for the FRAD interfaces) and two fast Ethernet ports.

Switches: I have selected the Cisco 2960-24 switch (450,000 hours) for main switch in ATH_2 and THES nods, the Cisco 520-48 switch (400,000 hours) in THES second floor and the Cisco 500-24 switch (350,000) in SOF node. The port selection is done according to the number of available users.

Servers: I have selected the HP ProLiant ML300 Server series (MTBF is 600,000 hours). The standard configuration includes hot plug hard drives, a RAID controller and two hot plug redundant power supplies. Each application server is a stand alone tower server, running windows server 2003 OS.

Something crucial to refer is that, although the SOF node belongs to another country, I consider that the interconnection to Frame Relay infrastructure belongs to the same provider, with availability 99.999%. The power provider although, is different. In Greece the nodes ATH_2 and THES are powered by DEH with 2 hours annual downtime, so the availability is 99.977%. In Bulgaria the annual downtime of the public power network is 4 hours, so the availability is 99.954%.

One critical aspect to consider is User Error. Normally I had to consider system crashes due to user interference, including application misuse. I have chosen to follow an easier path and to assign an annual downtime for every application, indifferently if was caused by a software bug or by human interference. However, all the applications incorporate the capability to restart, in the case that something goes wrong. So I have assigned 60 minutes downtime to commercial applications, the Web service, the Email service and the Database service and 3 hours downtime to Image View application.

Therefore the availability will be: 99.989% for the 3 commercials and 99.966% for the custom application, respectively. The availability results from formula 9, as follows:

The 8766 annually accumulated hours accounts also the ¼ of a leap year's day. I also assume that all physical layer connections, from Ethernet 100baseT up to X.21 serial protocol, have 100% availability. Considering the simplicity of the paths, I implement routing policy by static routes and not by a dynamic routing protocol such as RIP or OSPF. So there is no need to calculate any protocol availability.

### C. Availability Scenarios

I consider individual scenarios, each for every of the four services, for each of the two sub-networks which comprise the expanded extranet [2]. So the number of the scenarios is totally eight. Considering that the commercial applications have equal MTBF, the number of the scenarios diminishes in four, two for the Sofia user and two for the Thessaloniki user. To calculate the total availability, is required to draw the Reliability Block Diagram or RBD, from the Start Node to the End Node. The total availability results from parallel and serial node combinations.

### 1) Scenario for the user in SOF node

The start node is the SOF LAN, which is the Cisco 500-24 switch. To have Web service on every LAN terminal, the router Cisco 1941, the SOF power network, the Frame Relay service, the ATH_2 power network, the router Cisco 2901 and ATH_2 main switch, which is the Cisco 2960-24, should be operational. Furthermore, the HP ProLiant ML300 Web server in Athens' server farm and the Web server application should be operational, so they are in series with the rest of the components. The following 2 displays the relevant RBD.

Availability calculation matrix for Web service in SOF node

Component

MTBF (hours)

MTTR (hours)

Availability (%)

Cisco 500-24

350000

48

99.986%

Cisco 1941

250000

48

99.981%

SOF Power

8766

4

99.954%

Frame Relay

8766

0.0877

99.999%

ATH_2 Power

8766

2

99.977%

Cisco 2901

300000

48

99.984%

Cisco 2960-24

450000

48

99.989%

Web service

8766

1

99.989%

HP ProLiant ML300

600000

48

99.992%

Total Availability

8766

13.020

99.852%

In the table above, I calculate the total availability by using the formula 11. I multiply the partial availabilities of the components which are essential for the Web service operation, starting from the final user, up to the application server.

The total availability for the user in Sofia, of the Web service, and consequently for the rest of the commercial applications, is 99.852%. The annual downtime (MTTR) is 13 hours and 12 minutes, far above the contracted value of 8 hours and 46 min (A is 99.9%).

Then, I replace the Web service with the Image View, which has annual MTTR 3 hours, and I repeat the calculations. The total availability falls to 99.829% and the annual downtime rises to 15 hours, 13 min and 48 sec. The calculated downtimes are far above the contracted limit and therefore modifications to the network are inevitable.

### 2) Scenario for the user in THES node

The start node is the THES second's floor LAN, which is the Cisco 520-48 switch. To have Image View on every LAN terminal, the switch Cisco 2960-24, the router Cisco 2901, the SOF Power network, the Frame Relay service, the ATH_2 power network, the router Cisco 2901 and ATH_2 main switch, which is the Cisco 2960-24, should be operational. Furthermore, the HP ProLiant ML300 Radiology server in ATH_2 server farm and the Image View application should be operational, so they are in series with the rest of the components. The following 4 displays the relevant RBD.

Like the previous scenario, the total availability is calculated by the product of the partial availabilities:

Availability calculation matrix for Image View in THES node

Component

MTBF (hours)

MTTR (hours)

Availability (%)

Cisco 520-48

400000

48

99.988%

Cisco 2960-24

450000

48

99.989%

Cisco 2901

300000

48

99.984%

THES Power

8766

2

99.977%

Frame Relay

8766

0.0877

99.999%

ATH_2 Power

8766

2

99.977%

Cisco 2901

300000

48

99.984%

Cisco 2960-24

450000

48

99.989%

Image View

8766

3

99.966%

HP ProLiant ML300

600000

48

99.992%

Total Availability

8766

13.525

99.846%

From the table above it is clear that the total availability for the user of THES node of the Image View application is 99.846%. This gives annual MTTR of 13 hours, 31 min and 30 sec, which is above the contracted value. The total availability for the rest of the commercial applications (with MTTR 1 hour) rises to 99.869% and the annual downtime falls to 11 hours, 31 min and 22 sec, which are still above the contracted limit. At the following section I consider a generic solution to increase total reliability.

### D. Network upgrade

The planning network upgrade will be based on the following principles [2],[4]. Considering the nature of ATH Diagnostics Inc, it is crucial to install a power backup system in every diagnostic center (Sofia, Thessaloniki), including the headquarters in Athens, where the server farm is installed. Therefore, I install a UPS system in ATH_2, THES and SOF nodes, which supports the power supply from national power networks. The ATH_2 UPS is redundant to ATH_2 power and similarly, the SOF UPS to SOF power etc. The UPS system that I have chosen has MTBF 400,000 hours. Considering that it is also maintained by the contracting company, its MTTR has the value of 48 hours. So the UPS availability is 99.988%.

The Cisco 2901, which is installed in ATH_2 node, is the main router of the topology and all the traffic, due to application demand, passes through it. It is advisable to install a redundant 2901 router. The Cisco routers support the HSRP protocol, which is a failover capability. When be in use, a main router's failure will activate the redundant router.

### 1) Improved scenario for the user in SOF node

The following RBD in 6 shows the complex availabilities, which emerge from the upgraded network and the RBD in 7 shows the simplified serial availabilities.

The following equations show how the combined availability is calculated from formula 14:

The MTBF and MTTR are calculated when resolving formula 9 by MTTR and MTBF, respectively:

The following 8 displays the calculated availabilities. It is interesting to see how small becomes the annual power downtime due to a redundant power source, or how large becomes the total MTBF of a router pair, when assuming 48 hours of downtime. Something to note is that I have assumed that the failover mechanism is instantaneous (zero delay).

Availability resolution matrix for parallel combinations

Component

MTBF (hours)

MTTR (hours)

Availability (%)

SOF Power

8766

4

99.954390%

SOF UPS

400000

48

99.988001%

SOF Total Power

8766

0.000479724

99.999995%

ATH_2 Power

8766

2

99.977190%

ATH_2 Ups

400000

48

99.988001%

ATH_2 Total Power

8766

0.000239916

99.999997%

Cisco 2901

300000

48

99.984003%

Failover Cisco 2901

300000

48

99.984003%

ATH_2 Router pair

1875599997

48

99.999997%

The improved total availability of Web service in SOF node results as in the previous section and the calculations are depicted in the following matrix, in 9. The annual downtime has felt to 5 hours, 36 min and 43 sec, because the availability is now 99.936%, far above the contracted value. Similarly, the availability for Image View service has rise up to 99.913% and the annual downtime has felt to 7 hour, 36 min and 43 sec, below the contracted values as well.

### Improved availability of Web service in SOF node

Component

MTBF (hours)

MTTR (hours)

Availability (%)

Cisco 500-24

350000

48

99.986288%

Cisco 1941

250000

48

99.980804%

SOF Total Power

8766

0.000479724

99.999995%

Frame Relay

8766

0.0877

99.999000%

ATH_2 Total Power

8766

0.000239916

99.999997%

ATH_2 Router pair

1875599997

48

99.999997%

Cisco 2960-24

450000

48

99.989334%

Web service

8766

1

99.988594%

HP ProLiant ML300

600000

48

99.992001%

Total Availability

8766

5.612

99.936025%

### 2) Improved scenario for the user in Thessaloniki node

The second's floor LAN in Thessaloniki uses the Image view application, whose data traffic flows through the upgraded infrastructure in the ATH_2 node. The THES total power is equal to ATH_2 total power. The RBD resolution is similar as in the previous scenario.

I have done the rest of the calculations as in the previous scenario. The total availability for the Image View service has rise up to 99.907% and the annual downtime has fall to 8 hours and 7 min. For the commercial application the values are 99.93% and 8 hours, 6 min and 58 sec, respectively.

### Improved availability of Image View in THES node

Component

MTBF (hours)

MTTR (hours)

Availability (%)

Cisco 520-48

400000

48

99.988001%

Cisco 2960-24

450000

48

99.989334%

Cisco 2901

300000

48

99.984003%

THES Total Power

8766

0.000239916

99.999997%

Frame Relay

8766

0.0877

99.999000%

ATH_2 Total Power

8766

0.000239916

99.999997%

ATH_2 Router pair

1875599997

48

99.999997%

Cisco 2960-24

450000

48

99.989334%

Image View

8766

3

99.965789%

HP ProLiant ML300

600000

48

99.992001%

Total Availability

8766

8.117

99.907487%

### E. Final results

The following matrix in 10 contains the concentrated results for the expanded extranet of ATH Diagnostics.

Scenario

Availability before upg

MTTR before (hours)

Availability after upg

MTTR after (hours)

Web service in Sofia

99.852%

13.020

99.936%

5.612

Image View in Sofia

99.829%

15.023

99.913%

7.613

Web service in Thes.

99.869%

11.523

99.930%

6.116

Image View in Thes.

99.846%

13.525

99.907%

8.117

It is clear that the Image View, which is the most demanding application, induces the larger flaw in the total availability. As I have assumed in the beginning, all services are running in stand alone tower servers. To use redundant servers requires replacing the server tower with a server cluster, or a blade server, installed on a rack. This will expand the cost to unreasonable value, not suitable for the nature of ATH Diagnostics.

Instead of doing this, I have add redundant power everywhere, something critical, concerning the medical services that the company offers. I have also added a redundant router to the server LAN, with load balancing features, especially for the case of a future growth on the traffic volume. Hence, the overall throughput and processing power will be doubled. Eventually, in every case scenario, the annual downtime has been reduced below the contracted limit, with the maximum MTTR at THES node for the Image View service.

### III. Exercise b

The exercise B is the optimization problem of capacity assignment (CA), and is defined as follows: It is given the network topology, the external traffic flows requirements {λi}, the channel cost rate d and the total cost D of the network. It is asked to minimize the average packet delay T on the network with respect to channel capacities {Ci}, under the total cost constrain D.

### A. Formulas and definitions

Before proceeding to the problem, I cite the necessary equations and definitions [8],[9]. I assume that the network under consideration, comprises of i=1,2,3,..,M communication channels and each have a capacity Ci bits per seconds (bps). The traffic enters the network from external sources and follows a Poisson process.

The average rate of packets, which flow on the i-th channel, is equal to the sum of the average packet flow rates which traverses this channel:

Where j-k is the followed path, which includes the i-th channel. It is assumed that all packets have lengths following the exponential distribution, with mean L=1/μ bits. It is also assumed that traffic follows static routes, so there are no dynamic routing protocol active on the network. The total traffic rate λ is given by:

The cost of constructing the i-th channel € is given as a linear function of its capacity , which is assumed that is available in continuous values and not concrete:

The i-th channel is represented as an M/M/1 system with Poisson arrivals of rate λi and exponential service times of rate. In general Kleinrock's assumption stands. The expected time for a packet to spend on the i-th channel is given by:

(5)

Where is the arrival rate in bits per second (bps). The mean packet delay into the network is given by the type:

To solve the capacity assignment problem, it is assumed that the network topology and the traffic flow {λi} are known and fixed. The traffic that flows on the i-th channel should be less than channel's capacity: .

To minimize E[T] with respect to the capacity assignment {Ci}, a Lagrangian problem is formed and the solution (for i=1,2,3,..M) is given by the formula:

The variable Dres is the residual cost that remains after the estimation of the total cost from traffic rates :

The cost Dtot of the entire network is given by the sum of the costs Di of channel construction:

### B. Problem Solution

According to the given topology ( 1) the node 1 is the single gateway of the network and the traffic rates ri of the partial channels, are calculated by using formula 1 with r variables (bps instead of pps). Therefore, the traffic r1 that passes through channel 1 is the traffic r1-3 from node 3 plus the traffic r1-8 from node 8. The rest of the channels' traffic is calculated similarly, and the results are shown below, in 2.

### Traffic per channel

r1

=

r1-3 + r1-8

=

4.698+5.01

=

9.708

r2

=

r1-8

=

5.01

=

5.01

r3

=

r1-5 + r1-9 + r1-4 + r1-7 + r1-6 + r1-2 + r1-0

=

3.982+1.696+7.286+1.9+

5.576+7.4877+3.638

=

31.5657

r4

=

r1-9

=

1.696

=

1.696

r5

=

r1-4 + r1-7 + r1-6 + r1-2 + r1-0

=

7.286+1.9+5.576+7.4877+3.638

=

25.8877

r6

=

r1-6 + r1-2 + r1-0

=

5.576+7.4877+3.638

=

16.7017

r7

=

r1-2 + r1-0

=

7.4877+3.638

=

11.1257

r8

=

r1-0

=

3.638

=

3.638

r9

=

r1-7

=

1.9

=

1.9

The rates are in Mbps, so I multiply by 1e6 to conver in bps. The rate for the first channel is. The rest of the rates are calculated correspondingly.

It is also given the mean packet length. The packet rate accrues from formula 5, if I resolve by. The rest of the packet rates are calculated correspondingly. The total traffic rate λ = 8936.066 pps is given by formula 2.

I multiply the channel length Vi (km) with the cost rate to get the cost rate. For the first channel d1=0.001·3.46=0.00346 €/bps.

The optimum channel capacities are calculated from formula 11. The partial quantities should be worked out first. I calculate initially the total estimated cost due to input traffic. The residual cost Dres is taken from formula 8 by subtracting the total setup cost A:

The second quantity is the sum of the square roots of, which is:

Finally the optimum capacity for channel 1 is calculated by substituting the above quantities in formula 11:

The calculated capacities are shown in the eight column of the following table. Something to mention is that the total capacity C is found to be 111,004,799.3 bps or 111 Mbps.

The ninth column has the channel costs according to the linear capacities. Then, the total cost of channel construction Dtot is 645,000 €, so there is a 20,512 € margin to allocate extra capacity.

The expected time that a packet spends on a channel 1 is given by the formula 4:

The rest of the expected times are shown in column 10. The column 11 contains the number of packets per channel, during time E[Ti]. The sum of the partial numbers gives the number of packets in the network during the expected time E[T], which is 215.116 packets. Finally, the minimum average delay of a packet in the network is given by formula 6:

### IV. Conclusion

In the first part of the present coursework, I have developed an availability study based on the expanded extranet, which I have designed in the final part of coursework A. The proposed network upgrade is taking account of the nature of ATH diagnostics with main parameter the estimated service availability and secondary the upgrade cost. The final part includes the capacity assignment optimization problem. The accompaniment CD contains this document, the Excel spreadsheets, which I have used to calculate availability (av.xls) and minimum average delay (ca.xls), and the Visio drawings.

### V. References

[1] M. Xie, K.L. Poh and Y.S. Dai, Computing Systems Reliability: Models and Analysis, Springer, 2004.

[2] C. Oggerino, High availability Network Fundamentals, Cisco Press, 2001.

[3] E. Wustenhoff, Service Level Agreement in the Data Center, Sun BluePrints OnLine, 2002.

[4] D. Kakadia, S. Halabi and B. Cormier, Enterprise Network Design Patterns: High Availability, Sun BluePrints OnLine, 2003.

[5] Hellenic Telecommunications Organization (OTE S.A.), “ OTE Buisness,” Dec. 2009; http://www.otebusiness.gr/

[6] Cisco Systems, Inc. “ Small Buisness - Cisco Systems,” Dec. 2009; http://www.cisco.com/cisco/web/solutions/small_business/index.html

[7] HP Corporate, “ HP Servers,” Dec 2009; http://welcome.hp.com/country/us/en/prodserv/servers.html

[8] C. Papagianni, Network Design, MSc Lecture Notes, TEI of Piraeus, Piraeus, 2009

[9] F. Gebali, Computer Communication Networks: Analysis and Design, 3rd ed., Northstar Digital Design Inc., Victoria, BC, Canada, 2005.