Multi Threaded Architecture Of Servers Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

It is important to have an idea on how the requests handling is done in Axis2, which are used as the application servers in our multi-tenant environment. Apache Axis2 is the most commonly used web services engine, which is also used as a standalone server. In our environment, each physical server runs a single instance of Apache Axis2. Axis2 severs are capable of processing requests in parallel as they are multi-threaded in design. They have a large pool of threads from which each arriving request takes a separate thread for processing. The thread is released when the processing is done. All requests are processed concurrently and the number of concurrent threads can be increased up to the 'thread pool size'. If all the threads in the pool are busy when a request arrives, that request is rejected. To avoid this, auto-scaling load balancers are used. In auto-scaling environments, the number of servers increases as the load increases. This feature helps to maintain the 'concurrency level' (average number of concurrent threads) of servers below the maximum level. The WSO2 Load Balancer does auto-scaling using the 'number of messages on flight' as the measurement of load and using a pre-defined threshold value of that [35]. 'Number of messages on flight' refers to the number of messages, which are distributed from the load balancer to servers but the corresponding response is not received yet. This measurement reflects the concurrency level of the servers and the concurrency level is good information of the current load of servers.

4.2 Concurrency vs. Performance

The number of concurrent threads affects the performance of the system. Although it is not the only thing on which the performance of a system depends, it does a significant effect. The number of concurrent threads of a server directly affects the service time of the user-requests which are being processed. The average service time of requests increases when the concurrency level goes high and vice versa. That happens because all concurrent threads share a single or a limited number of CPUs and limited memory. Most often, the processor sharing happens in the time-shared manner and there are different architectures which share the available memory among multiple threads. Moreover, the concurrent threads may associate with 'locks' which block the execution of the thread while it is taken by another thread. Therefore, higher number of concurrent threads brings down the performance which leads to a increase of service time. We did a test and plotted the data gathered, to visualize how the number of concurrent threads affects the average service time of requests (figure 7). We did this test over a single physical machine which runs a single instance of Axis2. This helps to analyze our environment more precisely.

We used an Intel Pentium 4 machine with Windows XP as the OS, for this test. We used an Axis2 web service, where it would take around 350 milliseconds to process a request by a server without any concurrent overhead. We used a web client that can send requests in configurable rates. We executed the client program several times, for five minutes each time, increasing the rate it sends requests. We used Apace synapse mediation framework to monitor the requests and measure the parameters needed. We measured the service time of each requests and found the average service time within a ten-minute interval. We also calculated the average number of concurrent threads within the time interval. For that, we measured the 'no. of messages on flight' in each two-second interval and got the average after five minutes. We repeated the procedure for several time with different arrival rates of client requests in order to increase the number of concurrent threads running. We plotted the data in a scatter diagram to visualize how the average service time is affected by concurrency. We also plotted the standard deviations of each value in order visualize the variation of measured service times. We could see that the average service time increases as the number of concurrent threads goes high (figure 7). At the beginning of the curve, it moves flat. A small number of parallel threads have not affected the service times but higher levels of concurrency have affected the time dramatically. As we can see, the standard deviations increase as the service time increase, which is expectable. A single core machine would drop the performance when handling 20 - 40 threads concurrently. Anyhow, the concurrency's effect on service time makes concurrency level a good attribute which can control the performance level experienced by the user.

4.2 Elastic Server-Groups

We identify the concurrency level as a good parameter to differentiate the service for different classes of users. As an example, a request of a user in a higher service class should be processed in a server, which has a lower concurrency level and vice-versa. There may be any number of service classes but each two with a gap in the level of required performance. We suggest a simple grouping mechanism for the available server instances in order to support different service classes. We group all available server instances into the number of service classes where each class is associated with a particular group (figure). Each group behaves as a farm of servers and the user-requests of each class are distributed across the servers in the corresponding group. We use 'Round Robin' algorithm to distribute the requests within the group because it is simple and capable of treating all servers in a similar manner. The number of servers in each group (group sizes) is the key for providing service differentiation. The group sizes are decided in order to maintain different concurrency levels on the servers in different groups. Usually if the number of servers is less, the concurrency is high and vice-versa. However, according to the Queuing Theory there are several parameters which affect the concurrency level of a server. We use the relationships of those parameters to maintain the gaps among the concurrency levels in different server groups. We present our approach in detail in section 4.3.

However, we cannot assign servers statically for these groups because the cloud environments are highly dynamic thus the demand may vary dynamically. Therefore, we make the groups elastic. We dynamically re-assign servers for the groups at run time to maintain the difference of concurrency levels among the groups. Multi-tenancy makes this a simple job since any server can process requests of any tenant. We do not need to perform costly VM migrations with multi-tenancy. It gives a better elasticity for the server groups that we propose and helps to adjust quickly for the changing demands at run time. The proposed solution manages only the available servers to treat the users in different classes. We do not start new instances or shut down any worker nodes. We focus only on providing different levels of quality of service for different service classes using only the available servers in the most appropriate way.

4.3 Queuing Theory

Queuing Theory is the mathematical study of queues. It provides a mathematical analysis of several related processes which cause queuing, including arriving to a system, time spent in the system and leaving the system. The theory allows calculating important performance measures such as average response time, expected number of items receiving the service and the probabilities of the system having certain states.

4.3.1 Little's Law

"Little's Law" or "Little's Theorem" [36] gives a fundamental relationship used in queuing theory. Little's Law deals with queuing systems. A queuing system consists of discrete objects usually called 'items', which arrive at some rate to the system. The items spend some time in the system. This may be waiting in a queue or receiving service or both. Then the 'items' leave the system. Little's Law says that the average number of items in a queuing system, denoted L, equals the average arrival rate of items to the system, λ, multiplied by the average waiting time (time spent in the system) of an item, W [37]. Thus,

L = λW .

Although this result looks instinctively reasonable, it is a remarkable result as the relationship stands with any kind of arrival process distribution, service distribution and service order, and particularly not influenced by any other attribute as well [38]. The Poisson process is said to be the best approximation for the request arrival process for internet based computing systems [39]. Therefore, most research studies assume the Poisson distribution in request arrival pattern. But Little's Low is proved to be true regardless of the arrival pattern. The only assumption needed is, for the system to be in 'steady state' whereby the average arrival rate is less than or equal to the service rate of the system.

4.3.2 Queuing Models

Queuing Theory is preferably used in IT industry to analyze network related processes. Queuing theory fits better to most of the cases in computer networking and has been able to perform a significant job in setting promising services over internet and other networks. To analyze computer related queuing systems more specifically, several standard queuing models are introduced. They include 'Single Server-Single Queue' model, 'Multiple Server-Single Queue' model and 'Infinite Server' (IS) model. They describe different situations where the queuing theory can be applied and provide pre-defined relationships between the related processes.

The 'Single Server-Single Queue' model describes a system where a single queue is associated with a single server (figure 0). Only one item can be receiving service at the server and the others must wait in the queue. The items are treated in the FCFS (First-Come-First-Served) order. In this model, both the queue and the server together can be identified as the system. If so, when we apply Little's Law, the number of items in the system would be the addition of number of requests in the queue and the request in the server. And it is allowed to consider the queue or the server individually as a queuing system as well.


The 'Multiple Server-Single Queue' model describes a system where a single queue is associated with more than one, but limited number of servers as in figure 0. Items up to the number of servers can be receiving service at a time and the rest must wait in the queue until any server becomes available. Here also the whole system can be identified as the queuing system as well as the queue or a single server separately.


The IS model extends the 'Multiple Server-Single Queue' model to have infinite number of servers as the figure 0 shows. It means, each item arrives finds an available server and goes for it. There is no any queuing associated with this model, as there is no need to queue the items. Here, the set of servers is considered as the queuing system. Even though we do not encounter systems with infinite servers in real life, the IS model provides very useful approximations for many queuing systems.


The Little's Law can be applied to any of these models to build important relationships. In our research, we are more interested on the IS model, as it is useful in modeling our environment as a queuing system. Little's Law is directly applicable and can be expressed meaningfully in the setting of the IS model.

4.4 A Real world example of Little's Law

For better understanding of both Little's Law and the setting of the IS model, we shall take a common real world example; a super market. Here, the 'items' are customers and the 'system' is the super market. Customers arrive at the super market in different rates in different times of the day. Therefore, we shall pick an observation period such as 6 pm to 9 pm. The customers arrive at some rate and spend some time to buy their day-to-day needs and leave the super market as figure 0 shows.


We can use Little's Law to find out the average time a customer spends in the super market during the observed period. To do that, first we need to measure the total number of users arrive from 6 pm to 9 pm and divide it by 240 (no of minutes from 6 to 9 pm), to get the average arrival rate per minute. Second, we need to periodically measure the no of customers in the super market during the period and get the average value. We can get the readings of number of customers inside after every 10 minutes throughout the period, and then divide the summation by the number of readings. According to the Little's Law, by dividing the average customers in the super market by the average arrival rate we get the average time a customer spent in the super market. This example demonstrate the IS model. Usually a customer does not need to wait outside of the super market until he/she gets the change to enter. A super market is usually able to accommodate large number of customers in it. Therefore, it simulates the model where infinite servers are there to process each incoming request. The example also describes IS model's arrival process, service process and how the Little's Law is applied to them.

4.5 Modeling Multi-tenant Environments using Little's Law

In modeling the multi-tenant environment, the basic setting we need to consider is a load balancer dividing the workload across a farm of multi-threaded servers. As each arriving request to a server is processed by a separate thread in the pool, the IS model best describes the work carried out in servers. We can map the threads in the pool to servers and the user-requests to items. We need to assume that the size of the thread pool is large enough to handle the load at any time. In most clouds, the resources are scalable so the load would not increase continuously in a single server as they scale up. With the presence of auto scaling load balancers in modern cloud environments, having all threads busy in a particular server and rejecting the request is rarely happened. Therefore, with the assumption that the number of concurrent threads never exceeds the maximum allowed, a server in our setting perfectly emulates an infinite-server queuing system.

We can also compare the behavior of multi-threaded server with the super market example. The number of customers inside the super market maps to the number of user-requests in the server. The number of user-requests in a server at any given time is equal to the number of concurrent threads running in that server since a separate thread handles each request. The requests may arrive to a server at some rate, spend some time in the server for processing and then leave the server with the response message. In this model, a group of servers would represent a group of queuing systems. With this setting, we can apply Little's Law to find out important relationships.

4.7 Proof of Little's Law for Multi-threaded Architecture

Even though we use IS model to describe the function of multi-threaded servers, IS model originally considers a set of independent servers. The concurrent threads are not independent because they affect the each other and cause reducing performance as we explained under section 4.2. Even if there is a difference, Little's Law can still be applied on multi-threaded servers as they emulate perfect queuing systems. However, to show the applicability, we present a detailed proof of Little's Law specifically for multi-threaded architecture. This proof helps to justify the way we apply Little's Law in dynamic adjustments of server-groups.

There are many proofs of Little's law presented under different settings. Most of them [40][41] analyze queuing processes over the entire time axis (0 ≤ t < ) and show that L = λW. But in real world applications it is meaningless analyzing the process over the entire time axis because the objective is, in most cases, to control one of the parameters (L, λ and W) in real time [36]. Moreover, all observations practically take place in a finite time interval. Therefore, a proof of Little's Law for a finite time interval is proposed in [36]. They show that the relationship stands for a finite time interval restricted by any value, T (0 ≤ t ≤ T). They show it in two different settings of the queuing process where (1) the system is empty at the beginning of the observation (t = 0) and the end (t = T) and (2) the system is not empty at the beginning and ending of the observation.

These proofs are helpful for our solution as we are dealing with the server-grouping mechanism discussed in section 4.2. We need to re-size the groups in regular intervals to manage the concurrency levels of the servers with respect to the dynamic work demands. In each interval, we observe the system and calculate necessary attributes that we need for the calculation of new sizes of the groups. Our application of the little's law is similar to the second setting mentioned above, that the system is not empty at the beginning and ending of the observation. We present a simple proof of Little's Law considering a multi-threaded server to understand it more clearly. To do that we follow the same method used in [36]. For the simplicity, we use the setting where the system is empty at the beginning and ending of the observation. It is proved that if the Little's Law is true for a system with the first setting, it is also true with the second setting [36].

We consider a single multi-threaded Axis2 server which is idle at the beginning of the observation, then process a set of requests concurrently as they arrive and again become idle. A plot of the number of requests in the server versus time, during the observation period might look like figure 0.

Little's Law Setting 1 modified.png

Here is the proof of Little's Law for a multi-threaded server for a finite time interval (0 < T < ).


n(t) = number of items in the system at time t

λ = average arrival rate in [0, T] (requests/time unit)

N = the number of items arriving in [0, T]

L = average number of requests in the system during [0, T]

W = average time a request spend in the system during [0, T]

A = = area under n(t) over [0, T] (time units).

Using the above notation, we can see that



This proof is sufficiently simple that it is clear what is happening if we look at figure 0. The key reason for this relationship to be true is that the area of the plot represents both, total waiting time of all requests and the summation of all 'L' values throughout the period [0, T]. It is possible because a request in the system is also spending time in it. That is the physical reason why the Little's Law is true [36].

4.6 Controlling the Level of Concurrency with Server-Groups

Since we are interested in controlling the concurrency level, we have two parameters, arrival rate and service time, on which the concurrency level depends. The service time is what we finally need to differentiate by controlling the concurrency level. Therefore, the only attribute we can alter and make effect on concurrency level is the arrival rate. The system can positively alter the arrival rate in a different manner for each service class by using the server-grouping mechanism we have discussed in section 4.2. We shall consider a particular service class, class k. Let the average class k user-requests' arrival rate at the load balancer be Ak and the number of servers in the class k-group is Nk. Since we use round robin to distribute requests over N number of servers, the average arrival rate for one server, λk, must be as follows.

λk = Ak/Nk

Therefore, by changing the number of servers in the group, Nk, we can alter the average arrival rate of a single server. Applying Little's formula for a random server of class k-group, we get the following.

Lk = λkWk ,

Lk = (AkWk)/Nk

Here the Lk is the average number of concurrent threads in any server of class k-group within the observed period. Wk is the averages of service time over the period. We can see that the sizes of the groups directly related to the average number of concurrent threads, allowing us to use the size of the groups to control the concurrency level of servers.

4.8 On Demand Server-Groups Adjustment

Since we focus only on differentiating the service, we do not engage in starting up or shutting down servers on demand. Therefore, we cannot maintain a desired level of QoS throughout the time but desired 'distances' among QoS levels among the classes. We adjust the concurrency levels in servers by changing group sizes on demand in order to maintain the difference among average service times. Even if the resources are not sufficient to provide a quality service for either class, we guarantee that the higher-class users still get a better service than the lower-class users. If we combine our solution with 'auto-scaling', it is possible to maintain the desired concurrency levels throughout the time as well as the service times, no matter how the work demand increases or decreases. We show that our solution can be easily extended to support auto-scaling resources.

We achieve the service differentiation by maintaining a desired ratio among the concurrency levels in each group. As shown in figure 7 in section 4.1, it is clear that the ratio of the thread levels would reflect in the service times of the user-requests, so closely. Since we consider same type of services for all users, a particular concurrency level indicates a known service time regardless of the groups. We monitor the system to identify the changes of arrival rates and service times separately for each service class. We update the sizes of the groups in regular intervals where the interval would be the observation period for the calculations. For the implementation, we used 10-second intervals. After each 10 seconds, we calculate the new sizes of the groups according to the data gathered during the last 10 seconds. The changes we make affect the system in next 10 seconds. Here, we assume the work demands of the client base within last time frame would remain nearly the same for the next time frame as well. During the observation period we measure the arrival rates of user-requests separately in each class and the service times of the user-requests in each class and calculate the averages of them. These are the two parameters what cause changes in concurrency in the servers according to the Little's Law. What we need after each observation period is the new set of sizes of the groups which would suite for the latest demands. In this section we present the calculations done based on queuing theory, to obtain new sizes of the groups. Our solution supports any number of service classes, so we present a general solution for 'm' number of classes. For the ease of presenting the calculations and formulas for group-size adjustments, we use the following notation.

Number of service classes

Desired service time of class k

Corresponding 'average number of threads' for (Desired concurrency level of class k)

The ratio of concurrency levels between class 1 and class j

Total number of servers available

Number of servers of group k, which corresponds to class k

Average arrival rate of class k-requests to the system

Average arrival rate for any server at group k

Actual average service time of class k requests

Using the results of 'initial test', we first define a desired service time for each service classes (). Then we find the corresponding concurrency levels () according to figure 7. What is important is the ratio among the set of concurrency levels which is given below (equation number).

To feed this information to the system, we get the ratios between class-1 and each other class (). Then we have m-1 ratios. Maintaining these m-1 ratios would lead us to the objective of maintaining the ratio among the concurrency levels of all classes. and are average values within the observation period. We assume that the average number of requests in a server is equal to the average number of concurrent threads working in that server. Applying Little's Law to a random server, suppose in group k, we get




Applying formula to a group one server, we get


Applying formula to a group j server, we get


Dividing by,


Physically we know that the summation of all group sizes must equal to the total number of servers. Therefore,

Substituting from to,

Substituting from to,

The formula gives us an equation to find the preferred number of servers for group 1 (). All we need to know is total number of available servers, desired ratios of concurrency levels, average arrival rates and average service times of the requests in each service class. After finding we can find the sizes of other groups using the formula , substituting the value of In this manner we can find the new sizes of the server groups corresponding to all classes. If the new sizes are different from the earlier sizes, we need to re-assign servers to the relevant groups. We have presented the algorithm we use to adjust these group-sizes in the implementation chapter.