Abstract. As we approach towards physical limits of processor and memory speed it is difficult for a single computer system to handle ever increasing demand of processing and memory requirements of modern applications of these days. To overcome this problem distributed systems are useful where multiple computers interact through shared memory to achieve a common goal through concurrent programming. Concurrent programming results the most efficient use of resources but it's difficult to handle and results many concurrency and synchronization problems.
Lot of algorithms has been developed to solve above problems and are being continuously developed based on importance of the problem in computer science. We present a comparative analysis of different concurrency control algorithms based on parameters like mutual exclusion, deadlock freedom, starvation freedom, first in first out fairness and local spinning. This helps the reader to understand nature of these algorithms and utilize any of the algorithms that best suits to his/her requirements.
Keywords: Distributed Shared Memory Systems, Concurrency, Deadlock and Synchronization.
To run applications faster and in efficient way application developers took advantages of improved processor speed and memory size. But as we approach physical limits of processor and memory speed it is becoming more and more difficult to pack a large number of transistors on one single processor chip and increasing processing power at the same time. Now in order to fulfill the requirements of high speed and efficiency processor manufactures are focusing on multiprocessor computers also called multi-cores. In multiprocessor environment multiple processors executes instructions concurrently and collaborate with each other to achieve a common goal.
Another way to achieve concurrency, run applications faster and efficiently is to use distributed systems where multiple computers instead of multiple processors communicate through a common network and they interact within their network to achieve common goal for example in Network of workstations, Automatic banking (teller machine) system, Automotive systems. These systems deliver high performance and computation power with the use of shared memory. Shared memory is globally physical memory that is accessible to all processors. Every processor has its local memory for computation and some portion of it dedicated for shared memory. If remote data is required, computational task communicate with remote processors as they are fully aware where the data resides. However communication between processors require extensive message passing through different models and algorithms which most of the programmers find difficult to control.
Figure 1: Distributed Shared Memory System
A DSM system generally comprises on set of nodes or clusters as mentioned in Figure 1. Cluster itself can be a single processor or multiprocessor interconnected by a shared bus. Each cluster has local memory and a portion of it is mapped to global address space. Information about data or files is particularly organized in a table or directory. Directory organization is very important design decision either through single or double link list or any other appropriate data structure depending upon the hardware constraints. The system directory is distributed to the whole system.
This fundamental change from single processor to multi-processors or multi-computers requires a fundamental change in application development. Writing applications that better utilize multi-processor and multi-computer architectures is very difficult and challenging as compared to single processor applications. This requires understanding of several basic concepts.
Synchronization is very important in our daily routine. A student and a teacher must reach in class at the same time for the lecture and you must reach at a bus stand on time to pick up a bus. This type of synchronization is called cooperation where both participants must be available at the same time to complete a task. There is another type of synchronization called coordination where only one person not all should do a task at a time e.g. you and your spouse may have to synchronize on who will buy the food, which one of you will take the car, or use the single computer you have at your home.
Similarly when it comes to multiple processes using some resources, we must ensure synchronization between them so that two or more processes cooperate to achieve a common goal and no two processes use a shared resource simultaneously. Without proper synchronization between processes it is not possible to ensure data consistency and integrity.
These are the operations whose functionality in not interfered by other concurrent activities. In other words these are indivisible and execute as a whole. Most of the algorithms, which solve problems related to concurrency, assume some architectural support for atomic operations. Almost all the architectures supports atomic registers, which serves the purpose of shared objects and allow to perform some atomic operations on data. These are also called safe registers. In these registers, a read operation that is concurrent to the write operation must obtain correct value of the data. Following are some atomic operations that are supported by lot of architectures.
Fetch-and-increment: increments the value of a register r by 1 and returns old value of r.
Compare-and-swap: takes a register r, and two values: new and old. Returns true if the current value of r and old value are equal and updates r to new value as well and if value of r and old value are different, false is returned and r is left unchanged.
Read-modify-write: many times there is a need to read a value of a register r and based on the value of r its new value is calculated and updated to same register r. This operation serves this purpose even if multiple processes are trying to update same register r. it will also ensure that other processes are able to update value of register r in order they request but only one process at time.
Modern operating systems and programming languages also implement this synchronization mechanism by semaphores and monitors to ensure locks and exlusive access to resources.
Since resources are limited and shared among multiple processes, in such a situation it is critical to ensure that only one process uses a non shared resource at a time. Such problems arise in operating systems, database systems, multiprocessor computers, and computer networks.
In other words, we need to define a mechanism that ensures mutual exclusive access to critical section (a piece of code where a process uses a resource) among a number of competing processes. To achieve mutual exclusion we must need to satisfy following first two properties.
Mutual exclusion: There is only one process is in its critical section a time.
Deadlock-freedom: If a process wants to enter in its critical section, then some process, not necessarily the same one, eventually enters its critical section. This ensures that there is no deadlock in the system and it works fine.
Even after satisfying deadlock-freedom property, still there are chances of starvation of one of the processes. Starvation states that a process requesting to enter in its critical section may not be able to do so and continue waiting forever for other processes.
Starvation-freedom: To resolve starvation issue we need to satisfy starvation freedom property. It says if a process is wants to enter in its critical section, then it must be able to enter in critical section at some point of time.
Although starvation-freedom property allows processes to execute their critical sections but many times a process can enter its critical section before some other process can execute its critical section. Such a behavior is prevented by the following fairness requirement.
First-in-ï¬rst-out (FIFO): No beginning process can enter its critical section before a process that is already waiting for its turn to enter its critical section.
Mutual exclusion and deadlock freedom properties are required in order to solve the original problem of mutual exclusion. In solving the problem, it is assumed that a process entering in its critical section must complete it and exit its critical section successfully irrespective of other processes waiting to enter in critical section.
Distributed shared memory systems are essential to high level of efficiency and better resource utilization in the modern enterprise applications of these days. But DSM systems present lot of problems related to data consistency and synchronization and because of importance of DSM systems, lot of people have worked to solve these problems. Since Dijksta identified and solved mutual exclusion problem in his paper [15 Bakery Algo], lot of other people have studied and published some very good solutions of these problems. As the lots of solutions are available to these problems some people have also tried to compare some of these solutions based on different parameters.
One of the most recent studies in this domain is done by Gadi [2008T-SMsync]. In this paper Gadi has discussed importance of synchronization problems and explained mutual exclusion problem in details with lot of examples. He has mentioned some of the algorithms that solve these problems but he does not go in details of these algorithms, instead he focuses on different parameters like deadlock freedom, starvation freedom and local spinning. Then he moves to explaining conventional problems how these are solved by different algorithms.
Another survey of different solutions for problems related to DSM systems have been performed by Bill [sdmsurvey] in 1991that is quite old. This paper discusses different design choices for a DSM system with respective to totally different parameters like scalability, heterogeneity coherence protocols and data access. It also gives examples of different systems like Ivy, Linda, Mirage and Dash. Most of these are hardware implementations of DSM system.
Many other people have done work in this domain like  and [A Comprehensive Software Distributed Shared Memory System] but most of them focus on synchronization problems rather than comparison of different algorithms. Also some people [10.1.1.49] explain this in terms of multiple groups of processes and synchronization between the groups and within the group and also in terms of single writer and multiple readers and consumer and produces problems.
Review of Algorithms
There are lot of algorithms have been implemented that solve problems related to concurrent access to shared memory. Some the algorithms are also very efficient and still new algorithms are being implemented because of the importance of the problem and to take advantages of latest development in hardware and software.
We will ï¬rst review Bakery algorithm. It is one of the most known and elegant mutual exclusion algorithms that use safe registers only. It uses a boolean array choosing[1..n], and an integer array number[1..n] of unbounded size registers. Where n is the number of processes competing for mutual exclusive resource. The entries choosingi and numberi can be read by all the processes but can be updated only by processi. The relation"<" used in the algorithm on ordered pairs of integers is the lexicographic order relation and is deï¬ned by [a, b] < [c, d]. If a < c, or if a = c and b < d. The statement await condition is used as an abbreviation for while Â¬condition do skip. The algorithm is given below.
As Bakery has pointed out, the correctness of the Bakery algorithm depends on how the maximum is computed. We assume a simple correct implementation in which a process ï¬rst reads into local memory all the n number registers, one at a time, and then computes the maximum over these n values.
Figure 2: Simple Bakery Algorithm
Black-White Bakery Algorithm
The problem with Bakery algorithm was that it uses unbounded size registers, which means that whenever a process wants to enter in the critical section it is given a number that is grater then any number used by any other waiting process. It is just like giving a sequence number to the customers waiting for a service. In Bakery algorithm this number keep on growing without any bound.
In the black-white Bakery algorithm, this problem of unbounded size registers is solved by using an extra bit that is called color bit and is shared by all the processes. The value of color bit can be black=0 or white=0. By using this bit Bakery algorithm is bounded to n. where n is number of processes sharing some critical section.
When a process pi wants to enter in critical section, it sets its color bit to the value of shared color bit and gets a number that is greater than number of all the processes with the same color bit as color bit of process pi. Process enters in the critical section when its color bit matches shared color bit and its number is lowest then the number of other process with the same color bit. The order between the processes with different color bit is defined as follows, if process pi has different color bit then shared color bit, it is given priority and the priority within the processes of same color bit is decided based on number given to the process when it requested to enter critical section. The process with smallest number is allowed to enter in the critical section.
Now let discuss how and when shared color bit is updated so the processes with different color bit also got equal change to enter their critical section to satisfy the starvation freedom property. Whenever a process pi leaves critical section, it sets the color of the shared bit to the opposite color of its own color bit. It gives the priority to the process of the opposite color bit to enter their critical section.
Figure 3: Black-White Bakery Algorithm
This algorithm uses three data structures: (1) a single shared bit with name color, (2) a boolean array choosing [1...n] and (3) an array of size n where each array element is a colored bit. In the first line, process pi requests to enter in the critical section by setting its choosingi to true. Then it sets its color bit to the color of the shared bit in the second line. In the third line it gets the number that is greater than the number of all other processes of the same color bit. To find the maximum number process pi reads the respective number from local memory of all other processes one at a time using atomic operations to avoid inconsistency. Then in line 5 to 10 process waits in the loop to get its chance to enter in critical section. The loop will end once its number is lowest from the numbers of all other processes. Process pi gets its chance to enter in the critical section when loop ends. After performing the required operations in the critical section in line 11 process pi sets necessary values for exit section. In the exit code at line 12, process pi changes the value of shared color bit to the value different then its own color bit value i.e. if pi has a white color bit, it sets shared color bit to black and in case of black color bit it sets shared color bit to white. At the end process pi sets its number to zero to indicate it has completed its critical section and it does not need to enter in critical section anymore.
Mutual Exclusion problem defines the access of critical section, and this access is to be mutually exclusive. Mutual exclusion in single process can be achieved at hardware level but in multiprocessor it is very simple because multiprocessors implement the atomic test-and-set instruction executions. Because in single processors there is no test-and-set operations so researchers concentrated on finding shared-memory mutual exclusion algorithms in sequential processors, Since the designers thought that critical section contention is very rare and considered a "starved" also acceptable.
A process that access the shared memory in modern high speed processors can takes more time. So the criterion of evaluating an algorithm is based on the number of reads and writes.
Simplest algorithm purposed by Michel Fischer is given. await b is a abbreviation for while âŒb do skip:
Repeat await <x ==0>;
Until <x == i>;
This algorithm in the absence of contention requires five memory access times, Delay ensures to handle worst case time when there can occur deadlock state. And this worst case time will be O(N) and best case time will be the time require to access the memory. This delay can go in worst condition where systems use priority base access, in such cases there may not be upper bounds of time.
Now we tried to construct a better algorithm, we consider the minimal sequence of memory access and infect goal is to have such algorithm which having some definite or constant number of memory access say it could be O(N). For such algorithm we need to have some assumption that there will be no delay.
Lets consider Si is the sequence for process i. This sequence si executed when every read returns its first or initial value or a value written by an earlier operation in Si. Si cannot be accessed by the Sj. Therefore all the Si should access the same set of memory words. As the number of memory words are fixed and didn't depend on the N so on increasing the N its sure that for many processes i for which Si gives the identical sequence of writes and reads..
Its assumed that first operation will be not be a read instead of write, as every process first read the value of X before any processing and then writes or modifies. And there couldn't be any write operation to any other variable Y, as two write operations can't take place. And therefore the second operation in Si sequence should be read. And it shouldn't be the read operation on the X.
Start: (x := i);
if ( y # 0) then goto start FI;
( y := i);
if (x # i) then delay;
if ( y # i) then goto start 8 R;
Now each process have a define sequence, in which first operation would be the write to X and then read to some other variable Y. So there is no reason to perfume a read operation on any variable before write operation. And last operation in this sequence could not be the write one. As write can't help in making decision either a process should go into its critical section or not. So the best algorithm is in which the sequence Si consist:
wirte x, read y, write y, read x
This sequence is abbreviated as w-x, r-y, w-y and r-x. This sequence ensure that after executing the critical section process needs at least one write operation to indicate that the critical section is emptied or freed, So the next processes identify that there is no contention. And this write operation couldn't be on the X since the X is the first variable on which write operation is performed and this write should be on y. The minimum sequence would be like w-x, r-y, w-y, r-x, critical section and w-y. This algorithm guarantees mutual exclusion and dead lock free but in this a process can be starved.
In the algorithm above requires the upper bound as well as time needed to execute the critical section. So some time it required to have no upper bound in needed. More modified form of the previous algo is given
start: (b[i] := true);
(x := i);
if (y # 0) then (b[i] := false);
await (y = 0);
got0 start A;
(y := i);
if (x # i) then (b[i] := false);
for j := 1 to IV do await (-b[j]) od;
if (y#i) then await (y=O);
got0 start A Fi Fi;
(y := 0);
(b(i] := false)
In this algorithm minimal protocol to enter the critical section had to be of the form w-x, r-y, w-y, r-x. Let's assume that there are 3 processes executing this protocol
w2-x, w1-x, r1-y, r2-y, w1-y, w2-y, r1-x, w-x, r2-x
Local Spinning Algorithm
The aim of local spinning algorithm is to solve group mutual exclusion problem recently posed by Joung , which is generalization of both mutual exclusion problem and reader-writer problem. In mutual exclusion problem N processes attend sessions labeled from 1â€¦M. The processes which request to attend the same session can do so concurrently but if processes requests different session they can't do so at the same time. It is mandatory that if a process requests to attend a session is able to do so. Further if a process p wants to attend a session t and no other process is attending different session, then p can attend t without waiting to leave the other processes.
We assume that N processes cycles through entry section, critical section, non-critical section and exit section. Every time process p enters from its non-critical section to entry section, p non-deterministically chooses a "session" p:t from 1â€¦M sessions. The group mutual exclusion problem steps to design entry and exit sections in order to ensure following key properties in all executions.
Mutual Exclusion: If distinct processes p and q are executing within critical section then p.t = q.t.
Concurrent Entering: If p requests a session p: t and no process q, such that p.t is not equal to q.t, is executing outside its non-critical section, then p eventually reaches its critical section.
Bounded Waiting: If process p leaves its critical section it will eventually return into its non-critical section and any process leaves its non-critical section it will return to its critical section.
In local spin group mutual exclusion algorithm, a process p generates bounded number of remote memory references in joining and leaving session once.
Figure 4: A Simple Local Spin Group Mutual Exclusion Algorithm for process p
Our algorithm is designed to spin locally by reducing contention for processor-to-memory interconnects. The key advantages include admitting a process to its session in constant time in the absence of contention, spinning locally in Cache Coherent and Non-uniform Memory Access systems, and improvements in the complexity measures.
Conclusion goes here.