# Markov Decision Process Based Optimal Gateway Selection Algorithm Biology Essay

**Published:** **Last Edited:**

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Abstract: In this paper we study the problem of the optimal gateway selection in heterogeneous networks. A numerical study of the optimal policies for capacity of the gateways is presented. The problem facing the decision maker is to determine a suitable plan of action to follow over the duration of the operation so that the total gain is optimized. This paper is shown to some Markov decision processes as well as effective solution techniques which can be programmed to handle gateway selection problem.

Keywords: PSTN, SIP, PSTN-IP Telephony Gateway, Markov Decision Process, Cost-reward function

I. Introduction

A Markov decision process is a controlled stochastic process satisfying the Markov property with costs assigned to state transitions. The Markov decision processes (MDPs) can be used to gateway model and solve dynamic decision-making problems that are multi-period and occur in stochastic circumstances. In a Markov decision process, a system that can move from one distinguished state to another state. At each step, a decision maker effects the transition probabilities for the system by selecting from a well-defined set of alternatives, a particular course of action. Associated with each action is a call blocking probability. Kushal et al. [1] concludes that more calls are accepted when they are directed proportionally to the number of free circuits of a gateway, rather than all directed to the less utilized one in the help of some gateway selection algorithms which include minimum utilization based selection (MUBS) and probabilistic selection based on utilization (PSBU). These algorithms based on heuristic approach. He also measured the adaptive update rate with its effects and selected the gateway on the basis of minimum utilization of the gateway. It determined the probabilistic selection over low update rate using two consecutively least utilization. The improvement in telephony gateway selection algorithms by doing transient analysis and its impact on call performance especially in terms of call

Corresponding author. Tel. No. 91 3222 283571

Email address: chinmay@gssst.iitkgp.ernet.in (C. Chakraborty)

blocking probability at location server (LS) (In-between updates from GWs) are also done. In this study, continuous time Markov chain (CTMC) of telephony gateway registration protocol (TGREP) [2] is also highlighted. This work is based on periodic update information about state of own gateways learned through TGREP. Further they do not consider the end-to-end (delay) path for processing the calls between callee and caller. But we are choosing a particular objective for gateway selection under which optimal action the transition cost is optimized. The logical gateway model is designed as addendum to MDP. Although both of the techniques are based on stochastic approach, MDP uses a better optimal control policy as compared to CTMC. The MDPs have been widely studied and applied to a large number of optimization problems. Many solutions have been developed but all basically rely on Bellman's principle of optimality [3]. This consideration leads to the heart of dynamic programming. Evan [4] stated the optimality criterion if a strategy is optimal for the optimally truncated problem for every stage, then that strategy is optimal for every stage. The cost function is a mapping between state space and real functions. State space is a mathematical model of a physical system as a set of input, output and state variables. The Markov processes in which it takes decisions are made at each stage. The MDP is not generally used for complicated arrival process but mostly handles the Poisson arrivals. A Markov decision process is a 4-tuple (S , A, R (S , A ),T ), where S is a finite set of possible states, A is a finite set of possible actions, R(S,A) is the instantaneous cost-reward function of taking action a in state s and T(S1,A,S) is the probabilistic state transition respectively. In this way the need for MDP in gateway modeling is realized.

## II. Modeling and Analysis/ Overview of Operation

Telephony gateways are considered as a finite capacity. First it is required to establish the telephony gateway registration protocol (TGREP) session, then it can send the initial update to the telephony gateway. The PSTN-IP telephony gateway (PITG) sends routing message to proxy server the detailed information about the total number of circuits or path to the session. Each ongoing call uses one of these paths whereas each call consists of n number of packets. The LS makes its routing decision based on available path information as reported by the gateways.

When IP node is operated to make a call to a particular target location within the PSTN environment, it contacts the local signaling server (either SIP proxy or H.323 GK). The server contacts the LS in its Internet Telephony Administrator Domain (ITAD) to determine the gateway to use for completing the call. It is customary for gateways to send regular updates of their dynamic resource information to Location servers, which is used by LS while choosing a gateway. A call routing decision may also be needed even if both the end users are in the PSTN, but there is an IP cloud in between. This may occur if a telephony provider uses VoIP as a backbone for telephony services. In such a case, the incoming call is received by ingress gateway which needs to connect to an egress gateway to terminate the call. To achieve the above-mentioned objectives the following system model is proposed in Fig. 1.

## Routing

TGREP

SIP

## Proxy

Index

## GW

TL2

TL3

Egress

LS

Ingress

TL1

## GW

LS

TRIP

## PSTN

## PITG

## Networks

Fig. 1: PSTN-IP telephony gateway model

## III. Analytical Modeling of Logical Gateway Capacity

The logical gateway capacity model is developed based on Markov decision process. The call arrival rate and call departure processes both are modeled as Poisson processes. This model indicates the number of calls ongoing to the single logical gateway. We can write V N (a , i) for the expected transition costs over the next N stages, starting in present state (a,i) using fixed policy. Where recurrence relation, stage in process and present state are represented as V , N and (a , i). When the number of calls ongoing at the logical gateway then a increases by one with Poisson rate than the present state (a,i) and ends with rate (a+1)Âµ. Similarly the number of ongoing calls departs from present state then a decreases by one

and ends with rate aÂµ. From the present state, the logical gateway may send update information with rate Î±, resulting in i becoming equal to a. The transition state diagram of logical gateway is as shown in Fig. 2.

Fig. 2: MDP based logical gateway model

Under various conditions of the actual ongoing calls the expected discounted cost will occur that is shown in equation below. In fact, a system with single gateway having capacity T is represented as a MDP with T 2 states. In this case the scaling constant is represented as d = Î» + T Î¼ +Â± , where defined as the parameter can be Î» is the call arrival rate (Poisson), T is the total gateway capacity, Âµ is the call holding time and Î± is the update rate respectively. Let Î² ( 0,1)be a discount factor. It is typically close to 1.

Mathematically, we can characterize V N as follows:

Case 1: When number of calls ongoing through the gateway and their updates varies in between 0 to T.

if 0 â‰¤iâ‰¤T,0 â‰¤aâ‰¤T,then

N +1

Î»

N

(T âˆ’a)Â¼

N

aÎ¼

N

Î±

N

V (a ,i )=

Î»+T Î¼+Â±

Î²V (a +1,i )+

Î»+T Î¼+Â±

Î²V (a ,i )+

Î»+T Î¼+Â±

Î²V (a âˆ’1,i )+

Î»+T Î¼+Â±

Î²V (a ,a)

(1)

Case 2: When there are no ongoing calls through the gateway but their updates varies in between 0 to T.

if 0 â‰¤iâ‰¤T, a=0 ,then

N +1

Î»

N

T Î¼

N

Î±

N

V

(0, i)=

Î²V (1,i )+

Î²V

(0, i)+

Î²V

(0,0)

Î»+T Î¼+Â±

Î»+T Î¼+Â±

Î»+T Î¼+Â±

(2)

Case 3: when the number of ongoing calls equal to maximum of gateway capacity and as well as their updates varies in between 0 to T. So in this case packet will be dropped when it exceed the capacity of gateway and the cost incurred is 1.

if 0 â‰¤iâ‰¤T, a=T,

then

T Î¼

V N+1(T ,i )=

Î»

Î²V N (T ,i )+

Î²V N (T âˆ’1,i )+

Î±

Î²V N (T ,T )+1

Î»+T Î¼+Â±

Î»+T Î¼+Â±

Î»+T Î¼+Â±

(3)

We can assume

that

the expected

discounted

costs

are null

for

initialization

i.e.

V N (a ,i)=0andV N+1(a ,i)=0

where a,i {0,T}.

The

algorithm

of

logical

gateway

converges in a finite number of iterations to the policy that comes from the difference between future and present state, this is called gain. Since all element in states of the recurrent chain have identical rows in the S matrix, such states all have the same gain. All elements of the S matrix are non-negative. The difference between V N +1 (a , i ) V N (a , i) for any large N; it is equal to the increase in the long-run expected returns of the system caused

by

starting

in

state

1

rather

than

state

2.

Mathematically,

we

can

represent

thatVâ€²(a, i)=abs VN+1(a, i) VN (a, i).

The policy iteration based logical gateway selection algorithm is as follows:

## â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦...

Step 1: Initialization

- Set the initial values of the value function parameters a & i, where a , i {0,T}

Set the initial counter N = 0

Set the initial state V N (a , i )=V N +1 (a , i)= 0

Step 2: Policy Improvement

- for Î» = 0 to 160 for N = 1 to T

ComputeV â€²(a ,i )Âabs V

N +1

(a ,i ) V

N

(a ,i), (a,i) T,

N =0,1,.....,T

Calculate V â€²(a , i)// converges in a finite number of iterations if V â€²(a , i)= I // I indicates a matrix having all of it elements 1 then compute Pb (Â»)= // calculate gain for particular Î»

end end

end

Step 3: If policy converges, then stop; else go to 2,

## â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦

On the other hand the computing CTMC model appears as shown in Fig. 3.

Fig. 3: M/M/T/T model of logical gateway

The call blocking probability of the gateway is given by

ÏT

T !

Pb

## =

(4)

T

âˆ‘Ï

j

## /

j!

j=0

This is the well-known Erlang's B loss formula [5]. The call blocking probability increases with increase in call arrival rate as shown in Fig. 4 for T = 80. The MDP based logical gateway selection method is compared with CTMC based gateway selection method. It gives almost same results, so we can justify that MDP based technique is true.

The logical gateway selection parameters are considered in this case:

Table 1: Parameters for logical gateway system

## Parameter

## Our Consideration

T

80 calls/system

Î²

1

Âµ

1 sec

Î±

1/30 sec

Î»

60 packets/sec

a

0 to 80

i

0 to 80

The load of the gateway, expressed in Erlangs, is defined as Ï=(Â»i +Â¼Î»0)=Â»Î¼.

The call

blocking model using Erlang formula is verified with MDP based logical gateway model that is shown in table 2:

Table 2: Comparison of block calls for different method

## Parameter

## Our Consideration

T

80 calls/system

Î²

1

Âµ

1 sec

Î±

1/30 sec

Î»

60 packets/sec

a

0 to 80

i

0 to 80

Fig. 4: Blocking probability performance against Load for Logical Gateway

## IV. Gateway Selection in Two-Gateway Model

The actual scenario is considered where the LS needs to choose particular gateway over multiple available gateways for optimal packet routing. The TGREP protocol gives the routing (updates) information from all gateways to the LS. This system is modeled as a MDP. For this gateway model, we consider a system with two gateways GW1 and GW2, of capacities T1 and T2 respectively. The state space of the model is represented as S, which is described by four variables i.e. S ={a1 , i1 , a2 ,i2} in MDP. Obviously, 0 â‰¤ a1 â‰¤T1 , 0 â‰¤ i1 â‰¤T1 , 0 â‰¤ a2 â‰¤T2 and 0 â‰¤ i2 â‰¤T2 .

The general transition state diagram of the model is as shown in Fig. 5. When routing algorithm chooses to route an ongoing calls to GW1, a1 increases by one and it chooses to route a call to GW2, a2 increases by one. The call arrival rates depend on the routing algorithm. The calls end with rate a1Âµ and a2Âµ. The last updates as per ongoing calls are not equal to the number of ongoing calls. From these states, gateways may send updates (assume Poisson) with rate Î±1 and Î±2 respectively, resulting in i1 (i2) becoming equal to a1 (a2). If a call routed to the gateway when it has no circuits are available, i.e.

a1 =T1 or a2 =T2 , then the call is blocked. It is easy to see that the state space of the MDP or CTMC can become quite large as the total capacity of gateways becomes large.

Fig. 5: MDP model of PITG system with two gateways

This system has two gateways with capacities T1 and T2 are represented as a MDP with ( T1 + 1) 2 * (T2 +1)2 states. This model can be generalized to more than two gateways.

For n gateways the state will be the n-ordered of pairs (ak ,ik ), where ak is the number of calls ongoing at gateway Gk and ik is the value received in the last update from Gk .

Now the gateway selection performance is analyzed in terms of capacity using routing algorithms. We have analytically modeled the PITG with different gateway selection algorithms.

The objective of the gateway selection model is different, one may choose a min-max approach where typically model tries to obtain the best performance under worst possible strategies because âˆ‚1 (n)is monotonically non increasing and âˆ‚2 (n)is monotonically non decreasing function. After certain iteration later both functions will converges at one particular point i.e. called gain for particular value of call arrival rate ( Î» ).

Mathematically, we can represent that

N +1

N

âˆ‚1(n)=max(a,i)V

(a ,i ) V (a ,i)

(5)

N +1

N

âˆ‚2(n)=min(a,i)V

(a ,i ) V (a ,i)

(6)

Bellman [2] equations relate the value function to itself via the problem dynamics. For the discounted ( Î² =1) objective function is as follows

N +1

a

N

V

(i )=C (i ,a )+Â²âˆ‘p ij *V

(j ), i

(7)

j S

In this case, there is one equation per state i. Suppose the processes are applying action a in state si . The immediate or one-step cost on the initial transition would be C (i , a)and the probability of moving to state s j would be p a ij .

In this case, the two gateways are considered in between SIP and PSTN networks. The gateway model is developed using different algorithms i.e. minimum utilization based selection (MUBS) and probabilistic based on utilization (PSBU).

The policy iteration based gateway selection algorithms are as follows:

## â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦...

Step 1: Initialization

- Set the initial values of the value function parameters a1, i1, a2, i2, where a1 , i , a 2 , i2 {0,T}

Set the initial counter N = 0

Set the initial state V N (a1 , i1 , a2 , i2 )=V N +1 (a1 , i1 , a2 , i2 )= 0

Step 2: Policy Improvement

- for Î» = 0 to 160 for N = 1 to T

Compute the policy values V N +1 (i)Â C (i, a)+ pa ij *V N ( j ), i

j S

Compute âˆ‚ 1 ( n)Â max(a , i) V N +1 (a,i) V N (a,i) // element maximum Compute âˆ‚ 2 ( n)Â min(a , i) V N +1 (a,i) V N (a,i) // element minimum

ComputeV â€²(a1 ,i1 , a2 ,i2 )Â abs V N +1 (a1 ,i1 , a2 ,i2 ) V N (a1 ,i1 , a2 ,i2 ) ,

Where ( a1 , i1 , a 2 ,i2 ) T , N = 0,1,.....,T

if V â€²(a1 , i1 , a 2 ,i2 )= I // I indicates a matrix having all of it elements 1 then calculate Pb (Â»)= // Gain or blocking probability

// Pb (Â»)=V â€²(a1 , i1 , a 2 ,i2 ) break

end end

end

Step 3: If policy converges, then stop; else go to 2,

## â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦

## a. Minimum Utilization Based Selection (MUBS) Algorithm

The TGREP protocol sends the gateways update information to the LS, whether the route is congested less or more. This algorithm uses the available circuit's of all gateways and selects a path which is minimum utilized. The utilization of gateway is defined as time of system is busy over time of system is available. The cost-reward model is developed for this algorithm.

A Markov decision process is a sequential decision problem where the set of actions, rewards and transition probabilities depend only on the current state of the system and the current action selected. The history of the problem has no effect on the current decisions. Depending on the action, the system returns new state information with some reward. The rewards assigned to the states to measure the call blocking probability. For this algorithm, the call arrival with Poisson rate in the MDP model with reward that is represented in following functions. A function is considered that associated with updates as per ongoing calls and the capacities of the gateway. Therefore, if a gateway has utilization strictly less than other, then the algorithm will send all the calls to that gateway. But the utilization of the two gateways is equal, and then the calls are equally split between the two gateways.

Mathematically, we can write the cost-reward function as

i

i

1

2

f (i1,i 2,T1,T2 )=1 if T1

âˆ’T2

0

=0

if

i

1

## âˆ’

i

2

0

T 1

T 2

=0.5

if

i

1

## =

i

2

T 1

T 2

In this case the scaling constant (d) is represented as Î»+(T1 Î¼ ) + (T2 Î¼ ) + Î± 1 +Â±2 , where defined as the parameter can be Î» is the call arrival rate (Poisson), T1 and T2 are the capacity of the gateways, Âµ is the call holding time, Î±1 and Î±2 are the update rates respectively.

When actual number of ongoing calls goes to GW1 then it increase by one and compare with gateway capacity, takes minimum value similarly second route also. This algorithm based on the principle of recursive function i.e. equation (7).

Mathematically,

d= [Â»+(T1 Î¼ ) + (T2 Î¼ ) + Î± 1 +Â±2 ]

The value function is represented as

Î»

Î»

V N+1(a1,i1,a2,i2)=

f V N (min(a1 +1,T1),i1,a2,i2)+

(1 âˆ’f) VN (a1, i1,min(a2 +1,T2),i2)

d

d

a Î¼

(T âˆ’a)Î¼

## +

1

V N (max(a1âˆ’1,0),i1,a2,i2)+

1

1

V N (a1,i1,a2,i2)

d

d

a Î¼

(T âˆ’a )Î¼

## +

2

V N (a1,i1,max(a2 âˆ’1,0),i2)+

2

2

V N (a1,i1,a2,i2)

d

d

Î±d1 V N (a1 , a1 , a2 ,i2 )+ Î±d2 V N (a1 ,i1 , a2 , a2 )

f max (a1 + 1 âˆ’ T1 ,0)+ (1 âˆ’ f ) max (a2 + 1 âˆ’ T2 ,0)

(8) The MDP based MUBS algorithm gives a similar performance according to CTMC based

MUBS algorithm in Fig. 6. We have also seen that MUBS algorithm using either MDP based or CTMC based gateway selection schemes gives the worse performance in terms of blocking probability than the logical gateway.

Fig. 6: Comparison of blocking probability performance against Load with (a) algorithm for Logical gateway and (b) MUBS algorithm

## b. Probabilistic Selection Based on Utilization (PSBU) Algorithm

This is another gateway selection algorithm. The PSBU algorithm is modeled with the help of MDP. The problem with minimum utilization based selection is in between two updates, this is always picking the same gateway. If update rate is low for that particular gateway can quickly reach its capacity then it will start call dropping. Hence, a superior way to select a gateway is to assign probabilities to each gateway based on their current state in terms of capacity and select gateways based on those probabilities. There are two ways to assign probability. The first method is that the utilization is inversely proportional to the probability of choosing gateway. The second method is that the probability directly proportional to the remaining capacities of gateways. We have seen that second method is better than first one.

Here the cost-reward function f has probabilistic value 0.5, when last update as per ongoing calls is equal to the capacity of both gateways. The scaling constant d is identical like MUBS algorithm. This algorithm obeys the principle of recursive relationship (7).

Mathematically, we can write the cost-reward function as

1

f (i1,i2,T1,T2)=2

If i1 =T1and i2 =T2

i

1 âˆ’

1

T

## =

1

Otherwise

2 âˆ’

i

## âˆ’

i

1

2

T1

T2

The value function is represented as

Î»

Î»

V N+1(a1,i1,a2,i2)=

f V N (min(a1+1,T1),i1,a2,i2)+

(1 âˆ’f) VN (a1, i1,min(a2 +1,T2),i2)

d

d

a Î¼

(T âˆ’a)Î¼

## +

1

V N (max(a1 âˆ’1,0),i1,a2,i2)+

1

1

V N (a1,i1,a2,i2)

d

d

a Î¼

(T âˆ’a )Î¼

## +

2

V N (a1,i1,max(a2 âˆ’1,0),i2)+

2

2

V N (a1,i1,a2,i2)

d

d

Î±

Î±

+ 1 VN (a1,a1,a2,i2)+ 2 VN (a1,i1,a2,a2)

d

d

+f max

a +1âˆ’T ,0 +1âˆ’f

max

a

+1 âˆ’T,0

(9)

## (

1

1

## )

## (

## )

## (

2

2

## )

The gateway selection scheme is analyzed and verified using MDP based PSBU algorithm that provides a similar performance with PSBU-CTMC based gateway selection technique in terms of blocking calls, but PSBU gives a better performance than MUBS algorithm, these results are shown in Fig. 7. The gateway selection parameters are considered in this case:

Table 3: Parameters for gateway selection

## Parameter

## Our Consideration

T1

40 calls/system

T2

40 calls/system

Î²

1

Âµ

1 sec

Î±1

1/30 sec

Î±

1/30 sec

2

Î»

60 packets/sec

a1

0 to 40

i

0 to 40

1

Fig. 7: Comparison of blocking probability performance against Load with (a) algorithm for Logical gateway (b) MUBS algorithm and (c) PSBU algorithm

## c. Optimal Gateway Selection (OGS) Algorithm

The MDP based model is compared with CTMC based model for gateway selection using MUBS and PSBU algorithm. The performance in terms of blocking probability of the model is almost same. The CTMC has no criteria regarding choosing optimality. A policy prescribes an admissible decision for each state for a given stage. Here we have modeled the gateway under optimality condition by using MDP approach. The optimality is an important issue for finding the optimal route from multiple gateways to single LS. The principle of optimality is defined as that it will minimize or maximize over long-term performance. In this case the mapping function is similar to MUBS algorithm. This function indicates the cost-reward structure of the model. This mapping function f varies with different condition of calls ongoing to gateways with respect to capacity.

The Dynamic programming recursive relationship [2] can be written as

N +1

a

N

V

(i )=minaA

C (i ,a )+ p ij *V

(j ), i

(10)

j S

It is to observe that V N +1 (i)is the minimum expected n-step cost, given initial state i. The processes are applying action a in state si . The one-step cost on the initial transition would be C (i , a)and the probability of moving to state s j would be p a ij . It is known [Ber'87] that V N +1 (). is the minimum non-negative solution of the discount optimality equation.

Mathematically, we can write the cost-reward function in this case as

f (i1,i 2,T1,T2 )=1 if V

N

(min (a1,T1)+1, i1, a2, i2) V

N

(a1,i1, min(a2,T2)+1)<0

## =

V N

min

a ,T

+1, i, a , i

âˆ’V N

a ,i , min a ,T

+1 >0

0 if

## (

## (

1 1 )

1

2 2 )

## (

1 1

(2

2 )

## )

V N

min

a ,T

+1, i, a , i

âˆ’V N

a ,i , min a ,T

+1 =0

=0.5 if

## (

## (

1 1 )

1

2 2 )

## (

1 1

(2

2 )

## )

The scaling constant d is also analogous to MUBS and PSBU algorithm. When actual number of ongoing calls goes to GW1 then it increase by one and compare with gateway capacity, takes minimum value similarly second route also. This algorithm is selecting the gateway which has least utilization over two gateways. Under optimal condition, the OGS gives a better performance in terms of blocking probability than MUBS and PSBU

algorithms as shown in Fig. 8. The results with different algorithmic schemes for gateway selection are shown in table 4.

The optimal value function for this criterion is defined as

Î»

V N+1(a1,i1,a2,i2)=

min(VN (min (a1,T1)+1, i1, a2,i2),VN (a1,i1,min(a2,T2) +1,i2))

d

( a âˆ’1) Î¼

(T âˆ’a +1)Î¼

## +

1

V N (max(a1 âˆ’2,0)+1,i1,a2,i2)+

1

1

V N (a1,i1,a2,i2)

d

d

( a âˆ’1) Î¼

(T âˆ’a +1)Î¼

## +

2

V N (a1,i1,max(a2 âˆ’2,0)+1,i2)+

2

2

V N (a1,i1,a2,i2)

d

d

Î±

Î±

+ 1 VN (a1,a1,a2,i2)+ 2 VN (a1,i1,a2,a2)

d

d

+f max

a âˆ’T ,0

+1 âˆ’f

max

a

âˆ’T ,0

(11)

(1

1

## )

## (

## )

(2

2

## )

The policy iteration based optimal gateway selection algorithm is as follows:

## â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦...

## Algorithm: OGS

Step 1: Initialization

- Set the initial values of the value function parameters a1, i1, a2, i2, where a1 , i , a 2 , i2 {0,T}

Set the initial counter N = 0

Set the initial state V N (a1 , i1 , a2 , i2 )=V N +1 (a1 , i1 , a2 , i2 )= 0

Step 2: Policy Improvement

## -

for Î»= 0to160

for N =1toT

N +1

a

N

Compute the policy values V

(i )ÂminaA C (i ,a )+ p ij *V

(j ), i

j S

Compute âˆ‚1(n)Âmax(a,i)

N +1

(a,i) V

N

(a,i)// element maximum

V

Compute âˆ‚2(n)Âmin(a,i)

N +1

(a,i) V

N

(a,i)// element minimum

V

## â€²

N +1

N

ComputeV

a ,i ,a ,i

â†abs V

a ,i ,a ,i

âˆ’V

a ,i ,a ,i

## ,

## (

1 1

2 2 )

## (

1

1

2 2 )

(1 1

2

2 )

Where (a1,i1,a2,i2) T,

N =0,1,.....,T

## â€²

if V (a1, i1, a2,i2)= I// Iindicates a matrix having all of it elements 1

then calculate Pb (Â»)= // Gain or blocking probability

// Pb (Â»)=V â€²(a1 , i1 , a 2 ,i2 ) break

end end

end

Step 3: If policy converges, then stop; else go to 2,

## â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦

Table 4: Results with different algorithmic schemes

Fig. 8: Comparison of blocking probability performance against Load with (a) algorithm for Logical gateway (b) MUBS algorithm (c) PSBU algorithm and (d) OGS algorithm

## V. Conclusions

An optimal gateway selection and packet routing are imperative from studying aspects related to end-to-end QoS guarantees in packet networks. The key components and issues were studied and simulated. The gateway selection scheme is analyzed with MDP. The logical gateway is modeled by us with MDP and is verified with CTMC based logical gateway scheme also. The performance of those schemes is almost same, which in term verifies our MDP based gateway selection technique. A PITG model has been simulated and verified using MUBS algorithm with MDP that the performance is same with CTMC based technique when used for MUBS. However, MUBS gives worse performance than logical gateway. Thirdly, the gateway model is analyzed using MDP based PSBU algorithm that provides a performance similar to that of PSBU-CTMC based model. The PSBU algorithm gives a better performance than MUBS in practical systems. Under optimal condition, the OGS gives a better performance than MUBS and PSBU algorithms. Assuming observable state variables a1, a2 for two gateways the MDP is constructed which indicates that design policies may be better than PSBU and MUBS.