An Experimental Methodology Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

I am going to study the querying mechanisms performance in the settings of realistic,i am going to apply the method above in the hardware components. In this part, mainly ,i am representing the methods of experiments and also by illustrates the installing of the hardware components and also how the experiments are conducted by using this hardware components.

here the experiments used by Moteiy Tmote-Sky tools,generally called as the telosb motes.and here the experimental topology includes a square grid, nevertheless ,as distinguish to perfect symmetric grid in this each node as only having 4 neighbour nodes(not including border nodes),the practical case is like all neighbours transfer of packets and receiving of the packets which is connected by the near by immediate neighbours.these experiments tested both in indoor and the outdoor environment.the principles of the two indoor and out door environments are as follows:

Indoor:this indoor environment consisting of a very big clear room near by the campus block on the fourth floor.there is an functioning of 100 node test bed sensor networks on the third and fourth floors of this block.other resource of method are so many research lab using by means of wireless devices and the occurrence of a big number of IEEE 802.11b/g networks.

Outdoor: this outdoor environment consists of a huge open area outside campus block where we able to find notably lower interference using the outdoor resources as distinguish to the indoor this networks various grid environments were arrange. 5*5 and 7*7 are the grid for the indoor environment and 8*8 9*9 were be used for outdoor environment.

These test were made on different ways, noticeable few alterations in the environments-as this is significant confirmation for further calculations on dependability of the starting experiments does not have a density of huge range,and does not have nodes and the environments.while deploying the nodes i took careful observations that is i replace the incorrect nodes with the exact correct nodes to make ensure that each and every results are correct in common for the terms mentioned and are not having any artifacts of "complete implementation or with rare events"


Hitting time computatison

this part we have to assume a graph of random geometric described across a circle area of radii H, i.e.,actually we have to draw the nodes inside a circle instead of that we drawn into a square. For this type of graph numerically we have to calculate the hitting time of transmission of a random walk that begins from marking a point at the edge of the circle and estimating at hitting of the target,that which we have to be placed at the origin centre. As such a symmetric adoption is simple sufficient to compute the hitting time; however, it is helpful to develop some common approach - in a various geometry we can usually calculate a circular part of fitted radius centered at the target of the node that the packet functioning the random walk finally appears. by the use of this pattern, I am going to explain how the next other node's random selection principle is not exactly correct for winning a low hitting transmission of the time.

In the following study,I am considering that: (i) each and every node has the similar transmission frequency R, i.e.,atmost it can reach any other nodes at a distance of R from itself with a unique number of local transmissions; (ii) the

Figure 1: An example of a random walk on the symmetric setting.

distribution of the neighbors of a newly reached node is independent from the sending node's one; (iii)

random walk exploits lookahead, i.e., when the packet reaches a target's neighbor it is forwarded to the

target at the next step; (iv) the fact that the region is limited is modelled by assuming a virtual circle with

radius H + R and that when a packet is sent to a node at distance H + x the packet is bounced back to a

virtual node located at distance H ¡ x from the target.

Our aim is now to compute the average hitting time E[h(u; v)] for v located at the center of the area

and u located at distance r0 from the target, see Figure 1. For the sake of simplicity this hitting time is

denoted as ht(r0). Let ri be a random variable (r.v.) representing the distance of the node processing the

packet at step i from the target node.

The hitting time ht(r0) is computed by studying a random walk on the segment [0;H], where position

0 is an absorbing state, i.e., we can think that when the packet is received by the target the distance

becomes zero and it remains so for all the other future steps. Let fi(r) be the probability density function

Figure 2: Variation of the Euclidian distance after packet retransmissions.

To compute the progress pdf let us consider the circle c of radius R centered at the selecting node

and let C1 (C2) be the circle of radius r0 (r) centered at the target node (see Figure 2). The distance of

the packet varies from r0 to r, i.e. it makes the progress indicated in the figure with the dashed line, only

when the node selected as the next hop in the walk lies on the arc delimited by the intersection of

The length °(r0; r) can be expressed as 2Ácr, where Ác is the angle under which the arc is seen from

the target.

In Figure 3 we compare the hitting time obtained from the above model against the one estimated

by simulation for H = 1 and R = 0:3. The analysis provides accurate results when the total number of

nodes is higher than 300. The reduction of the hitting time with increasing node density can be surprising

at a first glance: as the total number of nodes in a fixed area increases it should become harder to reach

exactly a given node. This is not the case due to the lookahead assumption: in order to hit a target it is

sufficient to reach any of its neighbors (due to lookahead), and the number of target's neighbors grows

with node density.

How to reduce the hitting time

The packet orbiting problem is due to the uniform random selection of the next node to visit among all

the neighbors, which is performed by the selecting node; it has a strong negative effect on the hitting.


In this section we propose two next hop selection rules for eliminating the orbiting problem and thus

reducing the hitting time. They are referred to as Double Range and Ring Based. The first selection rule

aims at obtaining a progress pdf with a peak located at long progress, while the aim of the other rule is

to obtain a constant progress pdf. The hitting time is again evaluated for the circular setting described in

the previous section.

Since the rules, which are based on the neighbors' distances and the power required to send a packet,

can in principle be regulated according to the distance of the selected node, we have also estimated the

average power associated to the search by assuming that when a node sends the packet to a neighbor at

distance r it consumes the power r2.

3.1 Double Range Selection rule

The double range (DR) selection principle break down the area which is covered by a node into two area zones, the area which is near to the zone Zn- we namely the group of marking points at a distance which is less than Ri selecting from the node and also the far one Zf . A neighbor node which is related to the far (near) zone area is known as a far (near) node. The value is o<Ri <= R is known as the interior radius and it is the single parameter we consider for this method. In such a way that we need to maximize the probability to forming the highest outcome jump, the choosing node attempt to select one of the far node at randomly; if there is no existing of any far nodes it then selects a close node at randomly.

The area of the far (near) zone is AF=  R^2-. Ri^2 (AN =  Rif^2 ) here pN is the probability of nearest neighbor is (this occur with probability pN = Ri^2 /R^2 = (Ri/R)^2) and considering that the node forming the choosing has k neighbors.

The evaluation of pdf is the total of two contributions: here I am taking the first one into point as

This is the case like when the far area zone is not be empty, and the case second one is like when it is be empty.

3 rd *****************************************************************************************


Here, in this section, the description of dynamic search is given. The Dynamic search is nothing but a generalization of flooding, MBFS and RW. Dynamic search is done in two phase and each phase has their own searching strategy. The choice of search strategy depends on h and n, where h is hop count and n is decision threshold.

Phase 1: When h ≦ n;

In this phase when hop count is equal to decision threshold then dynamic search acts as MBFS or flooding. The query message sent by the query source depends on the pre defined transmission probability p. now, if d is the link degree of the query source then p*d will be the number of query message sent to neighbors. If p equals 1 then dynamic search resembles flooding else it operate as MBFS with transmission probability p.

Phase 2: When h > n;

In this phase when hop count is greater than decision threshold then the operation of dynamic search switches to RW. In this situation each node will send query message to the neighbors after receiving it. This will only happen in case if the node does not have query resource. If the nodes visited by DS at hop h = n has coverage cn, then the operation of DS at that time can be regarded as RW with cn walkers.

If we consider the whole operation we will find the differences between DS and RW. If we take the decision threshold (i.e. n) as 2 as given in figure 2. Therefore, when h > 2, DS performs the same as RW with c2 = 12 walkers. Let us consider a RW search with k = 12 walkers. At the first hop, the walkers only visit 4 nodes, but the cost is 12 walkers. This shows the significant difference between DS and RW.

Figure 1: A simple network topology with link degree of 4 with each vertex

Suppose that if

S= query source,

R= vertex which receives the query message,

F= queried resource,

Mi= ith query message, and

TTL = time to live limitation then figure 2 shows the pseudo code of DS

Figure 2: The pseudo code of DS

The random walks are used in several algorithm proposed for wireless networks. The RWMS (Random Walk-based Membership Service) for ad hoc networks is described. The service provides each node with a partial uniformly chosen view of network nodes. The algorithm uses in random walk as a sampling technique, whereas the aim of our protocol is to locate a target.

The group membership list is collected by a single random walk agent traversing the network. They apply a single random walk that covers the whole network, not for searching. An efficient token passing algorithm is exploited in NASCENT to provide a network layer service dedicated to group communication in ad hoc networks. Again, the goal of the random walk is not to perform a search. A previously visited node is hidden to subsequent selections with a given probability, which is called the bias of the walk. The work exploits only one form of bias. Differently from our case, no information is used.

Here nodes are allowed to choose the next hop among a small subset selected at random. The authors discuss the power of such a strategy for improving the performance of a random walk considerably. Their results are consistent with our findings, because making an informed choice is a way to achieve a strong bias.

Random walks over wireless networks are also studied from a graph theoretical point of view in several papers. However, all these studies focus on unbiased walks. Biased random walks are widely adopted for search in unstructured P2P architectures, both in the form of bias-by memory and bias-by-information. The interested reader can, for example, refer to for a survey.

However, several key aspects make search in P2P different from search in wireless networks. First, the topology of P2P networks is best modeled as a power-law graph, whereas wireless networks adhere to the random geometric graph model. Second, the channel model in the two networks is quite different. While in P2P nodes are connected via unicast channels (a TCP connection), in wireless networks a transmission is inherently broadcast.

The cost of implementing the same technique, like look ahead or next hop selection, is then quite different. Last, the changes in the topology are strongly correlated; this makes some source of information, like the distance among nodes, meaningful only in mobile networks


In this section we present the performance analysis of DS. We apply Newman's power-law random graph as the network topology, adopt the generating functions to model the link degree distribution, and analyze the DS based on some performance metrics, including the guaranteed search time, query hits, query messages, success rate, and search efficiency.

A. Network Model

We use the idea of generating function to derive the desired terms of a random graph. First, the generating function for link degree distribution is denoted by G0(x). Then, the average degree of a randomly chosen vertex, i.e. the average number of first neighbors, is G0(1) and the number of second neighbors are z1 = G0(1)G1(1).

Performance Metrics

Success Rate (SR)

Success rate (SR) is the probability that the query is success, i.e., there is at least one query hit. Assume that the queried resources are uniformly distributed in the network with replication ratio R, and then SR can be calculated as

SR  R)C ………………….. Equation 3

Where R is the replication ratio and C is the coverage. This formula shows that the SR highly depends on the coverage of the search algorithms. Following we use (3) to obtain an important metric guaranteed search time.

Guaranteed Search Time (GST)

To represent the capability of one search algorithm to find the queried resource with a given probability, we define the guaranteed search time (GST) as the search time it takes to guarantee the query success with success rate requirement SRreq. GST represents the hop count that a search is successful with probabilistic guarantee. Using Equation (3), GST is obtained when the coverage C is equal to log(1-R)(1- SRreq) .

Query Hits (QH)

The number of query hits highly depends on the coverage, i.e., the number of total visited nodes. Assume that the queried resources are uniformly distributed with replication ratio R in the network, and the coverage is C; hence the number of query hits is RC. The coverage C can be regarded as the summation of the coverage at each hop.

Therefore, we first analyze the coverage Ch at the hth hop. Let Vh be the event that a vertex is visited at the hth hop. Suppose the probability that the vertex i is visited at the hth hop is Pi(Vh). When the hop count h = 1, Ch is the expectation of the vertices that are visited at the first hop. When the hop count h is larger than 1, the calculation of Ch should preclude the event that the vertex has been visited in the previous hop

Implement a Biased Random Walk

The previous sections provide us with a theoretical flavor about the goodness of biased random walks for searching. As our analysis has shown, the hitting time is very sensible to the bias level. Motivated by this encouraging result, we will now drill down to a practical implementation of a search algorithm. Before that, we discuss the possible options for achieving bias and present general implementation frameworks that adhere to these frameworks

In a random walk, the walker moves by making blind and memory less random selections. The decision of the next node to visit is blind because the walker does not use any external information to decide, and it is memory less since if the walker visits two times the same node, it behaves the same. The bias arises when such a fairly simple decision mechanism is somehow altered in a way that the walker is statistically "pushed" toward the target. In this paper, we explore two orthogonal ways to bias the walk, dubbed as bias-by-information and bias-by-memory.

In Bias-by-information, the walker exploits information available at the currently visited node, which indicates the most appropriate decision to take for reaching the target. The information used for deciding is maintained by some protocol, which basically corresponds to the oracle used in our model. In the literature, many examples fall in this category.

In P2P architectures, where this technique is widely used, the information can consist of some topological knowledge, the connectivity degree of the neighbors of the selecting node; another option is leveraging on a learning protocol that estimates the goodness of the candidate nodes, according to previous searches. A routing protocol can also be considered as a special case of this class. The routing tables represent the information stored at nodes while the routing protocol, which is in charge of maintaining the table up to date, corresponds to the oracle. In this particular case, the "search" becomes deterministic.

In Bias-by-memory, the walker maintains memory of its previous selections, so that bias merely consists of forcing to visit new portions of the network. No sense of direction is exploited. Bias-by-information is potentially more effective than bias-by-memory, especially if the number of target is low. For example, the best one can expect from bias-by-memory is that at each step a new node is visited. If we assume that a newly visited node has the same probability of being the target, then the lowest average hitting is nþ1 2, where n is total the number of candidate nodes.

The hitting time may be much shorter under bias-by-information, the lowest value being the average network distance of a requesting node from the target. We now discuss two implementation frameworks that allow one to design random walks of these two classes, along with their performance assessment.

Bias-by-Information Implementation Frameworks

The first implementation strategy for an informed random walk is called ALLð_Þ. In this case, all nodes are equipped with an oracle functionality, which tags all neighbors either as near or far. A near (far) neighbor is a node whose distance from the target is less (higher) than the tagging one. The oracle is required to provide correct discriminations with probability; a near (far) node is correctly tagged as near (far) with probability and wrongly as far (near) with probability 1. The next node to visit is selected at random among the estimated near nodes. If no nodes are tagged as near, then the selection is a random among all nodes. A given bias level is obtained by regulating the estimation correctness. The expected bias level of this framework can be computed as follows: Let C be the average number of near nodes of a randomly observed selecting node and F the average number of far nodes; then, the expected number of near nodes that the oracle correctly tags as such is C, while the expected number of far nodes, wrongly tagged as near nodes,

The second framework is called PARTIALðI; kÞ. This implementation strategy requires only a subset I of nodes to be equipped with the oracle, which correctly tags minfk; NNeighg near nodes NNeigh being the total number of near neighbors. When the walker arrives at an informed node, i.e., one equipped with the oracle, it makes an informed step by selecting one of the recognized near nodes, at random. In this case, the bias level is where, i is the probability that the electing node is informed.

Bias-by-Memory Implementation Frameworks

To implement bias-by-memory, the identifiers of the visited nodes must be available at the walker. The first implementation is based on a Memory List, carried by the walker. The list contains at most H different identifiers of the most recently visited nodes.

The next node to visit is selected at random among the neighbors that do not appear in the list. If all neighbors are in the list, the selection is at random among all neighbors. The other implementation option is called Distributed Memory and corresponds to H that is equal to the number of nodes. In this case, each node maintains a flag associated to the walker that indicates if the node was visited or not. At selection time, a node probes the status of all neighbors and sends the walker to one unvisited node, selected uniformly at random. The overhead due to probing can easily be avoided by leveraging the broadcast nature of transmissions (see Section 4.2.1).

4.2.1 Evaluation

Fig. 18 shows the bias induced by the first solution as a function of H. The distributed memory option corresponds to H ¼ 400. Since we saw that the bias and the hitting time for H _ 100 are the same, the plot shows only results until H ¼ 100. The bias level increases rapidly as a memory capacity is added, and then it stabilizes to a value. Consider that in this case the bias cannot be regulated rather it is just a way to model the side effect of the memory.

Performance Evaluation

Comparing with Guided Search, Routing protocol, filtering with routing updating table provides optimum results for the search performance. Initially when the queries are low, guided search performance was good. When the queries are getting increased, filtering mechanism with routing updating table is the suitable one which gives the best results up to 90%. Hence it improves the searching performance of the peer. Routing updating table protocol contains the past successful search results and it is used for future references. Updating process can be taken place in each and every second.

Simulation Model

Simulation is based on NS-2 and Tcl with C++.Network Simulators such as NS-2 has been used for testing p2p protocols, while other network simulators, like OMNeT++ have been forced to produce a simulator specifically designed for P2P systems namely oversim. We have taken X-axis parameter as queries and Y-axis as success rate. By varying different methods like guided search, simple routing and routing with filtering towards search performance with varied queries (50,100,150………)

Simulation Results

Experiments were run using different parameter, protocols and system settings. The performance analysis presented here is designed to compare the effects of different filtering mechanisms parameters such as NOP, success rate, queries etc together with P2P protocols for the improvement of search performance.

Figure 1: Performance of filtering mechanism

Figure 2: Success rate performance (%)


We compare the performance of the three different search methods that realize the square-root principle: our popularity biased random walks, object replication, and topology reconstruction. Although object replication and topology reconstruction can be used to support both query flooding searches and random walk searches, our simulation study focuses on the performance of the three methods supporting random walk searches.

It is important to keep in mind that object replication and topology reconstruction must maintain up-to-date replication copies and network topologies to achieve the search performance presented in this section. Our popularity-biased random walks do not incur such costs. These costs can be very substantial in large networks with dynamic searchable datasets.

In addition to the performance comparison, another objective of the simulation is to evaluate the impact of system parameters (e.g., network topologies and query distributions) on the random walk search performance.

4.1 Simulation Setup

The three search methods are set up as follows.

Square root replication: Each object is replicated randomly over the network in a way that the number of replication copies is proportional to the square-root of its popularity. One uniform random walker is used for searching the network while we set the average number of replication copies as the number of random walkers used in square-root topology and square-root biased walks. This is intended to make a fair comparison since the expected search time for square-root replication is inversely proportional to the average number of replication copies.

Square root Topology: Square-root topology. Uniform random walkers are used to search the network. The degree of each node is proportional to the square root of its content popularity. To transform the original topology into this square-root topology, we compute the node degree sequence and use the PLRG algorithm to generate the new randomized topology with the desired node degree sequence.

Square root Biased Walks: Square-root biased walks. Without adjusting topologies or replicating data, each query issues a number of random walkers that travel the network according to Equation (6). Multiple random walkers coordinate with each other by periodically calling back the source to learn whether any other walker has found the target. If so, the remaining walkers will terminate upon next call.

We use random graphs and random power-law graphs as network topologies in our simulation. Random graphs represent those peer to peer topologies where new links are made independent of existing node degrees. Random power-law graphs represent those networks where new links are more likely attached to nodes with large degrees. In our simulation, the random power-law graphs are generated by using the PLRG algorithm. We use the random power-law graphs with α = 0.8. We generate random graphs by connecting each new nodes to some nodes selected uniformly at random from all existing nodes. Some default simulation parameters are listed in Table 1.

Table 1: Default Simulation Parameter

4.2 Simulation Results

Figures 1 and 2 present the search time and communication overhead on different network topologies (random graphs and random power-law graphs), query popularity distributions, and a variety of settings on the random walker count (k).

We observe that the three random walk search methods have similar performance with small variations (average 14% difference for random graphs and 19% for random power-law graphs). The performance difference is mainly due to the different speeds at which the three methods converge to the targeted square-root object probe distribution.

We perform measurements to specifically measure the random walk convergence speeds. Figure 3 & 4 shows that the square-root replication has better convergence speed than the square-root biased walks which in turn has better convergence speed than the square-root topology. Due to its non-uniformity, our popularity-biased random walks have slower convergence and hence lower search performance than square-root replication. The square-root topology has the slowest convergence speed because the square-root network topologies tend to have worse expansion properties than typical topologies such as random graphs and random power-law graphs.

Based on the results in Figures 1 and 2, we also examine the impact of system parameters on the random walk search performance. In terms of the query popularity distribution, it was found that the search performance for high-skewness popularity distributions (α = 1.2) is higher than that for low-skewness distributions (α = 0.6). This can be explained by results in Equation (5), which show that the search time for

α = 1.2 is Θ (n0.8) and the time for α = 0.6 is Θ (n).

Slightly higher time and communication overhead are observed for random power-law graphs than those for random graphs, which is mainly because random graphs have slightly better expansion properties than random power-law graphs. Consequently random walks tend to converge faster on random graphs.

The simulation results also suggest that increasing the number of random walkers can significantly reduce the search time with slight increase in the communication overhead.

Such increase is due to the convergence overhead associated with each random walker (so more walkers would incur more overhead).

Figure 1: The search time and communication overhead on random graphs, where k is the number of random walker (the average number of replication copied for square root replication.

Figure 2: The search time and communication overhead on random power-law graphs, where k is the number of random walker (the average number of replication copied for square root replication.