# Correction Method Of The Spatial Point Pattern Biology Essay

**Published:** **Last Edited:**

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

There are so many effects when we are doing the analysis of spatial point pattern, and edge effect is one that will be evoked in spatial point pattern analysis so we need to solve the problem of edge effect by the edge effect correction methods and to minimize the edge effect as well.

Nevertheless, in the papers about edge effect correction methods, I found that there are several methods, but those edge correction methods are in different purposes, benefits and shortcoming.

In this project, I am doing a review on the papers of edge correction methods and those are done by some famous experts. To review those edge correction methods in the view of those honorable experts, to sum up their opinions and find out the more effective and the least effective edge correction methods.

Furthermore, the summary statistical functions, which are G-function, F-function and the K-function, are also included in my project as the introduction of the spatial pattern analysis which is based on Ripley's K-function.

2. Introduction

The main part of my project is to review the journals, articles and reviews which are related to my topic, the comparison of the edge correction methods, to see the difference among the edge effect correction methods applied to K-function by different experts.

Firstly, this project is started with a brief introduction of the three summary statistical functions. The G-function, F-function and the K-function are aim in different purposes. They are used to find the nearest neighbor distances (G-function), the point-to-nearest-event distances (F-function) and the range of distances for the spatial pattern analysis.

Secondly, the main part of this project is the review on papers of edge correction methods. In this project, all of the edge correction methods are from those worthy experts and I have chosen the three most common edge correction methods to be my review objects from those reference papers, to introduce the three edge correction methods are the buffer zones method, the toroidal method and the Ripley's circumference correction method, give a concise summary of the three edge correction methods and make a conclusion on them.

As this is review project, the difficulties are much less than the other final year projects. But there are still difficulties. The limitation of the papers for review, the time limitation, the reassign of the final year project topic....etc.

3. Summary statistic functions for spatial point pattern

Summary statistic function is use to find out the relationship between points to points, events to events and also can find out that the pattern is cluster or regular pattern by different functions' help. They included the G-function, F-function, K-function....etc.

We can use the those functions to find out the information that we need or want, and also can know more about the pattern through the analysis, calculation and as well as the plot of those functions. Different functions are aim for different proposes. The following is a brief description about the G-function, F-function and the K-function.

In the following function, I will use the spatial point patterns that one is generated a group of i.i.d. (independent and identically-distributed) uniform points and it is based on the CSR, which means that the distribution of plot is complete spatial randomness and points are independent and identical to each other and another one is the redwood seedling data pattern.

In this project, the sampling points are on the unit square [0, 1]2.

The following two figures are two examples.

Fig.1 A generated spatial point pattern with 100 points.

Fig. 2 The distribution of redwood seedling data pattern

3.1 G-function

G function is used for calculating the probability of finding the nearest neighbour distances between an event to the nearest other event in the study area. It is the distribution function of the distance under CSR (complete spatial randomness) from an arbitrary event of the process to its nearest neighbour and the its distribution function is:

, n represents the events

When n is large, can be written as n and

For example, in a G function plot which shows that the higher value of the G value in study area, then it means that the higher probability to find the nearest points in the given region. And that is the aggregate pattern carries out by G function plot.

If it is the regular pattern, then G value will much lower or even zero within the given region.

In a given region, we can use G function to find the nearest point distance from an event to another, if it is an aggregate pattern, the higher value/ probability of G gets, which means the higher chance to finding the nearest points within the given region.

Fig. 3: the relationship between the G- function against the distance r in the a

generated point data pattern.

The above plot shows the relationship between the G- function against the distance r under the pattern under the pattern of Fig.1 and it has 100 points in the square area.

Fig. 4: the relationship between the G- function against the distance r in the

redwood seedling data pattern.

From the plots of Fig.3 and Fig.4, the line in green colour is the theoretical value of G(r) for a stationary Poisson process of the same estimated intensity, the one in red colour and black colour are the ``reduced sample'' or ``border correction'' estimator of G(r) and the spatial Kaplan-Meier estimator of G(r) representatively.

And we only focus on the green one, same as we mentioned above, the line increases smoothly with the theoretical value of G(r) and the distance r at which the function G(r) has been estimated. This means that the higher the probability of the nearest point can be found is according to the length of the distances r.

When you pointed at an event, there is a lower probability that you can find another event which is near to the event you pointed at the length of the distance r is smaller, and the probability will increase with the increasing length of distance r.

And this phenomenon also tells us this spatial point pattern is a regularity pattern around the event you pointed as the green line started to increase at r is less than 0.01 and rose smoothly with the distance r. but it will be cluster when the distance r is much longer, which means that more events can be found

3.2 F-function

F function can also help us to find the point to nearest event distances between a point and nearest of other n events. F-function uses the distance r from each generated points in a given area to the nearest of the n events. And its distributions function under CSR when the events n is large, it can be approximately written as:

If there is a regular pattern, the F value will be, much higher, as the distribution of a regular pattern distributes all the events regularly, so the chance to find the nearest points by pointing a point is much higher.

If there is an aggregate pattern, almost points are clustering together. If we pointing at a point arbitrarily, the chance to find a nearest event is much lower than that of regular pattern.

The following is plot that shows the relationship between the F- function against the distance r under the pattern under the pattern of Fig.1 and it has 100 points in the square area.

Fig. 5: the relationship between the F- function against the distance r in the

generated data pattern.

Fig. 6: the relationship between the F- function against the distance r in the

redwood seedling data pattern.

In Fig.5 and 6, they are same as the plot of G- function, the lines in green, in red and in black are the theoretical value of F(r) for a stationary Poisson process of the same estimated intensity, the ``reduced sample'' or ``border correction'' estimator of F(r) representatively.

For the green one, the line increases much smoother than the G-function with the theoretical value of F(r) and the distance r at which the function F(r) has been estimated. It also means that the higher the probability of the nearest point can be found is depends on the length of the distance r.

Different from the G-function, F-function is a point to events function. Pointed at a point arbitrarily, there is a probability shows that you can find another event which is near to the events you pointed at a smaller distance r, and the probability also will increase with the increasing length of distance r.

Fig.5 tells us this spatial point pattern is a regularity pattern as the green line started to increase at r is less than 0.01 and rose smoothly with the distance r. surrounding the point you pointed, less events you can find, but with the increasing distance r, you can find more events nearest the point you pointed.

The spatial point pattern shown in Fig 1 as the points are distributed arbitrarily in the study area. But the pattern seems in a regular orientated, but in somewhere, it also has a cluster pattern.

And the redwood seedling data pattern, you can find there is a difference between two data patterns.

For the redwood seedling pattern of Fig 2, we can find the green line rises vigorously than that of the generated data pattern. This appears that the redwood seedling data pattern is a cluster pattern and it is easily to classify its pattern than the generated data pattern.

3.3 K-function

For K function, it is about the intensity, which is definite that the expected value the number of events find in the given region over the mean number of events find in the given region. K-function provided a characterization of the second-order properties of a stationary isotropic process and its distribution function is:

Where is the number of the further events within distance r of an arbitrary event.

The plot of the K function is similar to that of the G function, as if there is an aggregate pattern, the higher value of K value will be, which means that the more number of events can be counted in the region.

Otherwise, a lower value shown, means that the events are occurred in regular pattern, the events are distributed evenly over the space, the number of events find in a given nearest point distance region.

Fig. 7: the relationship between the K- function against the distance r in the

generated data pattern.

Fig. 8: the relationship between the K- function against the distance r in the

redwood seedling data pattern

Fig.7 shows the relationship between the K- function against the distance r under the pattern of Fig.1 and it has 100 points in the square area.

From Fig.7 and 8, they are similar to the plots of G- function and F-function, the lines in green, in red and in black are the border-corrected estimate of K(r), translation-corrected estimate of K(r), Ripley isotropic correction estimate of K(r) and theoretical Poisson K(r) representatively.

For the green one, the line increases much slower than the G-function with the theoretical value of K(r) and the distance r at which the function K(r) has been estimated. It means that the more nearest points can be found when the length of the distance r is much longer.

We can from the plot of Fig 7, the lesser distance r; the lesser number of events will be counted. Oppositely, the more number of events counted while the distance r is longer enough.

The plots of Fig.7 and Fig.8 tell us this spatial point pattern is a regularity pattern as the green line started to increase at r is less than 0.01 and rose with the distance r. surrounding the pointed area, less events you can find. However, with the increasing distance r, more events nearest the pointed area will be counted.

From the redwood seedling data pattern, the green line rises steadily at the beginning, but arises vigorously at the end of the plot. This shows that the redwood seedling data pattern is a cluster pattern and the events are not distributed as even as that of the generated data pattern.

Different from the G-function and F-function, K function is a function counting the number of events among the given area. Pointed at a point or an event arbitrarily, it will count the number of event surrounding the pointed area with the distance r. it is one of the most powerful function in the literature review from the experts

4. A review of those edge effect correction methods

The main part of my project is to review the journals, articles and reviews which are related to my topic, the comparison of the edge correction methods, to see the difference among the edge effect correction methods applied to K-function by different experts.

First of all, in most spatial statistic require for edge correction as in the theoretical distribution for the spatial point statistics assume an unbounded area that is without any edge or boundary. Edge effect is a problem for us when we are doing analysis of the spatial point pattern. Although in some exploratory analysis, it can be ignored. But in fact, it is exist, and we need to find out some methods to treat it as a normally, seems there is no any edge among the analysis area, than we can find the result easily and also be more accurately.

Fig. 9 the plot shows there is edge effect surrounding the analysis area which is in red.

Edge effect correction is powerful technical to solve the edge effect problems in analysis of spatial point pattern. As edge effect may leading a mislead result, so it is needed to cure and solve, even we can cure all the edge effect, but edge effect correction can help us to find the most accurate or the most nearest result.

Diggle, 1987, a divine statistical expert, has said that all edge correction methods involve reducing the bias at the expense of the increased variance.

Edge effect correction also can improve the statistical power of the K-function more effectively than edge correction of the nearest point functions.

Hence, we will review some literature reviews to find the power of those edge effect correction, and their advantages and disadvantage as well.

From the reviews, the writers have carried out some tests to find out which edge correction method is/are the most effective, and which is /are the less effective one(s).

4.1 Edge correction methods

In this review, I mainly focus on three edge corrections; they are the Buffer zones method, the Toroidal method and Ripley s circumference method. All of the above methods have their pros and cons. My duty is to screen these three methods, to know more about them.

To describe them one by one, and I have planned to arrange them in a special order. What s order according to? Just look into the following review, then you will find the answer.

Fig.10 The upper and lower envelopes (5% significance level; 2,000 realizations)

4.1.1 Buffer Zones

Buffer zones, which also called a guard area correction and it is the most simply one method among the three edge effect correction methods. It can be used to solve two type of plot, one is rectangular plots by Sterner et al (1986) and another is circular plots by Szwagrzyk & Czerwczak (1993).

Buffer zones correction is to build a buffer area which has classified into two parts, one is the buffer area inside the study area (inner guard area method) and one is the buffer area outside of the study area (outer guard area method). After that, uses the points in the buffer area as destinations in the increasing or measuring distance between events to events, events to points or points to points.

But for the buffer zones method, we should make a presumption for it that is the distribution pattern in the buffer area is the same as that of the inside of the study area and the points in the inside of the study area will be counted as points i and the points in the study area will be counted as points j.

Buffer area

Fig.11 Edge effect correction methods: the inner guard area method.

For the inner guard area method, this is for the study in a smaller area. By reducing the original study area to the size of the guard area inside the original area, then the edge effect will seems eliminate in this situation as it treats the guard area be the new study area. There seems no edge in the guard area. When the analysis of the new guard area pattern by K-function needs to consider on the points outside the guard area, then we can look at the original study area and check whether there are any events near to the pointed area, so that we can eliminate the edge effect problem. Hence, we can follow the procedures of K-function to analysis the new spatial point pattern.

Guard area

Fig.12 edge effect correction methods: outer guard area method

The outer guard area method is to set a smaller study area to be the original study area in the original redwood seedling pattern and the original redwood seedling pattern area as the guard area, which is like to enlarge the original study area and set it as the guard area.

Under this outer guard method, the guard area is much larger then the original study area.

When studying on the original study area by using the K-function analysis, we can also eliminate the edge around the original study area as we have the guard area for us to screen the point outside the original study area. As this outer guard area, we are easy to find the nearest neighbour surrounding the pointed area and check is there any events outside the original study area, then check out that the distance between the outside events and the pointed area is/are nearer to the pointed area than the distance between the inside events and the pointed area. Moreover, we can continue to do the K-function analysis and regard the edge is not appear in the original study area. Thus, we can solve the edge effect problem by this outer guard area.

Now, let s see the plots which are using the buffer zones method to solve the edge effect problem.

Fig.13 The plot of K estimation with 2 00 points in a fixed area

(Under buffer zones correction method)

Fig.14.The K estimation with the redwood seedling data

(Under the buffer zones correction method)

From the above two plots, the line in blue is representing the K estimation with the buffer zones correction method and the line in red is representing the K estimation with the theoretical value K(r). After adjusting by using the buffer zones method, the blue curve is a little bit narrower than that of the red one in Fig.13. And in Fig.14, we can see a big difference between the blue curve and the red curve, the relationship of the probability of K-function and the distance r is likely in a direct proportional relationship under the help of the buffer zones method.

To sum up from the plots, buffer zones method is much worth in the cluster pattern, as Fig.14 is a cluster pattern data, then we can observe a big difference between the one using the buffer zones method and the theoretical one.

Buffer zones method has some benefits and drawbacks.

For benefits, like buffer zones method can be applied to any shape of the study area, so no need to care about the shape of study area we are considering. As it is though the outer guard area method can be used only when the data from the outside of the study area are available and it is revisable, the inner guard area method can be used only when the data from the outside of the guard area. Buffer zones method can be one of the choices or maybe the only available choice in some situations.

From the suggestion of Ikuho Yamada and Peter A. Rogerson with the plot of Fig.10, the non-correction method may outperform the outer guard method.

Buffer zones method is the simplest one among those edge effect correction methods. But in the reviews, it may be excluded as it is one of the less effective methods, similar to the non-correction method, just like doing the K-function analysis without the help of any edge correction method. And there is an unavoidable drawback result from the information which is given up or abandoned when using the inner guard area method.

4.1.2 The toroidal method

This method is like to flow the study area of the plot to be a cone or the area of the plot is considered as a round wrapped torus, it assumes that the top of the study area and the left of the study area are connected to the bottom of the study area and the right of the study area, then the edges are connected to each other, so it seems without any edges in the study area.

As its shape is like a ring, it will not fall out the ring area and as well as any point can reach any points on the ring area. Moreover, it's a ring shape. Just like a collection a many circles, we can find the longest distance between two points also have the nearest distance between them.

Furthermore points are at the opposite side of the plot are close to each other, so there is no boundary exist and the edge effect problem can be solve by using this toroidal method.

toroidal method is to replicated the original study area eight times around the original study area. It assumes that the point pattern outside the study area is the same as inside or within the study area.

Fig.15 the edge correction methods: toroidal method

As the toroidal edge correction has such properties, and then we can use to eliminate the edge effect.

Let s consider the plots which are using the toroidal method to solve the edge effect problem.

Fig.16 The plot of K estimation with 2 00 points in a fixed area

(Under toroidal method)

Fig. 17 The plot of K estimation with redwood seedling data

(Under toroidal method)

In Fig.16 and 17, the line in red is representing the K estimation with the toroidal correction method and the line in blue is representing the K estimation with the theoretical value K(r).

Using the toroidal method, the red curve is roughly near or parallel to the red one in Fig.13. In Fig.14, it shows a great unlikeness between the blue curve and the red curve and the red curve shows that the edge effect is being corrected and the curve from the theoretical curve (blue one) changed to the toroidal corrected curve (red one) after the toroidal correction method.

To end up, toroidal method is much useful in the cluster pattern and in the small plot size like Fig.2, toroidal; method is fully applicability with all the data in Fig.2 when compared with the buffer zones method by Fig.14 and 17. And as Fig.2 is a cluster pattern, and also it has fewer points compared to that in Fig.1, then Fig.17 shows a greater difference between the theoretical curve and the toroidal corrected curve.

The toroidal method also has benefits and drawbacks.

For benefits, toroidal method is a little bit more effective among those edge correction methods specially designed for each statistic.

For drawbacks, the toroidal method is only suitable for the study area which is in rectangle, and as its assumption, the point pattern outside of the study area and the point pattern inside of the study area are the same, when there is a cluster pattern exists and close to the study area edge, a bias may occur. So when using the toroidal method, we should consider the point pattern phenomenon inside or outside of the study area is under the assumption.

4.1.3 The Ripley's circumference method

This method is depending on the proportion between the distance of the circumference and that of the radius.

Ripley's circumference correction method is proposed by Ripley (1976, 1977, and 1981) and employed by Getis & Franklin (1987) and Anderson (1992) and this method is specially designed for the K-function in the spatial data analysis. It is a weighted edge correction method and its weighted edge-corrected function K (r) which is an approximately unbiased estimator for K (r) is defined as

Where n is the number of events in the study plot and A is the area of the study plot, Ih is a counter variable, dij is the distance between points i and points j, h is distance of an arbitrary point and lastly wij is a weighting factor to correct for the edge effect.

Ripley's circumference correction method is the one using the weighting factor to do the edge correction as the weighting factor wij is represented the proportion of the circumference of the circle of the study area centered at point i and passing through point j which is lies within the study area.

The mathematical expression is somehow complex and messy. Graphical expression is much simpler.

There are three possible spatial relations between the circle and the edge of the rectangular or circular study area which are represented in the following figures.

Fig.18 (a) Fig.18 (b)

Fig.18 (c)

Fig.18 Edge correction methods: Ripley s circumference method

From Fig.19 and 20, the line in red is representing the K estimation with the Ripley s circumference correction method and the line in blue is representing the K estimation with the theoretical value K(r).

After adjusting by using the Ripley s circumference method, the red curve is nearly the same of the blue one in Fig.19. That means with the help of the Ripley circumference method, we can just find a result which is equivalent to the theoretical value K(r) and the result shows that this method is not so effectual on the i.i.d uniform point pattern.

Fig.19 The plot of K estimation with 2 00 points in a fixed area

(Under Ripley s circumference correction method)

Fig. 20 The plot of K estimation with redwood seedling data

(Under Ripley s circumference correction method)

In Fig.20, there is a big difference between the red curve and the blue one, by using of the Ripley s circumference method, the relationship of the probability of K-function and the distance r is nearly in direct proportion.

To conclude, same as the buffer zones method and toroidal method, Ripley s circumference method works worthily which shows in the Fig.18, that is a cluster data pattern.

Using the Ripley's circumference correction method, can help us to avoid the bias originated by the edge effect for the distances which are less than that of the radius of the circle that circumscribes the study area, such as it can reduce the bias as long as distance of the arbitrary points, is less than 70.7% of the side of the area (this information is discovered by Getis (1983)).

The Ripley's circumference correction method is only convenient for the study area which is simple shape like circles and rectangles as an explicit formulae of the weighting factor wij is given (Cressie 1991), but not for any study area in arbitrarily shapes as it is difficult to derive the weighting factor wij.

Even though the Ripley's circumference correction method has drawbacks on it, but it also is an effective correction method, as it has the special weighting factor wij, for the pair of points i and points j, to find the proportion of the circumference of the circle with its centre at point i and passing through point j which contained in the study area.

5 Conclusion

Fig.21 The statistical power of cluster detection (r = 40)

Fig.22 The statistical power of regularity detection (d = 5)

Among the above three edge correction method, and the help of the plot of the statistical power of regularity detection with distance between points to points is equal to 5 which is the result done by Ikuho Yamada and Peter A.Rogerson on the probanbility of detecting clustering and regularity in a clustering and regularity pattern by using the above three edge correction method and also the non-correction method as well, we can find the more effective ones and the least effective ones.

The plot of Fig.21 and Fig.22 are come from the paper which is written by Ikuho Yamada and Peter A.Rogerson.

In Fig.21, is the plot shows the statistical power of clustering detection for the clustering radius r = 40 on the probability of detecting clustering. From the plot of Fig.22, we can know that the statistical power of clustering detection of Ripley's circumference method, toroidal method and the non-correction method are more than 95% of the time up the distance h of an arbitrary point is equal to 15, but that of the outer guard area is less than 88%.

In Fig.22, it is the plot of the statistical power of the regularity detection for the distance h=5 and the distance between points i to points j is 5. The result shown in Fig.22, the Ripley's circumference method, the toroidal method and the non-correction are all appear to be better to the outer guard area method.

To sum up, the Ripley's circumference method and the toroidal method perform much superior than the outer guard area method. Even the non-correction method has the higher power of the clustering/regularity detection than the outer guard area method.

From the results shown in Fig.21 and Fig. 22, we can just conclude that the outer guard area method is the least efficient method and it is better not to use the outer guard area method, just do the spatial data analysis without any edge correction may be can get a inferior result than using the outer guard area method.

There are other correction edges that can be used to correct the point lying outside the study area. This review paper is only a small part of the edge correction method of spatial statistics but it has already broadened my horizon in statistical research.