This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Statistics is according to Merriam-Webster Dictionary 1 a branch of mathematics dealing with the collection, analysis, interpretation, and presentation of masses of numerical data. Various disciplines of statistics deal with mentioned aspects, e.g. mathematical statistics studies statistics from mathematic standpoint of view using many mathematic tools such as probability theory. Statistics is an important tool in obtaining and understanding meaning of countless data in our world.
The earliest writing on statistics was found in a 9th century book entitled: "Manuscript on Deciphering Cryptographic Messages", written by Al-Kindi. In his book, he gave a detailed description of how to use statistics and frequency analysis to decipher encrypted messages; this was the birth of both statistics and cryptanalysis. (Al-Kadi, 1992)
What Is Spatial Statistics
Spatial analysis, as spatial statistics is often called, is a field of study which utilizes various techniques to study topological, geometric and geographic properties of studied subjects. It is used widely in our world from astronomy, with its studies of the placement of galaxies in the cosmos, through geographic data, to chip fabrication engineering, with its use of 'place and route' algorithms to build complex wiring structures.
First statistics with spatially referenced data are tied to maps. Maps as representations of the whole or parts of some areas are models in fact. In 1686 Halley (famous for discovery of Halley's Comet) drew wind directions to analyze their nexus. Bernoulli asked a question in 1734 that if orbits of all the at-the-time known six planets could have been random. In 1970 Watson answered this question for all the at-the-time known nine planets. (Stehlíková, 2001)
In 1935 Sir R. A. Fisher (eugenicist, and geneticist) had started to realize spatial relations in his field experiments and laid the base for randomization and recurrence. Papadakis in 1984 developed nearest neighborhood designs.
Why Spatial Analysis
Most of data intended for any kind analysis is tied to a location in space. Think of medical records, patients have their location history so does epidemics; think of sales records, each sale has to be made at a particular POS thus located in space. Not always is location necessary to evaluate data but if it is, spatial analysis can ease the process.
Overview of History of Spatial Analyses of Crime
Spatial analysis of crime originates during the middle of the 19th century with the work of the early social ecologists in France: Guerry (1833) and Quetelet  (1984). The works included maps analyzing population-based rates of crime, suicide, alcoholism, population age structure, family structure, educational levels, and population diversity in 19th-century French "Departments" (départements). These works are among the earliest examples of social ecological crime studies.
The next phase was taking place in the 1920s within the Chicago School of sociologists. While viewed as not theoretical and in the first place empirical, it is difficult to deny its importance in theoretical developments in community studies and criminology. The social disorganization theory of crime was born from observations of Shaw and McKay (1942). Thrasher's (1927) census of urban street gangs is another important Chicago School's work on crime analysis; Thrasher identified ganglands primarily in transitional parts of the city from residential to commercial areas.
"New Chicago School" and Place-based Crime Theories
Evolution of the field came to recent revival of contemporary ecological studies of crime in the end of 20th century. In 1996 in an annual meeting of the American Society of Criminology was discussed the question "Whither the Chicago School?" which led to revival of Chicago School. Sociological approach evolved into place-based theories of which is most known the routine activities theory (Cohen, et al., 1979) and rational choice theory (Cornish, et al., 1986).
Rise of computerized mapping and spatial analysis techniques in the coming years made possible for scientists analyzing crime in space to utilize geographic information systems.
Curry and Spergel (1988) in their work discover crime to be correlated with poverty and a lack of social control, but violence to be correlated with a certain level of social disorganization. Tita (1999) finds that gangs form in high-crime neighborhoods. On the other hand the arrival of gangs in an area does not change local crime levels, the only worth mentioning exception is a significant increase in shots fired.
Bernard Cohen's (1980) study of street level prostitution in New York City is an example of a blend of quantitative spatial measures and qualitative observational studies. The study finds that streetwalking is present in all levels of income all over Manhattan. However using hand-drawn maps, Cohen identified that there are similarities in the blocks and street corners frequented by prostitutes and their johns; these are the "hotspots" of prostitution activity. Further study of census data and participant observation helped him to discover that there is a lack or absence of young children and young women in these areas, and the local households are mostly made up of single adults and unrelated room-mates. The study also identified prostitution triggers of built environment such as wide streets (to provide flow of johns), types of businesses present and spatial proximity of places suitable for public sex (dark allies, parks or lots).
Cohen's (1980) another work, Deviant Street Networks, might be considered one of the first empirical studies of the spatial and temporal intersection of likely offenders and the properties of place encouraging crime which is proposed by the routine activities theory. One the other hand Cohen's study underscores at specifying the correct size of areal unit of study when examining mainly census tracts. GIS make possible to obtain measures and study areal units at smaller levels of aggregation.
RAT (routines activities theory), first introduced in Cohen and Felson (1979), later refined in Felson (1994), and extended to crime pattern theory in Brantingham and Brantingham (1993), suggests that crimes occurs in space and time based on presence of three factors: 1) a suitable target, 2) a likely offender and 3) the absence of a capable guardian. RAT focuses heavily on victims (targets) and the reasons and means of how they become the targets. A likely offender is individual capable and willing to do a criminal activity. But from perspective of spatial planning is the most interesting the last factor the absence of a capable guardian. A capable guardian is not only a person, but also the space itself, the environment which can be heavily influenced by urban and environmental design. And here come in effect spatial planners and their decisions. For example high-rise housing might seem to provide better oversight because it increases population density, but on the other hand residents live vertically so they are physically removed from monitoring activities in at street level. (Newman, 1972)
Places which lack a capable guardian can take form of facilities attractive to offenders or somehow evoke criminal activity. Also those facilities can be rich in presence of suitable targets, or they might be used for activities increasing risk of occurrence of likely offenders (e.g. alcohol consumption, drug use, drug dealing etc.). These places around and in the facilities are in criminology called crime hotspots.
These conditions leave this type of housing with fairly few place managers who will monitor and control public behavior and seriously limit the levels of informal social control exercised over all forms of disruptive behavior from minor inappropriate activities to more serious illicit activities. Roncek and Francik (1981) find elevated crime levels in and near public housing even after including controls for the composition of the resident population on a variety of attributes. This provides support for a criminogenic role of the facility itself which is independent of the types of people who are found there.
Aside from physical features, crime at places is apparently influenced by the routine activities that occur there thus the name. Crime is not distributed evenly or randomly over space. Instead, higher levels of crime infect places with some types of facilities and not others. In some cases, crimes seem to be elevated by a target-rich environment. For example, thefts of 24-hour convenience stores, auto thefts from large parking lots, or robberies of shoppers in heavily frequented commercial areas. (Engstad, 1975)
In others, certain activities such as alcohol consumption seem to contribute to increased levels of violence Roncek and Bell (1981). Still other places seem to be prone to higher levels of crime because of the types of people they attract and deter. Places with abandoned buildings or neglected housing with absent owners are attractive to illegal drug dealers who are looking for places where they can establish stable marketing locations without fear of owner or neighbor complaints (Eck, 1994).
The concentration of crime in some places was noted in Brantingham and Brantingham (1982). Sherman, Gartin, and Buerger (1989) published one of the first studies to quantify, that crime in a city is highly concentrated in relatively few small areas. The study found that 3.3 percent of street addresses and intersections in Minneapolis generated 50.4 percent of all dispatched police calls for duty. While these studies were done to raise effectiveness in fighting crime they also serve as a proof to the fact that there are strong relationships between crime and place.
Crime studies that examine the spatial distribution of crime clearly demonstrate that certain land uses and population characteristics are associated with crime hotspots. Roncek and Maier (1991) found a positive relationship between levels of crime and the number of pubs located in city blocks in Cleveland.
The influence of pubs on crime was even more apparent when they were located in areas with more anonymity and lower guardianship. Five of the top ten hotspots identified in Sherman, Gartin, and Buerger (1989) included bars. Cohen, Gorr, and Olligschlaeger (1993) found that drug hotspots tended to be in areas with bars, neglected businesses, or areas with poverty and low family cohesion.
Skogan and Maxfield (1981) discovered that environmental conditions such as abandoned buildings, public incivilities, disorderly youths, broken windows or other forms of vandalism, public drug use or drinking, prostitution, loitering, noise, litter, and obscene behavior increase community fear of crime. "Broken windows" and other public signs of disorder signal that a community has lost its ability to exercise social control, further encouraging crime (Wilson et al., 1982). This suggests that crime hotspots may arise at first as concentrations of "soft" crimes that later turn into more serious crimes.
Crime hotspots can be predictable concentration of crimes but also random. In case the hotspots occur randomly there is obviously no connection to the place itself but they are rather induced by other sources. Therefor it is crucial to identify hotspots carefully and also identify their correlation with crime. (Anselin, et al., 2000)
Analyses Utilizing GIS and Spatial Econometrics
Hotspots are very important tool to analyze crime patterns in space, but they themselves contribute very little to understanding of crime motives and reasons behind crime concentration in hotspots. Very important is to determine whether a hotspot is a real concentration or only a coincidence.
In spatial econometrics it is very important to sort out random clusters of data and clusters of data which are actually correlated. Human mind is very weak at this task because of its ability to find order and meaning in observations which are in point of fact random. A great example of such is cloud watching when individuals look for patterns or pictures in clouds. Because of that simple visual interpretation of the map is inadequate. (Rheingans and Landreth, 1995)
A crime hotspot is a location, or small area within an identifiable boundary, with a concentration of criminal incidents. These places where crime is concentrated at high rates over extended periods of time may be analogous to the small percentage of chronic offenders who are responsible for a large percentage of crime. (Anselin, et al., 2000)
First to study life cycle of crime hotspots were Sherman (1995) and Spelman (1995), they described crime hotspots in terms of processes like initiation, growth, crime-type hardening or escalation in crime seriousness, persistence, decline, displacement, and termination.
How to Identify a Hotspot
First we need to divide the whole area of a study into grid cells or other areal units similar in size. There are to approaches we can use: 1) fixed boundaries and 2) ad hoc. Fixed boundaries (e.g., census tracts, police precincts, or uniform grid cells) have advantage that they can be used with time series and are more easily recognizable, but actual hotspots might cross boundaries of our units. Ad hoc areal units might better reflect actual size and location of hotspots but are hardly incorporated into models. (Anselin, et al., 2000)
Each cell in the grid can become hotspot based on threshold values which we have defined. So if the number of crimes of certain designated type exceeds the fixed threshold values then the cell is declared hotspot for the current time period we are examining. This is our rule base.
Sherman and Weisburd (1995) identified hotspots in Minneapolis, Minnesota, of no more than one linear block of a street-an area in which a police officer can easily see and be seen. Hot times (temporal crime hotspots) were between 7:00 p.m. and 3:00 a.m. In Jersey City, New Jersey, hotspots were defined by intersections and the four connected street blocks, and hot times were from noon to midnight (Weisburd and Green 1994).
Hotspot Modeling and Analysis
Understanding the relationship between place and crime requires knowledge of the dynamics of hotspot development over space and time, with special attention to the ways that a location's facilities and utilization contribute to criminal behavior. This sort of knowledge can be derived from combining theory with empirical research. (Anselin, et al., 2000)
The life cycle of hotspots includes various stages of development, the duration of time spent in each stage, and probabilities of transitions between the stages. To better understand a hotspot we require space and time data of crime and its variables for a sample of cities. Those data should include a consistent rule base for identifying hotspots at different stages of life cycle. Then we will have a better basis for distinguishing random occurrences from systematic hardening of soft-crime hotspots to more serious crimes. (Anselin, et al., 2000)
After we have our hotspots described we proceed to prediction through models. As we have acknowledged variety of soft crimes (e.g., vandalism and public order disturbances) leads to serious crimes like assault and robbery. Predictive models of leading indicators require dependent variable (e.g. number of acts of public alcohol consumption per month) with precursor leading indicator variables such as littering in prior months or in contiguous areas.
The Vector Autoregression (VAR) model is a common time series model for estimating and testing leading indicators. VAR models are extensively used since 1980s. In the models the variable we are predicting is explained by its own past values and past values of all other variables (leading indicators) in the system (Holden, 1995). The Bayesian Vector Autoregression (BVAR) model is a form of VAR which incorporates Bayesian methods where the model parameters are treated as random variables, and prior probabilities are assigned to them. (Wikipedia contributors, 2012)
BVAR first introduced by Litterman (1980, 1986) relies on Bayes' estimates of priors to overcome collinearity and degrees of freedom problems that typically arise in applications of vector autoregressive models. Doan, Litterman, and Sims (1984) introduced the so-called Minnesota priors for BVAR. LeSage and Pan (1995) introduced spatial contiguity to further specify the priors in regional studies. BVAR models have been successful in time series analysis and forecasting models for regional data, especially in exploratory analyses of the appropriate time- and space-lagged model specifications (LeSage 1989, 1990; LeSage and Pan 1995).
Granger and Newbold (1977) introduced rules and tests for a weak form of causality testing based on VAR and relative to the limited information set of variables used. Now known as "Granger causality," Factor A "Granger causes" B if Lag A is a significant predictor of B, but Lag B is not a significant predictor of Enders (1995) presents a standard F-test to determine Granger causality.
Exploratory Spatial Data Analysis
In 1990s Luc Anselin defined exploratory spatial data analysis (ESDA). ESDA is a collection of techniques to describe and visualize spatial distributions; identify atypical locations or spatial outliers  ; discover patterns of spatial association, random clusters, or hotspots; and suggest spatial systems or other forms of spatial heterogeneity. (Anselin, 1994)
Cliff and Ord (1973, 1981) both reviewed large number of spatial statistics and other map summaries in the classic treatments of spatial autocorrelation, which were produced by the interest in quantification of patterns in maps. Detection of clusters and outliers in maps is a major concern in epidemiology and medical statistics (Marshall, 1991). The presence or absence of pattern in a map is indicated by the concept of spatial autocorrelation, which is the coincidence of similarity in value and similarity in location. So when high values in a place tend to be associated with high values in contiguous areas and vice versa or with low values then there is a positive spatial autocorrelation or spatial clustering. For comparison when high values at a location are surrounded by nearby low values, or vice versa, negative spatial autocorrelation is present in the form of spatial outliers. To apply spatial autocorrelation properly we need a point of reference which is spatial randomness. For example, under spatial randomness, the particular arrangement of crimes on a given map would be just as likely as any other arrangement, and any grouping of high or low values in a particular area would be totally spurious. (Anselin, et al., 2000)
Point pattern analysis
The formal assessment of the presence and extent of spatial autocorrelation depends on the type of data under consideration. The simplest situation is when only the location of a given phenomenon is known (for example, the street addresses where burglaries occurred). In this situation, the primary interest lies in assessing whether these locations, abstracted as points on a map, are seemingly randomly scattered across space, or instead, show systematic patterns in the form of clusters (more points are systematically closer together than they would be in a purely random case) or dispersion (more points are systematically further away from each other than under randomness). Point pattern analysis is concerned with detecting when "significant" deviations from spatial randomness occur. (Anselin, et al., 2000)
To determine whether our data is or is not random clusters we need to test random spatial location of individual observations (points). Methods for such analysis are of two types: 1) based on multiplicity of observations in a quadrant or a cell and 2) based on proximity. (Stehlíková, 2001)
Quadrat count method is based on Pearson's chi-squared test of goodness-of-fit for Poisson distribution of multiplicity of observations in a cell. This technique is easily carried out in a GIS by overlaying a square grid on our points of sample data, each cell of the grid becomes a quadrant. Disadvantage of the method is that quadrant size is determined by individual preference of an observer and the possibility of correlation between counts in nearby cells (spatial autocorrelation). (Anselin, et al., 2000)
Kernel estimation is an extension of the quadrat approach, in which a smooth estimate is derived by means of a moving window of fixed size over the data. The window becomes a kernel. The average is taken as an indicator of the intensity of the event at that location (e.g., how many violations per square kilometer). A particular implementation of this technique consists of drawing many overlapping circles of variable sizes and assessing the extent to which "clusters" may be present.
Kernel estimation or kernel smoothing is one method for examining large-scale global trends in point data. The goal of kernel estimation is to estimate how event levels vary continuously across a study area based on an observed point pattern for a sample of points (Bailey and Gatrell 1995). Kernel estimation creates a smooth map of values using spatial data. The smoothed map appears like a spatially based histogram, with the level at each location along the map reflecting the point pattern intensity for the surrounding area.
In kernel estimation, a moving three-dimensional function of a given radius or "bandwidth" visits every cell of the overlaid grid of study area. As the kernel visits each cell, distances are measured from the center of the grid cell to each observation falling within the set bandwidth. Each distance contributes to the intensity level of that grid cell, with greater weight given to observations lying closer to the center of the cell
Bandwidth is crucial because it determines the amount of smoothing applied to a point pattern. In general, a large bandwidth will result in a large amount of smoothing, producing a fluid map with low intensity levels. A smaller bandwidth results in less smoothing, producing a spiky map with local variations in intensity levels. Ideally, bandwidth should represent the actual distance between the points in the distribution. However, there is no persistent rule for determining bandwidth. (Anselin, et al., 2000)
By transforming spatial point patterns of criminal incidents into a smooth image kernel estimation visualizes areas of criminal activity and risk. Kernel estimation offers some practical benefits in the field of the spatial analysis of crime. The first one is accessibility. Kernel estimation allows analysts to visually simplify and examine complex point patterns of criminal incidents.
The value of Kernel estimation lies within the fact that a map is displayed to user as a fluid image and density of criminal activities are more easily verified and examined. On the other hand if a user would see a map full of point data, very often overlaying each other (data overload), it would be hard for them to read and examine it. Also kernel estimation allows analysis of hotspots crossing administrative boundaries. As mentioned before fixed boundaries (e.g. of a grid) make it possible to analyze sample data over time in addition to space. Density images can be compared for consecutive or corresponding time periods (e.g., the same month or year-to-date comparisons in successive years). These provide a context for interpreting short-term changes in relation to long-term trends and seasonal patterns. Kernel smoothed maps also help to reveal the larger spatial context of changes over time.
Methods based on proximity (distance methods) are based upon precise information of proximity of either nearest neighbor (nearest neighbor statistics) or of a random point to a location of an observation. (Stehlíková, 2001)
Their properties are either derived or approximated analytically, or, more interestingly, based on a computational approach. The latter consists of simulating the location of the same number of points as in the dataset (e.g., the total number of burglaries in a given year) by randomly assigning locations, thus mimicking the null hypothesis of spatial randomness. For each of the simulated patterns, the value of the statistic (or statistics) can be computed, thus yielding a reference distribution to which the statistic for the observed pattern can be compared. This provides an intuitive and highly visual way to assess the degree of non-randomness in a point pattern. For example, this can be applied to the empirical cumulative distribution function for the nearest neighbor distances for each point, or to all the distances between points. (Anselin, et al., 2000)
Nearest neighbor statistics have been extended to test for clusters in space and time. For example, the Knox statistic (Knox, 1964) consists of counting how many pairs of events are closer in space and time than would be the case under randomness.
The application of these methods to criminal activity is straightforward. The techniques discussed so far address so-called "general" levels of clustering (or, global spatial autocorrelation) in the sense of assessing the extent to which spatial randomness can be rejected. In many instances, it is interesting to locate "where" the clusters may be present. For example, one may be interested in finding out if the clusters center on particular locations of crime-inducing facilities, such as liquor stores or 24-hour convenience stores. Such tests are referred to as "focused" tests (Besag and Newell 1991) and relate the number of points in a cell (or counts of events) to the distance from a "putative source." Again, the general principle underlying these tests is that deviations from spatial randomness would yield a higher frequency of points close to the supposed source (putative source).
Previous paragraphs dealt with point pattern analysis, but besides point data there are areal data as regions or census tracts.
A fundamental concept in the analysis of spatial autocorrelation for areal data is the spatial weights matrix. This is a square matrix of dimension equal to the number of observations, with each row and column corresponding to an observation. A wide range of criteria may be used to define neighbors, such as binary contiguity (common boundary) or distance bands (locations within a given distance of each other), or even general "social" distance. (Anselin, et al., 2000)
The spatial weights matrix is used to formalize a notion of locational similarity and is central to every test statistic. In practice, spatial weights are typically derived from the boundary files or coordinate data in a geographic information system (Can, 1996).
Viewed from a more technical standpoint, almost every test for "global" spatial autocorrelation can be expressed as a special case of a general cross-product or "gamma" statistic (Hubert 1985, 1987; Hubert, Golledge, and Costanzo 1981).
This statistic consists of a sum of cross-products between two sets of terms, one related to the similarity in value between two observations, the other to their similarity in location, or, Î“=Î£iÎ£j a ij.wij. In this expression, the a ij term corresponds to value similarity, such as a cross product, x i.xj, or a squared difference (xi-xj)2, while the wij are elements in a spatial weights matrix. Inference for this general class of statistics is based on permutation. Specifically, a reference distribution is constructed that simulates spatial randomness by arbitrarily rearranging the values observed in a given map over the available locations and recomputing the statistic for each of these random arrangements.
Classic test statistics for spatial autocorrelation are the join count statistic, Moran's I and Geary's c (Cliff and Ord 1973). The join count statistic is appropriate when the data are binary, for example, the presence (coded B for black) or absence (coded W for white) of an item by city block. The number of times neighboring spatial units also have B in common is called a BB join count.
The tests are based on the extent to which the observed number of BB joins (or, WW, BW) is compatible with a null hypothesis of spatial randomness. Similarly, when the data are variables measured on a continuous scale (such as crime rates or counts of homicides), Moran's I and Geary's c statistics measure the deviation from spatial randomness. Moran's I is a cross-product coefficient similar to a Pearson correlation coefficient and scaled to be less than one in absolute value. Positive values for Moran's I indicate positive spatial autocorrelation (clustering), while negative values suggest spatial outliers.
In contrast to Moran's I, Geary's c coefficient is based on squared deviations. Values of Geary's c less than one indicate positive spatial autocorrelation, while values larger than one suggest negative spatial autocorrelation. Adjustments to Moran's I to account for the variance instability in rates have been suggested in the epidemiological literature, for example, the Ipop statistic of Oden (1995). Extensions of Moran's I to a multivariate setting are outlined in Wartenberg (1985).
When variables are used in standardized form (that is, their mean is zero and standard deviation one), the degree of spatial autocorrelation in a dataset can be readily visualized by means of a special scatterplot, termed Moran scatterplot in Anselin (1995, 1996). The Moran scatterplot is centered on the mean and shows the value of a variable (z) on the horizontal axis against its spatial lag (Wz, or Î£j wij z j; i.e., a weighted average of the neighboring values) on the vertical axis. The four quadrants in the scatterplot correspond to locations where high values are surrounded by high values in the upper right (an above mean z with an above mean Wz), or low values are surrounded by low values in the lower left, both indicating positive spatial autocorrelation. The two other quad-rants correspond with negative spatial autocorrelation, or high values surrounded by low values (high z, low Wz) and low values surrounded by high values (low z, high Wz). The slope of the linear regression line through the Moran scatter-plot is Moran's I coefficient. Moreover, a map showing the locations that correspond to the four quadrants provides a summary view of the overall patterns in the data. Hence, this device provides an intuitive means to visualize the degree of spatial autocorrelation, not only in a traditional cross-sectional setting, but also across variables and over time (Anselin, 1998). Recent illustrative examples of the application of these concepts in homicide studies can be found in Sampson, Morenoff, and Earls (1999) and Cohen and Tita (1999).
An alternative perspective on spatial autocorrelation for data available at discrete locations (points, areas) is to consider these as sampling points for an underlying continuous surface in a geostatistical approach. For example, crime statistics by police station would be used to estimate a continuous crime surface for the whole city. The primary interest in this paradigm lies in spatial interpolation, or kriging. The measure of spatial autocorrelation is taken to be a function of the squared difference between the values for each pair of observations compared with the distance that separates them. Formally, this is carried out in a variogram (or, more precisely, a semi-variogram).
One visualization of the variogram consists of a scatterplot of the squared differences organized by distance band, possibly with a box plot for each distance band-a variogram cloud plot or variogram box plot (Cressie, 1993). Another visualization focuses on each distance lag separately, in a spatially lagged scatterplot (Cressie, 1984). The mean or median in the variogram cloud plot for each distance band suggests an overall pattern for the change in spatial autocorrelation with distance, and a focus on outliers indicates pairs of observations that may unduly influence this central tendency (Anselin 1998, 1999a).
Local indicators of spatial association-LISA statistics
The measures of spatial autocorrelation reviewed so far are general, or global, in the sense that the overall pattern in the data is summarized in a single statistic. Paralleling the focused tests of point pattern analysis, local indicators of spatial association (LISA) provide a measure of the extent to which the arrangement of values around a specific location deviates from spatial randomness. Closely related to the focused tests, the G i and G I * statistics of Getis and Ord (1992); (Ord, a iní, 1995) measure the extent to which the concentration of high or low values within a given distance band around a location deviates from spatial randomness. These statistics are designed to find clusters of high or low values. They can be applied to each location in turn or to using increasing distance bands away from a given location. A general frame-work for LISA is outlined in Anselin (1995), where local forms are derived for several global statistics, such as the local Moran and local Geary statistics.
The local Moran is closely related to the Moran scatterplot and indicates the presence of local clusters or local spatial outliers. LISA statistics lend themselves well to visualization by means of a GIS, for example, in symbol maps that show the locations with significant local statistics. In addition, when combined with a Moran scatterplot, the locations with significant local Moran can be classified in terms of the type of association they represent. Estimations of spatial autocorrelation routines to test for spatial autocorrelation are found in a wide range of special purpose as well as commercial software. Other recent reviews can be found in Legendre (1993) and Levine (1996). Most of these software implementations are specialized and contain one or a few statistic(s).
Modern computational implementations of exploratory data analysis are based on the paradigm of dynamically linked windows, in which the user interacts with different "views" of the data on a computer screen. The views typically consist of standard statistical graphics such as histograms, box plots, and scatterplots, but increasingly include a map as well. The dynamic linking consists of allowing an analyst who uses a pointing device (mouse) to establish connections between data points in different graphs, highlight (brush) subsets of the data and rotate, cut through, and project high dimensional data. Geographical data can easily be included in this framework when viewed as x, y points in a standard scatterplot.
The techniques of exploratory analysis reviewed in the previous section are extremely useful in assessing the existence and location of nonrandom local patterns in spatial data. However, they are also limited by the lack of mechanisms to "explain" the observed patterns. EDA and ESDA are exploratory by nature. They "suggest" potential associations between variables and elicit hypotheses, but the formal testing of these hypotheses is left for confirmatory analysis, typically carried out by means of multivariate regression modeling
In the specific context of criminal justice, regression analysis plays a crucial role in the attempts to explain the causes of criminal activity. Until recently, the role of space (and space-time) was not explicitly acknowledge-edged in the methodology used in these studies, but it is central in a number of respects. For example, it is well known that urban crimes such as theft and bur-glary, as well as most categories of violent crimes, are likely to be spatially concentrated in low-income urban areas that have relatively high proportions of unemployed persons and racial minorities. This spatial concentration will tend to result in spatial autocorrelation, which runs counter to the usual assumption of independence in regression analysis. In addition, law enforcement efforts (Chambliss, 1994) and gang activity vary spatially, strongly suggesting the need for an explicit spatial perspective (Roncek, 1993) and the consideration of spatial heterogeneity (spatial structural change). A spatial perspective is further motivated by the findings of large-scale spatial differences for various crimes (for example, urban, suburban, and rural as reported in the Federal Bureau of Investigation's Uniform Crime Reports as well as the Bureau of Justice Statistics' semiannual National Crime Victimization Survey). This in turn has prompted a search for spatial mechanisms such as proximity and diffusion to explain these phenomena (Tolnay, Deane, and Beck 1996; Morenoff and Sampson 1997; Sampson, Morenoff, and Earls 1999).
The Challenge of Spatial Effects
In most of these studies, the regression analysis employs data for cross-sectional units, such as census tracts or counties. As is now increasingly recognized, in this instance specialized methods of spatial regression analysis (spatial econometrics) must be used to avoid potentially biased results and faulty inference (Anselin 1988; Anselin and Bera 1998). This is due to the presence of spatial effects, consisting of spatial dependence and spatial heterogeneity, which violate the basic assumptions underlying classical regression analysis. Statisticians have long been aware of the problems associated with analyzing spatial (geo-graphical) data, but spatial statistical techniques did not disseminate into the empirical practice of the mainstream social sciences until recently (Anselin, 1999b).
The motivation for the explicit incorporation of spatial effects in regression models that explain criminal activity is twofold. On the one hand, crime and enforcement data are readily geocoded, but the spatial scale of observation does not necessarily match the spatial scale of the process under study. For example, the occurrence of certain types of crimes, say dealing in illicit drugs, may be explained by socioeconomic variables and land use data collected at the block level. However, if the illicit drug trading zone for a given group covers multiple blocks, the data for several units of observation will be correlated. Similarly, if the un-modeled variables such as "social capital" or "sense of community" spill over across multiple units of observation, a spatial correlation of these "errors" will result. Hence, the concern with accounting for the presence of spatial autocorrelation in a regression model is driven by the fact that the analysis is based on spatial data for which the unit of observation is largely arbitrary (such as administrative units). The methodology focuses on making sure that the estimates and inference from the regression analysis (whether for spatial or a-spatial models) are correct in the presence of spatial autocorrelation.
On the other hand, much recent theoretical work in urban sociology, economics, and criminology has emphasized concepts related to the "interaction" of agents, such as copycatting, social norms, neighbor-hood effects, diffusion, and other peer group effects. These theories focus on questions of how individual interactions can lead to emergent collective behavior and aggregate patterns. Here, the need for an explicit spatial model is driven by theoretical concerns and the interest lies in a correct specification of the form and range of interaction and the estimation of its strength.
Spatial statistical techniques
The two different motivations for consideration of spatial effects in regression models lead to methods to handle spatial dependence as a nuisance (data problems) versus substantive spatial dependence (theory driven). Formally, this results in techniques to model spatial dependence in the error terms of the regression model or to transform the variables in the model to eliminate spatial correlation (spatial filtering), versus methods to explicitly add a spatial interaction variable as one of the regressors in the model. Common to all methodological approaches is the need to rigorously express the notion of "neighbor effects," which is based on the concept of a spatial weights matrix, discussed previously. A spatially explicit variable takes the form of a "spatial lag" or spatially lagged dependent variable, which consists of a weighted aver-age of the neighboring values. More precisely, the spatial lag of a dependent variable at location i, yi , would be Î£j wijyj , where the weighted sum is over those "neighbors" j that have a nonzero value for element w ij in the weights matrix (or, in general, the weight is wij). For practical purposes, the elements of the spatial weights matrix are often row-standardized, which facilitates interpretation and comparison across models.
A typical specification of a linear regression equation that expresses substantive spatial interaction (or spatial autocorrelation) is the mixed regressive, spatial autoregressive model, or spatial lag-model. This includes, in addition to the usual set of regressors (say, xi, the regressive part), a spatially lagged dependent variable Î£j wij yj, (the spatial autoregressive part), with a spatial autoregressive coefficient Ï.
The inclusion of a spatial lag term is similar to a temporal autoregressive term in time series analysis, although there are several important differences that require a specialized methodology for estimation and testing. The interpretation of the spatial lag model is best illustrated with a simple example. Say we were interested in explaining the crime rate by the usual socioeconomic variables as well as by a police intervention measure, and assume that the data are collected at the census-tract level. The spatial lag would capture the average crime rate for neighboring tracts. This measure of "potential" crime is one way to formalize the spatial interaction in the model. Therefore, the significance and value of the autoregressive coefficient have a direct interpretation as an indication of the strength of the spatial interaction. In our example, the estimate for Ï would suggest to what extent the crime rate in each census tract is "explained" by the average of the neighbors. There are two potential pitfalls in this interpretation. First, the spatial lag does not "explain" anything (similar to a time lag in time series), but instead is a proxy for the simultaneity in the whole system. This is best seen in a formal way, but for the sake of simplicity can be thought of as a spatial multiplier. After transforming the model to reduced form, so only "exogenous" variables remain on the right-hand side of the equation, it follows that the value of y at each location (e.g., the crime rate) depends not only on the explanatory variables for that location (the x i), but also on these variables at all other locations, suitably adjusted to reflect the effect of distance decay. In our example, the presence of a spatial multiplier implies that a change in police intervention at one location (census tract) not only affects the crime rate at that location, but at all other locations in the system as well (suitably decayed), hence the notion of a multiplier.
The second problem is due to the use of aggregate entities, such as census tracts or counties, as observational units. The interpretation of the autoregressive term as an indication of "interaction" between units can easily lead to an "ecological fallacy." This follows from the fact that these units are not social agents themselves, but only aggregates (averages) of individual behavioral units. Drawing inferences for individual behavior from relations observed at the aggregate level can only be carried out under a strict set of assumptions (essentially imposing extreme homogeneity), which is clearly unwarranted in the current context.
An alternative interpretation is that the spatial lag model allows for filtering out the potentially confounding effect of spatial autocorrelation in the variable under consideration. The main motivation for this is to obtain the proper inference on the coefficients of the other covariates in the model (the Î²). For example, spatial autocorrelation of the lag variety may result from a mismatch between the spatial extent of the criminal activity and the census tract as the spatial unit of observation.
From an estimation point of view, the problem with this model is that the spatial lag term contains the dependent variables for neighboring observations, which in turn contain the spatial lag for their neighbors, and so on, leading to simultaneity (the spatial multiplier effect mentioned previously). This simultaneity results in a nonzero correlation between the spatial lag and the error term, which violates a standard regression assumption. Consequently, ordinary least squares (OLS) estimation will yield inconsistent (and biased) estimates, and inference based on this method will be flawed. Instead of OLS, specialized estimation methods must be employed that properly account for the spatial simultaneity in the model. These methods are either based on the maximum likelihood (ML) principle, or on the application of instrumental variable (IV) estimation in a spatial two-stage, least-squares approach.
In contrast to the lag model, there are a number of ways to incorporate the spatial autocorrelation into the structure of the regression model error term. The most commonly used models are based on spatial processes, such as a spatial autoregressive (SAR) or spatial moving average (SMA) process, in parallel to the time series convention.
The particular form for the process yields a non-diagonal covariance structure for the errors, with the value and sign of the off-diagonal elements corresponding to the "spatial correlation" (that is, the correlation between the error terms at two different locations). An interesting aspect of this correlation structure is the range of interaction that is implied. For a SAR process, every error term is correlated with every other error term, but the magnitude of the correlation follows a distance decay effect. In other words, the implied interaction is global, as in the spatial multiplier of the spatial lag model. In contrast, the SMA process yields local interaction, where only first and second order neighbors have a nonzero correlation. Since this pertains to the error terms in a model, or the "ignored" or "unmeasurable" effects, the two specifications also have different policy implications. For example, if there were an unmeasurable "neighborhood" effect in our model of crime, the SAR specification would imply that change in this effect in one location affects all the locations in the system, whereas in an SMA specification this change would only affect the immediate neighbors. However, more precisely, these measurement errors only pertain to the precision of the estimates, and "on average" their impact is zero on the predicted crime, in contrast to the spatial multiplier in the lag model, in which shocks pertaining to the regressor (X) are transmitted throughout the system.
In space, the error variances are also heteroskedastic, which is not the case in the time domain. The heteroskedasticity is induced by the spatial process and will complicate specification testing (i.e., distinguishing "true" heteroskedasticity from that induced by a spatial process). This is an important distinction between the spatial error processes and their covariance structure and the time series counterpart. An alternative approach to handling spatial processes is to specify the magnitude of the spatial error covariance as a function of the distance that separates pairs of observations.
This "direct representation" approach is inspired by geostatistical modeling and lends itself well to spatial forecasting (or interpolation). In contrast to the spatial process models, there is no induced heteroskedasticity. However, for the direct representation approach to yield a valid covariance (e.g., to avoid negative variances), a number of restrictive assumptions must be satisfied.
The estimation of spatial error models falls under the generic category of regression models with nonspherical error variance. Technically, a form of generalized least squares will be applied, although in contrast to the time domain, there is no simple two-step estimation procedure. Instead, an explicit maximum likelihood approach or generalized moment technique must be followed.
In these methods, the coefficient of the spatial model is considered a "nuisance" parameter in the sense that it improves the precision of the estimates for the regressors (Î²), but in and of itself is of little interest.
Compared with spatial dependence, spatial effects in the form of spatial heterogeneity can be handled in a fairly straightforward way with standard econometric models. The resulting heteroskedasticity, varying coefficients, or structural instability is only distinct in the sense that the specification of the heterogeneity is in terms of spatial or regional differences (e.g., different crime rates in central city versus suburb). However, because spatial heterogeneity often occurs jointly with spatial dependence (or the two are observationally equivalent), explicit consideration of the latter is required in empirical applications. Examples of techniques that address spatial heterogeneity are spatial analysis of variance (Sokal et al. 1993), spatially varying coefficients as some form of hierarchical linear modeling in the spatial expansion method (Jones and Casetti 1992; Casetti 1997), locally different regression coefficients in the spatial adaptive filter (Foster and Gorr 1986; Gorr and Olligschlaeger 1994), geographically weighted regression (Brunsdon, Fotheringham, and Charlton 1996; McMillen and McDonald 1997), and the correction for spatial outliers by means of Bayesian techniques (LeSage, 1997); (LeSage, 1999).
When observations are available for a cross-section at different points in time, in the form of panel data, it becomes possible to model complex combinations of spatial heterogeneity and spatial dependence.
For example, different model coefficients can be specified for different subregions and/or different time periods; the spatial autoregressive coefficients can be allowed to vary over time, etc. The types of methods appropriate for addressing such models consist of seemingly unrelated regressions, error components, and Bayesian approaches, in conjunction with a spatial lag or spatial error dependence. Overviews of the methodological issues are given in Anselin (1988), (1990b), (1999b) and LeSage (1995).
In practice, the most important aspect of spatial modeling may well be specification testing. In fact, even if discovering spatial interaction of some form is not of primary interest, ignoring spatial lag or spatial error dependence when it is present creates serious model misspecification. Of the two spatial effects, ignoring lag dependence is the more serious offense, since, as an omitted variable problem, it results in biased and inconsistent estimates for all the coefficients in the model; and the inference derived from these estimates is flawed.
When spatial error dependence is ignored, the resulting OLS estimator remains unbiased, although it is no longer most efficient. Moreover, the estimates for the OLS coefficient standard errors will be biased, and, consequently, t-tests and measures of fit will be misleading.
Overview of Classification of Open Spaces
The violations against public order which are studied in this document are always placed in public spaces therefore to fully assess the connections and dependencies among the violations and their placement we need to understand the nature of the open public spaces they take place at. There are many approaches to classification of public spaces according to various authors. Below are mentioned the approaches which are relevant to study area and to the used analysis methods.
In general public spaces can be classified by following criteria:
the ratio of paved areas
Open urban spaces are in general divided into private, semiprivate and public. (Blaha, et al., 1986)
The private ones are usually in direct touch with housing and the access is restricted only to certain individuals (e.g. house owners, their guests). Among such spaces belong gardens, balconies, loggias, rooftop gardens etc.
Semiprivate open spaces are according to Ruland (2003) usually separated from public spaces. According to RakÅ¡ányi (JakuÅ¡ová, 2010) semiprivate space can be classified as such also if there are special conditions to be fulfilled or if there is a special mode of when and how to access these places.
Public open spaces are spaces under the open sky or inside which are accessible to the public. (Spitthöver, et al., 2002) According to Komrska public space is also space with limited access because of maintenance and security reasons. RakÅ¡ányi: Public spaces are areas for inhabitants and visitors to gather and communicate without any obstacles, either of distance, time or assets. (JakuÅ¡ová, 2010)
By morphology we can divide open spaces into linear (e.g. streets, river banks, avenues, boulevards, shopping arcades, cycle paths etc.) and areal (e.g. squares, parks etc.).
spaces adjunct to housing
spaces in 100m from housing
spaces 10 to 15 minutes of walk distance from housing
(Laage, et al., 1977)
By the Ratio of Paved Areas
with the dominance of unpaved areas (i.e. natural areas, e.g. parks, urban greeneryâ€¦)
with the dominance of paved areas
According to Giseke (2007):
zone level (e.g. park in residential area)
city level (e.g. city park)
city part level (e.g. forests and natural reservations)
local level (e.g. local park)
By Prevalent Function
The following classification is by Marques (The rehabilitation of Mouteira Housing Quarter Though Public Space Design, 2009):
recreation and personal contact - mainly intended for young people and elders, e.g. playgrounds, sport facilities etc.
edge spaces - informal recreation, places where natural features form open spaces attractive to people
pedestrian spaces net - connections between different places, mainly intended for transportation
Classification by Hudeková (Verejné priestory a zeleÅˆ, 2010):
parks and pocket parks
other open spaces
public spaces, roads
historical open spaces
spaces with public service
production and industrial facilities
public ownership (accessible to all inhabitants and visitors)
private ownership (university complex, community parks and gardens, corporate plazas) with set conditions of accessibility