Data mining is a process of extracting knowledge from huge amount of data stored in databases, data warehouse and data repositories. Crime is an interesting application where data mining plays an important role in terms of crime prediction and analysis. This paper presents detailed study on clustering techniques and its role on crime applications. This also helps the law enforcers in better investigation and crime prediction.
Key words: Crime data mining, crime data analysis, clustering.
In recent years, volume of crimes lead to serious problems throughout the world. Now-a- days criminals have maximum use of modern technologies and hi-tech methods which serve up criminals to commit crimes at an immense measure. The law enforcers have to effectively meet out challenges of crime control and maintain public law and order. Hence, creation of a data base for crimes and criminals is required. Data mining techniques have higher influence in the fields such as law-and-enforcement, narcotics, cyber crime, human trafficking and high-tech crimes. Crime data mining has been applied in the law and enforcement to retrieve the criminal details and useful information automatically, using named entity-extraction method. In this method, each word is compared with the noun phrases and the binary value either zero or one will be generated which indicates the match or mismatch of the name.
Intelligence agencies and university of Arizona collaborated COPLINK project and applied crime data mining in two-dimensions as crime types and security concerns to analyses crime and criminals and face challenges of law-enforcement problems of massive data bases from police narrative records . Suspects give details to police investigations in order to confuse and spoil the proceedings of the investigation. During the time of investigation, comparison is needed to find the differences between real entities and deceptive entities. One of the distance measurement methods is Euclidean distance method which is applied to calculate the distance between pairs of the real and deceptive entity and this distance gives the deceptions accurately at the time of detection . Hence, data mining techniques and clustering algorithms have been developed for better crime analysis which leads to the prediction of crimes in future.
The organization of the paper is as follows. Section II discusses some researches and applications on crime data analysis. Section III defines the role of data preprocessing in crime data mining. Section IV presents various clustering methods on crime domain and Section V discusses the conclusion and future work.
II Related work
Recent developments in crime control applications aim at adopting data mining techniques to aid the process of crime investigation. One of the earlier projects COPLINK, was teamed with Artificial Intelligence Lab of Arizona University, the police departments of Tuscon , Phonix solving crime and criminal network analysis . Brown et al. proposed a framework for regional crime analysis (ReCAP), which was built to provide crime analysis with both data fusion and data mining techniques. Data mining steps involved in crime investigations are: collection of crime data from multiple data sources such as police narrative records, criminal background information which consists of previous investigation files and police arrest records are used to diagnose whether a suspect was involved in any earlier cases. If it so, verdict clues from past annals featuring the suspect and it avails the investigators to preside in the case.
Using crime data mining techniques, most required information has been extracted from the vast crime databases which are maintained by NCRB (National Crime Record Bureau) for locating â€œcrime hot-spotsâ€Â. This helps the law enforcers to predict the crimes and to prevent in the near-future. Nath et al. has proposed k-means clustering technique with some enhancements to aid the process of identification of crime patterns. Semi-supervised learning technique for knowledge discovery has also been further developed which helps to increase the predictive accuracy . J.S. de Bruin, K.Cocx and Kosters et al. have applied clustering techniques for the analysis of crimes and criminal carriers based on four salient factors such as crime nature, frequency, duration and severity of crime. Binary (BCS) and transformed (TCS) categorical methods are similarity based methods used to find the similarity of corresponding attributes between real and deceptive entities from the crime records. Ozgul et al. recently suggested a crime prediction model on crime details like location, date of the incident and mode-of-operandi of events against terrorists which have not been solved. An enhanced Ak-mode algorithm called a weighted clustering algorithm which consists of two-phases to extract similar case subsets from large number of crime datasets.
III. Role of data preprocessing in crime data mining
Data preprocessing techniques are mainly used for producing high-quality mining results. Raw data are being preprocessed before mining because data are in different format, collected from various sources and stored in the data base and data warehouses. Major steps involved in crime data mining are data cleaning, data integration, data transformation and data reduction.
Fill in missing crime data value.
Smoothing crime data
Removing outliers of crime data.
Resolve inconsistent crime data.
Merging of crime data from multiple data storages.
Crime data normalization.
Crime Attribute subset selection.
Dimensionality reduction of crime attributes
Data mining process
Fig. Data Preprocessing steps in crime data mining
Crime data have been collected from different sources such as police narrative records, criminal profiles, case histories and log files. In the data cleaning step, missing values are filled, noisy data are smoothened ,outliers data are removed and inconsistent data are resolved. Data integration step undergoes merging of crime data. Data normalization and attribution construction are done in the data transformation for standardizing data. When standardization of crime data, the data range falls under 0.0 to 1.0. Attribute subsets are selected from crime dataset and dimensionality has reduced. After preprocessing, finally standard data underwent the process of mining and hence better results are obtained.
IV. Application of clustering methods in Crime
The Clustering methods play an important role on crime applications. Some of the clustering techniques highlighted are k-means clustering, Ak-mode algorithm and other similarity methods. After preprocessing, the operational crime data are undergoing the clustering techniques for grouping the nature of crimes as different clusters. In this process, lots of unsolved crimes are also grouped together. The next step of clustering is to identify the significant or decisive attribute. This may from case to case.ie. one of the cases may need the age group of victim as decisive attribute and it is very important in a murder cases.
The k-means clustering is one of the basic partition clustering techniques. The objects of similar crime cases are grouped together and are very dissimilar when compare to other groups. This algorithm mainly used to partition the clusters based on their means. Initially number of crime cases are grouped and specified as k clusters. The mean value is calculated as the mean distance between the objects. Then number of iteration are done until the convergence occur. The iterative process of weighing attributes and crime types, future crime patterns can be detected by the detectives or analysts. Unsolved crimes are clustered based on decisive attribute and the results are given to the investigators to proceed the case further. This k-mean is applicable only for numerical attributes and it is not applicable to categorical attributes.
Ak- mode clustering technique is used for categorical attributes. In this technique there are two steps such as attribute weighting phase and clustering phase. Weights of the attributes are computed using Information Gain Ratio (IGR) value for each attribute. The greatest value of weight is taken as decisive attribute. The distance between two categorical attributes are computed by finding the differences between two cases give the similarity measures. The analyst has set the threshold value Î± with the help of the computation result of similarity measures.
Finally binary and transformed categorical similarity methods are discussed for finding similarity measures. In the data bases, attribute values are either numerical or categorical i.e. either quantitative or qualitative. In the quantitative (numerical), the difference between two attributes are calculated as the direct difference between those two values of attributes. In the case of qualitative(categorical), the difference between two attributes are calculated as binary values as 0 or 1. If there is a match than attribute value will be 1 or 0 if it is not. This method is named as binary categorical method (BCS). In the transformed categorical similarly (TCS) method, the similarity table has created for all the attributes and the differences between those attributes value will be calculated. This difference gives the similarity measures. Hence various clustering techniques are used to identify the crime patterns which helps the crime analysts to proceed the cases further.
V. Conclusion and future work
Crime data were under various data preparing steps i.e cleaned the data, resolved inconsistent data and outliers are removed. Grouping crime data objects of clustering was needed to identify crime patterns which support crime analysts and law-enforcers to proceed the case in the investigation and help solving unsolved crimes faster. Similarity measures is an important factor which helps to find unsolved crimes in crime pattern. K-means, Ak-mode and other similarity methods such as binary categorical and transformed categorical methods were used to find the similarity measures of attributes which are very much needed to the crime analysts and police enforcers to solve unsolved crimes.
In future, some of the enhancements should be done in the existing algorithms to get an accurate results. There should be some improvement in finding similar case subsets that will be a good direction for solving crimes easily. Finally, challenge of setting threshold value without crime analyst may be an important task in future.