Data mining for Terrorist Tracking
After 9/11 attack, US government has invested billions of dollars fo use of technology to fight with terrorism. There initiative is to dig into travel, communication, finance and other personal records of peoples to trace the terrorist activities. Data mining as name says is about searching the databases and finding out the relevant information. US agencies are using this concept of Data mining to track the terrorist. Agencies Data mining projects like Pathfinder, Terrorist Information Awareness (TIA), Multi-State Anti-Terrorism Information Exchange (MATRIX) etc. consist of digging and analyzing both Private and public data to detect the suspicious activity. Private sector includes industries like Banking, Insurance, and Finance. This companies has there Database for storing the customer information. Public sector data consist of Databases of government agencies like Motor Vehicle, Social Security etc. Therefore, by using this Data and Data mining technology, government can track the terrorist activities like phone communication, money transfer, immigration activity, email etc.
Though there are certain advantages of applying the Data mining technique for tracking the terrorist, there are some issues, which need to consider. First issue related to Data quality, Data mining techniques could significantly affect by due to inaccurate and insufficient or incomplete Data for example duplicate records, lack of data standards, a timeliness update of Data and Human error. Second issue is Interoperability, interoperability using same working standards for different databases and data mining software. Different sources for data like banking, insurance, travel companies, legacy databases has different database systems and for government's initiatives for tracking the terrorist, need to analysis and search through many databases simultaneously. Third issue is privacy, is the Data collected or searched from multiple databases used for purpose other than that for which the data is originally collected? This question leads to the security of individual personal information. Availability of accessing the private data can make government to use it for other purposes like tracking the small criminals or keeping eye on transactions of individuals for finding tax frauds.
This paper first introduces the data mining technology. Next section discusses the government's data mining efforts and programs. Followed by the section consist of issues in the use of data mining techniques to detect the terrorist activities and then the right approach to handle those issues.
Introduction to data mining
The goal of any data mining exercise is the extraction of meaningful intelligence, or knowledge from the patterns that emerge within a database after it has been cleaned, sorted and processed (Oscar H. Gandy, Jr, 2002). Important information is summarized with help of data mining. Basically, data mining tools are used by the organizations such as finance, banking, insurance etc. for study historical as well as current organization data from many different dimensions or angles to make important decisions those help organization for increasing the revenue, cost reduction and increased sales. Data mining process consist of following steps-
Identifying the data-
For data identification it is necessary to consider for what purpose data is collected. It is important to understand what kind of information is required to solve the particular business problem, before identifying the data sources. For example, to study customer behavior, data for customer support data bases and from sales department can be considered.
Preparing the data-
As the data is collected from many sources and loaded into the data model, inconsistencies of the data in different sources and errors such as data entry errors affects the quality of the data. Therefore, it is important to prepare consistant and clean data for data model.
Building a model-
There are many data modeling techniques, used to build the data model. This phase selects and applies modeling techniques and calibrates their parameters to optimal values (Linacre J.M. Rasch, 2001, p. 826-7). Selection of technique depends upon form of data as most of the modeling techniques have specific requirements about form of data.
Evaluating the model-
This process consists of reviewing and analyzing the model and steps taken to build the model. Basic approach of this step is to identify the problems and issues those are not considered during the process.
Deploying the model-
Deployment process consists of queering the data model to generate the reports those converts the source data in to important information for business.
Data mining for terrorist Tracking
After the terrorist attack on September 11, 2001, government was looking for advanced approach to gather and analyze the intelligence that helps to provide way to collect and use information. Data mining was previously used by government for detecting the financial fraud, tax frauds etc. they identified the potential of Data mining technique has the potential to track the terrorism. After 9/11 attack government invested in several projects those deals with collection and analysis of private and public data to track the future terrorism. The Government Accountability Office (GAO), at the request of Senator Akaka, released a report in May 2004 that reviews the various data mining initiatives of 52 executive branch agenciesOf the 199 data mining projects listed, 122 collect and store personal information, and 54 “mine” data from the private sector (Shannon R. Anderson).
Nowadays, use of databases to store the data is become necessary for all the public services as well as the businesses. These databases store the information of every transactions and activities done. Accessing the government's public data such as data from motor vehicle, social security, criminal records is not enough for government for tracking the terrorist. Therefore, it is important to access private or commercial data where chances of detecting the terrorist activity or suspicious activity are more. So by accessing the private data such as information from insurance company of the individual's insurance policy for example name on the policy, credit card used for payment etc from the insurance companies or information from travel agency such as air ticket, name on the air ticket, payment method used etc. Now, if this private or commercial information is combined with government data, with the data mining techniques data discrepancies such as same person traveled using different names can be detected by credit card number or person is staying at the hotel using fraud name can be tracked easily.
Data mining efforts of government consist of data integration, analysis and result output. In data integration phase, data from federal agencies databases such as FBI, Immigration etc, data from public databases such as Motor vehicle, social security etc. and data from private sector databases such as finance companies, insurance companies, air travel, traveling agencies etc. are integrated into the common data warehouse.
Fig: Data mining for terrorist tracking
Source: GAO-05-866 Data Mining
In the analysis phase, data stored in data warehouse is queried based on the type of information needed. Two types of queries are used mostly, one is pattern based and other is subject based. Pattern queries are used to analyze the different or suspicious records those do not match with the predefined record patterns. Subject based quires are used to search information related to predefined records using unique identifier such as name of person, social security, driver's license number. The output results are displayed to agency personnel's and can be analyzed by them from many dimensions or angles to have invaluable information.
Data mining programs
After the 9/11 the government started a several programs for tracking the terrorists, out of these projects fourteen analyze the intelligence information and seven out of fourteen mine the personal information (Shannon R. Anderson). These seven programs are listed and discussed next
- TIA (Terrorist Information Awareness program)
- CAPPS II (Computer Assisted Passenger Prescreening System)
- Matrix (The Multistate Anti-terrorism Information Exchange System)
- SEVIS (The Student and Exchange Visitor Information System)
- US-VISIT (The U.S. Visitor and Immigrant Status Indicator Technology program
- NSEERS (The National Security Entry-Exit Registration System Program
- Verity K2 Enterprise
1. TIA -
After 9/11, concerns about national security cause the Defense Advanced Research Projects Agency (DARPA) to start a information awareness office (IAO). IAO comes out with the total information awareness program which is useful for advance actions, national security warning, and national security decision making. The main intension of the TIA program is to build a centralized database that stores the private transactional data of the peoples such as records of bank transactions, credit card purchases, flight bookings, e-mails, websites etc. To identify and differentiate the suspicious activity, government establish a standard pattern as what is suspicious activity for example to many airline bookings, drastic changes in buying habits, large amount of banking transactions etc and then identifying these activities by using pattern based queries to dig the huge database.
2. CAAPS II -
CAAPS is a rule based system that specially implemented for security and terrorist checks during the air travels. This system uses the PNR information that is passenger name record consist of passenger full name, birth date, home and business address, phone, email, credit card number, type of meal requested to identify his religion. And it also uses commercial information related to that passenger. On the basis of this information authority selects passengers who need additional screening. This system also compares the names of the passenger in each flight with the suspicious terrorist names to avoid foreign terrorist to enter in the US.
3. MATRIX -
Similar to TIA and CAAPS, MATRIX program is step taken by government as a reaction to the September 11 attack. MATRIX is developed by a Florida based IT company called ‘Seisint'. Matrix consist of the database that stores information gathered from various states such as drivers license data, criminal records, vehicle registration records, photographs, fingerprints etc. Like TIA, MATRIX can run pattern-based queries, which “seek information about people, places, and things based on patterns of activity, none of the components of which might on its own arouse suspicion or be in any way improper(Mary DeRosa, March 2004). MATRIX was actually used in post 9/11 investigation and several suspects are arrested based on the list of suspects resulted from the MATRIX.
SEVIS program started in late 2001, consist of Internet database that stores and keep track of the foreign students and exchange visitors during their stay in United States. Data about students consist of Name, address in both native country and US, date of birth, dependents, passport, university, type study etc. Primary purpose of this data was to track the immigration violations but it is now also used for detecting the criminal activities and terrorist activities. SEVIS program required that all the universities and schools to provide information of the foreign students such as they are attending the classes, their progress in studies etc to identify the suspicious activities.
This program requires US visitors and non-immigrants to be fingerprinted and photographed during their entry in United States airport. This information that is fingerprints and photographs are compared with the available information of the terrorist.
National security entry exit registration system consist of registering the entry exit information and prescreening the information for suspicious activities before entry and exit of non-immigrant visitor who is male, 18 years and old, and from country which is considered as high risk country for terrorism. According to the Department of Homeland Security, “the program has collected detailed information about the background and purpose of an individual's visit to the United States, the periodic verification of their location and activities, and departure confirmation.”
Verity K2 Enterprise-
Verity K2 Enterprise mines data from the intelligence community and internet searches to identify foreign terrorists or U.S. citizens connected to foreign terrorism activities (GAO report). This program combines the personal information, information from private or commercial sectors and information from several government agencies to mine data for detecting terrorist activities.
Issues in use of Data mining technique for Terrorist tracking
Though US government claims that data mining is effective technique for tracking the terrorist activities, there are certain concerns which need to be considered. Following are the important issues in using the data mining for terrorist tracking
1. Data quality -
Data quality is one of the major challenges for effective output of the data mining technique. Data quality refers to the correctness and completeness of the data. As in most of the data mining programs for terrorist tracking, data is collected from different sources and integrated into data warehouse. All these source databases are designed to serve the specific purpose therefore, different databases represents the same data in different way hence produces the data inconsistencies and errors. This affect the data quality of the data warehouse hence the wrong information output for data mining techniques. For example, name of the particular person stored in government database and in insurance company database can be stored as two different records in to the data warehouse because different way of representation of name in those two databases or spelling mistake during entering that name in database. This can lead to the wrong information as that person can be tracked as suspicious because of the using two names. Due to the wide range of possible data inconsistencies and the sheer data volume, data quality is considered to be one of the biggest problems in data warehousing (Erhard Rahm and Hong Hai Do, Dec 2000).
Inerrability is the dissimilarity or lack of standardization of the different databases and software's for data mining. Interoperability refers to the ability of a computer system and/or data to work with other systems or data using common standards or processes and interoperability is a critical part of the larger efforts to improve interagency collaboration and information sharing through e-government and homeland security initiatives (Jeffrey W. Seifert, October 2004). Some of the databases for data mining are still running on the legacy systems and the some companies are using advance databases which create interoperability problems such as schema or structure variations of the databases.
Collection and sharing of the information of the individual citizens without any probable cause or suspicious act by government investigation agencies make every innocent American citizen as suspect for terrorism. Government counter terrorism programs are gathering personal data such as credit card information, e-mails send and received, phone calls which seriously affecting civil liberties. Most of communities or classes of the people such as ‘Muslims community' are victimized as most of the terrorist are Muslims. Because of the similar names, religion, color, look to the terrorist, innocent peoples are getting arrested or investigated for nothing. The aim is not to critique or suggest particular legal frameworks or new laws, nor to criticize specific structures or programs, but to try to identify core privacy concerns that might be addressed through the application of certain guiding principles both to the development of these technologies and to the policies governing their use (K. A. Taipale). There is also threat of using the personal data for purpose other than the terrorist tracking for example personal information can be used to find the small thief's or other personal investigation purposes. Efforts to fight terrorism can, at times, take on an acute sense of urgency and this urgency can create pressure on both data holders and officials who access the data (Jeffrey W. Seifert, October 2004)
Right approach for issues in Data mining use for terrorist tracking
As there are numbers of the issues in the way of the data mining technique for counterterrorism, a right approach can help to give solution to these issues. Data quality issues are due to the human errors, different databases such as legacy system and some advance systems but it is difficult to avoid the human errors and changing the all legacy databases into advance. Therefore, proper techniques such as data cleaning or data cleansing techniques should be used to clean the data and improve the data quality of the data warehouses. Data cleaning, also called data cleansing or scrubbing, deals with detecting and removing errors and inconsistencies from data in order to improve the quality of data((Erhard Rahm and Hong Hai Do, Dec 2000). Quality of data input can be improved by proper data cleaning technique that includes the data analysis and tool support to remove the dirty data. Similar to data quality, Interoperability problems are also due to the schema differences of the different source databases. Many organizations those use data mining for business intelligence uses the tools such as ETL tools that support both schema integration and data cleaning purposes. Schema integration is done during the transformation process. Data transformations deal with schema/data translation and integration, and with filtering and aggregating data to be stored in the warehouse (Erhard Rahm and Hong Hai Do, Dec 2000). Therefore, data collected from different source can be transformed and cleaned to use for mining and tracking the terrorist.
Considering the privacy, it is important to think what is important? Security of the nation or privacy of individual? I think security of nation is most important but privacy of the individual also needs to be considered. I support most privacy advocates those argue in favor of creating clearer policies and exercising stronger oversight. Government should create guidelines and standards to use the private information of the individual. Government need to be more transparent about the use of private information. US citizens should be informed about the policies, guidelines stated. In addition, it should be explained to the citizens that how and what their information is used. Government should handle a private data with adequate control and oversight it should be take care that the private information is in safe hands and used for only purpose it gathered for. Patterns designed to identifying the terrorist should be designed properly so that chances of innocent people get trapped should be less. I think with the government's proper approach and support of the US citizens towards data mining techniques for terrorist tracking, privacy issue can be eliminated.
This paper discusses the data mining technology used for detecting the terrorist activities by digging the public as well as private data of the individuals. Paper describes the data mining process and data mining programs started for terrorist tracking. Then the main issues in the use of data mining technology and right approach to overcome those issues are listed.
Data mining is powerful tool for us government and intelligence agencies to detect the terrorist activities and to avoid the future terrorist attack. Data mining programs started by the government has the capabilities to identify the future terrorist activities and suspicious activities. Data quality, Interoperability and privacy are the three main issues in the data mining technique. These issues can be overlooked with the right approach and technology support. Therefore, I think use of data mining technology should not be stopped and should continue to do surveillance job for nation
Jeffrey W. Seifert (October 2004), Data mining and the search for security: Challenges
For connecting the dots and databases, Congressional Research
Service, Library of Congress, Washington, DC
K. A. Taipale, Data mining and domestic security: connecting the dots to make sense
of data, SCIENCE AND TECHNOLOGY LAW REVIEW
GAO Report (August 2005), DATA MINING: Agencies Have Taken Key Steps to
Protect Privacy in Selected Efforts, but Significant Compliance
Issues Remain, United States Government Accountability Office
Shannon R. Anderson, Total Information Awareness and beyond: The Dangers of
Using Data Mining Technology to Prevent Terrorism,
Bill of Rights Defense Committee
Oscar H. Gandy, Jr. (July 2002), Data mining and surveillance in the post-9.11
environment, , University of Pennsylvania, presentation to the
Political Economy Section, IAMCR, Barcelona
Mary DeRosa (March 2004), Data Mining & Data Analysis for Counterterrorism, Center
for Strategic & International Studies Report,
Linacre J.M. Rasch (Fall 2001), Data Mining and Rasch Measurement CRISP-DM
Measurement Transactions, p. 826-7
Erhard Rahm and Hong Hai Do( Dec 2000), Data Cleaning: Problems and Current
Approaches, IEEE Computer Society Vol 23 No 4
Bhavani Thuraisingham, Data Mining, National Security, Privacy and Civil Liberties,
The National Science Foundation, Arlington, VA
Volume 4, Issue 2
Diane M. Strong, Yang W. Lee, and Richard Y. Wang (May 1997), Data Quality in
Context, COMMUNICATIONS OF THE ACM, Vol. 40, No. 5 103
Vladimir Estivill-Castro and Chris Clifton (2002), Privacy, Security, and Data
Mining, Proceedings of the ICDM 2002 Workshop
Stephen E. Fienberg (2006), Privacy and Confidentiality in an e-Commerce World: Data
Mining, Data Warehousing, Matching and Disclosure Limitation,
Vol. 21, No. 2, 143-154
Dianne Daniel (Jun 9, 2000), Data warehouse cleaning: getting things in order,
Computing Canada; ABI/INFORM Global, pg. 13
Marc L Songini, Feb 2, 2004, ETL, Computerworld; ABI/INFORM Global, pg. 23