This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Defeating terrorism requires a more nimble intelligence apparatus that operates more actively within any country and makes use of advanced information technology. Data-mining and automated data-analysis techniques are powerful tools for intelligence and law enforcement officials fighting terrorism. Data-mining and automated data-analysis techniques are powerful tools for intelligence and law enforcement officials fighting terrorism. But privacy concerns with the use of these tools have generated significant fear and controversy. These tools are too valuable to be rejected outright. On the other hand, embracing them without any guidelines or controls for their use poses a great risk that they, and the private information they analyse, will be misused. Policymakers must acquire a greater understanding of data-mining and automated data-analysis tools and craft policy that encourages responsible use and sets parameters for that use.
Key words - Terrorism, Data analysis, Information technology, False positives, Knowledge.
Defeating terrorism requires a more nimble intelligence apparatus that operates more actively within any country and makes use of advanced information technology. Data-mining and automated data-analysis techniques are powerful tools for intelligence and law
enforcement officials fighting terrorism. But these tools also generate controversy and concern. They make analysis of data- including private data-easier and more powerful. Data mining and data analysis are simply too valuable to prohibit, but they should not be embraced without guidelines and controls for their use. Policymakers must acquire an understanding of data-mining and automated data-analysis tools so that they can craft policy that encourages responsible use and sets parameters for that use.
As almost everyone now recognizes, the fight against terrorism requires the government to find new approaches to intelligence gathering and analysis . At the same time, advances in technology provide new opportunities to collect and use information. "Data mining" is one technique that has significant potential for use in countering terrorism. Data-mining and automated data-analysis techniques are not new; they are already being used effectively in the private sector and in government. They have generated concern and controversy, however, because they allow the government far greater ability to use and analyse private information effectively. This makes private data a more attractive and powerful resource for the government and increases the potential for government intrusion on privacy. Recent high-profile government programs that would explore or employ data-mining and data-analysis techniques for counterterrorism have caused public concern and congressional action, but the debate has not always been fully informed. Resolving this debate intelligently and rationally is critical if we are to move forward in protecting both our security and our liberties.
Policy on data mining and related techniques that impact privacy should not rely solely on prohibition. Policymakers must make informed decisions about how to oversee and control government use of private information most effectively when using these techniques.
II WHAT IA DATA MINING?
A good description of what data mining does is: "discover useful, previously unknown knowledge by analysing large and complex" data sets. Data mining is one step in a broader "knowledge-discovery" process. Data mining itself is a relatively narrow process of using algorithms to discover predictive patterns in data sets . The process of applying or using those patterns to analyse data and make predictions is not data mining. A more accurate term for those analytical applications is "automated data analysis," which can include analysis based on pattern queries (the patterns can be developed from data mining or by methods other than data mining) or on less controversial subject-based queries. The term "data mining" is often used casually to refer both to actual data mining and the application of automated data-analysis tools. Both sets of techniques are relevant to counterterrorism, and this paper addresses both. It is important also to understand what these terms do not include. Data mining and automated data-analysis tools are not for locating and retrieving pieces of data in databases that might have been hard to find.
One of the first problems with "data mining" is that there are varying understandings of what the term means.
"Data mining" actually has a relatively narrow meaning: it is a process that uses algorithms to discover predictive patterns in data sets. "Automated data analysis" applies models to data to predict behaviour, assess risk, determine associations, or do other types of analysis. The models used for automated data analysis can be based on patterns (from data mining or discovered by other methods) or subject based, which start with a specific known subject.
There are a number of common misconceptions about these techniques. For example, data mining and data analysis do not increase access to private data. Data mining and data analysis certainly can make private data more useful, but they can only operate on data that is already accessible.
Another myth is that data mining and data analysis require masses of data in one large database. In fact, data mining
and analysis can be conducted using a number of databases of varying sizes. Although these techniques are powerful, it is a mistake to view data mining and automated data analysis as complete solutions to security problems. Their strength is as tools to assist analysts and investigators. They can automate some functions that analysts would otherwise have to perform manually, they can help prioritize attention and focus an inquiry, and they can even do some early analysis and sorting of masses of data. But in the complex world of counterterrorism, they are not likely to be useful as the only source for a conclusion or decision. When these techniques are used as more than an analytical tool, the potential for harm to
individuals is far more significant.
Automated data-analysis techniques can be useful tools for counterterrorism in a number of ways. One initial benefit of the data-analysis process is to assist in the important task of accurate identification. Technologies that use large collections of identity information can help resolve whether two records represent the same or different people. Accurate identification not only is critical for determining whether a person is of interest for a terrorism-related investigation, it also makes the government better at determining when someone is not of interest, thereby reducing the chance that the government will inconvenience that person.
Subject-based "link analysis" uses public records or other large collections of data to find links between a subject-a suspect, an address, or other piece of relevant information-and other people, places, or things. This technique is already being used for, among other things, background investigations and as an investigatory tool in national security and law enforcement investigations. Pattern-based analysis may also have potential counterterrorism uses. Pattern-based queries take a predictive model or pattern of behaviour and search for that
pattern in data sets. If models can be perfected, pattern-based searches could
provide clues to "sleeper" cells made up of people who have never engaged in activity that would link them to known terrorists.
Perhaps the most significant concern with data mining and automated data analysis is that the government might get it wrong, and innocent people will be
stigmatized and inconvenienced. This is the problem of "false positives"-when a
process incorrectly reports that it has found what it is looking for. With these tools, a false positive could mean that because of bad data or imperfect search
models a person is incorrectly identified as having a terrorist connection. A related concern is "mission creep"-the tendency to expand the use of a controversial technique beyond the original purposes. Use of controversial tools may be deemed acceptable given the potential harm of catastrophic terrorism, but there will then be a great temptation to expand their use to address other law enforcement or societal concerns ranging from the serious to the trivial.
This Google-type function is important but separate. Automated data-analysis tools find previously unknown knowledge through links, associations, and patterns in data. Also, these tools are not for discovering just any knowledge; they are used to discover useful knowledge in data. It is possible to find an endless number of patterns and associations in masses of data; many will be statistically significant, but they will not have any real world significance. An essential and sometimes extremely difficult aspect of data mining and automated data analysis is finding the patterns and associations that have value-the ones that actually mean something. There are two general ways to use automated data analysis: by following subject-based queries or pattern-based queries. Subject-based queries start with a specific and known subject and search for more information. The subject could be an identity-a suspect, an airline passenger, or a name on a watch list, for example-or it could be something else specific, like a place or a telephone number. A subject-based query will seek more information about and a more complete understanding of the subject, such as activities a person has engaged in or links to other people, places, and things. It will also provide leads to other subjects that can be investigated. "Link analysis" is a type of subject-based query that is already in use in the private sector and in government. Subject-based queries are not related to "data mining," but they do fall into the category of automated data analysis.
Pattern-based queries involve identifying some predictive model or pattern of behaviour and searching for that pattern in data sets. These predictive models can be discovered through data mining, or they can come from outside knowledge-intelligence or expertise about a subject. However the patterns are obtained, the process involves looking for occurrences of these patterns of activity in data. Probably the most well-known use of pattern-based searching involves credit card fraud. Banks search databases of credit card transactions, some of which are known to be fraudulent, and determine, through data mining or otherwise, the patterns of fraudulent activity. A simple example of such a pattern is use of a stolen credit card for a small purchase at a gas station-done to confirm whether the card is valid-before making a very significant purchase. The banks then use these patterns to identify fraudulent activity in databases of ongoing credit card transactions and take steps to stop that activity. Another long-standing use of pattern-based queries is by the U.S. Treasury Department's Financial Crimes Enforcement Network (FinCEN) to detect money-laundering activity. FinCEN looks at databases of financial data and identifies patterns of previous known cases of money laundering. For example, money laundering often involves people injecting large amounts of money into the financial system in small increments, under the guise of an existing business, and then using that money to import overpriced goods, so that the money flows out of the United States. None of these steps independently would necessarily be suspicious, but the whole pattern is consistent with money laundering. FinCEN looks for these patterns in data that exists in a variety of databases and uses the information it collects in its enforcement activities.
Both subject-based and pattern-based queries have the potential to be useful in counterterrorism, but we are currently farther along in our ability to deploy subject-based queries effectively in the counterterrorism realm. Moreover, subject-based queries raise somewhat fewer policy difficulties because they are more like the kinds of inquiries that are common in intelligence and law enforcement practice; that is, they are developed from a particularized suspicion or reason for interest and seek additional information. Pattern-based queries are less familiar in the law enforcement and intelligence worlds in that they do not arise from a particular interest in a person, place, or thing. Instead, they seek information about people, places, and things based on patterns of activity, none of the components of which might on its own arouse suspicion or be in any way improper.
III WHY DATA MINING FOR COUNTERTERRORISM?
Although all traditional intelligence collection methods remain important, understanding the terrorists and predicting their actions requires us to rely more on making sense of many small pieces of information.
The September 11, 2001, attacks illustrate this point. Even in hindsight, we can see no single source-other than perhaps an extraordinarily well-placed human asset-that could have provided the full or even a large part of the picture of what was being planned. We have seen a number of clues, however, that if recognized, combined, and analysed might have given us enough to track down the terrorists and stop their plan. Therefore, although we must still focus on improving our ability to collect human and other traditional sources of intelligence, our edge now will come more from breadth of access to information and quality analysis.14 For counterterrorism, we must be able to find a few small dots of data in a sea of information and make a picture out of them.
Data-mining and automated data-analysis techniques are not a complete solution. They are only tools, but they can be powerful tools for this new intelligence requirement. Although intuition and continual hypothesizing remain irreplaceable parts of the analytic process, these techniques can assist analysts and investigators by automating some low-level functions that they would otherwise have to perform manually. These techniques can help prioritize attention and provide clues about where to focus, thereby freeing analysts and investigators to engage in the analysis that requires human judgment. In addition, data mining and related techniques are useful tools for some early analysis and sorting tasks that would be impossible for human analysts. They can find links, patterns, and anomalies in masses of data that humans could never detect without this assistance. These can form the basis for further human inquiry and analysis.
One initial potential benefit of the data-analysis process is that the use of large databases containing identifying information assists in the important task of accurate identification. More information makes it far easier to resolve whether two or more records represent the same or different people. For example, an investigator might want to determine whether Hari Krishna boarding a plane is the same person as the H. Krishnan on a terrorist watch list or the H. Krishna that shared a residence with a suspected terrorist. If the government has only names, it is virtually impossible to resolve these identities for certain; if the government has a social security number, a date of birth, or an address, it is easier to make that judgment accurately. The task of identity resolution is far easier to perform when there are large data sets of identifying information to call on. Not incidentally, identity resolution also makes the government better at determining when a person in question is not the one suspected of terrorist ties, thereby potentially reducing inconvenience to that person.
A relatively simple and useful data-analysis tool for counterterrorism is subject-based "link analysis." This technique uses aggregated public records or other large collections of data to find links between a subject-a suspect, an address, or other piece of relevant information-and other people, places, or things. This can provide additional clues for analysts and investigators to follow.
Link analysis is a tool that is available now and is used for, among other things, background checks of applicants for sensitive jobs and as an investigatory tool in national security and law enforcement investigations.
IV 9/11 ATTACKS
A hindsight analysis of the September 11 attacks provides an example of how simple, subject-based link analysis could be used effectively to assist investigations or analysis of terrorist plans. By using government watch list information, airline reservation records, and aggregated public record data, link analysis could have identified all 19 September 11 terrorists-for follow-up investigation-before September 11 2001.
The links can be summarized as follows:
IV.1 DIRECT LINKS-WATCH LIST INFORMATION
Khalid Almihdhar and Nawaf Alhazmi, both hijackers of American Airlines (AA) Flight 77, which crashed into the Pentagon, appeared on a U.S. government terrorist watch list. Both used their real names to reserve their flights.
Ahmed Alghamdi, who hijacked United Airlines (UA) Flight 175, which crashed into the World Trade Center South Tower, was on an Immigration and Naturalization Service (INS) watch list for illegal or expired visas. He used his real name to reserve his flight.
IV.2 LINK ANALYSIS-ONE DEGREE OF SEPARATION
Two other hijackers used the same contact address for their flight reservations that Khalid Almihdhar listed on his reservation. These were Mohamed Atta, who hijacked AA Flight 11, which crashed into the World Trade Center North Tower, and Marwan Al Shehhi, who hijacked UA Flight 175.
Salem Alhazmi, who hijacked AA Flight 77, used the same contact address on his reservation as Nawaf Alhazmi.
The frequent flyer number that Khalid Almihdhar used to make his reservation was also used by hijacker Majed Moqed to make his reservation on AA Flight 77.
Hamza Alghamdi, who hijacked UA Flight 175, used the same contact address on his reservation as Ahmed Alghamdi used on his.
Hani Hanjour, who hijacked AA Flight 77, lived with both Nawaf Alhazmi and Khalid Almihdhar, a fact that searches of public records could have revealed.
IV.3 LINK ANALYSIS-TWO DEGREES OF SEPARATION
Mohamed Atta, already tied to Khalid Almihdhar, used a telephone number as a contact number for his reservation that was also used as a contact number by Waleed Alshehri, Wail Alshehri, and Abdulaziz Alomari, all from AA Flight 11, and by Fayez Ahmed and Mohand Alshehri, both from UA Flight 175.
Public records show that Hamza Alghamdi lived with Saeed Alghamdi, Ahmed Al Haznawi, and Ahmed Alnami, all hijackers of UA Flight 93, which crashed in Pennsylvania.
IV.4 LINK ANALYSIS-THREE DEGREES OF SEPARATION
Wail Alshehri was roommates with and shared a P.O. Box with Satam Al Suqami, an AA Flight 11 hijacker Ahmed Al Haznawi lived with Ziad Jarrah, a UA Flight 93 hijacker.
Thus, if the government had started with watch list data and pursued links, it is at least possible that all of the hijackers would have been identified as subjects for further investigation. Of course, this example does not show the false positives-names of people with no connection to the terror attacks that might also have been linked to the watch list subjects.
Pattern-based data analysis also has potential for counterterrorism in the longer term, if research on uses of those techniques continues. Data-mining research must find ways to identify useful patterns that can predict an extremely rare activity-terrorist planning and attacks. It must also identify how to separate the "signal" of pattern from the "noise" of innocent activity in the data. One possible advantage of pattern-based searches-if they can be perfected-would be that they could provide clues to "sleeper" activity by unknown terrorists who have never engaged in activity that would link them to known terrorists. Unlike subject-based queries, pattern-based searches do not require a link to a known suspicious subject. Types of pattern-based searches that could prove useful include searches for particular combinations of lower-level activity that together are predictive of terrorist activity. For example, a pattern of a "sleeper" terrorist might be a person in the country on a student visa who purchases a bomb-making book and 50 medium-sized loads of fertilizer. Or, if the concern is that terrorists will use large trucks for attacks, automated data analysis might be conducted regularly to identify people who have rented large trucks, used hotels or drop boxes as addresses, and fall within certain age ranges or have other qualities that are part of a known terrorist pattern. Significant patterns in e-mail traffic might be discovered that could reveal terrorist activity and terrorist "ringleaders." Pattern based searches might also be very useful in response and consequence management. For example, searches of hospital data for reports of certain combinations of symptoms, or of other databases for patterns of behaviour, such as pharmaceutical purchases or work absenteeism might provide an early signal of a terrorist attack using a biological weapon.
V THE PROCESS
Although there are obvious potential benefits of data-mining and automated data analysis techniques, it is important to have an understanding of the process used in those practices and the risks of error and intrusions on privacy.
V.1 GATHERING AND PROCESSING THE DATA
The first step for data mining and data analysis is identifying, gathering, and processing the data that will be analysed . To do this requires first identifying what the analysis is intended to discover and the type of data that will be useful. This is not always a simple task. For data mining, researchers have developed techniques for "active learning" that can find data that would be useful to collect. The data mining process itself will often assist in identifying kinds of data that are not useful. One common myth about data mining and automated data analysis is that they require data to reside in one large database. Typically, data for data mining have been combined into a single database, called a data warehouse or data mart, for mining. There are advantages to this approach-it allows for more efficient searching and for easier standardization and cleansing of the data-but it is not necessary. Data mining can be conducted over a number of databases of varying sizes, provided that certain very low size thresholds are exceeded to provide statistical validity. The same is true for automated data analysis.
The final step in this first phase is transforming the data to make them useful . This is often referred to as "data aggregation." This step involves gathering the data, "cleansing" them to eliminate redundant and other unusable data, and standardizing them to make searches more accurate. When done well, this process has a significant positive impact on the quality of the data-mining or data-analysis product because it reduces data errors such as false positives and false negatives.
One goal of transforming data for data mining is identity resolution-determining whether disparate identity records all represent one individual or different people. Some high-quality practices for cleansing and standardizing identity data have been developed, including "name standardization" and "address hygiene." Name standardization takes name data and recognizes alternate spellings, misspellings, language variations, and nicknames. Many names, like Viswanathan, can be spelled a number of different ways. Viswanathan can be vishwanathan, or Viswanath or even Biswanath and many more. Name standardization causes Viswanathan, vishwanath and the other alternatives to be considered as Viswanathan, making it possible to match names that might not otherwise appear to be the same. Address hygiene performs a similar function for address data. The more information that is introduced into the process of cleansing and standardizing identity data, the more effective that process becomes. For example, if all you have is three names that are similar, but not identical, you cannot say for sure that they are the same. If for each name you have additional information-social security number, address, or telephone number-you are more likely to be able to resolve whether the names represent the same person. All of this makes the data set far more accurate, which means later data searches will have fewer false-positive and false-negative results.
V.2 FINDING SEARCH MODELS
To conduct an automated data analysis requires a search model. When we are discussing pattern-based searching, finding and perfecting those models can be a very complex and difficult task. There are several ways to come up with the patterns on which a model is based. Models can be found from data-mining analysis, which is a "bottom-up" approach to finding a model in data. That is, it starts with the data and looks for anomalies or patterns that indicate certain behaviour. With data mining, the process begins with researchers developing a data-mining algorithm. The algorithm is then applied to "training sets" of data, for which the correct answers are known, to find a model. "Top-down" data analysis can also be used to find models. This involves starting with a hypothesis about the model and determining whether it exists in data. The hypothesis for a "top-down" analysis might come from an initial "bottom-up" review or from knowledge acquired elsewhere. Expertise or intelligence can be the source of a predictive model that will later be applied to data; that is, experts in relevant fields can develop a pattern to use in data mining analysis. Whatever method is used to discover them, models must be useful. That is, they must be predictive when applied in real-world situations. In data-mining research, producing blind or poorly designed models that are meaningless is sometimes referred to as "data dredging" or "overfitting the model." A significant amount of data-mining research involves finding ways to avoid trivial, misleading, or irrelevant models. A major goal in research on data mining for counterterrorism, for example, is not only to identify terrorist "signatures," but also to find ways to separate those patterns of activity from all other "noise" in databases. Whether they are obtained from data mining or other processes, validating models is critical, and to do this adequately requires conducting real-world testing or realistic simulations using the models. Also, results should be continually traced and analysed during use to see that they remain valid. An acceptable model would have low and acceptable numbers of false negatives, while producing manageable false positives that minimally impact the civil liberties of the innocent. Although automated data analysis using pattern-based predictive models has become relatively common in the private sector, developing these models for counterterrorism presents new and significant challenges for which additional research is necessary. Common commercial models are designed to find patterns that are broadly applicable among data points that are unrelated. For example, a retailer will look for broad patterns from unrelated customer purchase data that will predict future customer behaviour. This is "propositional" data-that is, data about unrelated instances-from a homogenous database of purchase information. For counterterrorism, on the other hand, the challenge is to find patterns in "relational" data-data in which the key facts are relationships between people, organizations, and activities-from a variety of different types and sources of data . This is because there are no broad patterns of terrorist activity; that is too rare. Terrorists operate in loose networks, and the effective models must find links among lower-level activities, people, organizations, and events that can allow inferences about higher-level clandestine organizations and activities. The data on these lower-level activities exist in different places, and it is the relationships between them that are important. Terrorist plots are rare and difficult to predict reliably, but preparatory and planning activities in which terrorists engage can be identified. Detecting combinations of these low-level activities-such as illegal immigration, operating front businesses, money transfers, use of drop boxes and hotel addresses for commercial activities, and having multiple identities-could help predict terrorist plots.
V.3 DECISION MAKING
The final stage of the data-mining and data-analysis process involves conducting the searches, interpreting the results, and making decisions about how to use these results. In the context of government use of these techniques for counterterrorism, there are very significant policy issues that arise at this stage. A key issue is the degree to which decisions are made automatically, based on the results of automated data analysis. These techniques are most useful as tools to inform analysis and decisions made by humans, not to substitute for them. In the commercial realm, some steps are taken automatically-with little or no human intervention-based on results of automated data analysis. For example, a retailer might apply data-mining models to predict the buying interests of a particular shopper based on his past purchases and those of others in the retail database. In that case, an automatic recommendation might be sent to the shopper, without the intervention of an employee. Patterns developed from data mining are also sometimes used to automatically "trigger" creditworthiness decisions . In most cases, however, data-mining results will be used as "power tools" for humans engaged in analysis or investigation. Certainly in the government, when the stakes of any action taken can be quite high, the results of automated data analysis are most appropriately used to inform human analysis, focus resources, or inspire additional investigation. Indeed, in the complex world of counterterrorism, application of data-mining models and related techniques are likely to be useful at several stages of a multistage process of developing a complete picture out of many "dots." Analysts might use these techniques to evaluate the significance of leads or suspicions, to generate those leads, to structure or order an investigation, or to acquire additional information along the way. But they are not likely to be useful as the only source for a decision or conclusion in investigations or analysis. The decision making stage is also significant because it is where many legal, policy, procedural, or technical controls on the acquisition or use of private information could be imposed. An example of this type of control would be technology that allows access to private information only for certain individuals or after certain permissions have been obtained. Controls might also include a requirement of approval by a neutral third party, based on a standard, before a government employee may obtain private information.
VI FALSE POSITIVES
Perhaps the most significant concern with data mining and automated data analysis is that the government might get it wrong and innocent people will be stigmatized as "terrorists" simply because they engaged in unusual patterns of behaviour or have some innocent link to a suspected terrorist. A major challenge in the use of these techniques is addressing the possibility of bad data or imperfect search models that result in "false positives." If automated data analysis is conducted on vast sets of data gathered from a variety of sources, data quality is inevitably an issue because many records will contain incorrect or obsolete information. If the data are not corrected or "cleansed" before they become the basis for government data analysis, inaccurate or incomplete identification could result. This means either false negatives-a significant security issue-or false positives that incorrectly identify people as matches or links. Even if the data quality is adequate, there is an additional false positive problem with pattern-based searches: if the data-mining model cannot separate the "noise" of innocent behaviour from the "signal" of terrorist activities, innocent behaviour will be viewed as suspicious. A critical issue is what the government does with false-positive results. If data mining and automated data analysis are used correctly as a "power tools" for analysts and investigators-a way to conduct low-level tasks that will provide clues to assist analysts and investigators-false positives are less dangerous. Data-mining results will then lead only to more analysis or investigation, and false positives can be discovered before there are significant negative consequences for the individual. But the stakes are so high when fighting catastrophic terrorism that there will be great temptation for the government to use these techniques as more than an analytical tool. Government actors will want to take action based on the results of data-analysis queries alone. This action could include detention, arrest, or denial of a benefit. Even if the government later corrects its mistake, the damage to reputation could already be done, with longer term negative consequences for the individual. Even when an error is identified, there may be difficulties correcting it. There are often inadequate procedures for correcting watch lists or other similar information. Systems that provide citizens the chance for redress of this kind of error either do not exist or are extremely difficult to use. In addition, if the false positive search results have been disseminated to other databases, they will be difficult to locate and correct. Although the technology exists to follow inaccurate data and correct cascading occurrences, it has not been a priority, and its implementation lags far behind the technology for collecting and analysing data.
Research to solve the problem of false positives is really about perfecting the data-analysis process itself. One cause of false-positive data-analysis results is "bad" or "dirty" data. Technology exists currently that goes a long way toward resolving the problems of bad or incomplete data leading to faulty identification in large data sets. The key here is that more information improves the fidelity of the data. As described in section III, given enough information, "data-cleansing" techniques like name standardization and address hygiene, identity resolution is highly effective. There can always be improvement, though, and research on data cleansing continues. One of the goals of DARPA's TIA research was to find ways to increase the accuracy for analysis of nonconforming data from multiple sources.48 In addition, large data aggregators work with algorithms that evaluate the historical accuracy of different data sources and use those to "score" the accuracy of a particular identity or other search result. Eliminating false positives that are generated by a pattern-based data-mining model requires perfecting the model. A model must look for accurate patterns and be able to separate the "signal" of those patterns from the "noise" of innocent transactions in the data. Research on pattern-based data mining for counterterrorism must include model accuracy as a primary goal.
1. Defeating terrorism requires a more nimble intelligence apparatus.
2. Data-mining and automated data-analysis techniques are powerful tools for intelligence and law enforcement officials fighting terrorism.
3. But privacy concerns with the use of these tools have generated significant fear and controversy.
4. These tools are too valuable to be rejected outright. On the other hand, embracing them without any guidelines or controls for their use poses a great risk that they, and the private information they analyse, will be misused.
5. Policymakers must acquire a greater understanding of data-mining and automated data-analysis tools and craft policy that encourages responsible use and sets parameters for that use.