Home | Business News | Browse by Publication | E | Ethics and Information Technology

Data mining to combat terrorism and the roots of privacy concerns.

Publication: Ethics and Information Technology
Publication Date: 01-DEC-05
Format: Online
Delivery: Immediate Online Access

Article Excerpt
Abstract. Recently, there has been a heavy debate in the US about the government's use of data mining in its fight against terrorism. Privacy concerns in fact led the Congress to terminate the funding of TIA, a program for advanced information technology to be used in the combat of terrorism. The arguments put forward in this debate, more specifically those found in the main report and minority report by the TAPAC established by the Secretary of Defense to examine the TIA issue, will be analysed to trace the deeper roots of this controversy. This analysis will in turn be used as a test case to examine the adequacy of the usual theoretical frameworks for these kinds of issues, in particular the notion of privacy. Whereas the dominant theoretical framing of the notion of privacy turns around access to information, most of the core arguments in the debate do not fit in this kind of framework. The basic disagreements in the controversy are not about mere access, they involve both access and use. Furthermore, whereas the issue of access by itself refers to a more or less static situation, the real disagreements much more concern the organisational dynamics of the use of information, the mechanisms in the organisation that control these dynamics, and the awareness present within the organisation of the 'social risks' these dynamics represent. The bottom line question is whether the assessment of these gives sufficient reason for trust.

Key words: data mining, ethics, privacy, risk, security, systems of subliminal enticement, terrorism

Introduction

Data mining is an emerging technology, that is perceived as highly promising in a number of areas, and that is increasingly used and developed. Recently data mining techniques for combating terrorism raised extensive discussion in the US. The discussion provides a good starting point for exploring some analytical categories to identify what are the main ethical issues connected with data mining in this context. Furthermore, such an analysis can produce clarifying observations where the discussion is confused.

Data mining techniques were and are involved in several governmental programs in the US such as Terrorist Information Awareness (TIA, involving an integrated database system to identify potential foreign terrorists) and Multistate Anti-terrorism Information Exchange (MATRIX, connecting databases of the states participating). A column by William Safire in the New York Times (1) triggered a public discussion on data mining under TIA and other programs. Its strong impact is illustrated by the fact that in January, 2003, Congress imposed a moratorium on data mining under TIA as well as under similar programs until more information would be provided on these programs, and in September of the same year decided to terminate the funding of TIA.

In this article various arguments in this controversy are analysed, in particular the arguments that can be found in the main and minority report from the Technology and Privacy Advisory Committee (TAPAC) that was specifically installed to examine this matter. These arguments will be confronted with the common theoretical framing of such issues, particularly with respect to privacy concerns. It will turn out that the main origins of disagreement do not fit into the usual frameworks.

What is data mining?

Broadly conceived, data mining is a field of computer science that can be described as concerned with 'the extraction of implicit, previously unknown, and potentially useful information from data' (2) or 'extracting useful information from large data sets or databases'. (3) Whereas the term Knowledge Discovery in Databases (KDD) commonly is used to cover the whole trajectory from data preparation up to implementation, the term 'data mining' tends to be restricted to the actual extraction process itself. Since there is some confusion concerning the precise meaning of the term 'data mining', and since it is important to have a proper understanding of what kind of basic techniques are involved, a brief explanation is due regarding to the meaning of a phrase like 'implicit information'.

Databases are constructed to answer specific questions ('queries'). A data base is set up in such a way that the queries that are specific for the data base can be processed easily, by means of direct linkages between the items that the queries may connect. For instance, a company collects some personal data about its employees, such as name, address, marital status, salary, bank account, etc.; on entering the name (or personal identification number) of an employee, the administration database should then be able to answer queries like 'what is this person's address?', or 'what is this person's salary?' With some more effort, however, it may also be possible to get answers to slightly less obvious questions like 'which employee is living at this address?', or even 'which male employees are unmarried and frequently absent?' Of course, if these are not standard queries, some additional programming or combination of answers to standard queries will be necessary. Even more effort may be needed when answers are sought that require information from several separate databases. Information like this exists in the database only in an implicit form, in the sense that the database was not set up to answer such questions, and is not structured in such a way as to find the answer in the most straightforward possible way (i.e., through a standard query). This, then, is a first type of activity that could be described as 'data mining': searching a (large) database (or a set of coupled databases) for items with a specific combination of characteristics that does not correspond to a standard query. Although in common language the term 'data mining' certainly seems appropriate here, this kind of activity is usually not included under the term 'data mining' in the computer science literature.

A second mode of operation is that we let the computer itself search for (frequently occurring or otherwise significant) combinations of characteristics in a database or collection of databases. For instance, a supermarket database could be searched to find products that are often bought together. Or the police may want to map networks of criminals or criminal activities. For such searches clustering and other algorithms exist or can be developed. This mode of operation is often called 'descriptive data mining', in contrast to the third type to be discussed next.

A third mode of operations is that patterns are searched for and used with the aim of predicting certain characteristics. For instance, the police may be interested in characteristics that could be indicative for criminal activities. Such patterns may be discovered from a small subset of those activities that are known to be connected to criminality; the predictive value is then tested on a different subset of activities known to be connected to criminality; finally, the pattern may be used as indicative of the potential criminality of activities that were not already known to be so. These kinds of searches are called 'predictive data mining'.

Although the distinction between descriptive and predictive data mining frequently occurs in the computer science literature, the semantics again is not always entirely clear-cut. A 'descriptive' pattern such as products that are often bought together in a supermarket can be used to change the display, grouping those articles together; this could in a sense already...

View this article FREE - Now for a Limited Time, try Goliath Business News
Free for 3 Days!



More articles from Ethics and Information Technology
Ethical issues in interaction design., June 01, 2006
On-line professionals., June 01, 2006
EPRs in the consultation room: a discussion of the literature on effec..., June 01, 2006
E-mail, terrorism, and the right to privacy., March 01, 2006
RFID: the next serious threat to privacy., December 01, 2005

Looking for additional articles?
Search our database of over 3 million articles.

Looking for more in-depth information on this industry?
Search our complete database of Industry & Market reports by text, subject, publication name or publication date.

About Goliath
Whether you're looking for sales prospects, competitive information, company analysis or best practices in managing your organization, Goliath can help you meet your business needs.

Our extensive business information databases empower business professionals with both the breadth and depth of credible, authoritative information they need to support their business goals. Whether it be strategic planning, sales prospecting, company research or defining management best practices - Goliath is your leading source for accurate information.