Data Mining

Read Complete Research Material

DATA MINING

Data Mining

Data Mining

Literature Review

Data mining refers to the statistical analysis techniques used to search through large amounts of data to discover trends or patterns. Data mining is an especially powerful tool in the examination and analysis of huge databases. With the advent of the Internet, vast amounts of data are accumulating. As well, the amount of data that can be generated from a single scientific experiment where stretches of DNA are affixed to a glass chip can be staggering. Visual inspection of the data is no longer sufficient to make a meaningful interpretation of the information. Computer-driven solutions are required. For example, to analyze the DNA chip data, the discipline of bioinformatics—essentially a data mining exercise—emerged in the 1990s as a powerful melding of biology and computer science. (Allen, 1994, pp. 40-42)

The collection of intelligence and the monitoring of the activities of a government or an organization also involves sifting through great amounts of data. Coded information can be inserted into data transmissions. If this information escapes detection, it can be used for undesirable purposes. The ability to extract the suspect information from the background of the other information is of tremendous benefit to security and intelligence agencies. (Allen, 1994, pp. 40-42)

An example of data mining that is of relevance to espionage, intelligence and security is the use of computer programs—such as the Carnivore program of the United States Federal Bureau of Investigation—to screen thousands of email messages or Web pages for suspicious or incriminating data. Another example is the screening of radio transmissions and television broadcasts for codes.

The formulas used in data mining are known as algorithms. Two common data mining algorithms are regression analysis and classification analysis. Regression analysis is used with numerical data (quantitative data). This analysis constructs a mathematical formula that describes the pattern of the data. The formula can be used to predict future behavior of data, and so is known as the predictive model of data mining. (Anand, 1993, pp. 19-24)

For example, from a database of terrorists who have corresponded using emails, predictions could be made as to who will send an email and to whom. This would aid efforts to intercept the transmission. This type of data mining is also referred to as text mining.

Data that is not numerical (i.e., colors, names, opinions) is called qualitative data. To analyze this information, classification analysis is best. This model of data mining is also known as the descriptive model.

The data mining process involves several steps:

Defining the problem.

Building the database.

Examining the data.

Preparing a model to be used to probe the data.

Testing the model.

Using the model.

Putting the results into action.

Database construction and model preparation—in essence the building of the framework for the mining exercise—requires about 90% of the data mining effort. If these fundamentals are done correctly, the use of the model will uncover the data that is of potential significance.

In July 2002, the Intelligence Technology Innovation Center, which is administered by the United States Central Intelligence Agency (CIA), pledged up to $8 million to the ...
Related Ads