I would take this opportunity to thank my research supervisor, family and friends for their support and guidance without which this research would not have been possible.
Abstract
Data mining, as we use the term, is the exploration and analysis of large quantities of data in order to discover meaningful patterns and rules. For the purposes of this book, we assume that the goal of data mining is to allow a corporation to improve its marketing, sales, and customer support operations through a better understanding of its customers. Keep in mind, however, that the data mining techniques and tools described here are equally applicable in fields ranging from law enforcement to radio astronomy, medicine, and industrial process control.
In fact, hardly any of the data mining algorithms were first invented with commercial applications in mind. The commercial data miner employs a grab bag of techniques borrowed from statistics, computer science, and machine learning research. The choice of a particular combination of techniques to apply in a particular situation depends on the nature of the data mining task, the nature of the available data, and the skills and preferences of the data miner. Data mining is largely concerned with building models. A model is simply an algorithm or set of rules that connects a collection of inputs (often in the form of fields in a corporate database) to a particular target or outcome.
Key words
Data Mining , Information Retrieval and Data Warehousing.
Issues In Data Mining And Information Retrieval
Introduction
Data mining is the drilling down for lost data that has lain dormant, sometimes for years. Often a company has not been aware it possessed this data—usually because of decentralized database management, lack of relational database systems, or the existence of legacy systems with old and forgotten databases. The real value of the data lies in analyzing it to reveal or create relationships that have been previously undiscovered. Having huge banks of data is of no value whatsoever if you don't bother to evaluate it. Evaluation can relate to anything from sales records to seasonal correlations; it can be applied to any supplier-customer relationship, whether in the private or public sector or in industrial, commercial, or consumer markets. (Tukey, 50)
The results of data mining can be grouped as follows:
Association of events that can be correlated. A computer purchase, for example, is likely to involve the simultaneous purchase of a printer.
Sequences as one event leads to another. Computer and printer purchase may be followed by the purchase of a scanner.
Classification through the recognition of patterns. These can be based on any relevant data—income, sales, location, or even average summer rainfall! It all depends on how you see the data benefiting your business.
Forecasting. This is a natural extrapolation from the other results and can facilitate more accurate projections. Projected beer consumption, for example, could also be related to future consumption of peanuts or potato chips.
In reality, successful data mining starts with data integration. The integration of disparate legacy systems and databases reduces ...