Data Mining

Read Complete Research Material



Data Mining

Data Mining

Introduction

Data mining is the science of extracting useful information from large data sets or databases. It is used for building empirical models, which are not based on the principles of operation of the process or mechanism underlying data, but the data from the observation process. It provides a description of the observed data. It essentially aims to better understand the data structure and its important characteristics, and to discover and extract patterns from data sets. It involves many disciplines such as managing databases, statistics, artificial intelligence and data visualization. The extracted information can be used to manage information, the query processing, decision making, process control and many other applications. Companies recognize the importance of data mining, the extraction of information from data and commercial or industrial use such information for new and innovative ways to generate income and expand opportunities (Pyle, 1999).

Discussion

History of Data Mining

The generation of models from a large amount of data is not a recent phenomenon. For there to model creation there must have collection. In China is ready to Emperor Tang Yao, will identify crops in 2238 BC, and in Egypt the pharaoh Amasis organizes census its population fifth century BC. It is only in the seventeenth century we begin to analyze the data to find common characteristics. In 1662, John Graunt published his book "Natural and Political Observations Made upon the Bills of Mortality" in which he analyzed the mortality in London and trying to predict the appearances of the bubonic plague. In 1763, Thomas Bayes shows that we can determine not only probabilities from observations derived from experience, but also the parameters for these probabilities. Presented in the particular case of a binomial distribution, this result is extended independently by Laplace, leading to a general formulation of Bayes' theorem . Legendre published in 1805 an essay on the least squares method for comparing a set of data a mathematical model.

From 1919 to 1925, Ronald Fisher develops the analysis of variance as a tool for his project of statistical inference medical. The 1950s saw the advent of computer technology and computer calculation. Same methods and techniques are emerging such as segmentation, the neural networks and genetic algorithms, and then in the 1960s, the decision tree, the method of mobile centers, these techniques allow researchers to exploit and discover models more accurate. In France, Jean-Paul Benzécri develops the correspondence analysis in 1962 .The advent ...
Related Ads