The typical approach to the problem of churn prediction is using a sufficiently large data set that contains churning and non-churning customers. This set is being analyzed to construct a classifier. The work of a classifier is to decide, given a customer data set, if churn is more or less likely. Such classifiers are constructed using, for instance, neural networks, bayesian statistics or decision trees constructed with the he heuristics like CART or C4.5.
Actual Churners
Actual Non Churners
Predicted Churners
True Positive
False Positive
Predicted Non Churners
False Negative
True Negative
Churn prediction categories
Quality of the output is then measured in terms of sensitivity, specifity and accuracy. Table 1 shows the categorization of churn prediction. The sensitivity of a classifier is the number of data sets for which correct predictions have been made (true positives, in our case: churners predicted as churners) divided by the total number of members (true positives + false positives). The specifity is the number of data sets that were correctly predicted to not be members of the class (true negatives, non churners predicted as non churners) divided by the number of all members that do not belong to the class. Usually a Receiver Operating Characteristic (ROC) curve is used to display a graphic of sensitivity vs. specifity. Accuracy is defined as the percentage of correct predictions. This quality measures are used to adjust parameters of the classifier until a reasonable quality of prediction is achieved. An accuracy of about 90% is, according to Domingos, sufficient for a classifier to predict churning.
Regression is considered to be a good technique for identifying and predicting customer satisfaction. For each of the variables in a regression model the standard error rate is calculated using SPSS. Then the variables with the most significance in respect to linear regressions for churn prediction are obtained and a regression model is constructed. Since the prediction task in churn prognosis is to identify a customer as a churner or non churner and therefore the prediction attribute is associated with only two values logistic regression techniques are suitable. While linear regression models are useful for prediction of continuous valued attributes, logistic regression models are suitable for binary attributes. The logistic regression model is simply a non-linear trans- formation of a liner regression model. The standard representation of logistic regression is referred as logistic function. The estimated probability of churn is estimated with the function
P r [churn] =1/1 + e-T
Where T = a +BX . Here a is a constant term, X represents the predictor attributes vector and B is the coefficient vector for the predictor attributes. If T equals 0 the probability is 0,5. This means that it is equi-probable that a customer is a churner and non churner. With T growing large the probability comes closer to 1, so the customer becomes a more probable churner, when T is becoming small the probability of churn is tending to be 0.
ROC Curve
An ROC curve is a graphical representation of the trade off between the false negative and false positive rates ...