PREDICTION OF CREDIT RISK Prediction of Credit Risk
Prediction of Credit Risk
Introduction
Deciding whether an application for a credit card will be a good risk to take or a bad one is a tricky issue for banking companies. If they are too lenient, they will suffer losses from too many defaults. If they are too stringent, they do issue enough cards, and therefore also will not earn as much as would be possible (Wekam http://www.cs.waikato.ac.nz/~ml/weka/index.html). Credit risk scoring in the past has mainly been done using neural network methods (at least according to published papers; companies might use other tools as well, but not advertise this so as not to lose their perceived competitive advantage). A smaller number of papers also investigated boosting- based methods. Machine Learning algorithms need to produce well-calibrated probabilities or similar scores to allow for effective credit risk scoring.
This competition has added one more dimension to this already tricky problem: time. A scenario is modelled, that is more realistic than the usual cross- validation estimates derived from one single set of data. As data collected is usually data from the past, possible attribute values and ranges may change over time, and so might the decision thresholds. Therefore the competition provided labeled training data collected a few years earlier. Then with a gap of one year, an unlabeled evaluation dataset spanning a full year was made available for a leaderboard comparison: participants could upload up to 15 solution and get them ranked. The final prediction was collected after another one year gap past the evaluation data, and again comprises a full year's worth of data. Therefore one would assume even more drift away from the training data to be present in this final data. In the following two sections we will describe the preprocessing performed and various algorithms that were found to perform better than average.
Preprocessing
Most preprocessing described here was standard and straightforward. Unfortunately there was not enough time available to seriously address the problem of drift over the years, but at least one simple idea was investigated and seems to be able to at least slightly improve predictions: categorical attributes like the SHOP ID can have values that are only present in the test data. In these cases these values were represented as missing values and treated specially at prediction time (Pfahringer & Holmesm 2004). Whenever an example contained a missing value for such an attribute, this value was replaced with the most frequent ones from the training data. A threshold of 4% was used for this selection. The predictions for these imputed copies of the test example were then averaged according to each value's prior probability in the training data. This method seems to improve on simply imputing the mode. Trying to predict the value, i.e. finding the most similar value in the training data, proved less beneficial than the weighted sum of the most frequent values, but still outperformed the mode as well. Again, for a lack of time, this claim should be viewed as preliminary at ...