Categories

Versions

Cost-Sensitive Scoring (Model Simulator)

Synopsis

A new approach for cost-sensitive learning similar to the idea of Meta Cost (Domingo 1999) but without the need for creating multiple models and it still works for more than two classes.

Description

This operator implements a novel algorithm for cost-sensitive learning. It shares all the advantages of Meta Cost (Domingo 1999) without its disadvantages. The problem with Meta Cost is that it requires to create bagged models to work properly. This means that the models become more complex (users need to understand 10+ models) and training times are increased. This operator uses a similar idea but without this drawback.

The basic idea of this approach is that only one model is trained (instead of multiple models during the bagging learning). This model is used as input of this operator. Having only one model also makes it easier to understand for users. But we still need to create a distribution of confidence values for each data point to be predicted. The original Meta Cost was achieving this by using the multiple bagged models.

This algorithms, however, is following a different approach by generating a number of artificial data points close to the one to be predicted. This is a idea which recently became more popular as part of the LIME algorithm for explaining predictions of models. We then use the the confidences for those similar but generated data points as distribution for the confidences for the point to be predicted.

Those confidences are then averaged and used as input to calculate the expected cost exactly like it is done for the original Meta Cost. The prediction with lowest expected cost, i.e. confidences times cost / benefit for this prediction and the possible errors, is chosen as the ultimate prediction. This is not necessarily the class with the highest confidence, but that is the whole point of cost-sensitive learning.

Please note that for best results the confidences should be as close to probabilities as possible. Since this is not the case for most machine learning models, you can use the operator Rescale Confidences (Logistic) to achieve this in a post-processing step. Please note also that while the training time is not increased, the scoring time is slowed down though. This slow-down factor is depending on the number of artificial data points created. We made good experiences with 10 points for each prediction which would result in a slowdown of scoring times by a factor of 10x. However, since scoring is typically very fast, this is not a problem for most use cases.

Input

  • model (Model)

    This port expects a model which should be adapted for cost-sensitive scoring.

  • training set (Data table)

    This port expects an ExampleSet, this should be the same ExampleSet that was used to create the model. It is used to generate statistics which define the possible values and boundaries for the artificially generated data points.

Output

  • model (Model)

    The model which will now produce predictions based on the confidences and the supplied cost matrix.

  • optimal data (Data table)

    The original training data.

Parameters

  • classes A list of possible classes for the model to predict. The order of the classes in this list also defines the order of classes in the cost matrix. Range: list
  • cost_matrix The costs and benefits for errors and correct predictions. Costs should be positive numbers and gains / benefits should be negative numbers. Range: matrix
  • number_of_variants The number of variants created for each data point to be scored. You should use at least 5 variants and we made good experiences with values around 10. More is better but please be aware that this increases the scoring times by the same factor. Range: integer

Tutorial Processes

Cost Sensitive Scoring for Sonar

This process trains a Logistic Regression model on the Sonar data. The goal is to predict if a frequency spectrum of a sonar shows a rock or a mine. While it of course a problem if a boat hits a rock, it is less devastatiting than hitting a mine. We therefore set the costs of making the error of predicting that something is a mine when it really is rock only to 1. But we use a cost of 100 if we predict a rock for something which really is a mine.

If you run the process, you will see that the model only makes the error of predicting a rock for a true mine three times with those cost settings. While this is still not great, you should change the costs in the cost matrix of the operator back to 1 as well and rerun the process. In this case, we make the same type of error 17 times which clearly showed that assigning higher costs to this type of errors moved the models predictions away from it.