Create Lift Chart (RapidMiner Studio Core)
Synopsis
This operator generates a lift chart for the given model and ExampleSet based on the discretized confidences and a Pareto chart.Description
The Create Lift Chart operator creates a lift chart based on a Pareto plot for the discretized confidence values of the given ExampleSet and model. The model is applied on the ExampleSet and a lift chart is produced afterwards. Please note that any predicted label of the given ExampleSet will be removed during the application of this operator. In order to produce reliable results, this operator must be applied on data that has not been used to build the model, otherwise the resulting plot will be too optimistic.
The lift chart measures the effectiveness of models by calculating the ratio between the result obtained with a model and the result obtained without a model. The result obtained without a model is based on randomly selected records.
Input
- example set (Data Table)
This input port expects an ExampleSet. It is the output of the Generate Direct Mailing Data operator in the attached Example Process. The output of other operators can also be used as input.
- model (Model)
This input port expects a model. It is the output of the Naive Bayes operator in the attached Example Process. The output of other operators can also be used as input.
Output
- example set (Data Table)
The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.
- model (Model)
The model that was given as input is passed without changing to the output through this port. This is usually used to reuse the same model in further operators or to view the model in the Results Workspace.
- lift pareto chart (Lift Pareto Chart)
For the given model and ExampleSet a lift chart is generated based on the discretized confidences and a Pareto chart. This lift chart is delivered through this port.
Parameters
- target_classThis parameter indicates the target class for which the lift chart should be produced. Range: string
- binning_typeThis parameter indicates the binning type of the confidences. Range: selection
- number_of_binsThis parameter specifies the number of bins the confidence should be discretized into. This parameter is only available when the binning type parameter is set to 'simple' or 'frequency'. Range: integer
- size_of_binsThis parameter specifies the number of examples that each bin should contain when the confidence is discretized. This parameter is only available when the binning type parameter is set to 'absolute'. Range: integer
- automatic_number_of_digitsThis parameter indicates if the number of digits should be automatically determined for the range names. Range: boolean
- number_of_digitsThis parameter specifies the minimum number of digits to be used for the interval names. If this parameter is set to -1 then the minimal number is determined automatically. This parameter is only available when the automatic number of digits parameter is set to false. Range: integer
- show_bar_labelsThis parameter indicates if the bars should display the size of the bin together with the amount of the target class in the corresponding bin. Range: boolean
- show_cumulative_labelsThis parameter indicates if the cumulative line plot should display the cumulative sizes of the bins together with the cumulative amount of the target class in the corresponding bins. Range: boolean
- rotate_labelsThis parameter indicates if the labels of the bins should be rotated. Range: boolean
Tutorial Processes
Creating lift chart for direct mailing data
The Direct Mailing Data operator is used for generating an ExampleSet with 10000 examples. The Split Validation operator is applied on this ExampleSet. The split ratio parameter is set to 0.7 and the sampling type parameter is set to 'shuffled sampling'. Here is an explanation of what happens inside the Split Validation operator.
The Split Validation operator provides a training data set through the training port of the training subprocess. This training data set is used as input for the Naive Bayes operator. Thus the Naive Bayes classification model is trained on this training data set. The Naive Bayes operator provides the Naive Bayes classification model as its output. This model is connected to the model port of the training subprocess. The Naive Bayes model that was provided at the model port of the training subprocess is delivered by the Split Validation operator at the model port of the testing subprocess. This model is provided as input at the model port of the Create Lift Chart operator. The Split validation operator provides the testing data set through the test set port of the testing subprocess. This testing data set is provided as input to the Create Lift Chart operator. The Create Lift Chart operator generates a lift chart for the given model and ExampleSet based on the discretized confidences and a Pareto chart. The lift chart is provided to the Remember operator to store it in the object store. The Apply Model operator is provided with the testing data set and the model. The Apply Model operator applies the model on the testing data set and the resultant labeled data set is delivered as output. This labeled data set is provided as input to the Performance operator. The Performance operator evaluates the statistical performance of the model through the given labeled data set and generates a performance vector which holds information about various performance criteria.
Outside the Split Validation operator, the Recall operator is used for fetching the lift chart from the object store. The lift chart is delivered to the output and it can be seen in the Results Workspace.