Categories

Versions

Performance Binominal Classification (AI Studio Core)

Synopsis

This Operator is used to statistically evaluate the strengths and weaknesses of a binary classification, after a trained model has been applied to labelled data.

Description

A binary classification makes predictions where the outcome has two possible values: call them positive and negative. Moreover, the prediction for each Example may be right or wrong, leading to a 2x2 confusion matrix with 4 entries:

  • TP - the number of "true positives", positive Examples that have been correctly identified
  • FP - the number of "false positives", negative Examples that have been incorrectly identified
  • FN - the number of "false negatives", positive Examples that have been incorrectly identified
  • TN - the number of "true negatives", negative Examples that have been correctly identified

In the parameter section, numerous performance criteria are described, any of which can be calculated in terms of the above variables.

If the model has a probabilistic scoring system where scores above a certain threshold are identified as positive, then the elements of the confusion matrix will depend on the threshold. To create an ROC graph and calculate the area under the curve (AUC), the threshold is varied and a point (x, y) is plotted for each threshold value:

  • y-axis - true positive rate = (True positive predictions)/(Number of positive Examples) = TP / (TP + FN)
  • x-axis - false positive rate = (False positive predictions)/(Number of negative Examples) = FP / (FP + TN)

Differentiation

There are numerous performance Operators, and you should choose the one that is best suited to your problem.

Performance (Classification)

Choose this Operator when the label is nominal and it has more than two values.

Input

  • labeled data (Data table)

    This input port expects a labeled ExampleSet. Make sure the ExampleSet has both a label Attribute and a prediction Attribute, and that the label is of type binominal.

  • performance (Performance Vector)

    This input port expects a performance vector. You need to connect a performance vector to the input if you want to do multi-objective optimization.

Output

  • performance (Performance Vector)

    This output port delivers a performance vector -- a list of performance criterion values based on the label and prediction Attributes of the input ExampleSet. In the output, the performance criterion values from the input (if any) are combined with the values from this Operator; in case of overlap, the values from the input are overwritten.

  • example set (Data table)

    The ExampleSet that was given as input is passed through without changes.

Parameters

  • main_criterion

    The main criterion is used when performance vectors are compared, e.g., parameter Optimization or Attribute selection. If not selected, the main criterion is the first criterion in the output performance vector.

    If performance vectors are not compared, the main criterion is ignored.

    Range:
  • manually_set_positive_class

    Check this box to use the positive class parameter to manually specify the positive class. Otherwise the positive class is derived from the label's internal mapping. It is recommended to manually set the desired positive class.

    Range:
  • positive_class

    Use this parameter to manually set the positive class.

    In rare cases the suggested values in the drop down menu may not match the actual label's values. In this case the correct positive class can be specified by typing its name manually instead of selecting it from the drop down menu.

    Range:
  • accuracy

    accuracy = (Correct predictions)/(Number of Examples) = (TP + TN) / (TP + FP + FN + TN)

    Range:
  • classification_error

    classification error = (Incorrect predictions)/(Number of Examples) = (FP + FN) / (TP + FP + FN + TN)

    Range:
  • kappa

    Cohen's kappa = (po - pe)/(1 - pe)

    where:

    po = observed accuracy = (TP + TN) / (TP + FP + FN + TN)

    pe = expected accuracy = [(TP + FP)(TP + FN) + (FN + TN)(FP + TN)] / [(TP + FP + FN + TN)^2]

    Range:
  • AUC (optimistic)

    When the ROC graph is plotted, before calculating the area under the curve (AUC), the predictions are sorted by score, from highest to lowest, and the graph is plotted Example by Example. If two or more Examples have the same score, the ordering is not well-defined; in this case, the optimistic version of AUC plots the positive Examples before plotting the negative Examples.

    Range:
  • AUC

    When the ROC graph is plotted, before calculating the area under the curve (AUC), the predictions are sorted by score, from highest to lowest, and the graph is plotted Example by Example. If two or more Examples have the same score, the ordering is not well-defined. The normal version of AUC calculates the area by taking the average of AUC (optimistic) and AUC (pessimistic).

    Range:
  • AUC (pessimistic)

    When the ROC graph is plotted, before calculating the area under the curve (AUC), the predictions are sorted by score, from highest to lowest, and the graph is plotted Example by Example. If two or more Examples have the same score, the ordering is not well-defined; in this case, the pessimistic version of AUC plots the negative Examples before plotting the positive Examples.

    Range:
  • precision

    precision = (True positive predictions)/(All positive predictions) = TP / (TP + FP)

    Range:
  • recall

    recall = (True positive predictions)/(Number of positive Examples) = TP / (TP + FN)

    Range:
  • lift

    lift is the ratio of two quantities, representing the improvement over random sampling:

    1. The probability of choosing a positive Example from the group of all positive predictions: TP / (TP + FP)

    2. The probability of choosing a positive Example from the group of all Examples: (TP + FN) / (TP + FP + FN + TN)

    lift = [TP / (TP + FP)] / [(TP + FN) / (TP + FP + FN + TN)]

    Range:
  • fallout

    fallout = (False positive predictions)/(Number of negative Examples) = FP / (FP + TN)

    Range:
  • f_measure

    F1 = 2 (precision * recall) / (precision + recall) = 2TP / (2TP + FP + FN)

    Range:
  • false_positive

    The number of false positive predictions: FP

    Range:
  • false_negative

    The number of false negative predictions: FN

    Range:
  • true_positive

    The number of true positive predictions: TP

    Range:
  • true_negative

    The number of true negative predictions: TN

    Range:
  • sensitivity

    sensitivity = recall = (True positive predictions)/(Number of positive Examples) = TP / (TP + FN)

    Range:
  • specificity

    specificity = (True negative predictions)/(Number of negative Examples) = TN / (TN + FP)

    Range:
  • youden

    Sometimes called informedness or DeltaP'.

    J = sensitivity + specificity - 1

    Range:
  • positive_predictive_value

    PPV = precision = (True positive predictions)/(All positive predictions) = TP / (TP + FP)

    Range:
  • negative_predictive_value

    NPV = (True negative predictions)/(All negative predictions) = TN / (TN + FN)

    Range:
  • psep

    Sometimes called markedness or DeltaP.

    psep = PPV + NPV - 1

    Range:
  • skip_undefined_labels

    When this parameter is true, Examples not belonging to a defined class are ignored.

    Range:
  • comparator_class

    The fully qualified classname of the PerformanceComparator implementation is specified here.

    Range:
  • use_example_weights

    This parameter has no effect if no Attribute has the weight role.

    Range:

Tutorial Processes

Separate mines from rocks

The Sonar data set contains 111 Examples obtained by bouncing sonar signals off a metal cylinder (a "mine") at various angles and under various conditions, and 97 Examples obtained from rocks under similar conditions. The transmitted sonar signal is a frequency-modulated chirp, rising in frequency. The data set contains signals obtained from a variety of different aspect angles, spanning 90 degrees for the cylinder and 180 degrees for the rock.

Each Example has 60 Attributes in the range 0.0 to 1.0. Each Attribute represents the energy within a particular frequency band, integrated over a certain period of time. The integration aperture for higher frequencies occur later in time, since these frequencies are transmitted later during the chirp.

In the first Tutorial Process, a predictive model is created to identify mines, based on the sonar signal. When you run the Process, the output is displayed in three steps:

1. The whole Sonar data set is displayed.

2. A subset of the Sonar data set is displayed, with predictions based on Neural Net.

3. An ROC graph is displayed in red, together with the threshold values in blue. To see the confusion matrix, click on "recall" or "false negative", where you will learn that the model discovers 90% of the mines, with 4 false negatives (mines that were identified as rocks).

Because the input of the Operator Performance (Binominal Classification) demands labelled data of type "binominal", the label for the original Sonar data must first be converted from "nominal" to "binominal" via the Operator Nominal to Binominal. This type conversion step is unnecessary if the final Operator is Performance (Classification), which accepts a nominal label as input.

Separate mines from rocks, with Cross Validation

A more realistic perspective on mine discovery is achieved by using Cross Validation. The second Tutorial Process is similar to the first Tutorial Process, but now 5 different versions of the Neural Net model are created, and the results are combined. The Operator Cross Validation takes the place of Split Data, and Performance (Binominal Classification) is part of the testing subprocess.

The output is again an ROC graph, but this time the lines on the graph have a spread which reflects the uncertainty in model building. If you click on "recall" to look at the confusion matrix, you will learn that the resultant model discovers 82% +/- 8% of the mines.