Smote Upsampling (Operator Toolbox)
Synopsis
This operator implements the Synthetic Minority Over-sampling Technique as proposed by Chawla et. al., Journal of Artificial Intelligence Research 16 (2002), 321 -- 357.Description
In the first step the ExampleSet is filtered to only consider examples of the minority class. Afterwards a search on the k nearest neighbours for all examples is performed. The algorithm then selects a random example and a random nearest neighbour for this example. A new example is created which is on the line between the two examples.
Input
- exa (Data table)
ExampleSet you want to upsample.
Output
- ups (Data table)
The original ExampleSet with the attached synthetic examples.
- ori (Data table)
The original ExampleSet.
Parameters
- number_of_neighbours In SMOTE we calculate the k nearest neighborhood. This parameter defines the number of neighbors to consider. Range:
- normalize If checked range transformation to [0,1] is performed to make distance calculation solid. Range:
- equalize_classes If activated as many new examples as needed to balance the classes are drawn. Range:
- upsampling_size Defines the number of examples you want to create. Range:
- auto_detect_minority_class If activated the class to upsample is the class with the least occurrences. Range:
- minority_class Defines the class you want to upsample. Range:
- round_integers Round Integer attributes to the next Integer. Range:
- nominal_change_rate Probability to change a nominal value to the nominal value of it's nearest neighbor. Range:
- use_local_random_seed This parameter indicates if a local random seed should be used. Range:
- local_random_seed If the use local random seed parameter is checked this parameter determines the local random seed. Range:
Tutorial Processes
Use smote on imbalanced Sonar
In this tutorial we unbalance the sonar data set with a sample operator and create synthetic examples to recreate class balance.