Categories

Versions

You are viewing the RapidMiner Studio documentation for version 10.1 - Check here for latest version

Smote Upsampling (Operator Toolbox)

Synopsis

This operator implements the Synthetic Minority Over-sampling Technique as proposed by Chawla et. al., Journal of Artificial Intelligence Research 16 (2002), 321 -- 357.

Description

In the first step the ExampleSet is filtered to only consider examples of the minority class. Afterwards a search on the k nearest neighbours for all examples is performed. The algorithm then selects a random example and a random nearest neighbour for this example. A new example is created which is on the line between the two examples.

Input

  • exa (Data Table)

    ExampleSet you want to upsample.

Output

  • ups (Data Table)

    The original ExampleSet with the attached synthetic examples.

  • ori (Data Table)

    The original ExampleSet.

Parameters

  • number_of_neighbours In SMOTE we calculate the k nearest neighborhood. This parameter defines the number of neighbors to consider. Range:
  • normalize If checked range transformation to [0,1] is performed to make distance calculation solid. Range:
  • equalize_classes If activated as many new examples as needed to balance the classes are drawn. Range:
  • upsampling_size Defines the number of examples you want to create. Range:
  • auto_detect_minority_class If activated the class to upsample is the class with the least occurrences. Range:
  • minority_class Defines the class you want to upsample. Range:
  • round_integers Round Integer attributes to the next Integer. Range:
  • nominal_change_rate Probability to change a nominal value to the nominal value of it's nearest neighbor. Range:
  • use_local_random_seed This parameter indicates if a local random seed should be used. Range:
  • local_random_seed If the use local random seed parameter is checked this parameter determines the local random seed. Range:

Tutorial Processes

Use smote on imbalanced Sonar

In this tutorial we unbalance the sonar data set with a sample operator and create synthetic examples to recreate class balance.