You are viewing the RapidMiner Studio documentation for version 8.0 - Check here for latest version

Generate Weight (Stratification) (RapidMiner Studio Core)

Synopsis

This operator distributes the specified weight over all the examples, such that weights sum up equally per label.

Description

The Generate Weight (Stratification) operator divides the weight specified through the total weight parameter among all the examples. While dividing the weight, this operator makes sure that the sum of example weights of all label values is same. This often improves the representativeness of the label values. Please study the attached Example Process for better understanding.

Input

  • example set input (Data Table)

    This input port expects an ExampleSet. It is output of the Retrieve operator in the attached Example Process.

Output

  • example set output (Data Table)

    The examples are assigned weights and the resultant ExampleSet is returned through this port.

  • original (Data Table)

    The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

Parameters

  • total_weightThis parameter specifies the total weight that should be distributed over all the examples. Range: real

Tutorial Processes

Assigning weights such that weights sum up equally per label

The 'Golf' data set is loaded using the Retrieve operator. A breakpoint is inserted here so that you can have a look at the ExampleSet. You can see that the label of this ExampleSet has two possible values i.e. 'yes' and 'no'. The Generate Weight (Stratification) operator is applied on this ExampleSet for weight assignment. The total weight parameter is set to 10. This operator assigns weight to examples such that: The sum of all weights is equal to the total weight. The sum of weights is equal per label. Thus in this process, the sum of all weights should be 10 and the weight sum of examples with label 'no' should be equal to the weight sum of examples with label 'yes'. You can verify this by viewing the resultant ExampleSet in the Results Workspace.