Weight by Deviation (RapidMiner Studio Core)

Synopsis

This operator calculates the relevance of attributes of the given ExampleSet based on the (normalized) standard deviation of the attributes.

Description

The Weight by Deviation operator calculates the weight of attributes with respect to the label attribute based on the (normalized) standard deviation of the attributes. The higher the weight of an attribute, the more relevant it is considered. The standard deviations can be normalized by average, minimum, or maximum of the attribute. Please note that this operator can be only applied on ExampleSets with numerical label.

Standard deviation shows how much variation or dispersion exists from the average (mean, or expected value). A low standard deviation indicates that the data points tend to be very close to the mean, whereas high standard deviation indicates that the data points are spread out over a large range of values. The standard deviation is a measure of how spread out numbers are. The formula is simple: it is the square root of the Variance.

Input

  • example set (IOObject)

    This input port expects an ExampleSet. It is output of the Retrieve operator in the attached Example Process.

Output

  • weights (Average Vector)

    This port delivers the weights of the attributes with respect to the label attribute. The attributes with higher weight are considered more relevant.

  • example set (IOObject)

    The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

Parameters

  • normalize_weightsThis parameter indicates if the calculated weights should be normalized or not. If set to true, all weights are normalized in the range from 0 to 1. Range: boolean
  • sort_weightsThis parameter indicates if the attributes should be sorted according to their weights in the results. If this parameter is set to true, the order of the sorting is specified using the sort direction parameter. Range: boolean
  • sort_directionThis parameter is only available when the sort weights parameter is set to true. This parameter specifies the sorting order of the attributes according to their weights. Range: selection
  • normalizeThis parameter indicates if the standard deviation should be divided by the minimum, maximum, or average of the attribute. Range: selection

Tutorial Processes

Calculating the attribute weights of the Polynomial data set

The 'Polynomial' data set is loaded using the Retrieve operator. A breakpoint is inserted here so that you can have a look at the ExampleSet. You can also see the standard deviation of all attributes in the 'Statistics' column in the Meta Data View. The Weight by Deviation operator is applied on this ExampleSet to calculate the weights of the attributes. The normalize weights parameter is set to false, thus the weights will not be normalized. The sort weights parameter is set to true and the sort direction parameter is set to 'ascending', thus the results will be in ascending order of the weights. You can verify this by viewing the results of this process in the Results Workspace. You can also see that these weights are exactly the same as the standard deviations of the attributes.