Categories

Versions

Weight by Relief (RapidMiner Studio Core)

Synopsis

This operator calculates the relevance of the attributes by Relief. The key idea of Relief is to estimate the quality of features according to how well their values distinguish between the instances of the same and different classes that are near each other.

Description

Relief is considered one of the most successful algorithms for assessing the quality of features due to its simplicity and effectiveness. The key idea of Relief is to estimate the quality of features according to how well their values distinguish between the instances of the same and different classes that are near each other. Relief measures the relevance of features by sampling examples and comparing the value of the current feature for the nearest example of the same and of a different class. This version also works for multiple classes and regression data sets. The resulting weights are normalized into the interval between 0 and 1 if the normalize weights parameter is set to true.

Input

  • example set (Data Table)

    This input port expects an ExampleSet. It is output of the Retrieve operator in the attached Example Process.

Output

  • weights (Attribute Weights)

    This port delivers the weights of the attributes with respect to the label attribute. The attributes with higher weight are considered more relevant.

  • example set (Data Table)

    The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

Parameters

  • normalize_weightsThis parameter indicates if the calculated weights should be normalized or not. If set to true, all weights are normalized in range from 0 to 1. Range: boolean
  • sort_weightsThis parameter indicates if the attributes should be sorted according to their weights in the results. If this parameter is set to true, the order of the sorting is specified using the sort direction parameter. Range: boolean
  • sort_directionThis parameter is only available when the sort weights parameter is set to true. This parameter specifies the sorting order of the attributes according to their weights. Range: selection
  • number_of_neighborsThis parameter specifies the number of nearest neighbors for relevance calculation. Range: integer
  • sample_ratioThis parameter specifies the ratio of examples to be used for determining the weights. Range: real
  • use_local_random_seedThis parameter indicates if a local random seed should be used for randomizing examples of a subset. Using the same value of the local random seed will produce the same sample. Changing the value of this parameter changes the way examples are randomized, thus the sample will have a different set of examples. Range: boolean
  • local_random_seedThis parameter specifies the local random seed. This parameter is only available if the use local random seed parameter is set to true. Range: integer

Tutorial Processes

Calculating the attribute weights of the Polynomial data set

The 'Polynomial' data set is loaded using the Retrieve operator. The Weight by Relief operator is applied on it to calculate the weights of the attributes. All parameters are used with default values. The normalize weights parameter is set to true, thus all the weights will be normalized in range 0 to 1. The sort weights parameter is set to true and the sort direction parameter is set to 'ascending', thus the results will be in ascending order of the weights. You can verify this by viewing the results of this process in the Results Workspace.