Categories

Versions

You are viewing the RapidMiner Studio documentation for version 8.2 - Check here for latest version

Weight by Gini Index (RapidMiner Studio Core)

Synopsis

This operator calculates the relevance of the attributes of the given ExampleSet based on the Gini impurity index.

Description

The Weight by Gini Index operator calculates the weight of attributes with respect to the label attribute by computing the Gini index of the class distribution, if the given ExampleSet would have been split according to the attribute. Gini Index is a measure of impurity of an ExampleSet. The higher the weight of an attribute, the more relevant it is considered. Please note that this operator can be only applied on ExampleSets with nominal label.

Input

  • example set (Data Table)

    This input port expects an ExampleSet. It is output of the Retrieve operator in the attached Example Process.

Output

  • weights (Attribute Weights)

    This port delivers the weights of the attributes with respect to the label attribute. The attributes with higher weight are considered more relevant.

  • example set (Data Table)

    ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

Parameters

  • normalize_weightsThis parameter indicates if the calculated weights should be normalized or not. If set to true, all weights are normalized in range from 0 to 1. Range: boolean
  • sort_weightsThis parameter indicates if the attributes should be sorted according to their weights in the results. If this parameter is set to true, the order of the sorting is specified using the sort direction parameter. Range: boolean
  • sort_directionThis parameter is only available when the sort weights parameter is set to true. This parameter specifies the sorting order of the attributes according to their weights. Range: selection

Tutorial Processes

Calculating the attribute weights of the Golf data set

The 'Golf' data set is loaded using the Retrieve operator. The Weight by Gini Index operator is applied on it to calculate the weights of the attributes. All parameters are used with default values. The normalize weights parameter is set to true, thus all the weights will be normalized in range 0 to 1. The sort weights parameter is set to true and the sort direction parameter is set to 'ascending', thus the results will be in ascending order of the weights. You can verify this by viewing the results of this process in the Results Workspace.