Select by Weights (RapidMiner Studio Core)
Synopsis
This operator selects only those attributes of an input ExampleSet whose weights satisfy the specified criterion with respect to the input weights.Description
This operator selects only those attributes of an input ExampleSet whose weights satisfy the specified criterion with respect to the input weights. Input weights are provided through the weights input port. The criterion for attribute selection by weights is specified by the weight relation parameter.
Input
- example set (Data Table)
This input port expects an ExampleSet. It is the output of the Retrieve operator in the attached Example Process. The output of other operators can also be used as input. It is essential that meta data should be attached with the data for the input because attributes are specified in their meta data. The Retrieve operator provides meta data along-with data
- weights
This port expects the attribute weights. There are numerous operators that provide the attribute weights. The Weight by Correlation operator is used in the Example Process.
Output
- example set (Data Table)
The ExampleSet with selected attributes is output of this port.
- original (Data Table)
The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.
- weights
The Attributes weights that were provided at the weights input port are delivered through this output port.
Parameters
- weight_relationOnly those attributes are selected whose weights satisfy this relation.
- greater: Attributes whose weights are greater than the weight parameter are selected.
- greater_equals: Attributes whose weights are equal or greater than the weight parameter are selected.
- equals: Attributes whose weights are equal to the weight parameter are selected.
- less_equals: Attributes whose weights are equal or less than the weight parameter are selected.
- less: Attributes whose weights are less than the weight parameter are selected.
- top_k: The k attributes with highest weights are selected. k is specified by the k parameter.
- bottom_k: The k attributes with lowest weights are selected. k is specified by the k parameter.
- all_but_top_k: All attributes other than the k attributes with highest weights are selected. k is specified by the k parameter.
- all_but_bottom_k: All attributes other than k attributes with lowest weights are selected. k is specified by the k parameter.
- top_p%: The top p percent attributes with highest weights are selected. p is specified by the p parameter.
- bottom_p%: The bottom p percent attributes with lowest weights are selected. p is specified by the p parameter.
- weightThis parameter is available only when the weight relation parameter is set to 'greater', 'greater equals', 'equals', 'less equals' or 'less'. This parameter is used to compare weights. Range:
- kThis parameter is available only when the weight relation parameter is set to 'top k', 'bottom k', 'all but top k' or 'all but bottom k'. It is used to count the number of attributes to select. Range:
- pThis parameter is available only when the weight relation parameter is set to 'top p%' or 'bottom p%'. It is used to specify the percentage of attributes to select. Range:
- deselect_unknownThis is an expert parameter. This parameter indicates if attributes whose weight is unknown should be removed from the ExampleSet. Range:
- use_absolute_weightsThis is an expert parameter. This parameter indicates if the absolute values of the weights should be used for comparison. Range:
Tutorial Processes
Selecting attributes from Sonar data set
The 'Sonar' data set is loaded using the Retrieve operator. The Weight by Correlation operator is applied on it to generate attribute weights. A breakpoint is inserted here. You can see the attributes with their weights here. The Select by Weights operator is applied next. The 'Sonar' data set is provided at the exampleset port and weights calculated by the Weight by Correlation operator are provided at the weights input port. The weight relation parameter is set to 'bottom k' and the k parameter is set to 4. Thus 4 attributes with minimum weights are selected. As you can see the 'attribute_57', 'attribute_17', 'attribute_30' and 'attribute_16' have lowest weights, thus these four attributes are selected. Also note that the label attribute 'class' is also selected. This is because the attributes with special roles are selected irrespective of weights condition.