Weight by User Specification (RapidMiner Studio Core)
Synopsis
This operator assigns user-defined weights to the specified attributes. The attributes can be selected using regular expressions.Description
The Weight by User Specification operator assigns user-defined weights to the selected attributes of the given ExampleSet. The higher the weight of an attribute, the more relevant it is considered. Unlike many other weighting operators, this operator can be applied on ExampleSets with both nominal or numerical label.
The name regex to weights parameter is used for selecting the attributes and assigning weights to them. The attributes are selected through regular expressions. Multiple regular expressions can be used for different attribute selections. Please note that the weights defined in the regular expression list are set in the order as they are defined in the list, i.e. weights can overwrite weights set before.
If the distribute weights parameter is set to true, then the weight specified in the name regex to weights parameter is divided equally into all the attributes that match the regular expression. The default weight parameter specifies weight of all those attributes that do not match any regular expression. Please Study the attached Example Process for more information.
Input
- example set (Data Table)
This input port expects an ExampleSet. It is output of the Retrieve operator in the attached Example Process.
Output
- weights (Attribute Weights)
This port delivers the weights of the attributes with respect to the label attribute. The attributes with higher weight are considered more relevant.
- example set (Data Table)
ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.
Parameters
- normalize_weightsThis parameter indicates if the calculated weights should be normalized or not. If set to true, all weights are normalized in range from 0 to 1. Range: boolean
- sort_weightsThis parameter indicates if the attributes should be sorted according to their weights in the results. If this parameter is set to true, the order of the sorting is specified using the sort direction parameter. Range: boolean
- sort_directionThis parameter is only available when the sort weights parameter is set to true. This parameter specifies the sorting order of the attributes according to their weights. Range: selection
- name_regex_to_weightsThis parameter is used for selecting the attributes and assigning weights to them. The attributes are selected through regular expressions. Multiple regular expressions can be used for different attribute selections. Please note that the weights defined in the regular expression list are set in the order as they are defined in the list, i.e. weights can overwrite weights set before. Range: list
- distribute_weightsIf this parameter is set to true, the weight specified in the name regex to weights parameter is split and distributed equally among the attributes matching the corresponding regular expressions. Range: boolean
- default_weightThis parameter specifies the weight of all those attributes that do not match any regular expression. Range: real
Tutorial Processes
Manually setting the attribute weights of the Golf data set
The 'Golf' data set is loaded using the Retrieve operator. The Weight by User Specification operator is applied on it to assign the attribute weights. The normalize weights parameter is set to false, thus the weights will not be normalized. The sort weights parameter is set to true and the sort direction parameter is set to 'ascending', thus the results will be in ascending order of the weights. The name regex to weights parameter is used for assigning weights through regular expressions. Only one regular expression is defined in this process. This regular expression selects all those attributes that have the alphabet 'i' in their names. The matching attributes (i.e. Humidity and Wind) are assigned weight 4.0. All those attributes that do not match any regular expression (i.e.. Temperature and Outlook) are assigned the default weight which is defined by the default weight parameter. In this process it is set to 1.0. Run the process and you will see that the attributes that matched the regular expression get corresponding weight (i.e. 4.0) and the remaining attributes get default weight (i.e. 1.0). Now set the distribute weights parameter to true and run the process again. Now the weight 4.0 will be equally split into the Wind and Humidity attributes, thus their weight will be set to 2.0. You can verify this by viewing results of the process in the Results Workspace.