Work on Subset (RapidMiner Studio Core)

Synopsis

This operator selects a subset (one or more attributes) of the input ExampleSet and applies the operators in its subprocess on the selected subset.

Description

The Work on the Subset operator can be considered as the blend of the Select Attributes and Subprocess operator to some extent. The attributes are selected in the same way as selected by the Select Attributes operator and the subprocess of this operator works in the same way as the Subprocess operator works. A subprocess can be considered as small unit of a process where all operators and a combination of operators can be applied in a subprocess. That is why a subprocess can also be defined as a chain of operators that is subsequently applied. For more information about subprocess please study the Subprocess operator. Although the Work on Subset operator has similarities with the Select Attributes and Subprocess operators however, this operator provides some functionality that cannot be performed by the combination of the Select Attributes and Subprocess operator. Most importantly, this operator can merge the results of its subprocess with the input ExampleSet such that the original subset is overwritten by the subset received after processing of the subset in the subprocess. This merging can be controlled by the keep subset only parameter. This parameter is set to false by default. Thus merging is done by default. If this parameter is set to true, then only the result of the subprocess is returned by this operator and no merging is done. In such a case this operator behaves very similar to the combination of the Select Attributes and Subprocess operator. This can be understood easily by studying the attached Example Process.

This operator can also deliver the additional results of the subprocess if desired. This can be controlled by the deliver inner results parameter. Please note that this is a very powerful operator. It can be used to create new preprocessing schemes by combining it with other preprocessing operators. However, there are two major restrictions:

  • Since the result of the subprocess will be combined with the rest of the input ExampleSet, the number of examples is not allowed to be changed inside the subprocess.
  • The changes in the role of an attribute will not be delivered outside the subprocess.

Input

  • example set (IOObject)

    This input port expects an ExampleSet. It is the output of the Retrieve operator in the attached Example Process. The output of other operators can also be used as input.

Output

  • example set (IOObject)

    The result of the subprocess will be combined with the rest of the input ExampleSet and delivered through this port. However if the keep subset only parameter is set to true then only the result of the subprocess will be delivered.

  • through

    This operator can also deliver the additional results of the subprocess if desired. This can be controlled by the deliver inner results parameter. This port is used for delivering the additional results of the subprocess.The Work on Subset operator can have multiple through ports. When one through port is connected, another through port becomes available which is ready to deliver another output (if any). The order of outputs remains the same. The object passed at the first through port inside the subprocess of the Work on Subset operator is delivered at the first through port of the operator.

Parameters

  • attribute_filter_typeThis parameter allows you to select the attribute selection filter; the method you want to use for selecting attributes. It has the following options:
    • all: This option simply selects all the attributes of the ExampleSet. This is the default option.
    • single: This option allows selection of a single attribute. When this option is selected another parameter (attribute) becomes visible in the Parameters panel.
    • subset: This option allows selection of multiple attributes through a list. All attributes of ExampleSet are present in the list; required attributes can be easily selected. This option will not work if meta data is not known. When this option is selected another parameter becomes visible in the Parameters panel.
    • regular_expression: This option allows you to specify a regular expression for attribute selection. When this option is selected some other parameters (regular expression, use except expression) become visible in the Parameters panel.
    • value_type: This option allows selection of all the attributes of a particular type. It should be noted that types are hierarchical. For example real and integer types both belong to the numeric type. Users should have basic understanding of type hierarchy when selecting attributes through this option. When this option is selected some other parameters (value type, use value type exception) become visible in the Parameters panel.
    • block_type: This option is similar in working to the value_type option. This option allows selection of all the attributes of a particular block type. It should be noted that block types may be hierarchical. For example value_series_start and value_series_end block types both belong to the value_series block type. When this option is selected some other parameters (block type, use block type exception) become visible in the Parameters panel.
    • no_missing_values: This option simply selects all the attributes of the ExampleSet which don't contain a missing value in any example. Attributes that have even a single missing value are removed.
    • numeric value filter: When this option is selected another parameter (numeric condition) becomes visible in the Parameters panel. All numeric attributes whose all examples satisfy the mentioned numeric condition are selected. Please note that all nominal attributes are also selected irrespective of the given numerical condition.
    Range: selection
  • attributeThe required attribute can be selected from this option. The attribute name can be selected from the drop down box of the parameter attribute if the meta data is known. Range: string
  • attributesThe required attributes can be selected from this option. This opens a new window with two lists. All attributes are present in the left list and can be shifted to the right list, which is the list of selected attributes. Range: string
  • regular_expressionThe attributes whose name match this expression will be selected. Regular expression is a very powerful tool but needs a detailed explanation to beginners. It is always good to specify the regular expression through the edit and preview regular expression menu. This menu gives a good idea of regular expressions and it also allows you to try different expressions and preview the results simultaneously. Range: string
  • use_except_expressionIf enabled, an exception to the first regular expression can be specified. When this option is selected another parameter (except regular expression) becomes visible in the Parameters panel. Range: boolean
  • except_regular_expressionThis option allows you to specify a regular expression. Attributes matching this expression will be filtered out even if they match the first regular expression (regular expression that was specified in the regular expression parameter). Range: string
  • value_typeThe type of attributes to be selected can be chosen from a drop down list. Range: selection
  • use_value_type_exception If enabled, an exception to the selected type can be specified. When this option is enabled, another parameter (except value type) becomes visible in the Parameters panel. Range: boolean
  • except_value_typeThe attributes matching this type will not be selected even if they match the previously mentioned type i.e. value type parameter's value. Range: selection
  • block_typeThe block type of attributes to be selected can be chosen from a drop down list. Range: selection
  • use_block_type_exception If enabled, an exception to the selected block type can be specified. When this option is selected another parameter (except block type) becomes visible in the Parameters panel. Range: boolean
  • except_block_typeThe attributes matching this block type will be not be selected even if they match the previously mentioned block type i.e. block type parameter's value. Range: selection
  • numeric_conditionThe numeric condition for testing examples of numeric attributes is specified here. For example the numeric condition '> 6' will keep all nominal attributes and all numeric attributes having a value of greater than 6 in every example. A combination of conditions is possible: '> 6 && < 11' or '<= 5 || < 0'. But && and || cannot be used together in one numeric condition. Conditions like '(> 0 && < 2) || (>10 && < 12)' are not allowed because they use both && and ||. Use a blank space after '>', '=' and '<' e.g. '<5' will not work, so use '< 5' instead. Range: string
  • include_special_attributesThe special attributes are attributes with special roles. Special attributes are those attributes which identify the examples. In contrast regular attributes simply describe the examples. Special attributes are: id, label, prediction, cluster, weight and batch. By default all special attributes selected irrespective of the conditions in the Select Attribute operator. If this parameter is set to true, Special attributes are also tested against conditions specified in the Select Attribute operator and only those attributes are selected that satisfy the conditions. Range: boolean
  • invert_selectionIf this parameter is set to true, it acts as a NOT gate, it reverses the selection. In that case all the selected attributes are unselected and previously unselected attributes are selected. For example if attribute 'att1' is selected and attribute 'att2' is unselected prior to checking of this parameter. After checking of this parameter 'att1' will be unselected and 'att2' will be selected. Range: boolean
  • name_conflict_handling This parameter decides how to handle a conflict with names when the Operator merges the Subset back to the ExampleSet. There are three possible behaviors:
    • error: The Operator will show an Error if there is any conflict.
    • keep new: If there is an conflict, the Operator will keep the one from the Subset.The other one will be deleted.
    • keep original: If there is an conflict, the Operator will keep the one which is not in the Subset. The other one will be deleted.
    Range: selection
  • role_conflict_handlingThis parameter decides how to handle a conflict with roles when the Operator merges the Subset back to the ExampleSet. There are three possible behaviors:
    • error: The Operator will show an Error if there is any conflict.
    • keep new: If there is an conflict, the Operator will keep the one from the Subset.The other one will be deleted.
    • keep original: If there is an conflict, the Operator will keep the one which is not in the Subset. The other one will be deleted.
    Range: selection
  • keep_subset_onlyThe Work on Subset operator can merge the results of its subprocess with the input ExampleSet such that the original subset is overwritten by the subset received after processing of the subset in the subprocess. This merging can be controlled by the keep subset only parameter. This parameter is set to false by default. Thus merging is done by default. If this parameter is set to true, then only the result of the subprocess is returned by this operator and no merging is done. Range: boolean
  • deliver_inner_resultsThis parameter indicates if the additional results (other than the input ExampleSet) of the subprocess should also be returned. If this parameter is set to true then the additional results are delivered through the through ports. Range: boolean
  • remove_rolesThis parameter decides if the role of the Special Attributes in the Subset will be removed by entering the Subset or not. Range: boolean

Tutorial Processes

Working on a subset of Golf data set

The 'Golf' data set is loaded using the Retrieve operator. Then the Work on Subset operator is applied on it. The attribute filter type parameter is set to subset. The attributes parameter is used for selecting the 'Temperature' and 'Humidity' attributes. Double-click on the Work on Subset operator to see its subprocess. All the operations in the subprocess will be performed only on the selected attributes i.e. the 'Temperature' and 'Humidity' attributes. The Normalize operator is applied in the subprocess. The attribute filter type parameter of the Normalize operator is set to 'all'. Please note that the Normalize operator will not be applied on 'all' the attributes of the input ExampleSet rather it would be applied on 'all' selected attributes of the input ExampleSet i.e. the 'Temperature' and 'Humidity' attributes. Run the process. You will see that the normalized 'Humidity' and 'Temperature' attribute are combined with the rest of the input ExampleSet. Now set the keep subset only parameter to true and run the process again. Now you will see that only the results of the subprocess are delivered by the Work on Subset operator. This Example Process just explains the basic usage of this operator. This operator can be used for creating new preprocessing schemes by combining it with other preprocessing operators.