Categories

Versions

You are viewing the RapidMiner Studio documentation for version 8.1 - Check here for latest version

Replace (Dictionary) (RapidMiner Studio Core)

Synopsis

This operator replaces substrings (in the values) of the selected nominal attributes of the first ExampleSet by using the dictionary specified by the second ExampleSet.

Description

This operator takes two ExampleSets as input. It replaces substrings (in the values) of the selected nominal attributes of the first ExampleSet by using the value-mappings defined in the second ExampleSet. This operator uses the second ExampleSet as a dictionary. The second ExampleSet must have two nominal attributes for value-mappings i.e. the 'from' attribute (i.e. specified through the from attribute parameter) and the 'to' attribute (i.e. specified through the to attribute parameter). For every example in the second ExampleSet a dictionary entry is created that matches the 'from attribute' value to the 'to attribute' value. Finally, this dictionary is used for replacing substrings in the first ExampleSet. If the values of the 'from' attribute of the second ExampleSet are found (as a whole or as a substring) in the selected nominal attributes of the first ExampleSet, then the corresponding value of the 'to' attribute is used as a replacement for the substring in the first ExampleSet. Please study the attached Example Process for better understanding.

Input

  • example set input (Data Table)

    This input port expects an ExampleSet. It is the output of the Retrieve operator in the attached Example Process. The output of other operators can also be used as input. The ExampleSet should have at least one nominal attribute because if there is no such attribute, the use of this operator does not make sense. The substrings of this ExampleSet will be replaced by using the second ExampleSet.

  • dictionary (Data Table)

    This input port expects an ExampleSet. It is the output of the Subprocess operator in the attached Example Process. The output of other operators can also be used as input. This ExampleSet should have a 'from attribute' and 'to attribute' as specified in the description of this operator. These attributes will be used for substring replacements in the first ExampleSet.

Output

  • example set output (Data Table)

    The substrings of the selected nominal attributes of the first ExampleSet are replaced and the resultant ExampleSet is delivered through this port.

  • original (Data Table)

    The first ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

  • preprocessing model (Preprocessing Model)

    This port delivers the preprocessing model, which has the information regarding the parameters of this operator in the current process.

Parameters

  • create_view It is possible to create a View instead of changing the underlying data. Simply select this parameter to enable this option. The transformation that would be normally performed directly on the data will then be computed every time a value is requested and the result is returned without changing the data. Range: boolean
  • attribute_filter_typeThis parameter allows you to select the attribute selection filter; the method you want to use for selecting the required attributes. It has the following options:
    • all: This option simply selects all the attributes of the ExampleSet. This is the default option.
    • single: This option allows selection of a single attribute. When this option is selected another parameter (attribute) becomes visible in the Parameters panel.
    • subset: This option allows selection of multiple attributes through a list. All attributes of the ExampleSet are present in the list; required attributes can be easily selected. This option will not work if the meta data is not known. When this option is selected another parameter becomes visible in the Parameters panel.
    • regular_expression: This option allows you to specify a regular expression for attribute selection. When this option is selected some other parameters (regular expression, use except expression) become visible in the Parameters panel.
    • value_type: This option allows selection of all the attributes of a particular type. It should be noted that types are hierarchical. For example real and integer types both belong to the numeric type. Users should have a basic understanding of type hierarchy when selecting attributes through this option. When it is selected some other parameters (value type, use value type exception) become visible in the Parameters panel.
    • block_type: This option is similar in working to the value type option. This option allows selection of all the attributes of a particular block type. When this option is selected some other parameters (block type, use block type exception) become visible in the Parameters panel.
    • no_missing_values: This option simply selects all the attributes of the ExampleSet which don't contain a missing value in any example. Attributes that have even a single missing value are removed.
    • numeric value filter: When this option is selected another parameter (numeric condition) becomes visible in the Parameters panel. All numeric attributes whose examples all satisfy the mentioned numeric condition are selected. Please note that all nominal attributes are also selected irrespective of the given numerical condition.
    Range: selection
  • attributeThe desired attribute can be selected from this option. The attribute name can be selected from the drop down box of attribute parameter if the meta data is known. Range: string
  • attributesThe required attributes can be selected from this option. This opens a new window with two lists. All attributes are present in the left list and can be shifted to the right list which is the list of selected attributes on which the conversion from nominal to numeric will take place; all other attributes will remain unchanged. Range: string
  • regular_expressionThe attributes whose name matches this expression will be selected. Regular expression is a very powerful tool but needs a detailed explanation to beginners. It is always good to specify the regular expression through the edit and preview regular expression menu. This menu gives a good idea of regular expressions. This menu also allows you to try different expressions and preview the results simultaneously. This will enhance your concept of regular expressions. Range: string
  • use_except_expressionIf enabled, an exception to the selected type can be specified. When this option is selected another parameter (except value type) becomes visible in the Parameters panel. Range: boolean
  • except_regular_expressionThis option allows you to specify a regular expression. Attributes matching this expression will be filtered out even if they match the first expression (expression that was specified in the regular expression parameter). Range: string
  • value_typeThe type of attributes to be selected can be chosen from a drop down list. One of the following types can be chosen: nominal, text, binominal, polynominal, file_path. Range: selection
  • use_value_type_exception If enabled, an exception to the selected type can be specified. When this option is selected another parameter (except value type) becomes visible in the Parameters panel. Range: boolean
  • except_value_typeThe attributes matching this type will be removed from the final output even if they matched the previously mentioned type i.e. value type parameter's value. One of the following types can be selected here: nominal, text, binominal, polynominal, file_path. Range: selection
  • block_typeThe block type of attributes to be selected can be chosen from a drop down list. The only possible value here is 'single_value' Range: selection
  • use_block_type_exception If enabled, an exception to the selected block type can be specified. When this option is selected another parameter (except block type) becomes visible in the Parameters panel. Range: boolean
  • except_block_typeThe attributes matching this block type will be removed from the final output even if they matched the previously mentioned block type. Range: selection
  • numeric_conditionThe numeric condition for testing examples of numeric attributes is specified here. For example the numeric condition '> 6' will keep all nominal attributes and all numeric attributes having a value of greater than 6 in every example. A combination of conditions is possible: '> 6 && < 11' or '<= 5 || < 0'. But && and || cannot be used together in one numeric condition. Conditions like '(> 0 && < 2) || (>10 && < 12)' are not allowed because they use both && and ||. Use a blank space after '>', '=' and '<' e.g. '<5' will not work, so use '< 5' instead. Range: string
  • include_special_attributesThe special attributes are attributes with special roles which identify the examples. In contrast regular attributes simply describe the examples. Special attributes are: id, label, prediction, cluster, weight and batch. Range: boolean
  • invert_selectionIf this parameter is set to true, it acts as a NOT gate, it reverses the selection. In that case all the selected attributes are unselected and previously unselected attributes are selected. For example if attribute 'att1' is selected and attribute 'att2' is unselected prior to checking of this parameter. After checking of this parameter 'att1' will be unselected and 'att2' will be selected. Range: boolean
  • from_attributeThis parameter specifies the name of the attribute of the second ExampleSet that specifies the substrings that should be replaced. Range: string
  • to_attributeThis parameter specifies the name of the attribute of the second ExampleSet that specifies the replacements of the substrings. Range: string
  • use_regular_expressionsThis parameter specifies if the replacements should be treated as regular expressions. Range: boolean
  • convert_to_lowercaseThis parameter specifies if the strings should be converted to lower case. Range: boolean
  • first_match_onlyThis parameter specifies if only the first match in the dictionary should be considered. If set to false, subsequent matches will be applied iteratively. Range: boolean

Tutorial Processes

Replacing substrings by using a dictionary

The 'Golf' data set is loaded using the Retrieve operator. A breakpoint is inserted here so that you can have a look at this ExampleSet. This ExampleSet will be used as the first ExampleSet for the Replace (Dictionary) operator. Therefore substring replacements will be made in this ExampleSet. The second ExampleSet is provided by the Subprocess operator. The operator chain inside the Subprocess operator generates a dictionary ExampleSet for this process. The explanation of this inner chain of operators is not relevant here. A breakpoint is inserted here so that you can have a look at the ExampleSet. You can see that this ExampleSet has two nominal attributes 'att1' and 'att2'. The Replace (Dictionary) operator takes these two ExampleSets as input and makes substring replacements in the first ExampleSet by using the second ExampleSet. Have a look at the parameters of the Replace (Dictionary) operator. The attribute filter type parameter is set to 'all', thus substring replacements will be done in all attributes of the first ExampleSet. The from attribute and to attribute parameters are set to 'att1' and 'att2' respectively. Thus if the values of the 'att1' attribute (i.e. 'true' and 'false') are found in any attribute of the first ExampleSet, they will be replaced by the corresponding 'att2' attribute values (i.e. 'YES' and 'NO' respectively). All other parameters are used with default values. Run the process and compare the resultant ExampleSet with the original ExampleSet. You can clearly see in the Wind attribute that the substrings 'true' and 'false' have been replaced by 'YES' and 'NO' respectively. Please note that this operator is a substring replacement tool, although it was used for value replacement in this process. If the 'att1' attribute had the value 'tr' instead of 'true'; all occurrences of this substring would have been replaced by 'YES'. In that case 'true' value in the Wind attribute would have been changed to 'YESue'.