Categories

Versions

You are viewing the RapidMiner Studio documentation for version 10.0 - Check here for latest version

Map (RapidMiner Studio Core)

Synopsis

This operator maps specified values of selected attributes to new values. This operator can be applied on both numerical and nominal attributes.

Description

This operator can be used to replace nominal values (e.g. replace the value 'green' by the value 'green_color') as well as numerical values (e.g. replace all values '3' by '-1'). But, one use of this operator can do mappings for attributes of only one type. A single mapping can be specified using the parameters replace what and replace by as in Replace operator. Multiple mappings can be specified through the value mappings parameter. Additionally, the operator allows defining a default mapping. This operator allows you to select attributes to make mappings in. This operator allows you to specify a regular expression. Attribute values of selected attributes that match this regular expression are mapped by the specified value mapping. Please go through the parameters and the Example Process to develop a better understanding of this operator.

Input

  • example set (Data Table)

    This input port expects an ExampleSet. It is the output of the Retrieve operator in the attached Example Process. The output of other operators can also be used as input. It is essential that meta data should be attached with the data for the input because attributes are specified in their meta data. The Retrieve operator provides meta data along-with data.

Output

  • example set (Data Table)

    The ExampleSet with value mappings is output of this port.

  • original (Data Table)

    The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

Parameters

  • attribute_filter_typeThis parameter allows you to select the attribute selection filter; the method you want to use for selecting attributes on which you want to apply mappings. It has the following options:
    • all: This option simply selects all the attributes of the ExampleSet. This is the default option.
    • single: This option allows selection of a single attribute. When this option is selected another parameter (attribute) becomes visible in the Parameters panel. (Since RapidMiner 6.0.4 the Operator will fail if a selected Attribute is not in the ExampleSet)
    • subset: This option allows selection of multiple attributes through a list. All attributes of ExampleSet are present in the list; required attributes can be easily selected. This option will not work if the meta data is not known. When this option is selected another parameter becomes visible in the Parameters panel. (Since RapidMiner 6.0.4 the Operator will fail if a selected Attribute is not in the ExampleSet)
    • regular_expression: This option allows you to specify a regular expression for the attribute selection. When this option is selected some other parameters (regular expression, use except expression) become visible in the Parameters panel.
    • value_type: This option allows selection of all the attributes of a particular type. It should be noted that types are hierarchical. For example real and integer types both belong to the numeric type. The user should have a basic understanding of type hierarchy when selecting attributes through this option. When this option is selected some other parameters (value type, use value type exception) become visible in the Parameters panel.
    • block_type: This option is similar in working to the value_type option. This option allows selection of all the attributes of a particular block type. It should be noted that block types may be hierarchical. For example value_series_start and value_series_end block types both belong to the value_series block type. When this option is selected some other parameters (block type, use block type exception) become visible in the Parameters panel.
    • no_missing_values: This option simply selects all the attributes of the ExampleSet which don't contain a missing value in any example. Attributes that have even a single missing value are not selected.
    • numeric_value_filter: When this option is selected another parameter (numeric condition) becomes visible in the Parameters panel. All numeric attributes whose examples all satisfy the mentioned numeric condition are selected. Please note that all nominal attributes are also selected irrespective of the given numerical condition.
    Range: selection
  • attributeThe required attribute can be selected from this option. The attribute name can be selected from the drop down box of the parameter attribute if the meta data is known. Range: string
  • attributesThe required attributes can be selected from this option. This opens a new window with two lists. All attributes are present in the left list and can be shifted to the right list which is the list of selected attributes. Range: string
  • regular_expressionAttributes whose name match this expression will be selected. Regular expression is very powerful tool but needs a detailed explanation to beginners. It is always good to specify the regular expression through edit and preview regular expression menu. This menu gives a good idea of regular expressions. This menu also allows you to try different expressions and preview the results simultaneously. This will enhance your concept of regular expressions. Range: string
  • use_except_expressionIf enabled, an exception to the first regular expression can be specified. When this option is selected another parameter (except regular expression) becomes visible in the Parameters panel. Range: boolean
  • except_regular_expressionThis option allows you to specify a regular expression. Attributes matching this expression will be filtered out even if they match the first expression (expression that was specified in regular expression parameter). Range: string
  • value_typeThe type of attributes to be selected can be chosen from a drop down list. Range: selection
  • use_value_type_exceptionIf enabled, an exception to the selected type can be specified. When this option is selected another parameter (except value type) becomes visible in the Parameters panel. Range: boolean
  • except_value_typeAttributes matching this type will be removed from the final output even if they matched the previously mentioned type i.e. value typeparameter's value. Range: selection
  • block_typeBlock type of attributes to be selected can be chosen from drop down list. Range: selection
  • use_block_type_exceptionIf enabled, an exception to the selected block type can be specified. When this option is selected another parameter (except block type) becomes visible in the Parameters panel. Range: boolean
  • except_block_typeAttributes matching this block type will be removed from the final output even if they matched the previously mentioned block type. Range: selection
  • numeric_conditionNumeric condition for testing examples of numeric attributes is mention here. For example the numeric condition '> 6' will keep all nominal attributes and all numeric attributes having a value of greater than 6 in every example. A combination of conditions is possible: '> 6 && < 11' or '<= 5 || < 0'. But && and || cannot be used together in one numeric condition. Conditions like '(> 0 && < 2) || (>10 && < 12)' are not allowed because they use both && and ||. Use a blank space after '>', '=' and '<' e.g. '<5' will not work, so use '< 5' instead. Range: string
  • include_special_attributesSpecial attributes are attributes with special roles which identify the examples. In contrast regular attributes simply describe the examples. Special attributes are: id, label, prediction, cluster, weight and batch. By default all special attributes are selected irrespective of the conditions in the Select Attribute operator. If this parameter is set to true, Special attributes are also tested against conditions specified in the Select Attribute operator and only those attributes are selected that satisfy the conditions. Range: boolean
  • invert_selectionIf this parameter is set to true, it acts as a NOT gate, it reverses the selection. In that case all the selected attributes are unselected and previously unselected attributes are selected. For example if attribute 'att1' is selected and attribute 'att2' is removed prior to selection of this parameter. After selection of this parameter 'att1' will be removed and 'att2' will be selected. Range: boolean
  • value_mappingsMultiple mappings can be specified through this parameter. If only a single mapping is required. It can be done using the parameters replace what and replace by as in the Replace operator. Old values and new values can be easily specified through this parameter. Multiple mappings can be defined for the same old value but only the new value corresponding to the first mapping is taken as replacement. Regular expressions can also be used here if the consider regular expressions parameter is set to true. Range:
  • replace_whatThis parameter specifies what is to be replaced. This can be specified using regular expressions. This parameter is useful only if single mapping is to be done. For multiple mappings use the value mappings parameter Range: string
  • replace_byRegions matching regular expression of the replace what parameter are replaced by the value of the replace by parameter.This parameter is useful only if single mapping is to be done. For multiple mappings use the value mappings parameter. Range: string
  • consider_regular_expressionsThis parameter enables matching based on regular expressions; old values(old values are original values, old values and 'replace what' represent the same thing) may be specified as regular expressions. If the parameter consider regular expressions is enabled, old values are replaced by the new values if the old values match the given regular expressions. The value corresponding to the first matching regular expression in the mappings list is taken as a replacement. Range: boolean
  • add_default_mappingIf set to true, all values that occur in the selected attributes of the ExampleSet but are not listed in the value mappings list are mapped to the value of the default value parameter. Range: boolean
  • default_valueThis parameter is only available if the add default mapping parameter is checked. If add default mapping is set to true and the default value is properly set, all values that occur in the selected attributes of the ExampleSet but are not listed in the value mappings list are replaced by the default value. This may be helpful in cases where only some values should be mapped explicitly and many unimportant values should be mapped to a default value (e.g. 'other'). Range: string

Tutorial Processes

Mapping multiple values

Focus of this Example Process is the use of the value mappings parameter and the default value parameter. Use of the replace what and replace by parameter can be seen in the Example Process of the Replace operator. Almost all other parameters of the Map operator are also part of the Select Attributes operator, their use can be better understood by studying the Attributes operator and it's Example Process.

The 'Golf' data set is loaded using the Retrieve operator. The Map operator is applied on it. 'Wind' and 'Outlook' attributes are selected for mapping. Thus, the effect of the Map operator will be limited to just these two attributes. Four value mappings are specified in the value mappings parameter. 'true', 'false', 'overcast' and 'sunny' are replaced by 'yes', 'no', 'bad' and 'good' respectively. The add default mappings parameter is set to true and 'other' is specified in the default value parameter. 'Wind' attribute has only two possible values i.e. 'true' and 'false'. Both of them were mapped in the mappings list. 'Outlook' attribute has three possible values i.e. 'sunny', 'overcast' and 'rain'. 'sunny' and 'overcast' were mapped in the mappings list but 'rain' was not mapped. As add default mappings parameter is set to true, 'rain' will be mapped to the default value i.e. 'other'.