Replace Missing Values (RapidMiner Studio Core)

Synopsis

This operator replaces missing values in examples of selected attributes by a specified replacement.

Description

This operator replaces missing values in examples of selected attributes by a specified replacement. Missing values can be replaced by the minimum, maximum or average value of that attribute. Zero can also be placed in place of missing values. Any replenishment value can also be specified as a replacement of missing values.

Input

  • example set (Data Table)

    This input port expects an ExampleSet. It is the output of the Retrieve operator in the attached Example Process. The output of other operators can also be used as input. It is essential that meta data should be attached with the data for the input because attributes are specified in their meta data. The Retrieve operator provides meta data along-with data.

Output

  • example set (Data Table)

    The ExampleSet with missing values replaced by specified replacement is output of this port.

  • original (Data Table)

    The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

  • preprocessing model (Preprocessing Model)

    This port delivers the preprocessing model, which has information regarding the parameters of this operator in the current process.

Parameters

  • create_view It is possible to create a View instead of changing the underlying data. Simply select this parameter to enable this option. The transformation that would be normally performed directly on the data will then be computed every time a value is requested and the result is returned without changing the data. Range: boolean
  • attribute_filter_typeThis parameter allows you to select the attribute selection filter; the method you want to use for selecting attributes in which you want to replace missing values. It has the following options:
    • all: This option simply selects all the attributes of the ExampleSet. This is the default option.
    • single: This option allows selection of a single attribute. When this option is selected another parameter (attribute) becomes visible in Parameters panel.
    • subset: This option allows selection of multiple attributes through a list. All attributes of ExampleSet are present in the list; required attributes can be easily selected. This option will not work if meta data is not known. When this option is selected another parameter becomes visible in Parameters panel.
    • regular_expression: This option allows you to specify a regular expression for attribute selection. When this option is selected some other parameters (regular expression, use except expression) become visible in Parameters panel.
    • value_type: This option allows selection of all the attributes of a particular type. It should be noted that types are hierarchical. For example real and integer types both belong to numeric type. User should have basic understanding of type hierarchy when selecting attributes through this option. When this option is selected some other parameters (value type, use value type exception) become visible in Parameters panel.
    • block_type: This option is similar in working to value_type option. This option allows selection of all the attributes of a particular block type. It should be noted that block types may be hierarchical. For example value_series_start and value_series_end block types both belong to value_series block type. When this option is selected some other parameters (block type, use block type exception) become visible in Parameters panel.
    • no_missing_values: This option simply selects all the attributes of the ExampleSet which don't contain a missing value in any example. Attributes that have even a single missing value are not selected.
    • numeric_value_filter: When this option is selected another parameter (numeric condition) becomes visible in Parameters panel. All numeric attributes whose all examples satisfy the mentioned numeric condition are selected. Please note that all nominal attributes are also selected irrespective of the given numerical condition.
    Range: selection
  • attributeThe required attribute can be selected from this option. The attribute name can be selected from the drop down box of the parameter attribute if the meta data is known. Range: string
  • attributesThe required attributes can be selected from this option. This opens a new window with two lists. All attributes are present in the left list and can be shifted to the right list which is the list of selected attributes. Range: string
  • regular_expressionAttributes whose name match this expression will be selected. Regular expression is a very powerful tool but needs a detailed explanation to beginners. It is always good to specify the regular expression through edit and preview regular expression menu. This menu gives a good idea of regular expressions. It also allows you to try different expressions and preview the results simultaneously. This will enhance your concept of regular expressions. Range: string
  • use_except_expressionIf enabled, an exception to the first regular expression can be specified. When this option is selected another parameter (except regular expression) becomes visible in Parameters panel. Range: boolean
  • except_regular_expressionThis option allows you to specify a regular expression. Attributes matching this expression will be filtered out even if they match the first expression (expression that was specified in regular expression parameter). Range: string
  • value_typeType of attributes to be selected can be chosen from drop down list. Range: selection
  • use_value_type_exceptionIf enabled, an exception to the selected type can be specified. When this option is selected another parameter (except value type) becomes visible in Parameters panel. Range: boolean
  • except_value_typeAttributes matching this type will be removed from the final output even if they matched the previously mentioned type i.e. value typeparameter's value. Range: selection
  • block_typeBlock type of attributes to be selected can be chosen from drop down list. Range: selection
  • use_block_type_exceptionIf enabled, an exception to the selected block type can be specified. When this option is selected another parameter (except block type) becomes visible in Parameters panel. Range: boolean
  • except_block_typeAttributes matching this block type will be removed from the final output even if they matched the previously mentioned block type. Range: selection
  • numeric_conditionNumeric condition for testing examples of numeric attributes is mention here. For example the numeric condition '> 6' will keep all nominal attributes and all numeric attributes having a value of greater than 6 in every example. A combination of conditions is possible: '> 6 && < 11' or '<= 5 || < 0'. But && and || cannot be used together in one numeric condition. Conditions like '(> 0 && < 2) || (>10 && < 12)' are not allowed because they use both && and ||. Use a blank space after '>', '=' and '<' e.g. '<5' will not work, so use '< 5' instead. Range: string
  • invert_selectionIf this parameter is set to true, it acts as a NOT gate, it reverses the selection. In that case all the selected attributes are unselected and previously unselected attributes are selected. For example if attribute 'att1' is selected and attribute 'att2' is removed prior to selection of this parameter. After selection of this parameter 'att1' will be removed and 'att2' will be selected. Range: boolean
  • include_special_attributesSpecial attributes are attributes with special roles which identify the examples. In contrast regular attributes simply describe the examples. Special attributes are: id, label, prediction, cluster, weight and batch. By default all special attributes are delivered to the output port irrespective of the conditions in the Select Attribute operator. If this parameter is set to true, Special attributes are also tested against conditions specified in the Select Attribute operator and only those attributes are selected that satisfy the conditions. Range: boolean
  • defaultFunction to apply to all columns that are not explicitly specified by the columns parameter.
    • none: If this option is selected, no function is applied by default i.e. missing values are not replaced by default.
    • minimum: If this option is selected, by default missing values are replaced by the minimum value of that attribute.
    • maximum: If this option is selected, by default missing values are replaced by the maximum value of that attribute.
    • average: If this option is selected, by default missing values are replaced by the average value of that attribute.
    • zero: If this option is selected, by default missing values are replaced by zero.
    • value: If this option is selected, by default missing values are replaced by the value specified in the replenishment value parameter.
    Range: selection
  • columnsDifferent attributes can be provided with a different type of replacements through this parameter. The default function selected by the default parameter is applied on attributes that are not explicitly mentioned in the columns parameter Range: list
  • replenishment_valueThis parameter is available for replacing missing values by a specified value. Range: string

Tutorial Processes

Replacing missing values of the Labor Negotiations data set

The focus of this process is to show the use of the default and columns parameters. All other parameters are for selection of attributes on which replacement is to be applied. For understanding these parameters please study the Example Process of the Select Attributes operator.

The 'Labor Negotiations' data set is loaded using the Retrieve operator. A breakpoint is inserted at this point so that you can view the data before the application of the Replace Missing Values operator. The Replace Missing Values operator is applied on it. The attribute filter type parameter is set to 'no missing values' and the invert selection parameter is also checked, thus all attributes with missing values are selected. In the columns parameter the 'wage-inc-1st', 'wage-inc-2nd' , 'wage-inc-3rd' and 'working hours' attributes are set to 'minimum', 'maximum', 'zero' and 'value' respectively. The minimum value of the 'wage-inc-1st' attribute is 2.000, thus missing values are replaced with 2.000. The maximum value of the 'wage-inc-2nd' attribute is 7.000, thus missing values are replaced with 7.000. Missing values of wage-inc-3rd are replaced by 0. The replenishment value parameter is set to 35, thus missing values of the 'working hours' operator are set to 35. The default parameter is set to 'average', thus missing values of all other attributes are replaced by the average value of that attribute.