Select Attributes (RapidMiner Studio Core)

Synopsis

This Operator selects a subset of Attributes of an ExampleSet and removes the other Attributes.

Description

The Operator provides different filter types to make Attribute selection easy. Possibilities are for example: Direct selection of Attributes. Selection by a regular expression or selecting only Attributes without missing values. See parameter attribute filter type for a detailed description of the different filter types.

The invert selection parameter reverses the selection. Special Attributes (Attributes with Roles, like id, label, weight) are by default ignored in the selection. They will always remain in the resulting output ExampleSet. The parameter include special attributes changes this.

Only the selected Attributes are delivered to the output port. The rest is removed from the ExampleSet.

Differentiation

Select by <...> Operators

There are several Operators which selects Attributes according to different input. For example the Select by Weights selects Attributes whose weights match a specified criterion. The Select by Random Operator selects a random subset of Attributes. The Remove Attribute Range removes a range of Attributes according to the index of the Attributes. The Remove Useless Attributes Operator removes Attributes which can be considered to be useless according to some specified criteria. The Remove Correlated Attributes Operator removes Attributes which are correlated to each other.

Work on Subset

This Operator is a combination of the Select Attributes Operator and the Subprocess Operator. It applies the Operators in its inner process to an ExampleSet with only the Attributes which are selected by the attribute filter type. The inner result is merged back to the whole input ExampleSet.

Forward Selection

This is an implementation of the forward selection feature selection method. It selects the most relevant Attributes according to an model which is trained inside the Operator. For details see the documentation of the Forward Selection Operator.

Backward Elimination

This is an implementation of the backward elimination feature selection method. It selects the most relevant Attributes according to an model which is trained inside the Operator. For details see the documentation of the Forward Selection Operator.

Filter Examples

This Operator does not select Attributes, but filters (or select) Examples. Thus it is the similar operation as the Select Attributes but applied on Examples instead of Attributes.

Input

  • example set (Data Table)

    This input port expects an ExampleSet for which you want to select Attributes from.

Output

  • example set (Data Table)

    The ExampleSet with only the selected Attributes is delivered to this output port.

  • original (Data Table)

    The ExampleSet that was given as input is passed without changing to the output through this port.

Parameters

  • attribute_filter_type

    This parameter allows you to select the Attribute selection filter; the method you want to use for selecting Attributes. It has the following options:

    • all: This option selects all the Attributes of the ExampleSet, no Attributes are removed. This is the default option.
    • single: This option allows the selection of a single Attribute. The required Attribute is selected by the attribute parameter.
    • subset: This option allows the selection of multiple Attributes through a list (see parameter attributes). If the meta data of the ExampleSet is known all Attributes are present in the list and the required ones can easily be selected.
    • regular_expression: This option allows you to specify a regular expression for the Attribute selection. The regular expression filter is configured by the parameters regular expression, use except expression and except expression.
    • value_type: This option allows selection of all the Attributes of a particular type. It should be noted that types are hierarchical. For example real and integer types both belong to the numeric type. The value type filter is configured by the parameters value type, use value type exception, except value type.
    • block_type: This option allows the selection of all the Attributes of a particular block type. It should be noted that block types may be hierarchical. For example value_series_start and value_series_end block types both belong to the value_series block type. The block type filter is configured by the parameters block type, use block type exception, except block type.
    • no_missing_values: This option selects all Attributes of the ExampleSet which do not contain a missing value in any Example. Attributes that have even a single missing value are removed.
    • numeric_value_filter: All numeric Attributes whose Examples all match a given numeric condition are selected. The condition is specified by the numeric condition parameter. Please note that all nominal Attributes are also selected irrespective of the given numerical condition.
    Range:
  • attribute

    The required Attribute can be selected from this option. The Attribute name can be selected from the drop down box of the parameter if the meta data is known.

    Range:
  • attributes

    The required Attributes can be selected from this option. This opens a new window with two lists. All Attributes are present in the left list. They can be shifted to the right list, which is the list of selected Attributes that will make it to the output port.

    Range:
  • regular_expression

    Attributes whose names match this expression will be selected. The expression can be specified through the edit and preview regular expression menu. This menu gives a good idea of regular expressions and it also allows you to try different expressions and preview the results simultaneously.

    Range:
  • use_except_expression

    If enabled, an exception to the first regular expression can be specified. This exception is specified by the except regular expression parameter.

    Range:
  • except_regular_expression

    This option allows you to specify a regular expression. Attributes matching this expression will be filtered out even if they match the first expression (expression that was specified in regular expression parameter).

    Range:
  • value_type

    This option allows to select a type of Attribute. One of the following types can be chosen: nominal, numeric, integer, real, text, binominal, polynominal, file_path, date_time, date, time.

    Range:
  • use_value_type_exception

    If enabled, an exception to the selected type can be specified. This exception is specified by the except value type parameter.

    Range:
  • except_value_type

    The Attributes matching this type will be removed from the final output even if they matched the before selected type, specified by the value type parameter. One of the following types can be selected here: nominal, numeric, integer, real, text, binominal, polynominal, file_path, date_time, date, time.

    Range:
  • block_type

    This option allows to select a block type of Attribute. One of the following types can be chosen: single_value, value_series, value_series_start, value_series_end, value_matrix, value_matrix_start, value_matrix_end, value_matrix_row_start.

    Range:
  • use_block_type_exception

    If enabled, an exception to the selected block type can be specified. This exception is specified by the except block type parameter.

    Range:
  • except_block_type

    The Attributes matching this block type will be removed from the final output even if they matched the before selected type by the block type parameter. One of the following block types can be selected here: single_value, value_series, value_series_start, value_series_end, value_matrix, value_matrix_start, value_matrix_end, value_matrix_row_start.

    Range:
  • numeric_condition

    The numeric condition used by the numeric condition filter type. A numeric Attribute is kept if all Examples match the specified condition for this Attribute. For example the numeric condition '> 6' will keep all numeric Attributes having a value of greater than 6 in every Example. A combination of conditions is possible: '> 6 && < 11' or '<= 5 || < 0'. But && and || cannot be used together in one numeric condition. Conditions like '(> 0 && < 2) || (>10 && < 12)' are not allowed because they use both && and ||. Nominal Attributes are always kept, regardless of the specified numeric condition.

    Range:
  • include_special_attributes

    Special Attributes are Attributes with special roles. These are: id, label, prediction, cluster, weight and batch. Also custom roles can be assigned to Attributes. By default all special Attributes are delivered to the output port irrespective of the conditions in the Select Attribute Operator. If this parameter is set to true, special Attributes are also tested against conditions specified in the Select Attribute Operator and only those Attributes are selected that match the conditions.

    Range:
  • invert_selection

    If this parameter is set to true the selection is reversed. In that case all Attributes matching the specified condition are removed and the other Attributes remain in the output ExampleSet. Special Attributes are kept independent of the invert selection parameter as along as the include special attributes parameter is not set to true. If so the condition is also applied to the special Attributes and the selection is reversed if this parameter is checked.

    Range:

Tutorial Processes

Selecting Attributes from the Titanic Data Sample

This tutorial Process show the basic usage of the Select Attributes Operator. First the 'Titanic' data is retrieved from the Samples folder. The first Select Attributes Operator selects a subset of the Attributes. The subset is specified by the attributes parameter.

The original output port is connected to the input port of the second Select Attributes Operator. There only nominal Attributes are selected.

Different usages of the Select Attributes Operator

This tutorial Process demonstrates different usages of the Select Attributes Operator. A demo ExampleSet is created inside a Subprocess Oeprator. It has 3 special Attributes (id, label, weight) and 5 regular Attributes (att1, att2, att3, att4, att5). Also different attribute types are used (numeric: id; binominal: label; numeric: weight; real: att1, att2, att4, att5; nominal: att3). After the Subprocess Operator a Breakpoint is inserted, to investigate the demo ExampleSet.

Next several Select Attributes Operators are used to show the different attribute filter types and the combinations with the parameters invert selection and include special attributes.

See the comments in the process for more details.

Selecting Attributes by using a regular expression

This tutorial Process illustrates the usage of a regular expression to select Attributes from the Labor-Negotiations data sample. The regular expression specified is: w.*|.*y.*

This means all Attributes starting with a 'w' (w.*) or (|) all Attributes whose name contains a 'y' in their name (.*y.*) matches the expression. The following Attributes of the Labor-Negotiations data set match this expression:

wage-inc-1st, wage-inc-2nd, wage-inc-3rd, working-hours, standby-pay, statutory-holidays, longterm-disability-assistance.

The use except expression parameter is also set to true. Thus Attributes that match the condition in the except regular expression parameter will be removed. The specified except regular expression is: .*\[0-9\].*. This means all Attributes whose name contains a digit are removed.

Finally the following four Attributes are selected: working-hours, standby-pay, statutory-holidays, longterm-disability-assistance. Besides the special Attribute class is also kept.

For more details about regular expression see the configuration of the regular expression parameter.