Select Attributes (RapidMiner Studio Core)

Synopsis

This operator selects which attributes of an ExampleSet should be kept and which attributes should be removed. This is used in cases when not all attributes of an ExampleSet are required; it helps you to select required attributes.

Description

Often need arises for selecting attributes before applying some operators. This is especially true for large and complex data sets. The Select Attributes operator lets you select required attributes conveniently. Different filter types are provided to make attribute selection easy. Only the selected attributes will be delivered from the output port and the rest will be removed from the ExampleSet.

Input

  • example set (Data Table)

    This input port expects an ExampleSet. It is output of the Retrieve operator in the attached Example Process. The output of other operators can also be used as input. It is essential that meta data should be attached with the data for input because attributes are specified in their meta data. The Retrieve operator provides meta data along-with data.

Output

  • example set (Data Table)

    The ExampleSet with selected attributes is output of this port.

  • original (Data Table)

    The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

Parameters

  • attribute_filter_typeThis parameter allows you to select the attribute selection filter; the method you want to use for selecting attributes. It has the following options:
    • all: This option simply selects all the attributes of the ExampleSet, no attributes are removed. This is the default option.
    • single: This option allows the selection of a single attribute. When this option is selected another parameter (attribute) becomes visible in the Parameters panel.
    • subset: This option allows the selection of multiple attributes through a list. All attributes of ExampleSet are present in the list; required attributes can be easily selected. This option will not work if the meta data is not known. When this option is selected another parameter becomes visible in the Parameters panel.
    • regular_expression: This option allows you to specify a regular expression for the attribute selection. When this option is selected some other parameters (regular expression, use except expression) become visible in the Parameters panel.
    • value_type: This option allows selection of all the attributes of a particular type. It should be noted that types are hierarchical. For example real and integer types both belong to the numeric type. The user should have a basic understanding of type hierarchy when selecting attributes through this option. When this option is selected some other parameters (value type, use value type exception) become visible in the Parameters panel.
    • block_type: This option is similar in working to the value_type option. This option allows the selection of all the attributes of a particular block type. It should be noted that block types may be hierarchical. For example value_series_start and value_series_end block types both belong to the value_series block type. When this option is selected some other parameters (block type, use block type exception) become visible in the Parameters panel.
    • no_missing_values: This option simply selects all the attributes of the ExampleSet which don't contain a missing value in any example. Attributes that have even a single missing value are removed.
    • numeric_value_filter: When this option is selected another parameter (numeric condition) becomes visible in the Parameters panel. All numeric attributes whose examples all satisfy the mentioned numeric condition are selected. Please note that all nominal attributes are also selected irrespective of the given numerical condition.
    Range: selection
  • attributeThe required attribute can be selected from this option. The attribute name can be selected from the drop down box of the parameter attribute if the meta data is known. Range: string
  • attributesThe required attributes can be selected from this option. This opens a new window with two lists. All attributes are present in the left list and can be shifted to the right list, which is the list of selected attributes that will make it to the output port; all other attributes will be removed. Range: string
  • regular_expressionThe attributes whose name match this expression will be selected. Regular expression is very powerful tool but needs a detailed explanation to beginners. It is always good to specify the regular expression through the edit and preview regular expression menu. This menu gives a good idea of regular expressions and it also allows you to try different expressions and preview the results simultaneously. This will enhance your concept of regular expressions. Range: string
  • use_except_expressionIf enabled, an exception to the first regular expression can be specified. When this option is selected another parameter (except regular expression) becomes visible in the Parameters panel. Range: boolean
  • except_regular_expressionThis option allows you to specify a regular expression. Attributes matching this expression will be filtered out even if they match the first expression (expression that was specified in regular expression parameter). Range: string
  • value_typeThe type of attributes to be selected can be chosen from a drop down list. One of the following types can be chosen: nominal, numeric, integer, real, text, binominal, polynominal, file_path, date_time, date, time. Range: selection
  • use_value_type_exceptionIf enabled, an exception to the selected type can be specified. When this option is selected another parameter (except value type) becomes visible in the Parameters panel. Range: boolean
  • except_value_typeThe attributes matching this type will be removed from the final output even if they matched the previously mentioned type i.e. the value type parameter's value. One of the following types can be selected here: nominal, numeric, integer, real, text, binominal, polynominal, file_path, date_time, date, time. Range: selection
  • block_typeThe Block type of attributes to be selected can be chosen from a drop down list. One of the following types can be chosen: single_value, value_series, value_series_start, value_series_end, value_matrix, value_matrix_start, value_matrix_end, value_matrix_row_start. Range: selection
  • use_block_type_exceptionIf enabled, an exception to the selected block type can be specified. When this option is selected another parameter (except block type) becomes visible in the Parameters panel. Range: boolean
  • except_block_typeThe attributes matching this block type will be removed from the final output even if they matched the previously mentioned block type. One of the following block types can be selected here: single_value, value_series, value_series_start, value_series_end, value_matrix, value_matrix_start, value_matrix_end, value_matrix_row_start. Range: selection
  • numeric_conditionThe numeric condition for testing examples of numeric attributes is mention here. For example the numeric condition '> 6' will keep all nominal attributes and all numeric attributes having a value of greater than 6 in every example. A combination of conditions is possible: '> 6 && < 11' or '<= 5 || < 0'. But && and || cannot be used together in one numeric condition. Conditions like '(> 0 && < 2) || (>10 && < 12)' are not allowed because they use both && and ||. Range: string
  • include_special_attributesSpecial attributes are attributes with special roles which identify the examples. In contrast regular attributes simply describe the examples. Special attributes are: id, label, prediction, cluster, weight and batch. By default all special attributes are delivered to the output port irrespective of the conditions in the Select Attribute operator. If this parameter is set to true, Special attributes are also tested against conditions specified in the Select Attribute operator and only those attributes are selected that satisfy the conditions. Range: boolean
  • invert_selectionIf this parameter is set to true, it acts as a NOT gate, it reverses the selection. In that case all the selected attributes are removed and previously removed attributes are selected. For example if attribute 'att1' is selected and attribute 'att2' is removed prior to selection of this parameter. After selection of this parameter 'att1' will be removed and 'att2' will be selected. Range: boolean

Tutorial Processes

Selecting attributes by specifying regular expressions matching their names

In the given Example process the Labor-Negotiations ExampleSet is loaded using the Retrieve operator. Then Select Attribute operator is applied on it. Have a look at the Parameters panel of the Select Attributes operator. Here is a stepwise explanation of this process. See that at the bottom of Parameters panel the include special attributes parameter is set to true. This means that all special attributes will also be checked against all the given conditions, they will appear in the output only if they pass all the conditions. The only special attribute is the 'class' attribute in this ExampleSet. Though 'class' is a special attribute; it will make to the output port only if it passes the conditions because the include special attributes parameter is set to true. The regular expression specified is = w.*|.*y.* w.* means all attribute names with starting alphabet 'w'. wage-inc-1st, wage-inc-2nd, wage-inc-3rd, working-hours satisfy this condition .*y.* means all attributes that have a 'y' in their name. standby-pay, statutory-holidays, longterm-disability-assistance satisfy this condition. || means logical OR operator. So if any attribute whose name starts with 'w' or its name contains a 'y', it satisfies this expression and is selected. Following attributes of the Labor-Negotiations data set satisfy this expression: wage-inc-1st, wage-inc-2nd, wage-inc-3rd, working-hours, standby-pay, statutory-holidays, longterm-disability-assistance. The use except expression parameter is also set to true which means attributes that satisfy the condition in the except regular expression parameter would be removed. The regular expression for except regular expression is = .*[0-9].* This expression means any attribute whose name contains a digit. Three attributes satisfy this condition: wage-inc-1st, wage-inc-2nd, wage-inc-3rd. Thus these three attributes do not make it to the output port even though they satisfied the regular expression of the regular expression parameter. Finally we are left with the following four attributes: working-hours, standby-pay, statutory-holidays, longterm-disability-assistance. These four attributes make it to the output port. Notice that the invert selection parameter was not set to true. If it was set to true, all attributes other than these four attributes would have made it to the output port.