Filter Examples (RapidMiner Studio Core)

Synopsis

This Operator selects which Examples of an ExampleSet are kept and which Examples are removed.

Description

The Operator returns those Examples that match the given condition. The conditions are defined by the user. Several pre-defined conditions also exist as advanced options.

Differentiation

Select Attributes

Filter Examples may reduce the number of Examples in an ExampleSet but it has no effect on the number of Attributes. The Select Attributes Operator is used to select Attributes.

Filter Example Range

The Filter Example Range Operator can be used to select Examples that lie in the specified index range (i.e. number of lines).

Input

  • example set input (Data Table)

    This input port expects an ExampleSet on which the defined filter will be applied.

Output

  • example set output (Data Table)

    This port outputs an ExampleSet with only the Examples, that satisfied the specified condition.

  • original (Data Table)

    The ExampleSet, that was given as input is passed through without changes.

  • unmatched example set (Data Table)

    An ExampleSet including only those Examples, that did not meet the specified condition.

Parameters

  • filters

    This is the default parameter for defining filter condtions via 'Add Filters...' dialog window. It is also available when the 'custom_filters' condition class is selected. This option allows the definition of a custom filter condition. A condition consists of an Attribute, a comparison function and a value to match. More conditions can be added by the "Add Entry" button. Several filters can be joined either by "Match all" or "Match any".

    Range:
  • condition_class

    This parameter only appears when the 'show advanced parameters' is activated. Otherwise the default selection 'custom_filters' is shown. Various predefined conditions are available for filtering Examples. Examples matching the selected condition are passed to the output port, others are removed. The available conditions are:

    • all: If this option is selected, no Examples are removed.
    • correct_predictions: If this option is selected, only those Examples are returned, where the prediction is correct. This option requires that the ExampleSet has two Attributes with the special roles Label and Prediction. Then only those Examples are returned, where the values of the label Attribute and prediction Attribute are the same.
    • wrong predictions: This option is the same as the correct_predictions option, but with the reversed result. Those Examples are matched, where the prediction is not the same as the label.
    • no_missing_attributes: If this option is selected, only those Examples are matched that have no missing values. Missing values or null values are shown as '?' in RapidMiner.
    • missing_attributes: If this option is selected, only those Examples are matched, that have missing values. Missing values are shown as '?' in RapidMiner.
    • no_missing_labels: If this option is selected, only those Examples are matched, that don't have a missing value in the special Attribute with the label role.
    • missing_labels: If this option is selected, only those Examples are matched, that have a missing value in the special Attribute with the label role.
    • attribute_value_filter: If this option is selected, a condition can be entered in the field of the parameter string. The option is like the default filter. The details are explained below in the parameter string description. The benefit of declaring a filter statement as a string is an increased flexibility using macros.
    • expression: With this option, expressions can be defined that offer more functions to write matching condition. How expressions can be used to filter Examples is explained below in the parameter expression description.
    • custom_filters: This option is the same as the default filters parameter.
    Range:
  • parameter_string

    This parameter is available when the parameter 'attribute_value_filter' is selected as condition class. The condition format is an Attribute name, followed by a comparison function and a value to match. For numerical Attributes the comparison functions are >, <, <=, >= and = while the matching value has to be a number. Nominal Attributes can be compared by = and != with an arbitrary string, which can also include a regular expression. Multiple conditions can be linked by a logical OR (||) or a logical AND (&&). Missing values can be written as '?' for numerical attributes and as '\?' for nominal attributes.

    Range:
  • parameter_expression

    This parameter is available when the parameter 'expression' is selected as condition class. Expressions can be entered as String or with the expression builder dialog. The expression needs to evaluate to a boolean value and should include one or more Attributes. This option is useful to build more complex matching conditions. For example including mathematical calculations or text manipulation.

    Range:
  • invert_filter

    If this parameter is set to true the selected condition is inverted. All matching Examples are removed from the output and Examples that don't match the condition are in the output.

    Range:

Tutorial Processes

Filter Examples using custom filters

This tutorial Process shows how to filter the Golf ExampleSet by combining two filter conditions.

Filter Examples using attribute value filter

This tutorial Process uses the advanced parameter attribute_filter to define a condition string. It uses the regular expression .*n.* to filter all Examples where the value of the Outlook Attribute contains the letter 'n'. The second statement filters the Examples where the Temperature Attribute is greater than 70. Both conditions are combined with an OR statement ( || )

Filter Examples using expression

This tutorial Process loads the Titanic data and uses an expression string to select all passengers whose name contains "Miss." and who are younger than 30, as well as all male passengers.