Loop Attribute Subsets (RapidMiner Studio Core)

Synopsis

This operator iterates over its subprocess for all possible combinations of regular attributes in the input ExampleSet. Optionally, the minimum and maximum number of attributes in a combination can be specified by the user.

Description

The Loop Attribute Subsets operator is a nested operator i.e. it has a subprocess. The subprocess of the Loop Attribute Subsets operator executes n number of times, where n is the number of possible combinations of the regular attributes in the given ExampleSet. The user can specify the minimum and maximum number of attributes in a combination through the respective parameters; in this case the value of n will change accordingly. So, if an ExampleSet has three regular attributes say a, b and c. Then this operator will execute 7 times; once for each attribute combination. The combinations will be {a},{b},{c},{a,b},{a,c},{b,c} and {a,b,c}. Please study the attached Example Process for more information.

This operator can be useful in combination with the Log operator and, for example, a performance evaluation operator. In contrast to the brute force feature selection, which performs a similar task, this iterative approach needs much less memory and can be performed on larger data sets.

Input

  • example set (IOObject)

    This input port expects an ExampleSet. It is the output of the Retrieve operator in the attached Example Process. The output of other operators can also be used as input.

Output

  • example set (IOObject)

    The ExampleSet that was given as input is delivered through this port without any modifications.

Parameters

  • use_exact_numberIf this parameter is set to true, then the subprocess will be executed only for combinations of a specified length i.e. specified number of attributes. The length of combinations is specified by the exact number of attributes parameter. Range: boolean
  • exact_number_of_attributesThis parameter determines the exact number of attributes to be used for the combinations. Range: integer
  • min_number_of_attributesThis parameter determines the minimum number of attributes to be used for the combinations. Range: integer
  • limit_max_numberIf this parameter is set to true, then the subprocess will be executed only for combinations that have less than or equal to m number of attributes; where m is specified by the max number of attributes parameter. Range: boolean
  • max_number_of_attributesThis parameter determines the maximum number of attributes to be used for the combinations. Range: integer

Tutorial Processes

Introduction to the Loop Attribute Subsets operator

The 'Golf' data set is loaded using the Retrieve operator. A breakpoint is inserted here so that you can have a look at the ExampleSet before the application of the Loop Attribute Subsets operator. You can see that the ExampleSet has four regular attributes. The Loop Attribute Subsets operator is applied on this ExampleSet with default values of all parameters. As no limit is applied on the minimum and maximum number of attributes in a combination, the subprocess of this operator will execute for all possible combinations of the four regular attributes. Have a look at the subprocess of the Loop Attribute Subsets operator. The Log operator is applied there to store the names of attributes of each iteration in the Log table. Execute the process and shift to the Results Workspace. Check the Table View of the Log results. You will see the names of attributes of each iteration. As there were 4 attributes there are 15 possible non-null combinations.