Categories

Versions

Extract Macro (RapidMiner Studio Core)

Synopsis

This operator can be used to define a single macro which can be used by %{macro_name} in parameter values of succeeding operators of the current process. The macro value will be derived from the input ExampleSet. A macro can be considered as a value that can be used by all operators of the current process that come after the macro has been defined. This operator can also be used to re-define an existing macro.

Description

This operator can be used to define a single macro which can be used in parameter values of succeeding operators of the current process. Once the macro has been defined, the value of that macro can be used as parameter values in coming operators by writing the macro name in %{macro_name} format in the parameter value where 'macro_name' is the name of the macro specified when it was defined. In the Extract Macro operator the macro name is specified by the macro parameter. The macro will be replaced in the value strings of parameters by the macro's value. This operator can also be used to re-define an existing macro.

This operator sets the value of a single macro from properties of a given input ExampleSet. This includes properties like the number of examples or number of attributes of the input ExampleSet. Specific data value of the input ExampleSet can also be used to set the value of the macro which can be set using various statistical properties of the input ExampleSet e.g. average, min or max value of an attribute. All these options can be understood by studying the parameters and the attached Example Processes. The Set Macro operator can also be used to define a macro but it does not set the value of the macro from properties of a given input ExampleSet.

Macros

A macro can be considered as a value that can be used by all operators of the current process that come after it has been defined. Whenever using macros, make sure that the operators are in the correct sequence. It is compulsory that the macro should be defined before it can be used in parameter values. The macro is one of the advanced topics of RapidMiner, please study the attached Example Processes to develop a better understanding of macros.

There are also some predefined macros:

  • %{process_name}: will be replaced by the name of the process (without path and extension)
  • %{process_file}: will be replaced by the file name of the process (with extension)
  • %{process_path}: will be replaced by the complete absolute path of the process file
  • Several other short macros also exist, e.g. %{a} for the number of times the current operator was applied.

Please note that other operators like many of the loop operators (e.g. Loop Values , Loop Attributes) also add specific macros.

Input

  • example set input (Data Table)

    This input port expects an ExampleSet. The macro value will be extracted from this ExampleSet

Output

  • example set output (Data Table)

    The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators. It is not compulsory to attach this port to any other port, Macro value is set even if this port is left without connections.

Parameters

  • macroThis parameter is used to name the macro and can be accessed in succeeding operators of current process by writing the macro name in %{macro_name} format, where 'macro_name' is the name of the macro specified by this parameter. Range: string
  • macro_typeThis parameter indicates the way the input ExampleSet should be used to define the macro.
    • number_of_examples: If this option is selected, the macro value is set to the total number of examples in the input ExampleSet.
    • number_of_attributes: If this option is selected, the macro value is set to the total number of attributes in the input ExampleSet.
    • data_value: If this option is selected, the macro value is set to the value of the specified attribute at the specified index. The attribute is specified using the attribute name parameter and the index is specified using the example index parameter.
    • statistics: If this option is selected, the macro value is set to the value obtained by applying the selected statistical operation on the specified attribute. The attribute is specified using the attribute name parameter and the statistical operator is selected using the statistics parameter.
    Range:
  • statisticsThis parameter is only available when the macro type parameter is set to 'statistics'. This parameter allows you to select the statistical operator to be applied on the attribute specified by the attribute name parameter. Range:
  • attribute_nameThis parameter is only available when the macro type parameter is set to 'statistics' or 'data value'. This parameter allows you to select the required attribute. Range: string
  • attribute_valueThis parameter is only available when the macro type parameter is set to 'statistics' and the statistics parameter is set to 'count'. This parameter is used to specify a particular value of the specified attribute. The macro value will be set to the number of occurrences of this value in the specified attribute. The attribute is specified by the attribute name parameter. Range: string
  • example_indexThis parameter is only available when the macro type parameter is set to 'data value'. This parameter allows you to select the index of the required example of the attribute specified by the attribute name parameter and the optional additional macros parameter. Range: integer
  • additional_macrosThis parameter is only available when the macro type parameter is set to 'data value'. This optional parameter allows you to add an unlimited amount of additional macros. Note that the value for the example index parameter is used for all macros in this list. Range:

Tutorial Processes

Introduction to the Extract Macro operator

This is a very basic process that demonstrates the use of macros and the Extract Macro operator. The 'Golf' data set is loaded using the Retrieve operator. The Extract Macro operator is applied on it. The macro is named 'avg_temp'. The macro type parameter is set to 'statistics', the statistics parameter is set to 'average' and the attribute name parameter is set to 'Temperature'. Thus the value of the avg_temp macro is set to the average of values of the 'Golf' data set's Temperature attribute. Which in all 14 examples of the 'Golf' data set is 73.571. Thus the value of the avg_temp macro is set to 73.571. In this process, wherever %{avg_temp} is used in parameter values, it will be replaced by the value of the avg_temp macro i.e. 73.571. Note that the output port of the Extract Macro operator is not connected to any other operator but still the avg_temp macro has been created.

The 'Golf-Testset' data set is loaded using the Retrieve operator. The Filter Examples operator is applied on it. The condition class parameter is set to 'attribute value filter'. The parameter string parameter is set to 'Temperature > %{avg_temp}'. Note the use of the avg_temp macro. When this process is run, %{avg_temp} will be replaced by the value of the avg_temp macro i.e. 73.571. Thus only those examples of the Golf-Testset data set will make it to the output port where the value of the Temperature attribute is greater than average value of the Temperature attribute values of the Golf data set (i.e. 73.571). You can clearly verify this by seeing the results in the Results Workspace.

Redefining a macro using the Extract Macro operator

The focus of this Example Process is to show how macros can be redefined using the Extract Macro operator. This process is almost the same as the first Example Process. The only difference is that after the avg_temp macro has been defined, the same macro is redefined using the 'Golf' data set and the Extract Macro operator. The 'Golf' data set is loaded again and it is provided to the second Extract Macro operator. In this Extract Macro operator the macro parameter is set to 'avg_temp' and the macro type parameter is set to 'number of examples'. As the avg_temp macro already exists, no new macro is created; the already existing macro is redefined. As the number of examples in the 'Golf' data set is 14, avg_temp is redefined as 14. Thus in the Filter Examples operator the value of the Temperature attribute of the 'Golf-Testset' data set is compared with 14 instead of 73.571. This can be verified by seeing the results in the Results workspace. Please note that macros are redefined depending on their order of execution.

use of Extract Macro in complex preprocessing

This Example Process is also part of the RapidMiner tutorial. It is included here to show the usage of the Extract Macro operator in complex preprocessing. This process will cover a number of concepts of macros including redefining macros, the macro of the Loop Values operator and the use of the Extract Macro operator. This process starts with a subprocess which is used to generate data. What is happening inside this subprocess is not relevant to the use of macros, so it is not discussed here. A breakpoint is inserted after this subprocess so that you can view the ExampleSet. You can see that the ExampleSet has 12 examples and 2 attributes: 'att1' and 'att2'. 'att1' is nominal and has 3 possible values: 'range1', 'range2' and 'range3'. 'att2' has real values.

The Loop Values operator is applied on the ExampleSet and iterates over the values of the specified attribute (i.e. att1) and applies the inner operators on the given ExampleSet while the current value can be accessed via the macro defined by the iteration macro parameter which is set to 'loop_value', thus the current value can be accessed by specifying %{loop_value} in the parameter values. As att1 has 3 possible values, Loop Values will iterate 3 times, once for each possible value of att1.

Here is an explanation of what happens inside the Loop Values operator. It is provided with an ExampleSet as input. The Filter Examples operator is applied on it. The condition class parameter is set to 'attribute value filter' and the parameter string is set to 'att1 = %{loop_value}'. Note the use of the loop_value macro here. Only those examples are selected where the value of att1 is equal to the value of the loop_value macro. A breakpoint is inserted here so that you can view the selected examples. Then the Aggregation operator is applied on the selected examples. It is configured to take the average of the att2 values of the selected examples. This average value is stored in a new ExampleSet in the attribute named 'average(att2)'. A breakpoint is inserted here so that you can see the average of the att2 values of the selected examples. The Extract Macro operator is applied on this new ExampleSet to store this average value in a macro named 'current_average'. The originally selected examples are passed to the Generate Attributes operator that generates a new attribute named 'att2_abs_avg' which is defined by the expression 'abs(att2 - eval(%{current_average}))'. Note the use of the current_average macro here. Its value is subtracted from all values of att2 and stored in a new attribute named 'att2_abs_avg'. The Resultant ExampleSet is delivered at the output of the Loop Values operator. A breakpoint is inserted here so that you can see the ExampleSet with the 'att2_abs_avg' attribute. This output is fed to the Append operator in the main process. It merges the results of all the iterations into a single ExampleSet which is visible at the end of this process in the Results Workspace.

Here is what you see when you run the process. ExampleSet generated by the Generate Data subprocess. Then the process enters the Loop Value operator and iterates 3 times. Iteration 1: ExampleSet where the 'att1' value is equal to the current value of the loop_value macro i.e. 'range1' Average of 'att2' values for the selected examples. The average is -1.161. ExampleSet with 'att2_abs_avg' attribute for iteration 1. Iteration 2: ExampleSet where the 'att1' value is equal to the current value of the loop_value macro i.e. 'range2' Average of 'att2' values for the selected examples. The average is -1.656. ExampleSet with 'att2_abs_avg' attribute for iteration 2. Iteration 3: ExampleSet where the 'att1' value is equal to the current value of the loop_value macro i.e. 'range3' Average of 'att2' values for the selected examples. The average is 1.340. ExampleSet with 'att2_abs_avg attribute' for iteration 3. Now the process comes out of the Loop Values operator and the Append operator merges the final ExampleSets of all three iterations into a single ExampleSet that you can see in the Results Workspace.