Generate Aggregation (RapidMiner Studio Core)

Synopsis

This operator generates a new attribute by performing the specified aggregation function on every example of the selected attributes.

Description

This operator can be considered to be a blend of the Generate Attributes operator and the Aggregate operator. This operator generates a new attribute which consists of a function of several other attributes. These 'other' attributes can be selected by the attribute filter type parameter and other associated parameters. The aggregation function is selected through the aggregation function parameter. Several aggregation functions are available e.g. count, minimum, maximum, average, mode etc. The attribute name parameter specifies the name of the new attribute. If you think this operator is close to your requirement but not exactly what you need, have a look at the Aggregate and the Generate Attributes operators because they perform similar tasks.

Differentiation

Aggregate

This operator performs the aggregation functions known from SQL. It provides a lot of functionalities in the same format as provided by the SQL aggregation functions. SQL aggregation functions and GROUP BY and HAVING clauses can be imitated using this operator.

Generate Attributes

It is a very powerful operator for generating new attributes from existing attributes. It even supports regular expressions and conditional statements for specifying the new attributes

Input

example set input (Data Table)
This input port expects an ExampleSet. It is the output of the Retrieve operator in the attached Example Process. The output of other operators can also be used as input.

Output

example set output (Data Table)
The ExampleSet with the additional attribute generated after applying the specified aggregation function is output of this port.
original (Data Table)
The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

Parameters

attribute_nameThe name of the resulting attribute is specified through this parameter. Range: string
attribute_filter_typeThis parameter allows you to select the attribute selection filter; the method you want to use for selecting the required attributes. It has the following options:
- all: This option simply selects all the attributes of the ExampleSet. This is the default option.
- single: This option allows selection of a single attribute. When this option is selected another parameter (attribute) becomes visible in the Parameters panel.
- subset: This option allows selection of multiple attributes through a list. All attributes of the ExampleSet are present in the list; required attributes can be easily selected. This option will not work if the meta data is not known. When this option is selected another parameter becomes visible in the Parameters panel.
- regular_expression: This option allows you to specify a regular expression for attribute selection. When this option is selected some other parameters (regular expression, use except expression) become visible in the Parameters panel.
- value_type: This option allows selection of all the attributes of a particular type. It should be noted that types are hierarchical. For example real and integer types both belong to the numeric type. Users should have a basic understanding of type hierarchy when selecting attributes through this option. When this option is selected some other parameters (value type, use value type exception) become visible in the Parameters panel.
- block_type: This option is similar in working to the value type option. This option allows selection of all the attributes of a particular block type. When this option is selected some other parameters (block type, use block type exception) become visible in the Parameters panel.
- no_missing_values: This option simply selects all the attributes of the ExampleSet which don't contain a missing value in any example. Attributes that have even a single missing value are removed.
- numeric value filter: When this option is selected another parameter (numeric condition) becomes visible in the Parameters panel. All numeric attributes whose examples all satisfy the mentioned numeric condition are selected. Please note that all nominal attributes are also selected irrespective of the given numerical condition.
Range: selection
attributeThe desired attribute can be selected from this option. The attribute name can be selected from the drop down box of attribute parameter if the meta data is known. Range: string
attributesThe required attributes can be selected from this option. This opens a new window with two lists. All attributes are present in the left list and can be shifted to the right list which is the list of selected attributes on which the conversion from nominal to numeric will take place; all other attributes will remain unchanged. Range: string
regular_expressionThe attributes whose name matches this expression will be selected. Regular expression is a very powerful tool but needs a detailed explanation to beginners. It is always good to specify the regular expression through the edit and preview regular expression menu. This menu gives a good idea of regular expressions. This menu also allows you to try different expressions and preview the results simultaneously. This will enhance your concept of regular expressions. Range: string
use_except_expressionIf enabled, an exception to the selected type can be specified. When this option is selected another parameter (except value type) becomes visible in the Parameters panel. Range: boolean
except_regular_expressionThis option allows you to specify a regular expression. Attributes matching this expression will be filtered out even if they match the first expression (expression that was specified in the regular expression parameter). Range: string
value_typeThe type of attributes to be selected can be chosen from a drop down list. One of the following types can be chosen: nominal, text, binominal, polynominal, file_path. Range: selection
use_value_type_exception If enabled, an exception to the selected type can be specified. When this option is selected another parameter (except value type) becomes visible in the Parameters panel. Range: boolean
except_value_typeThe attributes matching this type will be removed from the final output even if they matched the previously mentioned type i.e. value type parameter's value. One of the following types can be selected here: nominal, text, binominal, polynominal, file_path. Range: selection
block_typeThe block type of attributes to be selected can be chosen from a drop down list. The only possible value here is 'single_value' Range: selection
use_block_type_exception If enabled, an exception to the selected block type can be specified. When this option is selected another parameter (except block type) becomes visible in the Parameters panel. Range: boolean
except_block_typeThe attributes matching this block type will be removed from the final output even if they matched the previously mentioned block type. Range: selection
numeric_conditionThe numeric condition for testing examples of numeric attributes is specified here. For example the numeric condition '> 6' will keep all nominal attributes and all numeric attributes having a value of greater than 6 in every example. A combination of conditions is possible: '> 6 && < 11' or '<= 5 || < 0'. But && and || cannot be used together in one numeric condition. Conditions like '(> 0 && < 2) || (>10 && < 12)' are not allowed because they use both && and ||. Use a blank space after '>', '=' and '<' e.g. '<5' will not work, so use '< 5' instead. Range: string
include_special_attributesThe special attributes are attributes with special roles which identify the examples. In contrast regular attributes simply describe the examples. Special attributes are: id, label, prediction, cluster, weight and batch. Range: boolean
invert_selectionIf this parameter is set to true, it acts as a NOT gate, it reverses the selection. In that case all the selected attributes are unselected and previously unselected attributes are selected. For example if attribute 'att1' is selected and attribute 'att2' is unselected prior to checking of this parameter. After checking of this parameter 'att1' will be unselected and 'att2' will be selected. Range: boolean
aggregation_functionThis parameter specifies the function for aggregating the values of the selected attribute. Numerous options are available e.g. average, variance, standard deviation, count, minimum, maximum, sum, mode, median and product. Range: selection
keep_allThis parameter indicates if all old attributes should be kept. If this parameter is set to false then all the selected attributes (i.e. attributes that are used for aggregation) are removed. Range: boolean
ignore_missingsThis parameter indicates if missing values should be ignored and if the aggregation function should be only applied on existing values. If this parameter is not set to true the aggregated value will be a missing value in the presence of missing values in the selected attribute. Range: boolean
ignore_missing_attributesNormally an error is shown when the attribute filter doesn't match any attributes of the ExampleSet. If this parameter is set to true, that situation will be ignored. Range: boolean

Tutorial Processes

Generating an attribute having average of real attributes of Sonar data set

The 'Sonar' data set is loaded using the Retrieve operator. A breakpoint is inserted here so that you can have a look at the ExampleSet. You can see that the ExampleSet has one nominal and sixty real attributes. The Generate Aggregation operator is applied on this ExampleSet to generate a new attribute from the real attributes of the ExampleSet.

The attribute name parameter is set to 'Average' thus the new attribute will be named 'Average'. The attribute filter type parameter is set to 'value type' and the value type parameter is set to 'real', thus the new attribute will be created from real attributes of the ExampleSet. The aggregation function parameter is set to 'average', thus the new attribute will be average of the selected attributes.

The resultant ExampleSet can be seen in the Results Workspace. You can see that there is a new attribute named 'Average' in the ExampleSet that has the average value of the attribute_1, attribute_2, ..., attribute_60 attributes.