Generate Attributes (RapidMiner Studio Core)

Synopsis

This operator constructs new user defined attributes using mathematical expressions.

Description

The Generate Attributes operator constructs new attributes from the attributes of the input ExampleSet and arbitrary constants using mathematical expressions. The attribute names of the input ExampleSet might be used as variables in the mathematical expressions for new attributes. During the application of this operator these expressions are evaluated on each example, these variables are then filled with the example's attribute values. Thus this operator not only creates new columns for new attributes, but also fills those columns with corresponding values of those attributes. If a variable is undefined in an expression, the entire expression becomes undefined and '?' is stored at its location.

Please note that there are some restrictions for the attribute names in order to let this operator work properly:

  • Attribute names containing dashes '-' or other special characters, or having the same name as a constant (e.g. 'e' or 'pi') must be placed in square brackets e.g. '[weird-name]' or '[pi]'.
  • Attribute names containing square brackets or backslashes must be placed in square brackets and the square brackets and backslashes inside the name must be escaped, e.g. '[a\\tt\[1\]]' for an attribute 'a\tt[1]'.

If you want to apply this operator but the attributes of your ExampleSet do not fulfill above mentioned conditions you can rename attributes with the Rename operator before application of the Generate Attributes operator. When replacing several attributes following a certain schema, the Rename by Replacing operator might prove useful.

A large number of operations and functions is supported, which allows you to write rich expressions. For a list of operations and functions and their descriptions open the Edit Expression dialog. Complicated expressions can be created by using multiple operations and functions. Parenthesis can be used to nest operations.

This operator also supports various constants (for example 'INFINITY', 'PI' and 'e'). Again you can find a complete list in the Edit Expression dialog. You can also use strings in operations but the string values should be enclosed in double quotes (").

Input

  • example set (Data Table)

    This input port expects an ExampleSet. It is the output of the Rename operator in the attached Example Process. The output of other operators can also be used as input.

Output

  • example set (Data Table)

    The ExampleSet with new attributes is output of this port.

  • original (Data Table)

    The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the results workspace.

Parameters

  • function_descriptionsThe list of functions for generating new attributes is provided here. Range:
  • keep_allIf set to true, all the original attributes are kept, otherwise they are removed from the output ExampleSet. Range: boolean

Tutorial Processes

Generating attributes through different function descriptions

The 'Labor-Negotiations' data set is loaded using the Retrieve operator.

Now have a look at the Generate Attributes operator's parameters. The keep all parameter is checked, thus all attributes of the 'Labor-Negotiations' data set are also kept along with attributes generated by the Generate Attributes operator.

Click on the Edit List button of the function descriptions parameter to have a look at descriptions of functions defined for generating new attributes. 18 new attributes are generated, there might be better ways of generating these attributes but here they are written to explain the usage of the different type of functions available in the Generate Attributes operator. Please read the function description of each attribute and then see the values of the corresponding attribute in the Results Workspace to understand it completely. Here is a description of attributes created by this operator:

The 'average wage-inc' attribute takes sum of the wage-inc-1st, wage-inc-2nd and wage-inc-3rd attribute values and divides the sum by 3. This gives an average of wage-increments. There are better ways of doing this, but this example was just shown to clarify the use of some basic functions. The 'neglected worker bool' attribute is a boolean attribute i.e. it has only two possible values '0' and '1'. This attribute was created here to show usage of logical operations like 'AND' and 'OR' in the Generate Attributes operator. This attribute assumes value '1' if three conditions are satisfied. First, the working-hours attribute has value 35 or more. Second, the education-allowance attribute is not equal to 'yes'. Third, the vacation attribute has value 'average' OR 'below-average'. If any of these conditions is not satisfied, the new attribute gets value '0'. The 'logarithmic attribute' attribute shows the usage of logarithm base 10 and natural logarithm functions. The 'trigno attribute' attribute shows the usage of various trigonometric functions like sine and cosine. The 'rounded average wage-inc' attribute uses the avg function to take average of wage-increments and then uses the round function to round the resultant values. The 'vacations' attribute uses the replaceAll function to replace all occurrences of value 'generous' with 'above-average' in the 'vacation' attribute. The 'deadline' attribute shows usage of the If-then-Else and Date functions. This attribute assumes value of current date plus 25 days if class attribute has value 'good'. Otherwise it stores the date of the current date plus 10 days. The 'shift complete' attribute shows the usage of the If-then-Else, random, floor and missing functions. This attribute has values of the shift-differential attribute but it does not have missing values. Missing values are replaced with a random number between 0 and 25. The 'remaining_holidays' attribute stores the difference of the statutory-holidays attribute value from 15. The 'remaining_holidays_percentage' attribute uses the 'remaining_holidays' attribute to find the percentage of remaining holidays. This attribute was created to show that attributes created in this Generate Attribute operator can be used to generate new attributes in the same Generate Attributes operator. The 'constants' attribute was created to show the usage of constants like 'e' and 'PI'. The 'cut' attribute shows the usage of cut function. If you want to specify a string, you should place it in double quotes ("") as in the last term of this attribute's expression. If you want to specify name of an attribute you should not place it in the quotes. First term of expression cuts first two characters of the 'class' attribute values. This is because name of attribute is not placed in quotes. Last term of the expression selects first two characters of the string 'class'. As first two characters of string 'class' are 'cl', thus cl is appended at the end of this attribute's values. The middle term is used to concatenate a blank space between first and last term's results. The 'index' attribute shows usage of the index function. If the 'class' attribute has value 'no', 1 is stored because 'o' is at first index. If the 'class' attribute has value 'yes', -1 is stored because 'o' is not present in this value. The 'date constants' attribute shows the usage of the date constants. It shows the date of the 'deadline' attribute in full format, but only time is selected for display. The 'macro' attribute shows how to use macros in functions. The 'macro eval' attribute shows how to use macros that contain a number. The macro function %{} always returns a string, so if you want to obtain the number you have to use the eval function or the parse function. The 'expression eval' attribute shows usage of the eval function. If there is a string containing an expression, for example coming from a macro %{expression} you can evaluate this expression by using the eval function. The 'macro with attribute' attribute shows the usage of the #{} function. If there is a macro containing the name of an attribute, you can use this attribute in your expression by using #{attribute_macro} where attribute_macro is the macro containing the attribute name. Using eval(%{attribute_macro}) would lead to the same result, but the #{} function fails when the macro does not contain an attribute name, while eval(%{attribute_macro}) evaluates whatever is contained in the macro.