Categories

Versions

You are viewing the RapidMiner Studio documentation for version 8.2 - Check here for latest version

De-Pivot (RapidMiner Studio Core)

Synopsis

This operator transforms the ExampleSet by converting the examples of the selected attributes (usually attributes that measure the same characteristic) into examples of a single attribute.

Description

This operator is usually used when your ExampleSet has multiple attributes that measure the same characteristic (may be at different time intervals) and you want to merge these observations into a single attribute without loss of information. If the original ExampleSet has n examples and k attributes that measure the same characteristic, after application of this operator the ExampleSet will have k x n examples. The k attributes will be combined into one attribute. This attribute will have n examples of each of the k attributes. This can be easily understood by studying the attached Example Process.

In other words, this operator converts an ExampleSet by dividing examples which consist of multiple observations (at different times) into multiple examples, where each example covers one point in time. An index attribute is added in the ExampleSet, which denotes the actual point in time the example belongs to after the transformation.

The keep missings parameter specifies whether an example should be kept, even if it has missing values for all series at a certain point in time. The create nominal index parameter is only applicable if only one time series per example exists. Instead of using a numeric index, then the names of the attributes representing the single time points are used as index attribute values.

Input

  • example set input (Data Table)

    This input port expects an ExampleSet. It is the output of the Subprocess operator in the attached Example Process. The output of other operators can also be used as input.

Output

  • example set output (Data Table)

    The selected attributes are converted into examples of a new attribute and the resultant ExampleSet is output of this port.

  • original (Data Table)

    The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

Parameters

  • attribute_nameThis parameter maps a number of source attributes onto result attributes. The attribute name parameter is used for specifying the group of attributes that you want to combine and the name of the new attribute. The attributes of a group are selected through a regular expression. There can be numerous groups with each group having multiple attributes. Range: list
  • index_attributeThis parameter specifies the name of the newly created index attribute. The index attribute is used for differentiating between examples of different attributes of a group after the transformation. Range: string
  • create_nominal_indexThe create nominal index parameter is only applicable if only one time series per example exists. Instead of using a numeric index, then the names of the attributes representing the single time points are used as index attribute values. Range: boolean
  • keep_missingsThe keep missings parameter specifies whether an example should be kept, even if it has missing values for all series at a certain point in time. Range: boolean

Tutorial Processes

Merging multiple attributes that measure the same characteristic into a single attribute

This process starts with the Subprocess operator which delivers an ExampleSet. The subprocess is used for creating a sample ExampleSet therefore it is not important to understand what is going on inside the subprocess. A breakpoint is inserted after the subprocess so that you can have a look at the ExampleSet. You can see that the ExampleSet has 14 examples and it has two attributes i.e. 'Morning' and 'Evening'. These attributes measure the temperature of an area in morning and evening respectively. We want to convert these attributes into a single attribute but we still want to be able to differentiate between morning and evening temperatures.

The De-Pivot operator is applied on this ExampleSet to perform this task. The attribute name parameter is used for specifying the group of attributes that you want to combine and the name of the new attribute. The attributes of a group are selected through a regular expression. There can be numerous groups with each group having multiple attributes. In our case, there is only one group which has all the attributes of the ExampleSet (i.e. both 'Morning' and 'Evening' attributes). The new attribute is named 'Temperatures' and the regular expression: ' .* ' is used for selecting all the attributes of the ExampleSet. The index attribute is used for differentiating between examples of different attributes of a group after transformation. The name of the index attribute is set to 'Time'. The create nominal index parameter is also set to true so that the resultant ExampleSet is more self-explanatory.

Execute the process and have a look at the resultant ExampleSet. You can see that there are 28 examples in this ExampleSet. The original ExampleSet had 14 examples, and 2 attributes were grouped, therefore the resultant ExampleSet has 28 (i.e. 14 x 2) examples. There are 14 examples from the Morning attribute and 14 examples of the Evening attribute in the 'Temperatures' attribute. The 'Time' attribute explains whether an example measures morning or evening temperature.