Categories

Versions

You are viewing the RapidMiner Studio documentation for version 9.7 - Check here for latest version

Extract Aggregates (Time Series)

Synopsis

This operator calculates a set of aggregated values of one or more time series.

Description

This operator calculates descriptive features (e.g. sum, mean, min, max, ...) of the distribution of the values of one or more time series. The calculated features are provided as an ExampleSet at the features output port of the operator.

Depending on the parameter add time series name the ExampleSet will have one example with attributes for all combination of time series and features, or n examples, one example per time series. The features be calculated can be selected individually. In combination with the Process Windows operator, this operator can be used to calculate features of windows of time series as a preparation for a general machine learning problem.

By default invalid values (missing, positive infinity and negative infinity) are included in the calculation of the aggregated values. See the description of the parameters on how the calculation of the individual features handle invalid values. Select the parameter ignore invalid values to change this and ignore invalid values.

This operator works only on numerical time series.

Input

  • example set (IOObject)

    The ExampleSet which contains the time series data as attributes.

Output

  • features (IOObject)

    The ExampleSet which contains the calculated aggregates as attributes. Depending on the parameter add time series name the ExampleSet will have one example with attributes for all combination of time series and features, or n examples, one example per time series.

  • original (IOObject)

    The ExampleSet that was given as input is passed through without changes.

Parameters

  • attribute_filter_type

    This parameter allows you to select the filter for the time series attributes selection filter; the method you want to select the attributes which holds the time series values. Only numeric attributes can be selected as time series attributes. The different filter types are:

    • all: This option selects all attributes of the ExampleSet to be time series attributes. This is the default option.
    • single: This option allows the selection of a single time series attribute. The required attribute is selected by the attribute parameter.
    • subset: This option allows the selection of multiple time series attributes through a list (see parameter attributes). If the meta data of the ExampleSet is known all attributes are present in the list and the required ones can easily be selected.
    • regular_expression: This option allows you to specify a regular expression for the time series attribute selection. The regular expression filter is configured by the parameters regular expression, use except expression and except expression.
    • value_type: This option allows selection of all the attributes of a particular type to be time series attributes. It should be noted that types are hierarchical. For example real and integer types both belong to the numeric type. The value type filter is configured by the parameters value type, use value type exception, except value type.
    • block_type: This option allows the selection of all the attributes of a particular block type to be time series attributes. It should be noted that block types may be hierarchical. For example value_series_start and value_series_end block types both belong to the value_series block type. The block type filter is configured by the parameters block type, use block type exception, except block type.
    • no_missing_values: This option selects all attributes of the ExampleSet as time series attributes which do not contain a missing value in any example. Attributes that have even a single missing value are not selected.
    • numeric_value_filter: All numeric attributes whose examples all match a given numeric condition are selected as time series attributes. The condition is specified by the numeric condition parameter.
    Range:
  • attribute

    The required attribute can be selected from this option. The attribute name can be selected from the drop down box of the parameter if the meta data is known.

    Range:
  • attributes

    The required attributes can be selected from this option. This opens a new window with two lists. All attributes are present in the left list. They can be shifted to the right list, which is the list of selected time series attributes.

    Range:
  • regular_expression

    Attributes whose names match this expression will be selected. The expression can be specified through the edit and preview regular expression menu. This menu gives a good idea of regular expressions and it also allows you to try different expressions and preview the results simultaneously.

    Range:
  • use_except_expression

    If enabled, an exception to the first regular expression can be specified. This exception is specified by the except regular expression parameter.

    Range:
  • except_regular_expression

    This option allows you to specify a regular expression. Attributes matching this expression will be filtered out even if they match the first expression (expression that was specified in regular expression parameter).

    Range:
  • value_type

    This option allows to select a type of attribute. One of the following types can be chosen: numeric, integer, real.

    Range:
  • use_value_type_exception

    If enabled, an exception to the selected type can be specified. This exception is specified by the except value type parameter.

    Range:
  • except_value_type

    The attributes matching this type will be removed from the final output even if they matched the before selected type, specified by the value type parameter. One of the following types can be selected here: numeric, integer, real.

    Range:
  • block_type

    This option allows to select a block type of attribute. One of the following types can be chosen: value_series, value_series_start, value_series_end.

    Range:
  • use_block_type_exception

    If enabled, an exception to the selected block type can be specified. This exception is specified by the except block type parameter.

    Range:
  • except_block_type

    The attributes matching this block type will be removed from the final output even if they matched the before selected type by the block type parameter. One of the following block types can be selected here: value_series, value_series_start, value_series_end.

    Range:
  • numeric_condition

    The numeric condition used by the numeric condition filter type. A numeric attribute is selected if all examples match the specified condition for this attribute. For example the numeric condition '> 6' will keep all numeric attributes having a value of greater than 6 in every example. A combination of conditions is possible: '> 6 && < 11' or '<= 5 || < 0'. But && and || cannot be used together in one numeric condition. Conditions like '(> 0 && < 2) || (>10 && < 12)' are not allowed because they use both && and ||.

    Range:
  • invert_selection

    If this parameter is set to true the selection is reversed. In that case all attributes not matching the specified condition are selected as time series attributes. Special attributes are not selected independent of the invert selection parameter as along as the include special attributes parameter is not set to true. If so the condition is also applied to the special attributes and the selection is reversed if this parameter is checked.

    Range:
  • include_special_attributes

    Special attributes are attributes with special roles. These are: id, label, prediction, cluster, weight and batch. Also custom roles can be assigned to attributes. By default special attributes are not selected as time series attributes irrespective of the filter conditions. If this parameter is set to true, special attributes are also tested against conditions specified and those attributes are selected that match the conditions.

    Range:
  • add_time_series_name

    If this parameter is set to true the name of the time series attribute is added as a prefix to the name of the feature attributes. The resulting ExampleSet will have one example and n attributes, with n = <number of time series> x <number of features>. If this parameter is set to false, an additional attribute named time series is added with the name of the time series. The resulting ExampleSet will have n examples and m+1 attributes, with n = <number of time series> and m = <number of features>. The role of the time series attribute is set to id.

    Range:
  • sum

    If this parameter is set to true, the sum of the values of the time series is calculated. If invalid values aren't ignored, the sum is missing if any time series value is missing. The sum is positive/negative infinity if there is at least one positive/negative infinity values in the time series. If there are positive and negative infinity values in the time series the sum is missing.

    Range:
  • mean

    If this parameter is set to true, the mean of the values of the time series is calculated. If invalid values aren't ignored, the mean is missing if any time series value is missing, positive or negative infinity.

    Range:
  • geometric_mean

    If this parameter is set to true, the geometric mean of the values of the time series is calculated. If invalid values aren't ignored, the geometric mean is missing if any time series value is missing or negative infinity. The geometric mean is positive infinity if there is at least one positive infinity values in the time series.

    Range:
  • first_quartile

    If this parameter is set to true, the first quartile of the values of the time series is calculated. If invalid values aren't ignored, these values are listed in the same way as finite values for the determination of the first quartile.

    Range:
  • median

    If this parameter is set to true, the median of the values of the time series is calculated. If invalid values aren't ignored, these values are listed in the same way as finite values for the determination of the median.

    Range:
  • third_quartile

    If this parameter is set to true, the third quartile of the values of the time series is calculated. If invalid values aren't ignored, these values are listed in the same way as finite values for the determination of the third quartile.

    Range:
  • min

    If this parameter is set to true, the minimum of the values of the time series is calculated. If invalid values aren't ignored, negative and positive infinity are taken into account for the determination of the minimum, while missing values are ignored.

    Range:
  • max

    If this parameter is set to true, the maximum of the values of the time series is calculated. If invalid values aren't ignored, negative and positive infinity are taken into account for the determination of the maximum, while missing values are ignored.

    Range:
  • std_deviation

    If this parameter is set to true, the standard deviation of the values of the time series is calculated. If invalid values aren't ignored, the standard deviation is missing if any time series value is missing, positive or negative infinity.

    Range:
  • kurtosis

    If this parameter is set to true, the kurtosis of the values of the time series is calculated. If invalid values aren't ignored, the kurtosis is missing if any time series value is missing, positive or negative infinity.

    Range:
  • skewness

    If this parameter is set to true, the skewness of the values of the time series is calculated. If invalid values aren't ignored, the kurtosis is missing if any time series value is missing, positive or negative infinity.

    Range:
  • ignore_invalid_values

    If this parameter is set to true invalid values (missing, positive and negative infinity) are ignored in the calculation of the features.

    Range:

Tutorial Processes

Extract Aggregates of the Lake Huron data set

In this tutorial process, the sum, mean, minimum and maximum of the Lake Surface of the Lake Huron are calculated.

Extracting aggregates of windows of the Lake Huron data set

In this tutorial process, the Process Windows operator is used to loop over windows of size 30 of the Lake Huron data set. For each window, the Extract Aggregates operator is used to calculate some features of the window. The caculated features are then provided to the output port of the inner subprocess. Due to the parameters create horizon (labels) and add last index in window attribute are set to true, an attribute holding the horizon value (cause horizon width is 1) and an attribute holding the last Date (the indices attribute of the time series) in the window are added to the features ExampleSet.

The Append operator is used to append the features of all windows to one ExampleSet. This can be used to train a machine learning model.