Categories

Versions

You are viewing the RapidMiner Studio documentation for version 9.9 - Check here for latest version

Moving Average Filter (Time Series)

Synopsis

This operator applies a moving aggregation filter on values of one or more time series attributes.

Description

The aggregation method specifies how the time series values within a window of the filter are aggregated.

A filtered value is calculated by the weighted sum of a window around this value. The weights of the filter are defined by the filter type, the window of the filter by the parameters filter size left, filter size right and filter size

If the aggregation method is chosen to be mean several filter types are available. The filter types binom and spencers 15 points are symmetric windows, while for the simple filter the size of the left and right side of the filter can be specified individually.

If a missing value is in the filter window, the resulting filtered value is also a missing value. If one or more positive infinity values are in the filter window, the resulting filtered value is also positive infinity. The same for negative infinity values. If there are both negative and positive infinity values, the resulting filtered value is missing.

For values where the filter window is outside the time series (at start and end of the time series), the resulting filtered values will be set to missing values.

This operator works only on numerical time series.

Input

  • example set (Data Table)

    The ExampleSet which contains the time series data as attributes.

Output

  • example set (Data Table)

    The ExampleSet after applying the moving average filter. In case of overwrite attributes is true original time series attributes are overwritten, if not new attributes with the filtered values are added. For the name of the new attributes a postfix, specified by the new attributes postfix parameter, is added to the name of the original attributes. Other attributes are not changed.

Parameters

  • attribute_filter_type

    This parameter allows you to select the filter for the time series attributes selection filter; the method you want to select the attributes which holds the time series values. Only numeric attributes can be selected as time series attributes. The different filter types are:

    • all: This option selects all attributes of the ExampleSet to be time series attributes. This is the default option.
    • single: This option allows the selection of a single time series attribute. The required attribute is selected by the attribute parameter.
    • subset: This option allows the selection of multiple time series attributes through a list (see parameter attributes). If the meta data of the ExampleSet is known all attributes are present in the list and the required ones can easily be selected.
    • regular_expression: This option allows you to specify a regular expression for the time series attribute selection. The regular expression filter is configured by the parameters regular expression, use except expression and except expression.
    • value_type: This option allows selection of all the attributes of a particular type to be time series attributes. It should be noted that types are hierarchical. For example real and integer types both belong to the numeric type. The value type filter is configured by the parameters value type, use value type exception, except value type.
    • block_type: This option allows the selection of all the attributes of a particular block type to be time series attributes. It should be noted that block types may be hierarchical. For example value_series_start and value_series_end block types both belong to the value_series block type. The block type filter is configured by the parameters block type, use block type exception, except block type.
    • no_missing_values: This option selects all attributes of the ExampleSet as time series attributes which do not contain a missing value in any example. Attributes that have even a single missing value are not selected.
    • numeric_value_filter: All numeric attributes whose examples all match a given numeric condition are selected as time series attributes. The condition is specified by the numeric condition parameter.
    Range:
  • attribute

    The required attribute can be selected from this option. The attribute name can be selected from the drop down box of the parameter if the meta data is known.

    Range:
  • attributes

    The required attributes can be selected from this option. This opens a new window with two lists. All attributes are present in the left list. They can be shifted to the right list, which is the list of selected time series attributes.

    Range:
  • regular_expression

    Attributes whose names match this expression will be selected. The expression can be specified through the edit and preview regular expression menu. This menu gives a good idea of regular expressions and it also allows you to try different expressions and preview the results simultaneously.

    Range:
  • use_except_expression

    If enabled, an exception to the first regular expression can be specified. This exception is specified by the except regular expression parameter.

    Range:
  • except_regular_expression

    This option allows you to specify a regular expression. Attributes matching this expression will be filtered out even if they match the first expression (expression that was specified in regular expression parameter).

    Range:
  • value_type

    This option allows to select a type of attribute. One of the following types can be chosen: numeric, integer, real.

    Range:
  • use_value_type_exception

    If enabled, an exception to the selected type can be specified. This exception is specified by the except value type parameter.

    Range:
  • except_value_type

    The attributes matching this type will be removed from the final output even if they matched the before selected type, specified by the value type parameter. One of the following types can be selected here: numeric, integer, real.

    Range:
  • block_type

    This option allows to select a block type of attribute. One of the following types can be chosen: value_series, value_series_start, value_series_end.

    Range:
  • use_block_type_exception

    If enabled, an exception to the selected block type can be specified. This exception is specified by the except block type parameter.

    Range:
  • except_block_type

    The attributes matching this block type will be removed from the final output even if they matched the before selected type by the block type parameter. One of the following block types can be selected here: value_series, value_series_start, value_series_end.

    Range:
  • numeric_condition

    The numeric condition used by the numeric condition filter type. A numeric attribute is selected if all examples match the specified condition for this attribute. For example the numeric condition '> 6' will keep all numeric attributes having a value of greater than 6 in every example. A combination of conditions is possible: '> 6 && < 11' or '<= 5 || < 0'. But && and || cannot be used together in one numeric condition. Conditions like '(> 0 && < 2) || (>10 && < 12)' are not allowed because they use both && and ||.

    Range:
  • invert_selection

    If this parameter is set to true the selection is reversed. In that case all attributes not matching the specified condition are selected as time series attributes. Special attributes are not selected independent of the invert selection parameter as along as the include special attributes parameter is not set to true. If so the condition is also applied to the special attributes and the selection is reversed if this parameter is checked.

    Range:
  • include_special_attributes

    Special attributes are attributes with special roles. These are: id, label, prediction, cluster, weight and batch. Also custom roles can be assigned to attributes. By default special attributes are not selected as time series attributes irrespective of the filter conditions. If this parameter is set to true, special attributes are also tested against conditions specified and those attributes are selected that match the conditions.

    Range:
  • overwrite_attributes

    This parameter indicates if the original time series attributes are overwritten by the resulting time series. If this parameter is set to false, the resulting new time series are added as new attributes to the ExampleSet. The name of these new attributes will be the name of the original time series with a postfix added. The postfix is specified by the parameter new attributes postfix.

    Range:
  • new_attributes_postfix

    If overwrite attributes is false, this parameter specifies the postfix which is added to the names of the original time series to create the new attribute names.

    Range:
  • aggregation_method

    The aggregation_method defines how the values within the rolling windows are aggregated. Possible aggregation_method are:

    • mean: This method applies a moving average filter. This filter is also called moving mean, rolling average, rolling mean or running average. For this method different weighing methods are available which are given by the specified filter type.
    • median: This method applies a moving median filter. The time series values within the specified rolling windows (filter size) are aggregated by calculating the median.
    • maximum: This method applies a moving minimum filter. The time series values within the specified rolling windows (filter size) are aggregated by calculating the minimum.
    • minimum: This method applies a moving maximum filter. The time series values within the specified rolling windows (filter size) are aggregated by calculating the maximum.
    • variance: This method applies a moving variance filter. The time series values within the specified rolling windows (filter size) are aggregated by calculating the variance.
    • standard deviation: This method applies a moving standard deviation filter. The time series values within the specified rolling windows (filter size) are aggregated by calculating the standard deviation.
    Range:
  • filter_type

    The filter type defines the weights of the filter. Possible filter types are:

    • simple: filter size left values left to the actual value and filter size right values right to the actual value will be included in the filter. The weights have all the same value = 1 / (filter size left + filter size right + 1). This filter is also called moving average, moving mean, rolling average, rolling mean or running average.
    • binom: Symmetric filter with filter size (=q) values each left and right to the actual value. The weights follow the expansion of the binomial expression (1/2 + 1/2s)^(2q). For example for q = 2 the weights are [1/16, 4/16, 6/16, 4/16, 1/16]. For larger values of q the weights approximate to a normal (gaussian) curve.
    • spencers 15 points: The Spencer's 15-point moving average filter is a special filter, used to smooth mortality statistics to get life tables.
    Range:
  • filter_size_left

    This parameter defines the size of the left side of the filter window for the simple filter type. The parameter specifies the number of values left to the actual value, which are included in the filtering. The actual filter window has a size of filter size left + filter size right + 1.

    Range:
  • filter_size_right

    This parameter defines the size of the right side of the filter window for the simple filter type. The parameter specifies the number of values right to the actual value, which are included in the filtering. The actual filter window has a size of filter size left + filter size right + 1.

    Range:
  • filter_size

    This parameter defines the size of the filter window for the binom filter type. The parameter specifies the number of values left and right to the actual value, which are included in the filtering. Hence the actual filter window has a size of 2*filter size + 1.

    Range:
  • ignore_invalid_values

    If this parameter is set to true invalid values (missing, positive and negative infinity) are ignored in the calculation of the filtered values. The filtered value is only missing, if all values of the moving windows are invalid. Selecting this parameters also causes the beginning and end of the time series to have valid values, so the corresponding filtered values are only based on the valid part of the moving window.

    Range:

Tutorial Processes

Simple and Binom Moving Average Filter

This tutorial process demonstrate the usage and the difference between the simple and the binom moving average filter. Simple and binom moving average filter of size 1 and size 5 are applied on the Lake Huron data set from the Samples/Time Series folder. The resulting filtered time series can be compared in the Results view.

Different Aggregation Methods

This tutorial process compares different aggregation methods. Moving Mean, Median, Min, Max, Variance and Standard Deviation Filters of size 1 are applied on the Copper Price data set from the Samples/Time Series folder. The resulting filtered time series can be compared in the Results view.