Categories

Versions

You are viewing the RapidMiner Studio documentation for version 10.0 - Check here for latest version

STL Decomposition (Time Series)

Synopsis

This operator performs a STL decomposition of time series data.

Description

STL stands for "Seasonal and Trend decomposition using Loess" and splits time series into trend, seasonal and remainder component. Loess interpolation (seasonal smoothing) is used to smooth the cyclic sub-series (after removing the current trend estimation) to determine the seasonal component. Next another Loess interpolation (lowpass smoothing) is used to smooth out the estimated seasonal component. In a final step the deseasonalized series is smoothed again (trend smoothing) to find an estimation of the trend component. This process is repeated several times to improve the accuracy of the estimations of the components.

To perform the STL decomposition, the seasonality (e.g. 12 for monthly data with a pattern reoccuring every year) of the data has to be known. Advantages of the STL decomposition are that the seasonal component can change over time, the rate of change is controlled by the parameter seasonal width. The parameter robust can be set to true to remove effects of outliers on the calculation of trend and seasonal component. Also the results of the STL is defined for all points of the time series data.

The operator provides default values for all parameters used by the different Loess interpolation algorithm. The user can also set some or all of these parameters to further tune the STL decomposition. See description of the parameters for details.

The STL decomposition is not defined if a time series contains missing invalid values (missing, positive and negative infinity). The result of all three components is missing, if a time series contains such invalid values.

This operator works only on numerical time series.

Differentiation

Classic Decomposition

The Classic Decomposition also splits the time series into trend, seasonal and remainder component. The methods to determine the different components are simpler, the trend is often over-smoothed, so that rapid rises and falls are smoothed out. Also the seasonal component does not change in magnitude with time. The trend and the remainder component has missing values at the beginning and end of the series, due to using Moving Average Filter to determine the trend component. Otherwise the Classic Decomposition is capable of performing a multiplicative decomposition.

Fast Fourier Transformation

The Fast Fourier Transformation operator transforms time series into the frequency space. This can be used to identify prominent oscillations, for example to identify a possible seasonal component in the data.

Input

  • example set (Data Table)

    The ExampleSet which contains the time series data as attributes.

Output

  • decomposition (Data Table)

    ExampleSet containing the decomposed time series. The original time series and the trend, the seasonal and the remainder components are calculated for every selected time series and provided in this ExampleSet.

  • original (Data Table)

    The ExampleSet that was given as input is passed through without changes.

Parameters

  • attribute_filter_type

    This parameter allows you to select the filter for the time series attributes selection filter; the method you want to select the attributes which holds the time series values. Only numeric attributes can be selected as time series attributes. The different filter types are:

    • all: This option selects all attributes of the ExampleSet to be time series attributes. This is the default option.
    • single: This option allows the selection of a single time series attribute. The required attribute is selected by the attribute parameter.
    • subset: This option allows the selection of multiple time series attributes through a list (see parameter attributes). If the meta data of the ExampleSet is known all attributes are present in the list and the required ones can easily be selected.
    • regular_expression: This option allows you to specify a regular expression for the time series attribute selection. The regular expression filter is configured by the parameters regular expression, use except expression and except expression.
    • value_type: This option allows selection of all the attributes of a particular type to be time series attributes. It should be noted that types are hierarchical. For example real and integer types both belong to the numeric type. The value type filter is configured by the parameters value type, use value type exception, except value type.
    • block_type: This option allows the selection of all the attributes of a particular block type to be time series attributes. It should be noted that block types may be hierarchical. For example value_series_start and value_series_end block types both belong to the value_series block type. The block type filter is configured by the parameters block type, use block type exception, except block type.
    • no_missing_values: This option selects all attributes of the ExampleSet as time series attributes which do not contain a missing value in any example. Attributes that have even a single missing value are not selected.
    • numeric_value_filter: All numeric attributes whose examples all match a given numeric condition are selected as time series attributes. The condition is specified by the numeric condition parameter.
    Range:
  • attribute

    The required attribute can be selected from this option. The attribute name can be selected from the drop down box of the parameter if the meta data is known.

    Range:
  • attributes

    The required attributes can be selected from this option. This opens a new window with two lists. All attributes are present in the left list. They can be shifted to the right list, which is the list of selected time series attributes.

    Range:
  • regular_expression

    Attributes whose names match this expression will be selected. The expression can be specified through the edit and preview regular expression menu. This menu gives a good idea of regular expressions and it also allows you to try different expressions and preview the results simultaneously.

    Range:
  • use_except_expression

    If enabled, an exception to the first regular expression can be specified. This exception is specified by the except regular expression parameter.

    Range:
  • except_regular_expression

    This option allows you to specify a regular expression. Attributes matching this expression will be filtered out even if they match the first expression (expression that was specified in regular expression parameter).

    Range:
  • value_type

    This option allows to select a type of attribute. One of the following types can be chosen: numeric, integer, real.

    Range:
  • use_value_type_exception

    If enabled, an exception to the selected type can be specified. This exception is specified by the except value type parameter.

    Range:
  • except_value_type

    The attributes matching this type will be removed from the final output even if they matched the before selected type, specified by the value type parameter. One of the following types can be selected here: numeric, integer, real.

    Range:
  • block_type

    This option allows to select a block type of attribute. One of the following types can be chosen: value_series, value_series_start, value_series_end.

    Range:
  • use_block_type_exception

    If enabled, an exception to the selected block type can be specified. This exception is specified by the except block type parameter.

    Range:
  • except_block_type

    The attributes matching this block type will be removed from the final output even if they matched the before selected type by the block type parameter. One of the following block types can be selected here: value_series, value_series_start, value_series_end.

    Range:
  • numeric_condition

    The numeric condition used by the numeric condition filter type. A numeric attribute is selected if all examples match the specified condition for this attribute. For example the numeric condition '> 6' will keep all numeric attributes having a value of greater than 6 in every example. A combination of conditions is possible: '> 6 && < 11' or '<= 5 || < 0'. But && and || cannot be used together in one numeric condition. Conditions like '(> 0 && < 2) || (>10 && < 12)' are not allowed because they use both && and ||.

    Range:
  • invert_selection

    If this parameter is set to true the selection is reversed. In that case all attributes not matching the specified condition are selected as time series attributes. Special attributes are not selected independent of the invert selection parameter as along as the include special attributes parameter is not set to true. If so the condition is also applied to the special attributes and the selection is reversed if this parameter is checked.

    Range:
  • include_special_attributes

    Special attributes are attributes with special roles. These are: id, label, prediction, cluster, weight and batch. Also custom roles can be assigned to attributes. By default special attributes are not selected as time series attributes irrespective of the filter conditions. If this parameter is set to true, special attributes are also tested against conditions specified and those attributes are selected that match the conditions.

    Range:
  • has_indices

    This parameter indicates if there is an index attribute associated with the time series. If this parameter is set to true, the index attribute has to be selected.

    Range:
  • indices_attribute

    If the parameter has indices is set to true, this parameter defines the associated index attribute. It can be either a date, date_time or numeric value type attribute. The attribute name can be selected from the drop down box of the parameter if the meta data is known.

    Range:
  • sort_time_series

    If this parameter is selected, the input time series will be sorted, according to the selected indices attribute, before the time series operation is applied on. If it is not selected and the input time series is not sorted, a corresponding User Error is thrown.

    Keep in mind that the indices values still needs to be unique. If the values are non-unique a corresponding User Error is thrown.

    The data set provided at the original output port will be the sorted input time series.

    Range:
  • seasonality

    The length of one seasonal pattern of the seasonal component. For example for a seasonal pattern which occurs every year the seasonality is 4 (for quartely data), 12 (for montly data) or 52 (for weekly data). Or for a pattern which occurs every hour the seasonality is 60 (for minutely data) or 3600 (for secondly data).

    Range:
  • default_robust_calculations

    This parameter defines if the decomposition includes default settings for robust iterations to handle outliers. The number of inner iterations is set to 1 and the number of robust iterations to 15.

    Range:
  • inner_iterations

    This parameter defines the number of inner iterations performed to improve the accuracy of the estimation of the decomposition components.

    Range:
  • robust_iterations

    This parameter defines the number of robust (outer) iterations peformed to reduce the effect of outliers on the estimation of the trend and the seasonal component. Can be set to 0, if no outliers are expected in the data.

    Range:
  • seasonal_smoothing_settings

    This parameter defines which settings of the seasonal smoothing are set by the user and which are set to default values by the operator. The seasonal smoothing has 3 parameters, the seasonal width, the seasonal degree and the seasonal jump. See the description of the individual parameters on their effects and their default settings.

    • default: Only seasonal width has to be specified.
    • periodic: This option contrains the seasonal component to be exactly periodic. All three parameters are set by the operator.
    • width and degree: seasonal width and seasonal degree have to be specified, seasonal jump is set to the default value.
    • width and jump: seasonal width and seasonal jump have to be specified, seasonal degree is set to the default value.
    • all: All three parameters have to be specified.
    Range:
  • seasonal_width

    The width of the Loess smoother to determine the seasonal components. Has to be larger than 2 and uneven. If seasonal width is even, it is automatically increased by one. A large seasonal width reduce the rate of change of the seasonal component over time. Has always to be specified, except the seasonal smoothing settings is set to periodic. In this case the seasonal width is set to 100 times the length of the time series.

    Range:
  • seasonal_degree

    The degree of the polynomial used in the Loess smoothing. Has to be 0, 1, or 2 and defaults to 1. If the seasonal smoothing settings is set to periodic the degree is set to 0.

    Range:
  • seasonal_jump

    The number of points skipped between the Loess smoothing. Has to be larger than 0 and defaults to 10% of the seasonal width (ceiled).

    Range:
  • trend_smoothing_settings

    This parameter defines which settings of the trend smoothing are set by the user and which are set to default values by the operator. The trend smoothing has 3 parameters, the trend width, the trend degree and the trend jump. See the description of the individual parameters on their effects and their default settings.

    • default: For all three parameters the default values are used.
    • flat: The trend component is forced to be flat. All three parameters are set by the operator.
    • linear: The trend component is forced to be linear. All three parameters are set by the operator.
    • width: Only trend width has to be specified.
    • degree: Only trend degree has to be specified.
    • jump: Only trend jump has to be specified.
    • width and degree: trend width and trend degree have to be specified, trend jump is set to the default value.
    • width and jump: trend width and trend jump have to be specified, trend degree is set to the default value.
    • degree and jump: trend degree and trend jump have to be specified, trend width is set to the default value.
    • all: All three parameters have to be specified.
    Range:
  • trend_width

    The width of the Loess smoother to determine the trend components. Has to be larger than 2 and uneven. If trend width is even, it is automatically increased by one. The trend width increases the smoothing effect on the trend component. If the trend smoothing settings is set to flat or linear the trend width is automatically set to 100 times seasonality times length of time series. If it is not specified by the user, the trend width defaults to floor(1.5 x seasonality / (1 - 1.5 / seasonal width) + 0.5).

    Range:
  • trend_degree

    The degree of the polynomial used in the Loess smoothing. Has to be 0, 1, or 2 and defaults to 1. If the trend smoothing settings is set to flat or linear the trend width is automatically set 0 (flat) or 1 (linear).

    Range:
  • trend_jump

    The number of points skipped between the Loess smoothing. Has to be larger than 0 and defaults to 10% of the trend width (ceiled).

    Range:
  • lowpass_smoothing_settings

    This parameter defines which settings of the lowpass smoothing are set by the user and which are set to default values by the operator. The lowpass smoothing has 3 parameters, the lowpass width, the lowpass degree and the lowpass jump. See the description of the individual parameters on their effects and their default settings.

    • default: For all three parameters the default values are used.
    • width: Only lowpass width has to be specified.
    • degree: Only lowpass degree has to be specified.
    • jump: Only lowpass jump has to be specified.
    • width and degree: lowpass width and lowpass degree have to be specified, lowpass jump is set to the default value.
    • width and jump: lowpass width and lowpass jump have to be specified, lowpass degree is set to the default value.
    • degree and jump: lowpass degree and lowpass jump have to be specified, lowpass width is set to the default value.
    • all: All three parameters have to be specified.
    Range:
  • lowpass_width

    The width of the Loess smoother to smooth (and with that remove) the seasonal components from the time series data. Has to be larger than 2 and uneven. If lowpass width is even, it is automatically increased by one. If it is not specified by the user, the lowpass width defaults to seasonality.

    Range:
  • lowpass_degree

    The degree of the polynomial used in the Loess smoothing. Has to be 0, 1, or 2 and defaults to 1.

    Range:
  • lowpass_jump

    The number of points skipped between the Loess smoothing. Has to be larger than 0 and defaults to 10% of the lowpass width (ceiled).

    Range:

Tutorial Processes

STL Decomposition of the monthly milk production data set

In this process the STL Decomposition operator is used to split the monthly milk production data set into trend, seasonal and remainder component.