Categories

Versions

You are viewing the RapidMiner Studio documentation for version 10.0 - Check here for latest version

Lag (Time Series)

Synopsis

This operator performs a time series lag transformation on one or more attributes.

Description

Individual attributes can be lagged separately with different lag values by the parameter individual lags. In addition, a default lag for a set of attributes can be specified. If the parameter add default lag is checked, the set of attributes can be selected by the attribute filter. The default lag value for these attributes is given by the parameter default lag.

If the parameter overwrite attributes is selected, the lagged attributes overwrite the original ones. If not selected, new attributes are added to the ExampleSet (the names of the new attributes are in the form <attribute-name> - <lag>).

Lag values can also be negative, which effectively means all other attributes are lagged by this amount.

New attributes with a negative lag are named in the form <attribute-name> + <lag>.

If the parameter extend exampleset is selected, the resulting ExampleSet is extended by n examples where n is the sum of the maximum lag and the absolute value of the minimum lag (if there is a negative lag) specified. Attributes that are not selected for lagging are filled with missing values.

This operator works on all attributes (independent of type or role).

Input

  • example set input (Data Table)

    The ExampleSet which contains the time series data as attributes.

Output

  • example set output (Data Table)

    The ExampleSet after applying the lag transformation. If overwrite attributes is true, the original time series attributes are overwritten. Else new attributes with the lagged values are added. The names of the new attributes are in the form <attribute-name> - <lag> or <attribute-name> + <lag> if the lag is negative. If the parameter extend exampleset is selected, the resulting ExampleSet is extended by n examples where n is the sum of the maximum lag and the absolute value of the minimum lag (if there is a negative lag) specified. Attributes that are not selected for lagging are filled with missing values.

  • original (Data Table)

    The ExampleSet that was given as input is passed through without changes.

Parameters

  • individual_lags

    The lag attributes can be selected by the drop down menu if the meta data is known. For each attribute an integer lag value has to be specified. If overwrite attributes is not selected, the same attribute can be lagged more than one time with different lag values.

    Range:
  • individual_attribute

    The lag attribute can be selected by the drop down menu if the meta data is known. It can also be typed in manually.

    Range:
  • lag

    This parameter defines the number of lags for the individual attribute. Example i will contain the value of Example i-lag. The first lag values will be filled with missing values. lag value can be positive or negative.

    Range:
  • add_default_lag

    If this parameter is selected a default lag for a set of attributes can be specified. The set of these default lagged attributes can be selected by the attribute filter. The default lag value for is given by the parameter default lag.

    Range:
  • attribute_filter_type

    This parameter allows you to select the filter for the default lag attributes selection filter; the method you want to select the attributes on which the default lag is applied. The different filter types are:

    • all: This option selects all attributes of the ExampleSet. This is the default option.
    • single: This option allows the selection of a single attribute. The required attribute is selected by the attribute parameter.
    • subset: This option allows the selection of multiple attributes through a list (see parameter attributes). If the meta data of the ExampleSet is known all attributes are present in the list and the required ones can easily be selected.
    • regular_expression: This option allows you to specify a regular expression for the attribute selection. The regular expression filter is configured by the parameters regular expression, use except expression and except expression.
    • value_type: This option allows selection of all the attributes of a particular type. It should be noted that types are hierarchical. For example real and integer types both belong to the numeric type. The value type filter is configured by the parameters value type, use value type exception, except value type.
    • block_type: This option allows the selection of all the attributes of a particular block type. It should be noted that block types may be hierarchical. For example value_series_start and value_series_end block types both belong to the value_series block type. The block type filter is configured by the parameters block type, use block type exception, except block type.
    • no_missing_values: This option selects all attributes of the ExampleSet as attributes which do not contain a missing value in any example. Attributes that have even a single missing value are not selected.
    • numeric_value_filter: All numeric attributes whose examples all match a given numeric condition are selected. The condition is specified by the numeric condition parameter.
    Range:
  • attribute

    The required attribute can be selected from this option. The attribute name can be selected from the drop down box of the parameter if the meta data is known.

    Range:
  • attributes

    The required attributes can be selected from this option. This opens a new window with two lists. All attributes are present in the left list. They can be shifted to the right list, which is the list of selected time series attributes.

    Range:
  • regular_expression

    Attributes whose names match this expression will be selected. The expression can be specified through the edit and preview regular expression menu. This menu gives a good idea of regular expressions and it also allows you to try different expressions and preview the results simultaneously.

    Range:
  • use_except_expression

    If enabled, an exception to the first regular expression can be specified. This exception is specified by the except regular expression parameter.

    Range:
  • except_regular_expression

    This option allows you to specify a regular expression. Attributes matching this expression will be filtered out even if they match the first expression (expression that was specified in regular expression parameter).

    Range:
  • value_type

    This option allows to select a type of attribute.

    Range:
  • use_value_type_exception

    If enabled, an exception to the selected type can be specified. This exception is specified by the except value type parameter.

    Range:
  • except_value_type

    The attributes matching this type will be removed from the final output even if they matched the before selected type, specified by the value type parameter.

    Range:
  • block_type

    This option allows to select a block type of attribute.

    Range:
  • use_block_type_exception

    If enabled, an exception to the selected block type can be specified. This exception is specified by the except block type parameter.

    Range:
  • except_block_type

    The attributes matching this block type will be removed from the final output even if they matched the before selected type by the block type parameter.

    Range:
  • numeric_condition

    The numeric condition used by the numeric condition filter type. A numeric attribute is selected if all examples match the specified condition for this attribute. For example the numeric condition '> 6' will keep all numeric attributes having a value of greater than 6 in every example. A combination of conditions is possible: '> 6 && < 11' or '<= 5 || < 0'. But && and || cannot be used together in one numeric condition. Conditions like '(> 0 && < 2) || (>10 && < 12)' are not allowed because they use both && and ||.

    Range:
  • invert_selection

    If this parameter is set to true the selection is reversed. In that case all attributes not matching the specified condition are selected. Special attributes are not selected independent of the invert selection parameter as along as the include special attributes parameter is not set to true. If so the condition is also applied to the special attributes and the selection is reversed if this parameter is checked.

    Range:
  • include_special_attributes

    Special attributes are attributes with special roles. These are: id, label, prediction, cluster, weight and batch. Also custom roles can be assigned to attributes. By default special attributes are not selected irrespective of the filter conditions. If this parameter is set to true, special attributes are also tested against conditions specified and those attributes are selected that match the conditions.

    Range:
  • default_lag

    This parameter defines the default lag for the attributes selected by the attribute filter. All selected attributes will be lagged with this default lag value. Example i will contain the value of Example i-lag. The first lag values will be filled with missing values. lag value can be positive or negative.

    Range:
  • overwrite_attributes

    This parameter indicates if the original time series attributes are overwritten by the lagged time series. If this parameter is set to false, the lagged time series are added as new attributes to the ExampleSet. The name of these new attributes will be <attribute-name> - <lag> or *<attribute-name> + <lag>* if the lag is negative.

    Note that selecting this parameter can increase runtime (it required copying the input ExampleSet to ensure that there are no data leaks).

    Range:
  • extend_exampleset

    This parameter indicates if the ExampleSet should be extended by n Examples (where n is the maximum lag specified). Attributes that are not selected for lagging are filled with missing values.

    Note that selecting this parameter can increase runtime (it required copying the input ExampleSet to ensure that there are no data leaks).

    Range:

Tutorial Processes

Lagging Lake Huron Data Set

In this tutorial process the lagging of the Lake Huron Data Set is demonstrated.

Lagging options demonstrated on the Golf data set

This tutorial process showcase the different options for lagging are demonstrated on the Golf data set.

Positive and Negative lags demonstrated on the Titanic data set

This tutorial process demonstrate how to combine positive and negative lags on attributes of the Titanic Training data set.