You are viewing the RapidMiner Studio documentation for version 9.3 - Check here for latest version
Windowing (Time Series)
Synopsis
This operator converts one or more time series to a windowed ExampleSet with the windowed values and, if enabled, the horizon values as attributes.Description
This operator converts time series data into a windowed ExampleSet which can be processed for example with standard machine learning methods. For each time series attribute window size attributes are created, holding the values of the corresponding window. If create horizon (label) is selected horizon size additional attributes are also created, holding the horizon values of the corresponding window.
If has indices is selected, also the last index value of the time series in the corresponding window is added. If has indices is not selected an attribute named Window id with the number of the corresponding window (starting from 0) is added. See the description of the parameter for details.
This operator works on all time series (numerical, nominal and time series with date time values).
Input
- example set (IOObject)
The ExampleSet which contains the time series data as attributes.
Output
- windowed example set (IOObject)
The windowed ExampleSet. For each time series attribute window size attributes are created. Also for each value in the horizon an attribute is created. If an indices attribute is specified, an additional attribute containing the index value of the last value in the window is created. Otherwise an attribute named Window id with the number of the corresponding window (starting from 0) is added.
- original (IOObject)
The ExampleSet that was given as input is passed through without changes.
Parameters
- attribute_filter_type
This parameter allows you to select the filter for the time series attributes selection filter; the method you want to select the attributes which holds the time series values. The different filter types are:
- all: This option selects all attributes of the ExampleSet to be time series attributes. This is the default option.
- single: This option allows the selection of a single time series attribute. The required attribute is selected by the attribute parameter.
- subset: This option allows the selection of multiple time series attributes through a list (see parameter attributes). If the meta data of the ExampleSet is known all attributes are present in the list and the required ones can easily be selected.
- regular_expression: This option allows you to specify a regular expression for the time series attribute selection. The regular expression filter is configured by the parameters regular expression, use except expression and except expression.
- value_type: This option allows selection of all the attributes of a particular type to be time series attributes. It should be noted that types are hierarchical. For example real and integer types both belong to the numeric type. The value type filter is configured by the parameters value type, use value type exception, except value type.
- block_type: This option allows the selection of all the attributes of a particular block type to be time series attributes. It should be noted that block types may be hierarchical. For example value_series_start and value_series_end block types both belong to the value_series block type. The block type filter is configured by the parameters block type, use block type exception, except block type.
- no_missing_values: This option selects all attributes of the ExampleSet as time series attributes which do not contain a missing value in any example. Attributes that have even a single missing value are not selected.
- numeric_value_filter: All numeric attributes whose examples all match a given numeric condition are selected as time series attributes. The condition is specified by the numeric condition parameter.
- attribute
The required attribute can be selected from this option. The attribute name can be selected from the drop down box of the parameter if the meta data is known.
Range: - attributes
The required attributes can be selected from this option. This opens a new window with two lists. All attributes are present in the left list. They can be shifted to the right list, which is the list of selected time series attributes.
Range: - regular_expression
Attributes whose names match this expression will be selected. The expression can be specified through the edit and preview regular expression menu. This menu gives a good idea of regular expressions and it also allows you to try different expressions and preview the results simultaneously.
Range: - use_except_expression
If enabled, an exception to the first regular expression can be specified. This exception is specified by the except regular expression parameter.
Range: - except_regular_expression
This option allows you to specify a regular expression. Attributes matching this expression will be filtered out even if they match the first expression (expression that was specified in regular expression parameter).
Range: - value_type
This option allows to select a type of attribute.
Range: - use_value_type_exception
If enabled, an exception to the selected type can be specified. This exception is specified by the except value type parameter.
Range: - except_value_type
The attributes matching this type will be removed from the final output even if they matched the before selected type, specified by the value type parameter.
Range: - block_type
This option allows to select a block type of attribute.
Range: - use_block_type_exception
If enabled, an exception to the selected block type can be specified. This exception is specified by the except block type parameter.
Range: - except_block_type
The attributes matching this block type will be removed from the final output even if they matched the before selected type by the block type parameter.
Range: - numeric_condition
The numeric condition used by the numeric condition filter type. A numeric attribute is selected if all examples match the specified condition for this attribute. For example the numeric condition '> 6' will keep all numeric attributes having a value of greater than 6 in every example. A combination of conditions is possible: '> 6 && < 11' or '<= 5 || < 0'. But && and || cannot be used together in one numeric condition. Conditions like '(> 0 && < 2) || (>10 && < 12)' are not allowed because they use both && and ||.
Range: - invert_selection
If this parameter is set to true the selection is reversed. In that case all attributes not matching the specified condition are selected as time series attributes. Special attributes are not selected independent of the invert selection parameter as along as the include special attributes parameter is not set to true. If so the condition is also applied to the special attributes and the selection is reversed if this parameter is checked.
Range: - include_special_attributes
Special attributes are attributes with special roles. These are: id, label, prediction, cluster, weight and batch. Also custom roles can be assigned to attributes. By default special attributes are not selected as time series attributes irrespective of the filter conditions. If this parameter is set to true, special attributes are also tested against conditions specified and those attributes are selected that match the conditions.
Range: - has_indices
This parameter indicates if there is an index attribute associated with the time series. If this parameter is set to true, an additional attribute is added which contains the value of the last index value in the corresponding window and is named: Last <indices attribute> in window. The role of this attribute is set to the ID role. If this parameter is set to false, an attribute named Window id is added with the number of the corresponding window (starting from 0).
Range: - indices_attribute
If the parameter has indices is set to true, this parameter defines the associated index attribute. It can be either a date, date_time or numeric value type attribute. The attribute name can be selected from the drop down box of the parameter if the meta data is known.
Range: - window_size
The number of values in one window. The windowed ExampleSet will contain one attribute per value in the window. The attributes are named <time series attribute name> - i with i running from (window size - 1), ..., 0 The window size has to be smaller or equal to the length of the time series.
Range: - no_overlapping_windows
If this parameter is set to true, the parameter stepsize is determined automatically, so that windows and horizons don't overlap. The stepsize is set to window size + horizon size + horizon offset.
Range: - step_size
The step size between the first values of two consecutive windows. E.g. with a window size of 10 and a step size of 2, the first window has the values from 0, ..., 9, the second window the values from 2, ..., 11 and so on. If no overlaping windows is set to true the step size is automatically determined depending on window size, horizon size and horizon offset.
Range: - create_horizon_(labels)
If this parameter is set to true, one or more label attributes are created, containing the horizon of the windows. The parameter horizon attribute, horizon size and horizon offset are used to define the horizon.
Range: - horizon_attribute
If the parameter create horizon (labels) is set to true, this parameter defines the attribute holding the horizon. The attribute name can be selected from the drop down box of the parameter if the meta data is known.
Range: - horizon_size
The number of values taken as the horizon. For each value in the horizon, an attribute is created, named <time series attribute name> + i (horizon) with i running from 1, ..., horizon size. If the horizon size is one, the role of the created attribute is set to label. If the size is larger (and thus more than one attribute is created), the role of the attributes are set to Horizon + i, with i running from 1, ..., horizon size.
Range: - horizon_offset
The offset between the windows and their corresponding horizon. If the offset is 0 the horizon is taken from the consecutive values behind the window. Otherwise the horizon is horizon offset values behind the end of the window.
Range:
Tutorial Processes
Windowing of the Lake Huron data set
Simple windowing of the Lake Huron data set.