You are viewing the RapidMiner Studio documentation for version 10.0 - Check here for latest version
Process Windows (Time Series)
Synopsis
This operator creates windows from the input time series ExampleSet and loops over its inner subprocess for each of the windows.Description
A window, with the size defined by the window size parameter is created from the input time series and provided at the inner windowed example set port of the operator. Any operators can be inserted into the subprocess and work with the windowed ExampleSet. The results can than be provided to the output port. These output port is a port extender, which means, that a new output port is created every time you connect one of the ports.
If the parameter create horizon (labels) is set to true, additional attributes are added to ExampleSets provided at the output ports. See the description of the parameter for more details. For the next iteration the window is shifted by k values, defined by the step size parameter.
The described behavior is the default example based windowing. It can be changed to time based windowing or custom windowing by changing the unit parameter. For time based windowing, the windowing parameter are specified in time durations/periods. For the "custom" windowing an additional ExampleSet has to be provided to the new "custom windows" input port. It holds the start (and optional the stop values) of the windows. For more details see the unit parameter and the description of the corresponding parameters.
Expert settings (for example no overlapping windows, a value for the horizon offset, the empty window handling, the add last index in window attribute parameter ...) can be enabled by selecting the corresponding expert settings parameter.
This operator works on all time series (numerical, nominal and time series with date time values).
Input
- example set (Data Table)
The ExampleSet which contains the time series data as attributes.
- custom windows (Data Table)
The example set which contains the start (and stop) values of the custom windows. Only needs to be connected if the parameter unit is set to custom.
- input (IOObject)
This is port is a port extender, which means if a port is connected a new input port is created. Any IOObject can be connected to the port and is passed to the corresponding inner input port for each iteration.
Output
- output (IOObject)
This is port is a port extender, which means if a port is connected a new output port is created. The port collects every result that is provided by the inner process and returns a collections of all iterations. If the parameters create horizon (labels) or add last index in window attribute is set to true and the connected IOObject is an ExampleSet, additional attributes are added to the ExampleSets. See the description of the corresponding parameters for details.
Parameters
- attribute_filter_type
This parameter allows you to select the filter for the time series attributes selection filter; the method you want to select the attributes which holds the time series values. The different filter types are:
- all: This option selects all attributes of the ExampleSet to be time series attributes. This is the default option.
- single: This option allows the selection of a single time series attribute. The required attribute is selected by the attribute parameter.
- subset: This option allows the selection of multiple time series attributes through a list (see parameter attributes). If the meta data of the ExampleSet is known all attributes are present in the list and the required ones can easily be selected.
- regular_expression: This option allows you to specify a regular expression for the time series attribute selection. The regular expression filter is configured by the parameters regular expression, use except expression and except expression.
- value_type: This option allows selection of all the attributes of a particular type to be time series attributes. It should be noted that types are hierarchical. For example real and integer types both belong to the numeric type. The value type filter is configured by the parameters value type, use value type exception, except value type.
- block_type: This option allows the selection of all the attributes of a particular block type to be time series attributes. It should be noted that block types may be hierarchical. For example value_series_start and value_series_end block types both belong to the value_series block type. The block type filter is configured by the parameters block type, use block type exception, except block type.
- no_missing_values: This option selects all attributes of the ExampleSet as time series attributes which do not contain a missing value in any example. Attributes that have even a single missing value are not selected.
- numeric_value_filter: All numeric attributes whose examples all match a given numeric condition are selected as time series attributes. The condition is specified by the numeric condition parameter.
- attribute
The required attribute can be selected from this option. The attribute name can be selected from the drop down box of the parameter if the meta data is known.
Range: - attributes
The required attributes can be selected from this option. This opens a new window with two lists. All attributes are present in the left list. They can be shifted to the right list, which is the list of selected time series attributes.
Range: - regular_expression
Attributes whose names match this expression will be selected. The expression can be specified through the edit and preview regular expression menu. This menu gives a good idea of regular expressions and it also allows you to try different expressions and preview the results simultaneously.
Range: - use_except_expression
If enabled, an exception to the first regular expression can be specified. This exception is specified by the except regular expression parameter.
Range: - except_regular_expression
This option allows you to specify a regular expression. Attributes matching this expression will be filtered out even if they match the first expression (expression that was specified in regular expression parameter).
Range: - value_type
This option allows to select a type of attribute.
Range: - use_value_type_exception
If enabled, an exception to the selected type can be specified. This exception is specified by the except value type parameter.
Range: - except_value_type
The attributes matching this type will be removed from the final output even if they matched the before selected type, specified by the value type parameter.
Range: - block_type
This option allows to select a block type of attribute.
Range: - use_block_type_exception
If enabled, an exception to the selected block type can be specified. This exception is specified by the except block type parameter.
Range: - except_block_type
The attributes matching this block type will be removed from the final output even if they matched the before selected type by the block type parameter.
Range: - numeric_condition
The numeric condition used by the numeric condition filter type. A numeric attribute is selected if all examples match the specified condition for this attribute. For example the numeric condition '> 6' will keep all numeric attributes having a value of greater than 6 in every example. A combination of conditions is possible: '> 6 && < 11' or '<= 5 || < 0'. But && and || cannot be used together in one numeric condition. Conditions like '(> 0 && < 2) || (>10 && < 12)' are not allowed because they use both && and ||.
Range: - invert_selection
If this parameter is set to true the selection is reversed. In that case all attributes not matching the specified condition are selected as time series attributes. Sspecial attributes are selected independent of the invert selection parameter as along as the include special attributes parameter is not set to true. If so the condition is also applied to the special attributes and the selection is reversed if this parameter is checked.
Range: - include_special_attributes
Special attributes are attributes with special roles. These are: id, label, prediction, cluster, weight and batch. Also custom roles can be assigned to attributes. By default special attributes are selected as time series attributes irrespective of the filter conditions. If this parameter is set to true, special attributes are also tested against conditions specified and those attributes are selected that match the conditions.
Range: - has_indices
This parameter indicates if there is an index attribute associated with the time series. If this parameter is set to true, the index attribute has to be selected.
Range: - indices_attribute
If the parameter has indices is set to true, this parameter defines the associated index attribute. It can be either a date, date_time or numeric value type attribute. The attribute name can be selected from the drop down box of the parameter if the meta data is known.
Range: - sort_time_series
If this parameter is selected, the input time series will be sorted, according to the selected indices attribute, before the time series operation is applied on. If it is not selected and the input time series is not sorted, a corresponding User Error is thrown.
Keep in mind that the indices values still needs to be unique. If the values are non-unique a corresponding User Error is thrown.
Range: - expert_settings
This parameter can be selected to show expert settings for a more detailed configuration of the operator. The expert settings are: windows defined, custom start point, custom end point, date format, no overlapping windows, horizon offset, empty window handling and add last index in window attribute.
Range: - unit
The mode on how windows are defined. It defines the unit of the window parameters (window size, step size, horizon size and horizon offset).
- example based: The window parameters are specified in number of examples. This is the default option.
- time based: The window parameter are specified in time durations/periods (units ranging from milliseconds to years).
- custom: An additional example set has to be provided to the new "custom windows" input port. It holds the start (and optional the stop values) of the windows.
- windows_defined
This parameter defines the point from which the windows are defined of. It is an expert setting and hence it is only shown if the parameter expert settings is selected.
- from start: The first window will start at the first example of the input data set. The following windows are set up according to the window parameters.
- from end: The last window will end at the last example of the input data set. The previous windows are set up according to the window parameters.
- custom start: The first window will start at the custom start point provided by the parameter custom start point / custom start time. The following windows are set up according to the window parameters.
- custom end: The last window will end at the custom end point provided by the parameter custom end point / custom end time. The previous windows are set up according to the window parameters.
- custom_start_point
If the parameter windows defined is set to custom start and the unit is set to example based, this parameter defines the custom point from which the windows start. It is an expert setting and hence it is only shown if the parameter expert settings is selected.
Range: - custom_end_point
If the parameter windows defined is set to custom end and the unit is set to example based, this parameter defines the custom point where the windows end. It is an expert setting and hence it is only shown if the parameter expert settings is selected.
Range: - custom_start_time
If the parameter windows defined is set to custom start and the unit is set to time based, this parameter defines the custom date time point from which the windows start.
The date time format used to interpret the string provided in this parameter is defined by the parameter date format. It is an expert setting and hence it is only shown if the parameter expert settings is selected.
Range: - custom_end_time
If the parameter windows defined is set to custom end and the unit is set to time based, this parameter defines the custom date time point where the windows end.
The date time format used to interpret the string provided in this parameter is defined by the parameter date format. It is an expert setting and hence it is only shown if the parameter expert settings is selected.
Range: - date_format
Date format used for the custom start time and custom end time parameters. It is an expert setting and hence it is only shown if the parameter expert settings is selected.
Range: - window_size
The number of values in one window. The ExampleSet provided at the windowed example set port of the inner subprocess will have window size number of examples. The window size has to be smaller or equal to the length of the time series.
Range: - window_size_time
The time duration/period of one window. The example set provided at the windowed example set port of the inner subprocess will have all examples which are in the corresponding window. The window size time has to be smaller or equal to the time duration of the time series.
Range: - no_overlapping_windows
If this parameter is set to true, the parameter stepsize is determined automatically, so that windows and horizons don't overlap. The stepsize is set to window size + horizon size + horizon offset. It is an expert setting and hence it is only shown if the parameter expert settings is selected.
Range: - step_size
The step size between the first values of two consecutive windows. E.g. with a window size of 10 and a step size of 2, the first window has the values from 0, ..., 9, the second window the values from 2, ..., 11 and so on. If no overlapping windows is set to true the step size is automatically determined depending on window size, horizon size and horizon offset.
Range: - step_size_time
The step size (in units of time) between the start points of two consecutive windows. E.g. with a window size of 1 week and a step size of 2 days, the first window has the days from 0, ..., 6, the second window the days from 2, ..., 8 and so on. If no overlapping windows is set to true the step size time is automatically determined depending on window size time, horizon size time and horizon offset time.
Range: - create_horizon_(labels)
If this parameter is set to true, one or more attributes are added to all ExampleSets which are provided at the output port of the inner subprocess. They contain the values of the horizon window which is defined by the parameters horizon attribute, horizon size and horizon offset. Objects provided at the output ports, which aren't ExampleSet are not changed.
Range: - horizon_attribute
If the parameter create horizon (labels) is set to true, this parameter defines the attribute holding the horizon. The attribute name can be selected from the drop down box of the parameter if the meta data is known.
Range: - horizon_size
The number of values taken as the horizon. For each value in the horizon, an attribute is created, named <time series attribute name> + i (horizon) with i running from 1, ..., horizon size. If the horizon size is one, the role of the created attribute is set to label. If the size is larger (and thus more than one attribute is created), the role of the attributes are set to Horizon + i, with i running from 1, ..., horizon size.
Range: - horizon_size_time
The time duration/period taken as the horizon. An attribute per example in the horizon window is created and added to the result example sets. Hence, the horizon window with the most examples (maximum number of horizon values) will define how many attributes are added. For windows with less examples, the other attributes are filled with missing values.
The name of the new attributes are <time series attribute name> + i (horizon) with i running from 1, ..., maximum number of horizon values. If the maximum number of horizon values is one, the role of the created attribute is set to label. If it is larger the role of the attributes are set to Horizon + i, with i running from 1, ..., horizon size.
Range: - horizon_offset
The offset between the windows and their corresponding horizons. If the offset is 0 the horizon is taken from the consecutive values behind the window. Otherwise the horizon is horizon offset values behind the end of the window. It is an expert setting and hence it is only shown if the parameter expert settings is selected.
Range: - horizon_offset_time
The offset (in time units) between the windows and their corresponding horizons. If the offset is 0 the horizon is set directly behind the window. Otherwise the horizon starts the time duration provided by this parameter behind the end of the window. It is an expert setting and hence it is only shown if the parameter expert settings is selected.
Range: - windows_stop_definition
Defines if the end of the custom windows are either defined by the start of the next window (windows are spanning over the whole index range) or from an additional attribute.
- from next window start: The end of the windows are defined by the start of the next window (windows are spanning over the whole index range) Training windows end at the start of the next horizon window (or the next training window, if there aren't horizon windows). Horizon windows end at the start of the next training window. Be aware that the last value of the start definition values (the last value of the horizon start attribute or the last value of the window start attribute, if there aren't horizon windows) is only used as the end of the final window.
- from attribute: The end of the windows are defined by additional attribute(s) in the custom window example set. The attribute names have to be provided by the parameters window stop attribute and horizon stop attribute.
- window_start_attribute
This parameter defines the attribute in the custom window example set (the example set provided at the custom windows input port) which contains the start values for the custom training windows.
The window start attribute, window stop attribute, horizon start attribute and horizon stop attribute have to be of the same data type. If the data type is integer, the windowing is example based (see parameter unit) otherwise the attributes needs to be the same data type as the indices attribute.
Range: - window_stop_attribute
This parameter defines the attribute in the custom window example set (the example set provided at the custom windows input port) which contains the end values for the custom training windows.
The window start attribute, window stop attribute, horizon start attribute and horizon stop attribute have to be of the same data type. If the data type is integer, the windowing is example based (see parameter unit) otherwise the attributes needs to be the same data type as the indices attribute.
Range: - horizon_start_attribute
This parameter defines the attribute in the custom window example set (the example set provided at the custom windows input port) which contains the start values for the custom horizon windows.
The window start attribute, window stop attribute, horizon start attribute and horizon stop attribute have to be of the same data type. If the data type is integer, the windowing is example based (see parameter unit) otherwise the attributes needs to be the same data type as the indices attribute.
Range: - horizon_stop_attribute
This parameter defines the attribute in the custom window example set (the example set provided at the custom windows input port) which contains the stop values for the custom horizon windows.
The window start attribute, window stop attribute, horizon start attribute and horizon stop attribute have to be of the same data type. If the data type is integer, the windowing is example based (see parameter unit) otherwise the attributes needs to be the same data type as the indices attribute.
Range: - empty_window_handling
This parameter defines how empty windows (windows which do not contain an Example) will be handled. It is an expert setting and hence it is only shown if the parameter expert settings is selected.
- add empty exampleset: Empty windows will be added as an empty ExampleSet, or a row with missing values.
- skip: Empty windows will be skipped completely in the processing. If horizon windows are created as well and either the training or the horizon window is empty, the processing for both windows is skipped
- fail: A user error is thrown, if an empty window occurs.
- add_last_index_in_window_attribute
If this parameter is set to true, an additional attribute is added to all ExampleSets which are provided at the output port of the inner subprocess. If the parameter has indices is true, the additional attribute is named: Last <indices attribute> in window and contains the last index value in the corresponding window. If has indices is false the additional attribute is called Window id and contains the number of the corresponding window (starting form 0). Objects provided at the output ports, which aren't ExampleSet are not changed. It is an expert setting and hence it is only shown if the parameter expert settings is selected.
Range: - enable_parallel_execution
This parameter enables the parallel execution of the subprocess. Please disable the parallel execution if you run into memory problems.
Range:
Tutorial Processes
Extracting aggregates of windows of the Lake Huron data set
In this tutorial process, the Process Windows operator is used to loop over windows of size 30 of the Lake Huron data set. For each window, the Extract Aggregates operator is used to calculate some features of the window. The caculated features are then provided to the output port of the inner subprocess. Due to the parameters create horizon (labels) and add last index in window attribute are set to true, an attribute holding the horizon value (cause horizon size is 1) and an attribute holding the last Date (the indices attribute of the time series) in the window are added to the features ExampleSet.
The Append operator is used to append the features of all windows to one ExampleSet. This can be used to train a machine learning model.
Analyzing asynchronous log data with time based windowing
In this tutorial process, the Process Windows operator is used to perform 1 min windows of asynchronous (fictive) log data.
For each 1 min window, all log messages received are analyzed and the most frequent one computed, to get some insight about the behavior of the underlying process.
Use Extract Peaks and Custom Windowing to analysis industrial sensor data
In this tutorial process, an artificial sensor data set is created, with a few jumps in the numerical values added. These jumps shall indicate changes in the underlying industrial process.
The tutorial process demonstrate how to use the time series operators Differentiate and Extract Peaks to find these jumps and to convert them into custom windows. And how to use Process Windows with custom windowing to analyze the windows between the jumps independent of each other.