You are viewing the RapidMiner Studio documentation for version 10.0 - Check here for latest version
Sliding Window Validation (Time Series)
Synopsis
This operator performs a sliding window validation for a machine learning model trained on time dependent input data.Description
The operator creates sliding windows from the input data. In each validation step the training window is provided at the inner training set port of the Training subprocess. The size of the training window is defined by the parameter training window size. The training window can be used to train a machine learning model which has to be provided to the model port of the Training subprocess.
The test window of the input data is provided at the inner test set port of the Testing subprocess. Its size is defined by the parameter test window size. The model trained in the Training subprocess is provided at the model port of the Testing subprocess. It can be applied on the test set. The performance of this prediction can be evaluated and the performance vector has to be provided to the performance port of the Testing process. For the next validation fold, the training and the test windows are shifted by k values, defined by the parameter step size.
The described behavior is the default example based windowing. It can be changed to time based windowing or custom windowing by changing the unit parameter. For time based windowing, the windowing parameter are specified in time durations/periods. For the "custom" windowing an additional ExampleSet has to be provided to the new "custom windows" input port. It holds the start (and optional the stop values) of the windows. For more details see the unit parameter and the description of the corresponding parameters.
Expert settings (for example no overlapping windows, the empty window handling, ..) can be enabled by selecting the corresponding expert settings parameter.
The sliding window validation ensures that the machine learning model built in the Training subprocess is always evaluated on Examples which are after the training window.
If the model output port of the Sliding Window Validation operator is connected, a final window with the same size as the training windows, but ending at the last example of the input series is used to train a final model. This final model is provided at the model output port.
This operator works on all time series (numerical, nominal and time series with date time values).
Input
- example set (Data Table)
This input port receives an ExampleSet to apply the sliding window validation.
- custom windows (Data Table)
The example set which contains the start (and stop) values of the custom windows. Only needs to be connected if the parameter unit is set to custom.
Output
- model (Model)
If the model output port of the Sliding Window Validation operator is connected, a final window with the same size as the training windows, but ending at the last example of the input series is used to train a final model. This final model is provided at the model output port.
- example set (Data Table)
The ExampleSet that was given as input is passed through without changes.
- test result set (Data Table)
All test set ExampleSets, appended to one ExampleSet.
- performance (Performance Vector)
This is an expandable port. You can connect any performance vector (result of a Performance operator) to the result port of the inner Testing subprocess. The performance output port delivers the average of the performances over all folds of the validation
Parameters
- has_indices
This parameter indicates if there is an index attribute associated with the time series. If this parameter is set to true, the index attribute has to be selected.
Range: - indices_attribute
If the parameter has indices is set to true, this parameter defines the associated index attribute. It can be either a date, date_time or numeric value type attribute. The attribute name can be selected from the drop down box of the parameter if the meta data is known.
Range: - sort_time_series
If this parameter is selected, the input time series will be sorted, according to the selected indices attribute, before the time series operation is applied on. If it is not selected and the input time series is not sorted, a corresponding User Error is thrown.
Keep in mind that the indices values still needs to be unique. If the values are non-unique a corresponding User Error is thrown.
Range: - expert_settings
This parameter can be selected to show expert settings for a more detailed configuration of the operator. The expert settings are: windows defined, custom start point, custom end point, date format, no overlapping windows, and empty window handling.
Range: - unit
The mode on how windows are defined. It defines the unit of the window parameters (training window size, step size, test window size and test window offset).
- example based: The window parameters are specified in number of examples. This is the default option.
- time based: The window parameter are specified in time durations/periods (units ranging from milliseconds to years).
- custom: An additional example set has to be provided to the new "custom windows" input port. It holds the start (and optional the stop values) of the windows.
- windows_defined
This parameter defines the point from which the windows are defined of. It is an expert setting and hence it is only shown if the parameter expert settings is selected.
- from start: The first window will start at the first example of the input data set. The following windows are set up according to the window parameters.
- from end: The last window will end at the last example of the input data set. The previous windows are set up according to the window parameters.
- custom start: The first window will start at the custom start point provided by the parameter custom start point / custom start time. The following windows are set up according to the window parameters.
- custom end: The last window will end at the custom end point provided by the parameter custom end point / custom end time. The previous windows are set up according to the window parameters.
- custom_start_point
If the parameter windows defined is set to custom start and the unit is set to example based, this parameter defines the custom point from which the windows start. It is an expert setting and hence it is only shown if the parameter expert settings is selected.
Range: - custom_end_point
If the parameter windows defined is set to custom end and the unit is set to example based, this parameter defines the custom point where the windows end. It is an expert setting and hence it is only shown if the parameter expert settings is selected.
Range: - custom_start_time
If the parameter windows defined is set to custom start and the unit is set to time based, this parameter defines the custom date time point from which the windows start.
The date time format used to interpret the string provided in this parameter is defined by the parameter date format. It is an expert setting and hence it is only shown if the parameter expert settings is selected.
Range: - custom_end_time
If the parameter windows defined is set to custom end and the unit is set to time based, this parameter defines the custom date time point where the windows end.
The date time format used to interpret the string provided in this parameter is defined by the parameter date format. It is an expert setting and hence it is only shown if the parameter expert settings is selected.
Range: - date_format
Date format used for the custom start time and custom end time parameters. It is an expert setting and hence it is only shown if the parameter expert settings is selected.
Range: - training_window_size
The number of values in the training window. The ExampleSet provided at the training set port of the Training subprocess will have training window size number of examples. The training window size has to be smaller or equal to the length of the time series.
Range: - training_window_size_time
The time duration/period of the training window.
The example set provided at the training set port of the Training subprocess will have all examples which are in the corresponding training window.
The training window size time has to be smaller or equal to the time duration of the time series.
Range: - no_overlapping_windows
If this parameter is set to true, the parameter stepsize is determined automatically, so that all training and test windows don't overlap. The stepsize is set to training window size + test window size. It is an expert setting and hence it is only shown if the parameter expert settings is selected.
Range: - step_size
The step size between the first values of two consecutive windows. E.g. with a training window size of 10 and a step size of 2, the first training window has the values from 0, ..., 9, the second training window the values from 2, ..., 11 and so on. If no overlapping windows is set to true the step size is automatically determined depending on training window size and test window size.
Range: - step_size_time
The step size (in units of time) between the start points of two consecutive windows. E.g. with a training window size of 1 week and a step size of 2 days, the first training window has the days from 0, ..., 6, the second training window the days from 2, ..., 8 and so on. If no overlapping windows is set to true the step size time is automatically determined depending on training window size time, test window size time and test window offset time.
Range: - test_window_size
The number of values in the test window. The ExampleSet provided at the test set port of the Testing subprocess will have test window size number of examples. The test window size has to be smaller or equal to the length of the time series.
Range: - test_window_size_time
The time duration/period taken in the test window.
The ExampleSet provided at the test set port of the Testing subprocess will have the examples in the corresponding test windows. It will have an attribute holding the original time series values in the test window (attribute name is the name of the time series attribute parameter), and an attribute holding the values in the test window, forecasted by the forecast model from the Training subprocess (attribute name is forecast of <time series attribute>). In addition, the ExampleSet has an attribute with the forecast position, ranging from 1 to maximum number of test values. If the parameter has indices is set to true the ExampleSet has also an attribute holding the last index value of the training window.
Range: - windows_stop_definition
Defines if the end of the custom windows are either defined by the start of the next window (windows are spanning over the whole index range) or from an additional attribute.
- from next window start: The end of the windows are defined by the start of the next window (windows are spanning over the whole index range) Training windows end at the start of the next test window. Test windows end at the start of the next training window. Be aware that the last value of the start definition values (the last value of the test window start attribute) is only used as the end of the final window.
- from attribute: The end of the windows are defined by additional attribute(s) in the custom window example set. The attribute names have to be provided by the parameters training window stop attribute and test window stop attribute.
- training_window_start_attribute
This parameter defines the attribute in the custom window example set (the example set provided at the custom windows input port) which contains the start values for the custom training windows.
The training window start attribute, training window stop attribute, test window start attribute and test window stop attribute have to be of the same data type. If the data type is integer, the windowing is example based (see parameter unit) otherwise the attributes needs to be the same data type as the indices attribute.
Range: - training_window_stop_attribute
This parameter defines the attribute in the custom window example set (the example set provided at the custom windows input port) which contains the end values for the custom training windows.
The training window start attribute, training window stop attribute, test window start attribute and test window stop attribute have to be of the same data type. If the data type is integer, the windowing is example based (see parameter unit) otherwise the attributes needs to be the same data type as the indices attribute.
Range: - test_window_start_attribute
This parameter defines the attribute in the custom window example set (the example set provided at the custom windows input port) which contains the start values for the custom test windows.
The training window start attribute, training window stop attribute, test window start attribute and test window stop attribute have to be of the same data type. If the data type is integer, the windowing is example based (see parameter unit) otherwise the attributes needs to be the same data type as the indices attribute.
Range: - test_window_stop_attribute
This parameter defines the attribute in the custom window example set (the example set provided at the custom windows input port) which contains the stop values for the custom test windows.
The training window start attribute, training window stop attribute, test window start attribute and test window stop attribute have to be of the same data type. If the data type is integer, the windowing is example based (see parameter unit) otherwise the attributes needs to be the same data type as the indices attribute.
Range: - empty_window_handling
This parameter defines how empty windows (windows which do not contain an Example) will be handled. It is an expert setting and hence it is only shown if the parameter expert settings is selected.
- add empty exampleset: Empty windows will be added as an empty ExampleSet, or a row with missing values.
- skip: Empty windows will be skipped completely in the processing. If either the training or the test window is empty, the processing for both windows is skipped.
- fail: A user error is thrown, if an empty window occurs.
- enable_parallel_execution
This parameter enables the parallel execution of the inner processes. Please disable the parallel execution if you run into memory problems.
Range:
Tutorial Processes
Validate the performance of a GBT trained to forecast the gas price
In this process the Sliding Window Validation operator is used to validate the performance of a GBT trained to predict the price of gas 24 hours in the future.
See the comments in the process for details.
Use time based windowing to train and test on complete months of a daily Sales data set
In this tutorial process a fictive Sales data set with daily entries is created. The Sliding Window Validation operator with time based windowing is used to perform a training of a Multi-Horizon Forecast Model (with a linear regression as the inner model) on three months of the input data and the model is validated on the data of the following months.
Use custom windowing to define your own training and test windows
In this process an ExampleSet holding the fictive dates of fiscal quarters of a company is created. This ExampleSet is used as custom windows for the Sliding Window Validation operator to define custom training and test windows.
On a fictive Sales data set, the Sliding Window Validation operator trains Multi-Horizon Forecast Model (linear regression as the inner model) on the custom training window and evaluates its performance on the following custom test window.