Categories

Versions

You are viewing the RapidMiner Studio documentation for version 9.7 - Check here for latest version

Forecast Validation (Time Series)

Synopsis

This operator performs a sliding window validation for a machine learning model trained on time dependent input data.

Description

The operator creates sliding windows from the input data. In each validation step the training window is provided at the inner training set port of the Training subprocess. The size of the training window is defined by the parameter training window size. The first training window starts with the first Example of the input data. The training window can be used to train a machine learning model which has to be provided to the model port of the Training subprocess.

The test window of the input data is provided at the inner test set port of the Testing subprocess. Its size is defined by the parameter test window size. The test window always starts with the first Example after the previous training window. The model trained in the Training subprocess is provided at the model port of the Testing subprocess. It can be applied on the test set. The performance of this prediction can be evaluated and the performance vector has to be provided to the performance port of the Testing process.

For the next validation fold, the training and the test windows are shifted by k values, defined by the parameter step size. If the parameter no overlapping windows is set to true, the step size is set to a value so that neither the training window nor the test window are overlapping (step size = training window size + test window size).

The sliding window validation ensures that the machine learning model built in the Training subprocess is always evaluated on Examples which are after the training window.

If the model output port of the Sliding Window Validation operator is connected a final execution of the Training subprocess is performed with all input Examples. The machine learning model built in this iteration is provided at the model output port.

This operator works on all time series (numerical, nominal and time series with date time values).

Input

  • example set (IOObject)

    This input port receives an ExampleSet to apply the sliding window validation.

Output

  • model (Model)

    This port delivers the prediction model trained on the whole ExampleSet. Please note that this port should only be connected if you really need this model because otherwise the generation will be skipped which improves runtime.

  • example set (IOObject)

    The ExampleSet that was given as input is passed through without changes.

  • test result set (IOObject)

    All test set ExampleSets, appended to one ExampleSet.

  • performance (Performance Vector)

    This is an expandable port. You can connect any performance vector (result of a Performance operator) to the result port of the inner Testing subprocess. The performance output port delivers the average of the performances over all folds of the validation

Parameters

  • has_indices

    This parameter indicates if there is an index attribute associated with the time series. If this parameter is set to true, the index attribute has to be selected.

    Range:
  • indices_attribute

    If the parameter has indices is set to true, this parameter defines the associated index attribute. It can be either a date, date_time or numeric value type attribute. The attribute name can be selected from the drop down box of the parameter if the meta data is known.

    Range:
  • training_window_size

    The number of values in the training window. The ExampleSet provided at the training set port of the Training subprocess will have training window size number of examples. The training window size has to be smaller or equal to the length of the time series.

    Range:
  • no_overlapping_windows

    If this parameter is set to true, the parameter stepsize is determined automatically, so that all training and test windows don't overlap. The stepsize is set to training window size + test window size.

    Range:
  • step_size

    The step size between the first values of two consecutive windows. E.g. with a training window size of 10 and a step size of 2, the first training window has the values from 0, ..., 9, the second training window the values from 2, ..., 11 and so on. If no overlaping windows is set to true the step size is automatically determined depending on training window size and test window size.

    Range:
  • test_window_size

    The number of values in the test window. The ExampleSet provided at the test set port of the Testing subprocess will have test window size number of examples. The test window size has to be smaller or equal to the length of the time series.

    Range:
  • enable_parallel_execution

    This parameter enables the parallel execution of the inner processes. Please disable the parallel execution if you run into memory problems.

    Range:

Tutorial Processes

Validate the performance of a GBT trained to forecast the gas price

In this process the Sliding Window Validation operator is used to validate the performance of a GBT trained to predict the price of gas 24 hours in the future.

See the comments in the process for details.