Categories

Versions

You are viewing the RapidMiner Studio documentation for version 9.1 - Check here for latest version

Forecast Validation (Time Series)

Synopsis

This operator performs a validation of a forecast model, which predicts the future values of a time series.

Description

The operator creates sliding windows from the input time series, specified by the time series attribute parameter. In each validation step the training window is provided at the inner training set port of the Training subprocess. Its size is defined by the parameter window size. The training window can be used to train a forecast model (e.g. an ARIMA model, by the ARIMA operator), which has to be provided to the model port of the Training subprocess.

The inner test set port of the Testing subprocess, contains the values of the test window. Its size is defined by the parameter horizon size. The forecast model of the Training subprocess is used to predict these values. Contrary to the Cross Validation operator the number of values which has to be forecasted by the forecast model has to be equal to the horizon size. Thus, the forecasted values are already added to the ExampleSet provided at the test set port, an additional Apply Forecast operator is not necessary. The attribute holding the test window values has the label role, while the attribute holding the forecasted values has the prediction role. Thus a Performance operator (e.g. Performance (Regression)) can be used to calculate the performance of the forecast.

For the next validation fold, the training and the test windows are shifted by k values, defined by the parameter step size. If the parameter no overlapping windows is set to true, the step size is set to a value so that neither the training window nor the test window are overlapping (step size = window size + horizon size).

The Forecast Validation operator delivers the forecast model of the last fold, which was trained on the last training window in the time series. It also deliver all test set ExampleSets, appended to one ExampleSet and the averaged Performance Vector.

This operator works only on numerical time series.

Input

  • example set (Data Table)

    The ExampleSet which contains the time series data as an attribute.

Output

  • model (Model)

    The forecast model of the last fold, which was trained on the last training window in the time series.

  • example set (Data Table)

    The ExampleSet that was given as input is passed through without changes.

  • test result set (Data Table)

    All test set ExampleSets, appended to one ExampleSet.

  • performance (Performance Vector)

    This is an expandable port. You can connect any performance vector (result of a Performance operator) to the result port of the inner Testing subprocess. The performance output port delivers the average of the performances over all folds of the validation

Parameters

  • time_series_attribute

    The time series attribute (numerical) holding the time series values for which the forecast model shall be build. The required attribute can be selected from this option. The attribute name can be selected from the drop down box of the parameter if the meta data is known.

    Range:
  • has_indices

    This parameter indicates if there is an index attribute associated with the time series. If this parameter is set to true, the index attribute has to be selected.

    Range:
  • indices_attribute

    If the parameter has indices is set to true, this parameter defines the associated index attribute. It can be either a date, date_time or numeric value type attribute. The attribute name can be selected from the drop down box of the parameter if the meta data is known.

    Range:
  • window_size

    The number of values in the training window. The ExampleSet provided at the training set port of the Training subprocess will have window size number of examples. The window size has to be smaller or equal to the length of the time series.

    Range:
  • no_overlapping_windows

    If this parameter is set to true, the parameter stepsize is determined automatically, so that all windows and horizons don't overlap. The stepsize is set to window size + horizon size.

    Range:
  • step_size

    The step size between the first values of two consecutive windows. E.g. with a window size of 10 and a step size of 2, the first window has the values from 0, ..., 9, the second window the values from 2, ..., 11 and so on. If no overlaping windows is set to true the step size is automatically determined depending on window size and horizon size.

    Range:
  • horizon_size

    The number of values in the test window. The ExampleSet provided at the test set port of the Testing subprocess will have horizon size number of examples. It will have an attribute holding the original time series values in the test window (attribute name is the name of the time series attribute parameter), and an attribute holding the values in the test window, forecasted by the forecast model from the Training subprocess (attribute name is forecast of <time series attribute>). In addition, the ExampleSet has an attribute with the forecast position, ranging from 1 to horizon size. If the parameter has indices is set to true the ExampleSet has also an attribute holding the last index value of the training window.

    Range:
  • enable_parallel_execution

    This parameter enables the parallel execution of the inner processes. Please disable the parallel execution if you run into memory problems.

    Range:

Tutorial Processes

Validate the performance of an ARIMA model for Lake Huron

In this process the Forecast Validation operator is used to validate the performance of an ARIMA model for the Lake Huron data set. The ARIMA model is trained on a training window with a size of 20. This model is used to forecast the next 5 ( horizon size ) values of the time series. The forecasted values are compared to the original ones, to calculate the performance of the forecast model.

The step size is set to 5, so the training and test windows are shifted by 5 in each validation fold.