Z-Score Peak Transformation (Time Series)
SynopsisThis operator performs a Z-Score Peak Transformation for one or more time series attributes.
A peak transformation detects peaks in the time series and outputs an indicator series (and optional a peaked series) as the result. The meaning of the indicator series and the actual peak detection algorithm are described below.
The maximum number n of peaks to be extracted is defined by the parameter number of peaks, the type of peaks to be detected is defined by the parameter peak types.
The indicator time series consists of the flag values :
- (0) no peak,
- (1) maximum,
- (-1) minimum
The operator provides the original time series, the indicator time series and (if parameter add peaked series is selected) the peaked time series at the peak transformed example set outputport. The peaked time series has all values set to missing where there is no peak (indicator series is 0).
The Z-Score peak detection algorithm calculates for each data point if it deviates from a moving average by a given threshold and flags it as a peak if so. The size of the moving average is defined by the parameter lag. If the point is above the average it's flagged as a positive peak (1) or as a negative peak (-1) if it's below. By default the average is calculated by the mean of the data and the deviation is measured by calculating the standard deviation (this is called the z-score). Alternatively the more robust measures of median and interquartile range (IQR) can be used. Also an influence factor determines how strong previous peaks influence the z-score. This algorithm was originally proposed by Jean-Paul van Brakel (https://stackoverflow.com/a/22640362/4940080). An heuristic (see parameter use heuristics) can be used to determine values for the parameters.
If a peak is detected, the high-low amplitude of the peak is calculated. Therefore the minimum and maximum values in the whole peak area (and 1 slice left and right of the peak area) are calculated. The high-low amplitude is the difference between maximum and minimum in the peak area. The operator only returns the n highest peaks in terms of the high-low amplitude of the peaks.
This operator works only on numerical time series.
- example set (Data Table)
The ExampleSet which contains the time series data as attributes.
- peak transformed example set (Data Table)
The ExampleSet containing the results of the peak transformation. It contains the original time series, the peak indicator time series (peak flag values (-1,0,+1)) for the selected attributes and optionally the peaked time series.
- original (Data Table)
The ExampleSet that was given as input is passed through without changes.
This parameter allows you to select the filter for the time series attributes selection filter; the method you want to select the attributes which holds the time series values. Only numeric attributes can be selected as time series attributes. The different filter types are:
- all: This option selects all attributes of the ExampleSet to be time series attributes. This is the default option.
- single: This option allows the selection of a single time series attribute. The required attribute is selected by the attribute parameter.
- subset: This option allows the selection of multiple time series attributes through a list (see parameter attributes). If the meta data of the ExampleSet is known all attributes are present in the list and the required ones can easily be selected.
- regular_expression: This option allows you to specify a regular expression for the time series attribute selection. The regular expression filter is configured by the parameters regular expression, use except expression and except expression.
- value_type: This option allows selection of all the attributes of a particular type to be time series attributes. It should be noted that types are hierarchical. For example real and integer types both belong to the numeric type. The value type filter is configured by the parameters value type, use value type exception, except value type.
- block_type: This option allows the selection of all the attributes of a particular block type to be time series attributes. It should be noted that block types may be hierarchical. For example value_series_start and value_series_end block types both belong to the value_series block type. The block type filter is configured by the parameters block type, use block type exception, except block type.
- no_missing_values: This option selects all attributes of the ExampleSet as time series attributes which do not contain a missing value in any example. Attributes that have even a single missing value are not selected.
- numeric_value_filter: All numeric attributes whose examples all match a given numeric condition are selected as time series attributes. The condition is specified by the numeric condition parameter.
The required attribute can be selected from this option. The attribute name can be selected from the drop down box of the parameter if the meta data is known.Range:
The required attributes can be selected from this option. This opens a new window with two lists. All attributes are present in the left list. They can be shifted to the right list, which is the list of selected time series attributes.Range:
Attributes whose names match this expression will be selected. The expression can be specified through the edit and preview regular expression menu. This menu gives a good idea of regular expressions and it also allows you to try different expressions and preview the results simultaneously.Range:
If enabled, an exception to the first regular expression can be specified. This exception is specified by the except regular expression parameter.Range:
This option allows you to specify a regular expression. Attributes matching this expression will be filtered out even if they match the first expression (expression that was specified in regular expression parameter).Range:
This option allows to select a type of attribute. One of the following types can be chosen: numeric, integer, real.Range:
If enabled, an exception to the selected type can be specified. This exception is specified by the except value type parameter.Range:
The attributes matching this type will be removed from the final output even if they matched the before selected type, specified by the value type parameter. One of the following types can be selected here: numeric, integer, real.Range:
This option allows to select a block type of attribute. One of the following types can be chosen: value_series, value_series_start, value_series_end.Range:
If enabled, an exception to the selected block type can be specified. This exception is specified by the except block type parameter.Range:
The attributes matching this block type will be removed from the final output even if they matched the before selected type by the block type parameter. One of the following block types can be selected here: value_series, value_series_start, value_series_end.Range:
The numeric condition used by the numeric condition filter type. A numeric attribute is selected if all examples match the specified condition for this attribute. For example the numeric condition '> 6' will keep all numeric attributes having a value of greater than 6 in every example. A combination of conditions is possible: '> 6 && < 11' or '<= 5 || < 0'. But && and || cannot be used together in one numeric condition. Conditions like '(> 0 && < 2) || (>10 && < 12)' are not allowed because they use both && and ||.Range:
If this parameter is set to true the selection is reversed. In that case all attributes not matching the specified condition are selected as time series attributes. Special attributes are not selected independent of the invert selection parameter as along as the include special attributes parameter is not set to true. If so the condition is also applied to the special attributes and the selection is reversed if this parameter is checked.Range:
Special attributes are attributes with special roles. These are: id, label, prediction, cluster, weight and batch. Also custom roles can be assigned to attributes. By default special attributes are not selected as time series attributes irrespective of the filter conditions. If this parameter is set to true, special attributes are also tested against conditions specified and those attributes are selected that match the conditions.Range:
This parameter indicates if there is an index attribute associated with the time series. If this parameter is set to true, the index attribute has to be selected.Range:
If the parameter has indices is set to true, this parameter defines the associated index attribute. It can be either a date, date_time or numeric value type attribute. The attribute name can be selected from the drop down box of the parameter if the meta data is known.Range:
If this parameter is selected, the input time series will be sorted, according to the selected indices attribute, before the time series operation is applied on. If it is not selected and the input time series is not sorted, a corresponding User Error is thrown.
Keep in mind that the indices values still needs to be unique. If the values are non-unique a corresponding User Error is thrown.
The data set provided at the original output port will be the sorted input time series.Range:
Maximum number of peaks to be detected. If the Z-Score peak detection algorithm detects more peaks, only the largest (in terms of high-low amplitude of the peaks) are kept. Be aware that this maximum number is either for both peak types separately or combined (see parameter peak types).Range:
This parameter defines the types (maximum/minimum) to be detected by the peak detection algorithm. n is the value of the number of peaks parameter.
- only maxima: Only maximum peaks are detected. (maximal number of peaks is n)
- only minima: Only minimum peaks are detected. (maximal number of peaks is n)
- maxima and minima separately: Both maximum and minimum peaks are detected. The number of peaks is counted for each type separately (so that the maximal number of peaks is 2n)
- maxima and minima combined: Both maximum and minimum peaks are detected. The number of peaks is counted for both types combined (so that the maximal number of peaks is n)
If selected the parameters lag, threshold, influence and robust measures are determined by an heuristic.
incluence is set to 0.0, robust measures is set to false.
lag is set to sqrt(<length of time series>).
threshold is set to the average of (percentile(90) - mean) / (2 x std) (only maximum) or (mean - percentile(10)) / (2 x std) (only minimum) or (percentile(90) - percentile(10)) / (2 x std) (both peak types) over all selected time series.
Be aware that this is only a rough heuristic, for optimized results the parameters have to be adapted to your data.Range:
The size of the window of previous data points that are considered for the peak detection. As a result the points in the first window can't be scored. The less the data changes over time, the larger the lag can be. For more volatile time series, a smaller lag is better suited.Range:
Value of the Z-Score above which a point is flagged as a peak. The threshold represents the number of standard deviations above a point is flagged as a peak.Range:
The (relative) influence previous peaks have on the calculation of the Z-Score. If set to zero, they are completely ignored. An influence of 0 is therefore the most robust option (but assumes stationarity). If it's expected, that after a peak the data return to a normal value, an influence close to zero is appropriate.Range:
If selected, the more robust median and interquartile range (IQR) are used to calculate the Z-Score of a point. Otherwise the mean and standard deviation are used.Range:
If selected the peaked series will be added, which contains the actual values for the detected peaks and missing values for non-peak areas.Range:
if selected invalid values (missing, positive and negative infinity) are ingored in the peak detection algorithm.Range:
This tutorial process demonstrate the basic usage of the Z-Score Peak Transformation operator. The example is directly taken from the original presentation of the algorithm (https://stackoverflow.com/a/22640362/4940080).
It also shows the effect of the influence parameter. The influence factor of the second operator is set to 0.9. This causes that the high values of the first peak are used for calculating the mean and the z-score. Therefore the second and third spike in the data are no longer flagged as peaks.
Detecting Peaks on artifical time series
This tutorial process demonstrate the usage of the Z-Score Peak Transformation operator on an advanced time series. An artificial time series data set is created. Several types of time series signals are combined (two oscillations, three normal distributed peaks, a trend and noise).
The Z-Score Peak Transformation operator is used to detect the 4 highest Peaks (minima and maxima) in the time series. The first two normal distributed peaks are correctly identified as well as a smaller peak from the oszillation. In addition the raising trend at the end of the series is classified as a peak, due to the fact that the influence is set to 0 and the Z-Score Peak Transformation operator expects a stationary time series.