You are viewing the RapidMiner Studio documentation for version 10.1 - Check here for latest version
Highest Peak Transformation (Time Series)
Synopsis
This operator performs a Highest Peak Transformation for one or more time series attributes.Description
A peak transformation detects peaks in the time series and outputs an indicator series (and optional a peaked series) as the result. The meaning of the indicator series and the actual peak detection algorithm are described below.
The maximum number n of peaks to be extracted is defined by the parameter number of peaks, the type of peaks to be detected is defined by the parameter peak types.
The indicator time series consists of the flag values :
- (0) no peak
- (1) maximum
- (-1) minimum
The operator provides the original time series, the indicator time series and (if parameter add peaked series is selected) the peaked time series at the peak transformed example set outputport. The peaked time series has all values set to missing where there is no peak (indicator series is 0).
The highest peak detection algorithm checks for extrema in the time series and adds all values around the extrema to the peak for which the relative change between two values is larger than the minimum change. The parameter sloppy values defines how many values are allowed which not fulfill this condition. Note that only values above/below (for maximum/minimum) the average are considered to be peak values. An heuristic (see parameter use heuristics) can be used to determine values for the parameters sloppy values and minimum change.
The exact peak detection procedure is as follows:
- 1. Find the global extremum in the current Area (start Area is the whole time series). The method only consider values above/below the average as candidates for an extremum. The actual Area is skipped, if no value above/below the average exist in the Area.
- 2. Found left and right end of the peak
- 3. Add current peak to the result
- 4. Repeat steps 1.-3. for the Areas left and right of the current Area.
The procedure to find the left and right end of the peak is as follows:
- 1. Check if the next value left/right to the last value fulfills the peak condition:
- The next value has to be above/below (maximum/minimum) the average. If it is not, the search for peak values is stopped
- The relative change (decrease/increase for maximum/minimum) between last value and next value has to be larger than the minimum change per step (see the description of the parameter minimum change for more details).
- Allow for sloppy values number of values where the relative change is not larger than the minimum change
- 2. If the peak condition is fulfilled, update last value to the next value
- 3. Repeat steps 1.-2. until peak condition is not fulfilled
If a peak is detected, the high-low amplitude of the peak is calculated. Therefore the minimum and maximum values in the whole peak area (and 1 slice left and right of the peak area) are calculated. The high-low amplitude is the difference between maximum and minimum in the peak area. The operator only returns the n highest peaks in terms of the high-low amplitude of the peaks.
This operator works only on numerical time series.
Input
- example set (Data Table)
The ExampleSet which contains the time series data as attributes.
Output
- peak transformed example set (Data Table)
The ExampleSet containing the results of the peak transformation. It contains the original time series, the peak indicator time series (peak flag values (-1,0,+1)) for the selected attributes and optionally the peaked time series.
- original (Data Table)
The ExampleSet that was given as input is passed through without changes.
Parameters
- attribute_filter_type
This parameter allows you to select the filter for the time series attributes selection filter; the method you want to select the attributes which holds the time series values. Only numeric attributes can be selected as time series attributes. The different filter types are:
- all: This option selects all attributes of the ExampleSet to be time series attributes. This is the default option.
- single: This option allows the selection of a single time series attribute. The required attribute is selected by the attribute parameter.
- subset: This option allows the selection of multiple time series attributes through a list (see parameter attributes). If the meta data of the ExampleSet is known all attributes are present in the list and the required ones can easily be selected.
- regular_expression: This option allows you to specify a regular expression for the time series attribute selection. The regular expression filter is configured by the parameters regular expression, use except expression and except expression.
- value_type: This option allows selection of all the attributes of a particular type to be time series attributes. It should be noted that types are hierarchical. For example real and integer types both belong to the numeric type. The value type filter is configured by the parameters value type, use value type exception, except value type.
- block_type: This option allows the selection of all the attributes of a particular block type to be time series attributes. It should be noted that block types may be hierarchical. For example value_series_start and value_series_end block types both belong to the value_series block type. The block type filter is configured by the parameters block type, use block type exception, except block type.
- no_missing_values: This option selects all attributes of the ExampleSet as time series attributes which do not contain a missing value in any example. Attributes that have even a single missing value are not selected.
- numeric_value_filter: All numeric attributes whose examples all match a given numeric condition are selected as time series attributes. The condition is specified by the numeric condition parameter.
- attribute
The required attribute can be selected from this option. The attribute name can be selected from the drop down box of the parameter if the meta data is known.
Range: - attributes
The required attributes can be selected from this option. This opens a new window with two lists. All attributes are present in the left list. They can be shifted to the right list, which is the list of selected time series attributes.
Range: - regular_expression
Attributes whose names match this expression will be selected. The expression can be specified through the edit and preview regular expression menu. This menu gives a good idea of regular expressions and it also allows you to try different expressions and preview the results simultaneously.
Range: - use_except_expression
If enabled, an exception to the first regular expression can be specified. This exception is specified by the except regular expression parameter.
Range: - except_regular_expression
This option allows you to specify a regular expression. Attributes matching this expression will be filtered out even if they match the first expression (expression that was specified in regular expression parameter).
Range: - value_type
This option allows to select a type of attribute. One of the following types can be chosen: numeric, integer, real.
Range: - use_value_type_exception
If enabled, an exception to the selected type can be specified. This exception is specified by the except value type parameter.
Range: - except_value_type
The attributes matching this type will be removed from the final output even if they matched the before selected type, specified by the value type parameter. One of the following types can be selected here: numeric, integer, real.
Range: - block_type
This option allows to select a block type of attribute. One of the following types can be chosen: value_series, value_series_start, value_series_end.
Range: - use_block_type_exception
If enabled, an exception to the selected block type can be specified. This exception is specified by the except block type parameter.
Range: - except_block_type
The attributes matching this block type will be removed from the final output even if they matched the before selected type by the block type parameter. One of the following block types can be selected here: value_series, value_series_start, value_series_end.
Range: - numeric_condition
The numeric condition used by the numeric condition filter type. A numeric attribute is selected if all examples match the specified condition for this attribute. For example the numeric condition '> 6' will keep all numeric attributes having a value of greater than 6 in every example. A combination of conditions is possible: '> 6 && < 11' or '<= 5 || < 0'. But && and || cannot be used together in one numeric condition. Conditions like '(> 0 && < 2) || (>10 && < 12)' are not allowed because they use both && and ||.
Range: - invert_selection
If this parameter is set to true the selection is reversed. In that case all attributes not matching the specified condition are selected as time series attributes. Special attributes are not selected independent of the invert selection parameter as along as the include special attributes parameter is not set to true. If so the condition is also applied to the special attributes and the selection is reversed if this parameter is checked.
Range: - include_special_attributes
Special attributes are attributes with special roles. These are: id, label, prediction, cluster, weight and batch. Also custom roles can be assigned to attributes. By default special attributes are not selected as time series attributes irrespective of the filter conditions. If this parameter is set to true, special attributes are also tested against conditions specified and those attributes are selected that match the conditions.
Range: - has_indices
This parameter indicates if there is an index attribute associated with the time series. If this parameter is set to true, the index attribute has to be selected.
Range: - indices_attribute
If the parameter has indices is set to true, this parameter defines the associated index attribute. It can be either a date, date_time or numeric value type attribute. The attribute name can be selected from the drop down box of the parameter if the meta data is known.
Range: - sort_time_series
If this parameter is selected, the input time series will be sorted, according to the selected indices attribute, before the time series operation is applied on. If it is not selected and the input time series is not sorted, a corresponding User Error is thrown.
Keep in mind that the indices values still needs to be unique. If the values are non-unique a corresponding User Error is thrown.
The data set provided at the original output port will be the sorted input time series.
Range: - number_of_peaks
Maximum number of peaks to be detected. If the highest peak detection algorithm detects more peaks, only the largest (in terms of high-low amplitude of the peaks) are kept. Be aware that this maximum number is either for both peak types separately or combined (see parameter peak types).
Range: - peak_types
This parameter defines the types (maximum/minimum) of peaks to be detected by the peak detection algorithm. n is the value of the number of peaks parameter.
- only maxima: Only maximum peaks are detected. (maximal number of peaks is n)
- only minima: Only minimum peaks are detected. (maximal number of peaks is n)
- maxima and minima separately: Both maximum and minimum peaks are detected. The number of peaks is counted for each type separately (so that the maximal number of peaks is 2n)
- maxima and minima combined: Both maximum and minimum peaks are detected. The number of peaks is counted for both types combined (so that the maximal number of peaks is n)
- use_heuristics
If selected the parameters sloppy values and minimum change are determined by an heuristic (n = <length of time series>):
sloppy values is set to sqrt(n / 2.0).
minimum change is set to the average of (percentile(90) - mean) / (std x 0.1 x n) (only maximum) or (mean - percentile(10)) / (std x 0.1 x n) (only minimum) or (percentile(90) - percentile(10)) / (std x 0.1 x n) (both peak types) over all selected time series (at maximum 0.5).
Be aware that this is only a rough heuristic, for optimized results the parameters have to be adapted to your data.
Range: - sloppy_values
Allowed number of sloppy values (values for which the relative change is smaller than the minimum change) until end of current peak is reached. Number of sloppy values should be increased for noisy data.
Range: - minimum_change
Threshold on the relative change between last and next value to count the next value as a peak value.
The relative change is calculated as the decrease / increase (maximum / minimum) between next value and last value divided by the distance of last value to the average:
relative change = (lastValue - nextValue) / (lastValue - average)
If the relative change is larger than the minimum change the next value is counted as a peak value. As the minimum change is the threshold per slice, the parameter defines how sharp a peak has to be, to be detected. If you expect wide peaks in your data, decrease the minimum change.
Range: - add_peaked_series
If selected the peaked series will be added, which contains the actual values for the detected peaks and missing values for non-peak areas.
Range: - ignore_invalid_values
if selected invalid values (missing, positive and negative infinity) are ignored in the peak detection algorithm.
Range:
Tutorial Processes
Detecting Peaks on artifical time series
This tutorial process demonstrate the basic usage of the Highest Peak Transformation operator. An artificial time series data set is created. Several types of time series signals are combined (two oscillations, three normal distributed peaks, a trend and noise).
The Highest Peak Transformation operator is used to detect the 4 highest Peaks (minima and maxima) in the time series. The 3 normal distributed peaks are correctly identified. In addition a part of the trend is identified as the 4. peak in the time series.