You are viewing the RapidMiner Studio documentation for version 10.0 - Check here for latest version
Extract Peaks (Time Series)
Synopsis
This operator performs a peak transformation on one or more time series attributes and provides the properties of the peaks as features.Description
This operator performs a peak transformation on one or more time series. The method to detect the peaks can be selected by the parameter peak detection method. For a detailed description of the peak detection algorithm, please see the help text of the corresponding Peak Transformation operators.
For each detected peak in the time series, the following features are extracted:
- peak_<nr>_value: The value of the extremum (minimum / maximum) of the peak
- peak_<nr>_extremum_position: The position (series index) of the extremum
- peak_<nr>_width: The width of the peak (in units of the indices attributes. For date time indices attributes this is milliseconds)
- peak_<nr>_high_low_amplitude: The amplitude between the highest and lowest value in the peak (and one slice left and right of the peak)
- peak_<nr>_type: The type (minimum / maximum) of the peak
If the parameter add index features is selected, additional features are extracted:
- peak_<nr>_left_position: The position (series index) of the left side of the peak
- peak_<nr>_center: The center position (series index) of the peak
- peak_<nr>_right_position: The position (series index) of the right side of the peak
If an indices attribute is provided, additional features are extracted:
- peak_<nr>_extremum_index_value: The value of the indices attribute at the extremum
- peak_<nr>_left_position: The value of the indices attribute at the left side of the peak (only extracted if parameter add index features is selected)
- peak_<nr>_center: The value of the indices attribute at the center of the peak (only extracted if parameter add index features is selected)
- peak_<nr>_right_position: The value of the indices attribute at the right side of the peak (only extracted if parameter add index features is selected)
The extracted features for every detected peak are provided as an ExampleSet at the features output port of the operator. In addition the original time series and the peaked time series (all non-peak values are set to null) are provided at the peaked output port of the operator.
Depending on the parameter add time series name the ExampleSet will have one example with attributes for all combination of time series and features, or n examples, one example per time series. In combination with the Process Windows operator, this operator can be used to calculate features of windows of time series as a preparation for a general machine learning problem.
This operator works only on numerical time series.
Input
- example set (Data Table)
The ExampleSet which contains the time series data as attributes.
Output
- features (Data Table)
The ExampleSet which contains the calculated peak features as attributes. Depending on the parameter add time series name the ExampleSet will have one example with attributes for all combination of time series and features, or n examples, one example per time series.
- original (Data Table)
The ExampleSet that was given as input is passed through without changes.
- peaked (Data Table)
This data set contains the original time series and the peaked time series. The peaked time series contains the values of the original time series for the areas where a peak was detected and missing values for non-peak areas. It can be used to compare the peaked time series and the original time series data.
Parameters
- attribute_filter_type
This parameter allows you to select the filter for the time series attributes selection filter; the method you want to select the attributes which holds the time series values. Only numeric attributes can be selected as time series attributes. The different filter types are:
- all: This option selects all attributes of the ExampleSet to be time series attributes. This is the default option.
- single: This option allows the selection of a single time series attribute. The required attribute is selected by the attribute parameter.
- subset: This option allows the selection of multiple time series attributes through a list (see parameter attributes). If the meta data of the ExampleSet is known all attributes are present in the list and the required ones can easily be selected.
- regular_expression: This option allows you to specify a regular expression for the time series attribute selection. The regular expression filter is configured by the parameters regular expression, use except expression and except expression.
- value_type: This option allows selection of all the attributes of a particular type to be time series attributes. It should be noted that types are hierarchical. For example real and integer types both belong to the numeric type. The value type filter is configured by the parameters value type, use value type exception, except value type.
- block_type: This option allows the selection of all the attributes of a particular block type to be time series attributes. It should be noted that block types may be hierarchical. For example value_series_start and value_series_end block types both belong to the value_series block type. The block type filter is configured by the parameters block type, use block type exception, except block type.
- no_missing_values: This option selects all attributes of the ExampleSet as time series attributes which do not contain a missing value in any example. Attributes that have even a single missing value are not selected.
- numeric_value_filter: All numeric attributes whose examples all match a given numeric condition are selected as time series attributes. The condition is specified by the numeric condition parameter.
- attribute
The required attribute can be selected from this option. The attribute name can be selected from the drop down box of the parameter if the meta data is known.
Range: - attributes
The required attributes can be selected from this option. This opens a new window with two lists. All attributes are present in the left list. They can be shifted to the right list, which is the list of selected time series attributes.
Range: - regular_expression
Attributes whose names match this expression will be selected. The expression can be specified through the edit and preview regular expression menu. This menu gives a good idea of regular expressions and it also allows you to try different expressions and preview the results simultaneously.
Range: - use_except_expression
If enabled, an exception to the first regular expression can be specified. This exception is specified by the except regular expression parameter.
Range: - except_regular_expression
This option allows you to specify a regular expression. Attributes matching this expression will be filtered out even if they match the first expression (expression that was specified in regular expression parameter).
Range: - value_type
This option allows to select a type of attribute. One of the following types can be chosen: numeric, integer, real.
Range: - use_value_type_exception
If enabled, an exception to the selected type can be specified. This exception is specified by the except value type parameter.
Range: - except_value_type
The attributes matching this type will be removed from the final output even if they matched the before selected type, specified by the value type parameter. One of the following types can be selected here: numeric, integer, real.
Range: - block_type
This option allows to select a block type of attribute. One of the following types can be chosen: value_series, value_series_start, value_series_end.
Range: - use_block_type_exception
If enabled, an exception to the selected block type can be specified. This exception is specified by the except block type parameter.
Range: - except_block_type
The attributes matching this block type will be removed from the final output even if they matched the before selected type by the block type parameter. One of the following block types can be selected here: value_series, value_series_start, value_series_end.
Range: - numeric_condition
The numeric condition used by the numeric condition filter type. A numeric attribute is selected if all examples match the specified condition for this attribute. For example the numeric condition '> 6' will keep all numeric attributes having a value of greater than 6 in every example. A combination of conditions is possible: '> 6 && < 11' or '<= 5 || < 0'. But && and || cannot be used together in one numeric condition. Conditions like '(> 0 && < 2) || (>10 && < 12)' are not allowed because they use both && and ||.
Range: - invert_selection
If this parameter is set to true the selection is reversed. In that case all attributes not matching the specified condition are selected as time series attributes. Special attributes are not selected independent of the invert selection parameter as along as the include special attributes parameter is not set to true. If so the condition is also applied to the special attributes and the selection is reversed if this parameter is checked.
Range: - include_special_attributes
Special attributes are attributes with special roles. These are: id, label, prediction, cluster, weight and batch. Also custom roles can be assigned to attributes. By default special attributes are not selected as time series attributes irrespective of the filter conditions. If this parameter is set to true, special attributes are also tested against conditions specified and those attributes are selected that match the conditions.
Range: - has_indices
This parameter indicates if there is an index attribute associated with the time series. If this parameter is set to true, the index attribute has to be selected.
Range: - indices_attribute
If the parameter has indices is set to true, this parameter defines the associated index attribute. It can be either a date, date_time or numeric value type attribute. The attribute name can be selected from the drop down box of the parameter if the meta data is known.
Range: - sort_time_series
If this parameter is selected, the input time series will be sorted, according to the selected indices attribute, before the time series operation is applied on. If it is not selected and the input time series is not sorted, a corresponding User Error is thrown.
Keep in mind that the indices values still needs to be unique. If the values are non-unique a corresponding User Error is thrown.
The data set provided at the original output port will be the sorted input time series.
Range: - peak_detection_method
This parameter defines which peak detection algorithm is used to detect the peaks from which the features are extracted.
- z-score: The Z-Score Peak detection algorithm is used. Please have a look into the help text of the Z-Score Peak Transformation operator for details of the algorithm.
- highest-peak: The Highest Peaks detection algorithm is used. Please have a look into the help text of the Highest Peak Transformation operator for details of the algorithm.
- number_of_peaks
Maximum number of peaks to be detected. If the peak detection algorithm detects more peaks, only the largest (in terms of high-low amplitude of the peaks) are kept. Be aware that this maximum number is either for both peak types separately or combined (see parameter peak types).
Range: - peak_types
This parameter defines the types (maximum/minimum) to be detected by the peak detection algorithm. n is the value of the number of peaks parameter.
- only maxima: Only maximum peaks are detected. (maximal number of peaks is n)
- only minima: Only minimum peaks are detected. (maximal number of peaks is n)
- maxima and minima separately: Both maximum and minimum peaks are detected. The number of peaks is counted for each type separately (so that the maximal number of peaks is 2n)
- maxima and minima combined: Both maximum and minimum peaks are detected. The number of peaks is counted for both types combined (so that the maximal number of peaks is n)
- use_heuristics
If selected the specific configuration parameters for the peak detection algorithm are determined by a heuristic. For details about the concrete heuristic used, see the help text of the use heuristics parameter of the corresponding Peak Transformation algorithm.
For the z-score peak detection algorithm these are the parameters lag, threshold, influence and robust measures.
For the highest peaks peak detection algorithm these are the parameters sloppy values and minimum change.
Be aware that this is only a rough heuristic, for optimized results the parameters have to be adapted to your data.
Range: - lag
Parameter for the z-score peak detection algorithm: The size of the window of previous data points that are considered for the peak detection. As a result the points in the first window can't be scored. The less the data changes over time, the larger the lag can be. For more volatile time series, a smaller lag is better suited.
Range: - threshold
Parameter for the z-score peak detection algorithm: Value of the Z-Score above which a point is flagged as a peak. The threshold represents the number of standard deviations above a point is flagged as a peak.
Range: - influence
Parameter for the z-score peak detection algorithm: The (relative) influence previous peaks have on the calculation of the Z-Score. If set to zero, they are completely ignored. An influence of 0 is therefore the most robust option (but assumes stationarity). If it's expected, that after a peak the data return to a normal value, an influence close to zero is appropriate.
Range: - robust_measures
Parameter for the z-score peak detection algorithm: If selected, the more robust median and interquartile range (IQR) are used to calculate the Z-Score of a point. Otherwise the mean and standard deviation are used.
Range: - ignore_invalid_values
If this parameter is set to true invalid values (missing, positive and negative infinity) are ignored in the calculation of the features.
Range: - sloppy_values
Parameter for the highest peaks peak detection algorithm: Allowed number of sloppy values (values for which the relative change is smaller than the minimum change) until end of current peak is reached. Number of sloppy values should be increased for noisy data.
Range: - minimum_change
Parameter for the highest peaks peak detection algorithm: Threshold on the relative change between last and next value to count the next value as a peak value.
The relative change is calculated as the decrease / increase (maximum / minimum) between next value and last value divided by the distance of last value to the average:
relative change = (lastValue - nextValue) / (lastValue - average)
If the relative change is larger than the minimum change the next value is counted as a peak value. As the minimum change is the threshold per slice, the parameter defines how sharp a peak has to be, to be detected. If you expect wide peaks in your data, decrease the minimum change.
Range: - add_index_features
If selected, additional features containing information about the index of the left, center and right side of the peaks are added.
Range: - add_time_series_name
If this parameter is set to true the name of the time series attribute is added as a prefix to the name of the feature attributes. The resulting ExampleSet will have one example and n attributes, with n = <number of time series> x <number of features>. If this parameter is set to false, an additional attribute named time series is added with the name of the time series. The resulting ExampleSet will have n examples and m+1 attributes, with n = <number of time series> and m = <number of features>. The role of the time series attribute is set to id.
Range:
Tutorial Processes
Peak Features with Z-Score Peak Detection
This tutorial process demonstrate the basic usage of the Extract Peaks operator. The example is directly taken from the original presentation of the Z-Score Peak Detection algorithm.
Extracting peak features on artificial time series
This tutorial process demonstrate the usage of the Extract Peaks operator to extract features of maxima and minima in a time series.
An artificial time series data set is created. Several types of time series signals are combined (two oscillations, three normal distributed peaks, a trend and noise). In addition a 'Date' attribute with daily data is added.
The Extract Peaks operator is used to detect the 4 highest peaks (minima and maxima) (using the highest peaks peak detection method) in the time series. Besides the standard peak features (value, width, extremum_position, high_low_amplitude and type) also index features are extracted (left_position, center_position, right_position). As the 'Date' attribute is selected as the indices attribute, also the index_value features (extremum_index_value, left_index_value, center_index_value, right_index_value) are extracted. They contains the corresponding dates from the indices attribute. In addition the width feature is calculated in milliseconds.