Extract Performance (RapidMiner Studio Core)
Synopsis
This operator can be used for deriving a performance measure (in form of a performance vector) from the given ExampleSet.Description
This operator can be used for generating a performance vector from the properties of the given ExampleSet. This includes properties like the number of examples or number of attributes of the input ExampleSet. Specific data value of the input ExampleSet can also be used as the value of the performance vector. Various statistical properties of the input ExampleSet e.g. average, min or max value of an attribute can also be used as the value of the performance vector. All these options can be understood by studying the parameters and the attached Example Process.
Input
- example set (Data Table)
This input port expects an ExampleSet. The performance vector value will be extracted from this ExampleSet.
Output
- performance (Performance Vector)
This port delivers a performance vector. A performance vector is a list of performance criteria values.
- example set (Data Table)
The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators.
Parameters
- performance_typeThis parameter indicates the way the input ExampleSet should be used to define the performance vector.
- number_of_examples: If this option is selected, the performance vector value is set to the total number of examples in the input ExampleSet.
- number_of_attributes: If this option is selected, the performance vector value is set to the total number of attributes in the input ExampleSet.
- data_value: If this option is selected, the performance vector value is set to the value of the specified attribute at the specified index. The attribute is specified using the attribute name parameter and the index is specified using the example index parameter.
- statistics: If this option is selected, the performance vector value is set to the value obtained by applying the selected statistical operation on the specified attribute. The attribute is specified using the attribute name parameter and the statistical operation is selected using the statistics parameter.
- statisticsThis parameter is only available when the performance type parameter is set to 'statistics'. This parameter allows you to select the statistical operation to be applied on the attribute specified by the attribute name parameter. Range: selection
- attribute_nameThis parameter is only available when the performance type parameter is set to 'statistics' or 'data value'. This parameter allows you to select the required attribute. Range: string
- attribute_valueThis parameter is only available when the performance type parameter is set to 'statistics' and the statistics parameter is set to 'count'. This parameter is used for specifying a particular value of the specified attribute. The performance vector value will be set to the number of occurrences of this value in the specified attribute. The attribute is specified by the attribute name parameter. Range: string
- example_indexThis parameter is only available when the performance type parameter is set to 'data value'. This parameter allows you to select the index of the required example of the attribute specified by the attribute name parameter. Range: integer
- optimization_directionThis parameter indicates if the performance value should be minimized or maximized. Range: selection
Tutorial Processes
Introduction to the Extract Performance operator
This is a very basic process that demonstrates the use of the Extract Performance operator. The 'Golf' data set is loaded using the Retrieve operator. The Extract Performance operator is applied on it. The performance type parameter is set to 'statistics', the statistics parameter is set to 'average' and the attribute name parameter is set to 'Temperature'. Thus the value of the resultant performance vector will be the average of values of the Temperature attribute. The average of the Temperature attribute in all 14 examples of the 'Golf' data set is 73.571. The resultant performance vector and the 'Golf' data set can be seen in the Results Workspace. You can see that the value of the performance vector is 73.571.