Weight by PCA (RapidMiner Studio Core)
Synopsis
This operator creates attribute weights of the ExampleSet by using a component created by the PCA. This operator behaves exactly the same way as if a PCA model is given to the Weight by Component Model operator.Description
The Weight by PCA operator generates attribute weights of the given ExampleSet using a component created by the PCA. The component is specified by the component number parameter. If the normalize weights parameter is not set to true, exact values of the selected component are used as attribute weights. The normalize weights parameter is usually set to true to spread the weights between 0 and 1. The attribute weights reflect the relevance of the attributes with respect to the class attribute. The higher the weight of an attribute, the more relevant it is considered.
Principal Component Analysis (PCA) is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated attributes into a set of values of uncorrelated attributes called principal components. The number of principal components is less than or equal to the number of original attributes. This transformation is defined in such a way that the first principal component's variance is as high as possible (accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it should be orthogonal to (uncorrelated with) the preceding components.
Input
- example set (Data Table)
This input port expects an ExampleSet. It is the output of the Retrieve operator in the attached Example Process.
Output
- weights (Attribute Weights)
This port delivers the weights of the attributes with respect to the label attribute. The attributes with higher weight are considered more relevant.
- example set (Data Table)
The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.
Parameters
- normalize_weightsThis parameter indicates if the calculated weights should be normalized or not. If set to true, all weights are normalized in the range from 0 to 1. Range: boolean
- sort_weightsThis parameter indicates if the attributes should be sorted according to their weights in the results. If this parameter is set to true, the order of the sorting is specified using the sort direction parameter. Range: boolean
- sort_directionThis parameter is only available when the sort weights parameter is set to true. This parameter specifies the sorting order of the attributes according to their weights. Range: selection
- component_numberThis parameter specifies the number of the component that should be used as attribute weights. Range: integer
Tutorial Processes
Calculating the attribute weights of the Sonar data set by PCA
The 'Sonar' data set is loaded using the Retrieve operator. The PCA operator is applied on it. The dimensionality reduction parameter is set to 'none'. A breakpoint is inserted here so that you can have a look at the components created by the PCA operator. Have a look at the EigenVectors generated by the PCA operator especially 'PC1' because it will be used as weights by using the Weight by Component Model operator. The Weight by Component Model operator is applied next. The ExampleSet and Model ports of the PCA operator are connected to the corresponding ports of the Weight by Component Model operator. The normalize weights and sort weights parameters are set to false, thus all the weights will be exactly the same as the selected component. The component number parameter is set to 1, thus 'PC1' will be used as attribute weights. The weights can be seen in the Results Workspace. You can see that these weights are exactly the same as the values of 'PC1'.
In the second operator chain the Weight by PCA operator is applied on the 'Sonar' data set to perform exactly the same task. The parameters of the Weight by PCA operator are set exactly the same as the parameters of the Weight by Component Model operator. As it can be seen in the Results Workspace, exactly same weights are generated here.