Generalized Hebbian Algorithm (RapidMiner Studio Core)

Synopsis

This operator is an implementation of the Generalized Hebbian Algorithm (GHA) which is an iterative method for computing principal components. The user can specify manually the required number of principal components.

Description

The Generalized Hebbian Algorithm (GHA) is a linear feedforward neural network model for unsupervised learning with applications primarily in principal components analysis. From a computational point of view, it can be advantageous to solve the eigenvalue problem by iterative methods which do not need to compute the covariance matrix directly. This is useful when the ExampleSet contains many attributes (hundreds or even thousands).

Principal Component Analysis (PCA) is an attribute reduction procedure. It is useful when you have obtained data on a number of attributes (possibly a large number of attributes), and believe that there is some redundancy in those attributes. In this case, redundancy means that some of the attributes are correlated with one another, possibly because they are measuring the same construct. Because of this redundancy, you believe that it should be possible to reduce the observed attributes into a smaller number of principal components (artificial attributes) that will account for most of the variance in the observed attributes. Principal Component Analysis is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated attributes into a set of values of uncorrelated attributes called principal components. The number of principal components is less than or equal to the number of original attributes. This transformation is defined in such a way that the first principal component's variance is as high as possible (accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it should be orthogonal to (uncorrelated with) the preceding components.

Input

example set (Data Table)
This input port expects an ExampleSet. It is output of the Retrieve operator in the attached Example Process. The output of other operators can also be used as input. It is essential that meta data should be attached with the data for the input because attributes are specified in their meta data. The Retrieve operator provides meta data along with the data. Please note that this operator cannot handle nominal attributes; it works on numerical attributes.

Output

example set (Data Table)
The Generalized Hebbian Algorithm is performed on the input ExampleSet and the resultant ExampleSet is delivered through this port.
original (Data Table)
The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.
preprocessing model (GHA Model)
This port delivers the GHA model.

Parameters

number_of_componentsThe number of components to keep is specified by the number of components parameter. If set to -1 the number of principal components in the resultant ExampleSet is equal to the number of attributes in the original ExampleSet. Range: integer
number_of_iterationsThis parameter specifies the number of iterations to apply the update rule. Range: integer
learning_rateThis parameter specifies the learning rate of the GHA. Range: real
use_local_random_seedThis parameter indicates if a local random seed should be used for randomization. Range: boolean
local_random_seedThis parameter specifies the local random seed. It is available only if the use local random seed parameter is set to true. Range: integer

Tutorial Processes

Dimensionality reduction of the Polynomial data set using the GHA operator

The 'Polynomial' data set is loaded using the Retrieve operator. A breakpoint is inserted here so that you can have a look at the ExampleSet. You can see that the ExampleSet has 5 regular attributes. The Generalized Hebbian Algorithm operator is applied on the 'Polynomial' data set. The number of components parameter is set to 3. Thus the resultant ExampleSet will be composed of 3 principal components. All other parameters are used with default values. Run the process, you will see that the ExampleSet that had 5 attributes has been reduced to an ExampleSet with 3 principal components.

Categories

Versions