Independent Component Analysis (RapidMiner Studio Core)
Synopsis
This operator performs the Independent Component Analysis (ICA) of the given ExampleSet using the FastICA-algorithm of Hyvärinen and Oja.Description
Independent component analysis (ICA) is a very general-purpose statistical technique in which observed random data are linearly transformed into components that are maximally independent from each other, and simultaneously have "interesting" distributions. Such a representation seems to capture the essential structure of the data in many applications, including feature extraction. ICA is used for revealing hidden factors that underlie sets of random variables or measurements. ICA is superficially related to principal component analysis (PCA) and factor analysis. ICA is a much more powerful technique, however, capable of finding the underlying factors or sources when these classic methods fail completely. This operator implements the FastICA-algorithm of A. Hyvärinen and E. Oja. The FastICA-algorithm has most of the advantages of neural algorithms: It is parallel, distributed, computationally simple, and requires little memory space.
Input
- example set input (Data Table)
This input port expects an ExampleSet. It is the output of the Retrieve operator in the attached Example Process. The output of other operators can also be used as input. It is essential that meta data should be attached with the data for the input because attributes are specified in their meta data. The Retrieve operator provides meta data along with the data. Please note that this operator cannot handle nominal attributes; it works on numerical attributes.
Output
- example set output (Data Table)
The Independent Component Analysis is performed on the input ExampleSet and the resultant ExampleSet is delivered through this port.
- original (Data Table)
The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.
- preprocessing model (Preprocessing Model)
This port delivers the preprocessing model, which has information regarding the parameters of this operator in the current process.
Parameters
- dimensionality_reductionThis parameter indicates which type of dimensionality reduction (reduction in number of attributes) should be applied.
- none: if this option is selected, dimensionality reduction is not performed.
- fixed_number: if this option is selected, only a fixed number of components are kept. The number of components to keep is specified by the number of components parameter.
- number_of_componentsThis parameter is only available when the dimensionality reduction parameter is set to 'fixed number'. The number of components to keep is specified by the number of components parameter. Range: integer
- algorithm_typeThis parameter specifies the type of algorithm to be used.
- parallel: If parallel option is selected, the components are extracted simultaneously.
- deflation: If deflation option is selected, the components are extracted one at a time.
- functionThis parameter specifies the functional form of the G function to be used in the approximation to neg-entropy. Range: selection
- alphaThis parameter specifies the alpha constant in range [1, 2] which is used in approximation to neg-entropy. Range: real
- row_normThis parameter indicates whether rows of the data matrix should be standardized beforehand. Range: boolean
- max_iterationThis parameter specifies the maximum number of iterations to perform. Range: integer
- toleranceThis parameter specifies a positive scalar giving the tolerance at which the un-mixing matrix is considered to have converged. Range: real
- use_local_random_seedThis parameter indicates if a local random seed should be used for randomization. Using the same value of local random seed will produce the same randomization. Range: boolean
- local_random_seedThis parameter specifies the local random seed. This parameter is only available if the use local random seed parameter is set to true. Range: integer
Tutorial Processes
Dimensionality reduction of the Sonar data set using the Independent Component Analysis operator
The 'Sonar' data set is loaded using the Retrieve operator. A breakpoint is inserted here so that you can have a look at the ExampleSet. You can see that the ExampleSet has 60 attributes. The Independent Component Analysis operator is applied on the 'Sonar' data set. The dimensionality reduction parameter is set to 'fixed number' and the number_of_components parameter is set to 10. Thus the resultant ExampleSet will be composed of 10 components (artificial attributes). You can see the resultant ExampleSet in the Results Workspace and verify that it has only 10 attributes. Please note that these attributes are not original attributes of the 'Sonar' data set. These attributes were created using the ICA procedure.