Categories

Versions

You are viewing the RapidMiner Studio documentation for version 10.0 - Check here for latest version

Self-Organizing Map (RapidMiner Studio Core)

Synopsis

This operator performs a dimensionality reduction of the given ExampleSet based on a self-organizing map (SOM). The user can specify the required number of dimensions.

Description

A self-organizing map (SOM) or self-organizing feature map (SOFM) is a type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional), discretized representation of the input space of the training samples, called a map. Self-organizing maps are different from other artificial neural networks in the sense that they use a neighborhood function to preserve the topological properties of the input space. This makes SOMs useful for visualizing low-dimensional views of high-dimensional data, akin to multidimensional scaling. The model was first described as an artificial neural network by Teuvo Kohonen, and is sometimes called a Kohonen map.

Like most artificial neural networks, SOMs operate in two modes: training and mapping. Training builds the map using input examples. Mapping automatically classifies a new input vector. A self-organizing map consists of components called nodes or neurons. Associated with each node is a weight vector of the same dimension as the input data vectors and a position in the map space. The usual arrangement of nodes is a regular spacing in a hexagonal or rectangular grid. The self-organizing map describes a mapping from a higher dimensional input space to a lower dimensional map space. The procedure for placing a vector from data space onto the map is to first find the node with the closest weight vector to the vector taken from data space. Once the closest node is located it is assigned the values from the vector taken from the data space.

While it is typical to consider this type of network structure as related to feed-forward networks where the nodes are visualized as being attached, this type of architecture is fundamentally different in arrangement and motivation.

Input

  • example set input (Data Table)

    This input port expects an ExampleSet. It is the output of the Retrieve operator in the attached Example Process. The output of other operators can also be used as input. It is essential that meta data should be attached with the data for the input because attributes are specified in their meta data. The Retrieve operator provides meta data along with the data. Please note that this operator cannot handle nominal attributes; it works on numerical attributes.

Output

  • example set output (Data Table)

    The dimensionality reduction of the given ExampleSet is performed based on a self-organizing map and the resultant ExampleSet is delivered through this port.

  • original (Data Table)

    The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

  • preprocessing model (Preprocessing Model)

    This port delivers the preprocessing model, which has information regarding the parameters of this operator in the current process.

Parameters

  • return_preprocessing_modelThis parameter indicates if the preprocessing model should be returned. Range: boolean
  • number_of_dimensionsThis parameter specifies the number of dimensions to keep i.e. the number of attributes of the resultant ExampleSet. Range: integer
  • net_sizeThis parameter specifies the size of the SOM net, by setting the length of every edge of the net. In total, there will be net size to the power of number of dimensions nodes in the net. Range: integer
  • training_roundsThis parameter specifies the number of training rounds. Range: integer
  • learning_rate_startThis parameter specifies the strength of an adaption in the first round. The strength will decrease every round until it reaches the learning rate end in the last round. Range: real
  • learning_rate_endThis parameter specifies the strength of an adaption in the last round. The strength will decrease to this value in last round, beginning with learning rate start in the first round. Range: real
  • adaption_radius_startThis parameter specifies the radius of the sphere around a stimulus in the first round. This radius decreases every round, starting by adaption radius start in the first round, to adaption radius end in the last round. Range: real
  • adaption_radius_endThis parameter specifies the radius of the sphere around a stimulus in the last round. This radius decreases every round, starting by adaption radius start in the first round, to adaption radius end in the last round. Range: real

Tutorial Processes

Dimensionality reduction of the Sonar data set using the Self-Organizing Map operator

The 'Sonar' data set is loaded using the Retrieve operator. A breakpoint is inserted here so that you can have a look at the ExampleSet. You can see that the ExampleSet has 60 attributes. The Self-Organizing Map operator is applied on the 'Sonar' data set. The number of dimensions parameter is set to 2. Thus the resultant ExampleSet will be composed of 2 dimensions (artificial attributes). You can see the resultant ExampleSet in the Results Workspace and verify that it has only 2 attributes. Please note that these attributes are not original attributes of the 'Sonar' data set. These attributes were created using the SOM procedure.