Map Clustering on Labels (RapidMiner Studio Core)

Synopsis

This operator converts the cluster attribute into a prediction attribute.

Description

The Map Clustering on Labels operator expects a clustered ExampleSet and a cluster model as input. Using these inputs, it estimates a mapping between the given clustering and prediction. It adjusts the given clusters with the given labels and so estimates the best fitting pairs. The resultant ExampleSet has a prediction attribute which is derived from the cluster attribute.

Input

example set (Data Table)
This input port expects a clustered ExampleSet. It is the output of the K-Means operator in the attached Example Process.
cluster model (Centroid Cluster Model)
This input port expects a cluster model. It is the output of the K-Means operator in the attached Example Process.

Output

example set (Data Table)
The prediction attribute is derived from the cluster attribute and the resultant ExampleSet is delivered through this port.
cluster model (Centroid Cluster Model)
The cluster model that was given as input is passed without any modifications to the output through this port. This is usually used to reuse the same cluster model in further operators or to view the cluster model in the Results Workspace.

Tutorial Processes

Introduction to the Map Clustering on Labels operator

The 'Ripley-Set' data set is loaded using the Retrieve operator. Note that the label is loaded too, but it is only used for visualization and comparison and not for building the clusters. Besides the label attribute the 'Ripley-Set' has two real attributes; 'att1' and 'att2'. The K-Means operator is applied on this data set with default values for all parameters. Run the process and you will see that two new attributes are created by the K-Means operator. The id attribute is created to distinguish examples clearly. The cluster attribute is created to show which cluster the examples belong to. As parameter k was set to 2, only two clusters are possible. That is why each example is assigned to either 'cluster_0' or 'cluster_1'

This clustered ExampleSet and cluster model are provided as input to the Map Clustering on Labels operator. The resultant ExampleSet can be seen it the Results Workspace. You can see that the ExampleSet has a prediction attribute now. You can also observe that the values of this attribute have been derived from the cluster attribute.