Categories

Versions

(AI Studio Core)

Synopsis

This operator represents an implementation of k-Means according to C. Elkan. This operator will create a cluster attribute if not present yet.

Description

In contrast to the standard implementation of k-means, this implementation is much faster in many cases, especially for data sets with many attributes and a high k value, but it also needs more additional memory. For more information, please see paper: - Using the Triangle Inequality to Accelerate k-Means - Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC, 2003

Input

  • example set (IOObject)

    This is an example set input port

Output

  • cluster model (Centroid Cluster Model)

  • clustered set (Data Table)

Parameters

  • add cluster attributeIf enabled, a cluster id is generated as new special attribute directly in this operator, otherwise this operator does not add an id attribute. In the latter case you have to use the Apply Model operator to generate the cluster attribute.
  • add as labelIf true, the cluster id is stored in an attribute with the special role 'label' instead of 'cluster'.
  • remove unlabeledDelete the unlabeled examples.
  • kThe number of clusters which should be detected.
  • determine good start valuesDetermine the first k centroids using the K-Means++ heuristic described in "k-means++: The Advantages of Careful Seeding" by David Arthur and Sergei Vassilvitskii 2007
  • measure typesThe measure type
  • mixed measureSelect measure
  • nominal measureSelect measure
  • numerical measureSelect measure
  • divergenceSelect divergence
  • kernel typeThe kernel type
  • kernel gammaThe kernel parameter gamma.
  • kernel sigma1The kernel parameter sigma1.
  • kernel sigma2The kernel parameter sigma2.
  • kernel sigma3The kernel parameter sigma3.
  • kernel degreeThe kernel parameter degree.
  • kernel shiftThe kernel parameter shift.
  • kernel aThe kernel parameter a.
  • kernel bThe kernel parameter b.
  • max runsThe maximal number of runs of k-Means with random initialization that are performed.
  • max optimization stepsThe maximal number of iterations performed for one run of k-Means.
  • use local random seedIndicates if a local random seed should be used.
  • local random seedSpecifies the local random seed