Categories

Versions

X-Means (AI Studio Core)

Synopsis

Clustering using X-Means. This operator implements the algorithm publisehd by Dan Pelleg and Andrew Moore.

Description

X-Means is a clustering algorithm which determines the correct number of centroids based on a heuristic. It begins with a minimum set of centroids and then iteratively exploits if using more centroids makes sense according to the data. If a cluster is split into two sub-clusters is determined by the Bayesian Information Criteria (BIC), balancing the trade-off between precision and model complexity. Original publication: "X-means: Extending K-means with Efficient Estimation of the Number of Clusters" by Dan Pelleg and Andrew Moore, Proceedings of the Seventeenth International Conference on Machine Learning, 2000.

Input

  • example set (IOObject)

    This is an example set input port

Output

  • cluster model (Centroid Cluster Model)

  • clustered set (Data Table)

Parameters

  • add cluster attributeIf enabled, a cluster id is generated as new special attribute directly in this operator, otherwise this operator does not add an id attribute. In the latter case you have to use the Apply Model operator to generate the cluster attribute.
  • add as labelIf true, the cluster id is stored in an attribute with the special role 'label' instead of 'cluster'.
  • remove unlabeledDelete the unlabeled examples.
  • k minThe minimal number of clusters which should be detected.
  • k maxThe maximal number of clusters which should be detected.
  • determine good start valuesDetermine the first k centroids using the K-Means++ heuristic described in "k-means++: The Advantages of Careful Seeding" by David Arthur and Sergei Vassilvitskii 2007
  • measure typesThe measure type
  • mixed measureSelect measure
  • nominal measureSelect measure
  • numerical measureSelect measure
  • divergenceSelect divergence
  • kernel typeThe kernel type
  • kernel gammaThe kernel parameter gamma.
  • kernel sigma1The kernel parameter sigma1.
  • kernel sigma2The kernel parameter sigma2.
  • kernel sigma3The kernel parameter sigma3.
  • kernel degreeThe kernel parameter degree.
  • kernel shiftThe kernel parameter shift.
  • kernel aThe kernel parameter a.
  • kernel bThe kernel parameter b.
  • clustering algorithmClustering Algorithm
  • max runsThe maximal number of runs of k-Means with random initialization that are performed.
  • max optimization stepsThe maximal number of iterations performed for one run of k-Means.
  • use local random seedIndicates if a local random seed should be used.
  • local random seedSpecifies the local random seed