Sample (Kennard-Stone) (RapidMiner Studio Core)

Synopsis

This operator creates a sample from the given ExampleSet by using the Kennard-Stone algorithm. The size of the sample can be specified on absolute and relative basis.

Description

The Sample (Kennard-Stone) operator performs a Kennard-Stone Sampling. This sampling algorithm works as follows:

  • Find the two most separated points in the ExampleSet.
  • For each candidate point, find the smallest distance to any already selected object.
  • Select the point which has the largest of these smallest distances.
This algorithm always gives the same result because the two starting points are always the same. This implementation reduces the number of iterations by holding a list with candidates of the largest smallest distances. Please note that the number of examples in the sample may not be exactly the same as specified because of the way this algorithm works.

The sampling operators are similar to the Filter Examples operator in principle that they take an ExampleSet as input and delivers a subset of the ExampleSet as output. The difference is this that the Filter Examples operator filters examples on the basis of specified conditions. But the Sample operators focus on the number of examples and class distribution in the resultant sample. Moreover, the samples are generated randomly. The number of examples in the sample can be specified on absolute and relative basis depending on the setting of the sample parameter.

Input

  • example set input (IOObject)

    This input port expects an ExampleSet. It is output of the Retrieve operator in the attached Example Process.

Output

  • example set output (IOObject)

    The Kennard-Stone algorithm is applied and the resultant sample of the input ExampleSet is output of this port.

  • original (IOObject)

    The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

Parameters

  • sampleThis parameter determines how the amount of data is specified.
    • absolute: If the sample parameter is set to 'absolute' then the sample is created of an exactly specified number of examples. The required number of examples is specified in the sample size parameter.
    • relative: If the sample parameter is set to 'relative' then the sample is created as a fraction of the total number of examples in the input ExampleSet. The required ratio of examples is specified in the sample ratio parameter.
    Range: selection
  • sample_sizeThis parameter specifies the exact number of examples which should be sampled. This parameter is only available when the sample parameter is set to 'absolute'. Range: integer
  • sample_ratioThis parameter specifies the fraction of examples which should be sampled. This parameter is only available when the sample parameter is set to 'relative'. Range: real

Tutorial Processes

Kennard-Stone sampling of the Iris data set

The 'Iris' data set is loaded using the Retrieve operator. A breakpoint is inserted here so that you can view the ExampleSet. You can see that the ExampleSet has 150 examples. The Sample (Kennard-Stone) operator is applied on the ExampleSet. The sample parameter is set to 'absolute' and the sample size parameter is set to 15. Thus the resultant sample will have only 15 examples. The resultant ExampleSet with 15 examples can be seen in the Results Workspace.