Wrapper Split Validation (RapidMiner Studio Core)

Synopsis

A simple validation method to check the performance of a feature weighting or selection wrapper.

Description

This operator evaluates the performance of feature weighting algorithms including feature selection. The first inner operator is the weighting algorithm to be evaluated itself. It must return an attribute weights vector which is applied on the data. Then a new model is created using the second inner operator and a performance is retrieved using the third inner operator. This performance vector serves as a performance indicator for the actual algorithm. This implementation is described for the RandomSplitValidationChain.

Input

  • example set in (Data Table)

    This input port expects an ExampleSet. Subsets of this ExampleSet will be used as training and testing data sets.

Output

  • performance vector out (Performance Vector)

    The Model Evaluation subprocess must return a Performance Vector in each iteration. This is usually generated by applying the model and measuring its performance. Please note that the statistical performance calculated by this estimation scheme is only an estimate (instead of an exact calculation) of the performance which would be achieved with the model built on the complete delivered data set.

  • attribute weights out (Attribute Weights)

    The Attribute Weighting subprocess must return an attribute weights vector in each iteration. Please note that the attribute weights vector built on the complete input ExampleSet is delivered from this port.

Parameters

  • split_ratioRelative size of the training set. Range:
  • sampling_typeThe Wrapper Split Validation operator can use several types of sampling for building the subsets. Following options are available:
    • linear_sampling: The linear sampling simply divides the ExampleSet into partitions without changing the order of the examples i.e. subsets with consecutive examples are created.
    • shuffled_sampling: The shuffled sampling builds random subsets of the ExampleSet. Examples are chosen randomly for making subsets.
    • stratified_sampling: The stratified sampling builds random subsets and ensures that the class distribution in the subsets is the same as in the whole ExampleSet. For example, in the case of a binominal classification, stratified sampling builds random subsets such that each subset contains roughly the same proportions of the two values of class labels.
    • automatic: The automated mode uses stratified sampling per default. If it isn't applicable, e.g., if the ExampleSet doesn't contain a nominal label, shuffled sampling will be used instead.
    Range:
  • use_local_random_seedThis parameter indicates if a local random seed should be used for randomizing examples of a subset. Using the same value of the local random seed will produce the same subsets. Changing the value of this parameter changes the way examples are randomized, thus subsets will have a different set of examples. This parameter is available only if shuffled, stratified or automatic sampling is selected. It is not available for linear sampling because it requires no randomization, examples are selected in sequence. Range:
  • local_random_seedThis parameter specifies the local random seed. This parameter is available only if the use local random seed parameter is set to true. Range: