Categories

Versions

(AI Studio Core)

Synopsis

This operator selects the most relevant attributes of the given ExampleSet. Two deterministic greedy feature selection algorithms 'forward selection' and 'backward elimination' are used for feature selection.

Description

Feature selection i.e. the question for the most relevant features for classification or regression problems, is one of the main data mining tasks. A wide range of search methods have been integrated into Altair RapidMiner including evolutionary algorithms. For all search methods we need a performance measurement which indicates how well a search point (a feature subset) will probably perform on the given data set.

A deterministic algorithm is an algorithm which, in informal terms, behaves predictably. Given a particular input, it will always produce the same output, and the underlying machine will always pass through the same sequence of states.

A greedy algorithm is an algorithm that follows the problem solving heuristic of making the locally optimal choice at each stage with the hope of finding a global optimum. On some problems, a greedy strategy may not produce an optimal solution, but nonetheless a greedy heuristic may yield locally optimal solutions that approximate a global optimal solution.

This operator realizes the two deterministic greedy feature selection algorithms 'forward selection' and 'backward elimination'. However, we have added some enhancements to the standard algorithms which are described below:

Forward Selection

Create an initial population with n individuals where n is the number of attributes in the input ExampleSet. Each individual will use exactly one of the features. Evaluate the attribute sets and select only the best k. For each of the k attribute sets do: If there are j unused attributes, make j copies of the attribute set and add exactly one of the previously unused attributes to the attribute set. As long as the performance improved in the last p iterations go to step 2

Backward Elimination

Start with an attribute set which uses all features. Evaluate all attribute sets and select the best k. For each of the k attribute sets do: If there are j attributes used, make j copies of the attribute set and remove exactly one of the previously used attributes from the attribute set. As long as the performance improved in the last p iterations go to step 2

The parameter k can be specified by the keep best parameter, the parameter p can be specified by the generations without improval parameter. These parameters have default values 1 which means that the standard selection algorithms are used. Using other values increases the runtime but might help to avoid local extrema in the search for the global optimum.

Another unusual parameter is the maximum number of generations parameter. This parameter bounds the number of iterations to this maximum of feature selections / de-selections. In combination with the generations without improval parameter, this allows several different selection schemes (which are described for forward selection, backward elimination works analogous):

maximum number of generations = m and generations without improval = p:

Selects maximal m features. The selection stops if no performance improvement was measured in the last p generations.

maximum number of generations = -1 and generations without improval = p:

Tries to selects new features until no performance improvement was measured in the last p generations.

maximum number of generations = m and generations without improval = -1:

Selects maximal m features. The selection stops is not stopped until all combinations with maximal m were tried. However, the result might contain fewer features than these.

maximum number of generations = -1 and generations without improval = -1:

Test all combinations of attributes (brute force, this might take a very long time and should only be applied to small attribute sets).

Differentiation

Optimize Selection (Evolutionary)

This is also an attribute set reduction operator but it uses a genetic algorithm for this purpose.

Input

  • example set in (Data table)

    This input port expects an ExampleSet. This ExampleSet is available at the first port of the nested chain (inside the subprocess) for processing in the subprocess.

  • through (IOObject)

    This operator can have multiple through ports. When one input is connected with the through port, another through port becomes available which is ready to accept another input (if any). The order of inputs remains the same. The Object supplied at the first through port of this operator is available at the first through port of the nested chain (inside the subprocess). Do not forget to connect all inputs in correct order. Make sure that you have connected the right number of ports at the subprocess level.

Output

  • example set out (Data table)

    The feature selection algorithm is applied on the input ExampleSet. The resultant ExampleSet with reduced attributes is delivered through this port.

  • weights (Attribute Weights)

    The attribute weights are delivered through this port.

  • performance (Performance Vector)

    This port delivers the Performance Vector for the selected attributes. A Performance Vector is a list of performance criteria values.

Parameters

  • selection directionThis parameter specifies which of the 'forward selection' and 'backward elimination' algorithms should be used.
  • limit generations without improvalThis parameter indicates if the optimization should be aborted if this number of generations showed no improvement. If unchecked, always the maximal number of generations will be used.
  • generations without improvalThis parameter is only available when the limit generations without improval parameter is set to true. This parameter specifies the stop criterion for early stopping i.e. it stops after n generations without improvement in the performance. n is specified by this parameter.
  • limit number of generationsThis parameter indicates if the number of generations should be limited to a specific number.
  • keep bestThe best n individuals are kept in each generation where n is the value of this parameter.
  • maximum number of generationsThis parameter is only available when the limit number of generations parameter is set to true. This parameter specifies the number of generations after which the algorithm should be terminated.
  • normalize weightsThis parameter indicates if the final weights should be normalized. If set to true, the final weights are normalized such that the maximum weight is 1 and the minimum weight is 0.
  • use local random seedThis parameter indicates if a local random seed should be used for randomization. Using the same value of local random seed will produce the same randomization.
  • local random seedThis parameter specifies the local random seed and is only available if the use local random seed parameter is set to true.
  • user result individual selectionIf this parameter is set to true, it allows the user to select the final result individual from the last population.
  • show population plotterThis parameter determines if the current population should be displayed in the performance space.
  • plot generationsThis parameter is only available when the show population plotter parameter is set to true. The population plotter is updated in these generations.
  • constraint draw rangeThis parameter is only available when the show population plotter parameter is set to true. This parameter determines if the draw range of the population plotter should be constrained between 0 and 1.
  • draw dominated pointsThis parameter is only available when the show population plotter parameter is set to true. This parameter determines if only points which are not Pareto dominated should be drawn on the population plotter.
  • population criteria data fileThis parameter specifies the path to the file in which the criteria data of the final population should be saved.
  • maximal fitnessThis parameter specifies the maximal fitness. The optimization will stop if the fitness reaches this value.

Tutorial Processes

Feature reduction of the Polynomial data set

The 'Polynomial' data set is loaded using the Retrieve operator. A breakpoint is inserted here so that you can have a look at the ExampleSet. You can see that the ExampleSet has 5 regular attributes other then the label attribute. The Optimize Selection operator is applied on the ExampleSet which is a nested operator i.e. it has a subprocess. It is necessary for the subprocess to deliver a performance vector. This performance vector is used by the underlying feature reduction algorithm. Have a look at the subprocess of this operator. The Split Validation operator has been used there which itself is a nested operator. Have a look at the subprocesses of the Split Validation operator. The Support Vector Machine operator is used in the 'Training' subprocess to train a model. The trained model is applied using the Apply Model operator in the 'Testing' subprocess. The performance is measured through the Performance operator and the resultant performance vector is used by the underlying algorithm. Run the process and switch to the Results Workspace. You can see that the ExampleSet that had 5 attributes has now been reduced to 2 attributes.