Optimize Parameters (Evolutionary) (RapidMiner Studio Core)

Synopsis

This operator finds the optimal values of the selected parameters of the operators in its subprocess. It uses an evolutionary computation approach.

Description

This operator finds the optimal values for a set of parameters using an evolutionary approach which is often more appropriate than a grid search (as in the Optimize Parameters (Grid) operator) or a greedy search (as in the Optimize Parameters (Quadratic) operator) and leads to better results. This is a nested operator i.e. it has a subprocess. It executes its subprocess for a multiple number of times to find optimal values for the specified parameters.

This operator delivers the optimal parameter values through the parameter port which can also be written into a file with the Write Parameters operator. This parameter set can be read in another process using the Read Parameters operator. The performance vector for optimal values of parameters is delivered through the performance port. Any additional results of the subprocess are delivered through the result ports.

Other parameter optimization schemes are also available in RapidMiner. The Optimize Parameters (Evolutionary) operator might be useful if the best ranges and dependencies are not known at all. Another operator which works similar to this parameter optimization operator is the Loop Parameters operator. In contrast to the optimization operators, this operator simply iterates through all parameter combinations. This might be especially useful for plotting purposes.

Differentiation

Optimize Parameters (Grid)

The Optimize Parameters (Grid) operator executes its subprocess for all combinations of the selected values of the parameters and then delivers the optimal parameter values.

Input

  • input (IOObject)

    This operator can have multiple inputs. When one input is connected, another input port becomes available which is ready to accept another input (if any). The order of inputs remains the same. The Object supplied at the first input port of this operator is available at the first input port of the nested chain (inside the subprocess). Do not forget to connect all inputs in correct order. Make sure that you have connected the right number of ports at the subprocess level.

Output

  • performance (Performance Vector)

    This port delivers the Performance Vector for the optimal values of the selected parameters. A Performance Vector is a list of performance criteria values.

  • parameter (Parameter Set)

    This port delivers the optimal values of the selected parameters. This optimal parameter set can be written into a file with the Write Parameters operator. The written parameter set can be read in another process using the Read Parameters operator.

  • result (IOObject)

    Any additional results of the subprocess are delivered through the result ports. This operator can have multiple outputs. When one result port is connected, another result port becomes available which is ready to deliver another output (if any). The order of outputs remains the same. The Object delivered at the first result port of the subprocess is delivered at the first result port of the operator. Don't forget to connect all outputs in correct order. Make sure that you have connected the right number of ports.

Parameters

  • edit_parameter_settingsThe parameters are selected through the edit parameter settings menu. You can select the parameters and their possible values through this menu. This menu has an Operators window which lists all the operators in the subprocess of this operator. When you click on any operator in the Operators window, all parameters of that operator are listed in the Parameters window. You can select any parameter through the arrow keys of the menu. The selected parameters are listed in the Selected Parameters window. Only those parameters should be selected for which you want to find optimal values. This operator finds optimal values of the parameters in the specified range. The range of every selected parameter should be specified. When you click on any selected parameter (parameter in the Selected Parameters window) the Grid/Range option is enabled. This option allows you to specify the range of values of the selected parameters. The Min and Max fields are for specifying the lower and upper bounds of the range respectively. The steps and scale options are disabled for this operator. Note that only numerical parameters are displayed, since this operator does not support non numerical parameters. Range: menu
  • error_handlingThis parameter allows you to select the method for handling errors occurring during the execution of the inner process. It has the following options:
    • fail_on_error: In case an error occurs, the execution of the process will fail with an error message.
    • ignore_error: In case an error occurs, the error will be ignored and the execution of the process will continue with the next iteration.
    Range: selection
  • max_generationsThis parameter specifies the number of generations after which the algorithm should be terminated. Range: integer
  • use_early_stoppingThis parameter enables early stopping. If not set to true, always the maximum number of generations are performed. Range: boolean
  • generations_without_improvalThis parameter is only available when the use early stopping parameter is set to true. This parameter specifies the stop criterion for early stopping i.e. it stops after n generations without improvement in the performance. n is specified by this parameter. Range: integer
  • specify_population_sizeThis parameter specifies the size of the population. If it is not set to true, one individual per example of the given ExampleSet is used. Range: boolean
  • population_sizeThis parameter is only available when the specify population size parameter is set to true. This parameter specifies the population size i.e. the number of individuals per generation. Range: integer
  • keep_bestThis parameter specifies if the best individual should survive. This is also called elitist selection. Retaining the best individuals in a generation unchanged in the next generation, is called elitism or elitist selection. Range: boolean
  • mutation_typeThis parameter specifies the type of the mutation operator. Range: selection
  • selection_typeThis parameter specifies the selection scheme of this evolutionary algorithms. Range: selection
  • tournament_fractionThis parameter is only available when the selection type parameter is set to 'tournament'. It specifies the fraction of the current population which should be used as tournament members. Range: real
  • crossover_probThe probability for an individual to be selected for crossover is specified by this parameter. Range: real
  • use_local_random_seedThis parameter indicates if a local random seed should be used for randomization. Using the same value of local random seed will produce the same randomization. Range: boolean
  • local_random_seedThis parameter specifies the local random seed. This parameter is only available if the use local random seed parameter is set to true. Range: integer
  • show_convergence_plotThis parameter indicates if a dialog with a convergence plot should be drawn. Range: boolean

Tutorial Processes

Finding optimal values of parameters of the SVM operator through the Optimize Parameters (Evolutionary) operator

The 'Weighting' data set is loaded using the Retrieve operator. The Optimize Parameters (Evolutionary) operator is applied on it. Have a look at the Edit Parameter Settings parameter of the Optimize Parameters (Evolutionary) operator. You can see in the Selected Parameters window that the C and gamma parameters of the SVM operator are selected. Click on the SVM.C parameter in the Selected Parameters window, you will see that the range of the C parameter is set from 0.001 to 100000. Now, click on the SVM.gamma parameter in the Selected Parameters window, you will see that the range of the gamma parameter is set from 0.001 to 1.5. In every iteration of the subprocess, the value of the C and/or gamma parameters of the SVM(LibSVM) operator is changed in search of optimal values.

Have a look at the subprocess of the Optimize Parameters (Evolutionary) operator. First the data is split into two equal partitions using the Split Data operator. The SVM (LibSVM) operator is applied on one partition. The resultant classification model is applied using two Apply Model operators on both the partitions. The statistical performance of the SVM model on both testing and training partitions is measured using the Performance (Classification) operators. At the end the Log operator is used to store the required results.

The log parameter of the Log operator stores five things. The iterations of the Optimize Parameters (Evolutionary) operator are counted by the apply-count of the SVM operator. This is stored in a column named 'Count'. The value of the classification error parameter of the Performance (Classification) operator that was applied on the Training partition is stored in a column named 'Training Error'. The value of the classification error parameter of the Performance (Classification) operator that was applied on the Testing partition is stored in a column named 'Testing Error'. The value of the C parameter of the SVM (LibSVM) operator is stored in a column named 'SVM C'. The value of the gamma parameter of the SVM (LibSVM) operator is stored in a column named 'SVM gamma'. Also note that the stored information will be written into a file as specified in the filename parameter.

At the end of the process, the Write Parameters operator is used for writing the optimal parameter set in a file. This file can be read using the Read Parameters operator to use these parameter values in another process.

Run the process and turn to the Results Workspace. You can see that the optimal parameter set has the following values: SVM.C = 56462 and SVM.gamma = 0.115 approximately. Now have a look at the values saved by the Log operator to verify these values. Switch to Table View to see the stored values in tabular form. You can see that the minimum Testing Error is 0.064 (in 20th iteration). The values of the C and gamma parameters for this iteration are the same as given in the optimal parameter set.