Categories

Versions

You are viewing the RapidMiner Studio documentation for version 9.2 - Check here for latest version

Optimize Parameters (Grid) (Concurrency)

Synopsis

This Operator finds the optimal values of the selected parameters for the Operators in its subprocess.

Description

The Optimize Parameters (Grid) Operator is a nested Operator. It executes the subprocess for all combinations of selected values of the parameters and then delivers the optimal parameter values through the parameter set port. The performance vector for optimal values of parameters is delivered through the performance port and the associated model (if any) through the model port. Any additional results of the best run are delivered through the output ports. Which parameters are optimal is based on the performance value delivered to the inner performance port.

The entire configuration of this Operator is done through the edit parameter settings parameter. Complete description of this parameter can be found in the parameters section.

This Operator returns an optimal parameter set which can also be written to a file with the Write Parameters Operator. This parameter set can be read in another process using the Read Parameters Operator and then be applied using the Set Parameters Operator.

The inner performance port can be used to log the performance of the inner subprocess. A log is created automatically to capture the number of the run, the parameter settings and the main criterion or all criteria of the delivered performance vector, depending on the parameter log all criteria. This can be disabled by deselecting log performance. The inner performance port is also used to determine the best model by comparing the fitness of the performance of the different iterations.

Please note that this Operator has two modes: synchronized and non-synchronized. They depend on the setting of the synchronize parameter. In the latter, all parameter combinations are generated and the subprocess is executed for each combination. In the synchronized mode, no combinations are created but the parameter values are treated as a list of combinations. For the iteration over a single parameter there is no difference between both modes. Please note that the number of parameter possibilities must be the same for all parameters in the synchronized mode. As an Example, having two boolean parameters A and B (both with true/false as possible parameter settings) will produce four combinations in non-synchronized mode (t/t, f/t, t/f, f/f) and two combinations in synchronized mode (t/t, f/f).

If the synchronize parameter is not set to true, selecting a large number of parameters and/or large number of steps (or possible values of parameters) results in a huge number of combinations. For example, if you select 3 parameters and 25 steps for each parameter then the total number of combinations would be above 17576 (i.e. 26 x 26 x 26). The subprocess is executed for all possible combinations. Running a subprocess for such a huge number of iterations will take a lot of time. So always carefully limit the parameters and their steps.

Differentiation

Other parameter optimization schemes are also available. The Optimize Parameters (Evolutionary) Operator might be useful if the best ranges and dependencies are not known at all. Another Operator which works similar to this parameter optimization Operator is the Loop Parameters Operator. In contrast to the optimization Operators, this Operator simply iterates through all parameter combinations. This might be especially useful for plotting and logging purposes.

Optimize Parameters (Evolutionary)

The Optimize Parameters (Evolutionary) Operator finds the optimal values for a set of parameters using an evolutionary approach which is often more appropriate than a grid search (as in the Optimize Parameters (Grid) Operator) or a greedy search (as in the Optimize Parameters (Quadratic) Operator) and leads to better results. The Optimize Parameters (Evolutionary) Operator might be useful if the best ranges and dependencies are not known at all.

Optimize Parameters (Quadratic)

The Optimize Parameters (Quadratic) Operator finds the optimal values using a quadratic interaction model. First it runs the same iterations as this operator. From the collected parameter set/performance pairs it tries to calculate a new parameter set that might lie in between the given grid lines. The result will either be the best performance from the original runs or the from the newly calculated parameter set.

Tutorial Processes

Finding optimal values of parameters of the SVM Operator

The 'Weighting' data set is loaded using the Retrieve Operator. The Optimize Parameters (Grid) Operator is applied on it. Have a look at the Edit Parameter Settings parameter of the Optimize Parameters (Grid) Operator. You can see in the Selected Parameters window that the C and gamma parameters of the SVM Operator are selected. Click on the SVM.C parameter in the Selected Parameters window, you will see that the range of the C parameter is set from 0.001 to 100000. 11 values are selected (in 10 steps) logarithmically. Now, click on the SVM.gamma parameter in the Selected Parameters window, you will see that the range of the gamma parameter is set from 0.001 to 1.5. 11 values are selected (in 10 steps) logarithmically. There are 11 possible values of 2 parameters, thus there are 121 ( i.e. 11 x 11) combinations. The subprocess will be executed for all combinations of these values, thus it will iterate 121 times. In every iteration, the values of the C and/or gamma parameters of the SVM(LibSVM) Operator are changed. The value of the C parameter is 0.001 in the first iteration. The value is increased logarithmically until it reaches 100000 in the last iteration. Similarly, the value of the gamma parameter is 0.001 in the first iteration. The value is increased logarithmically until it reaches 1.5 in the last iteration.

Have a look at the subprocess of the Optimize Parameters (Grid) Operator. First the data is split into two equal partitions using the Split Data Operator. The SVM (LibSVM) Operator is applied on one partition. The resultant classification model is applied using a Apply Model Operator on the second partition. The statistical performance of the SVM model on the testing partition is measured using the Performance (Classification) Operators. The nested Operator also logs the performance and parameters for each iteration.

Run the process and turn to the Results View. You can see that the optimal parameter set has the following values: SVM.C = 398.107 and SVM.gamma = 0.001. Now have a look at the values logged by the Optimize Parameter (Grid) Operator to verify these values. You can see that the minimum Testing Error is 0.02 (in 8th iteration). The values of the C and gamma parameters for this iteration are the same as given in the optimal parameter set.

Parameters

  • edit_parameter_settings The parameters are selected through the edit parameter settings menu. You can select the parameters and their possible values through this menu. This menu has an Operators window which lists all the operators in the subprocess of this Operator. When you click on any Operator in the Operators window, all parameters of that Operator are listed in the Parameters window. You can select any parameter through the arrow keys of the menu. The selected parameters are listed in the Selected Parameters window. Only those parameters should be selected for which you want to iterate the subprocess. This Operator iterates through parameter values in the specified range. The range of every selected parameter should be specified. When you click on any selected parameter (parameter in Selected Parameters window), the Grid/Range and Value List option is enabled. These options allow you to specify the range of values of the selected parameters. The Min and Max fields are for specifying the lower and upper bounds of the range respectively. As all values within this range cannot be checked, the steps field allows you to specify the number of values to be checked from the specified range. Finally the scale option allows you to select the pattern of these values. You can also specify the values in form of a list. Range: menu
  • error_handling This parameter allows you to select the method for handling errors occurring during the execution of the inner process. It has the following options:
    • fail_on_error: In case an error occurs, the execution of the process will fail with an error message.
    • ignore_error: In case an error occurs, the error will be ignored and the execution of the process will continue with the next iteration.
    Range: selection
  • log_performance This parameter will only be visible if the inner performance port is connected. If it is connected, the main criterion of the performance vector will be automatically logged with the parameter set if this parameter is set to true. Range: boolean
  • log_all_criteria This parameter allows for more logging. If set to true, all performance criteria will be logged. Range: boolean
  • synchronize This Operator has two modes: synchronized and non-synchronized. They depend on the setting of this parameter. If it is set to false, all parameter combinations are generated and the inner Operators are applied for each combination. If it is set to true, no combinations are created but the parameter values are treated as a list of combinations. For the iteration over a single parameter there is no difference between both modes. Please note that the number of parameter possibilities must be the same for all parameters in the synchronized mode. Range: boolean
  • enable_parallel_execution This parameter enables the parallel execution of the subprocess. Please disable the parallel execution if you run into memory problems. Range: boolean

Input

  • input (IOObject)

    This Operator can have multiple inputs. When one input is connected, another input port becomes available which is ready to accept another input (if any). The order of inputs remains the same. The Object supplied at the first input port of this Operator is available at the first input port of the nested chain (inside the subprocess). Do not forget to connect all inputs in correct order. Make sure that you have connected the right number of ports at the subprocess level.

Output

  • performance (Performance Vector)

    This port delivers the Performance Vector for the optimal values of the selected parameters. A Performance Vector is a list of performance criteria values.

  • model (Model)

    This port delivers the Model for the optimal values of the selected parameters.

  • parameters (Parameter Set)

    This port delivers the optimal values of the selected parameters. This optimal parameter set can also be written to a file with the Write Parameters operator. The written parameter set can be read in another process using the Read Parameters operator.

  • output (IOObject)

    Any results of the subprocess are delivered through the output ports. This Operator can have multiple outputs. When one output port is connected, another output port becomes available which is ready to deliver another output (if any). The order of outputs remains the same. The Object delivered at the first output port of the subprocess is delivered at the first outputport of the Operator. Don't forget to connect all outputs in correct order. Make sure that you have connected the right number of ports.