Categories

Versions

Branch (RapidMiner Studio Core)

Synopsis

This operator consists of two subprocesses but it executes only one subprocess at a time depending upon the condition. This operator is similar to the 'if-then-else' statement, where one of the two options is selected depending upon the results of the specified condition. It is important to have understanding of subprocesses in order to use this operator effectively.

Description

The Branch operator tests the condition specified in the parameters (mostly through the condition type and condition value parameters) on the object supplied at the condition input port. If the condition is satisfied, the first subprocess i.e. the 'Then' subprocess is executed otherwise the second subprocess i.e. the 'Else' subprocess is executed.

It is very important to have a good understanding of use of subprocesses in RapidMiner to understand this operator completely. A subprocess introduces a process within a process. Whenever a subprocess is reached during a process execution, first the entire subprocess is executed. Once the subprocess execution is complete, the flow is returned to the process (the parent process). A subprocess can be considered as a small unit of a process, like in a process, all operators and combination of operators can be applied in a subprocess. That is why a subprocess can also be defined as a chain of operators that is subsequently applied. For more detail about subprocesses please study the Subprocess operator.

Double-click on the Branch operator to go inside and view the subprocesses. The subprocesses are then shown in the same Process View. Here you can see two subprocesses: 'Then' and 'Else' subprocesses. The 'Then' subprocess is executed if the condition specified in the parameters results true. The 'Else' subprocess is executed if the condition specified in the parameters results false. To go back to the parent process, click the blue-colored up arrow button in the Process View toolbar. This works like files and folders work in operating systems. Subprocesses can have subprocesses in them just like folders can have folders in them.

The Branch operator is similar to the Select Subprocess operator because they both have multiple subprocesses but only one subprocess is executed at a time. The Select Subprocess operator can have more than two subprocesses and the subprocess to be executed is specified in the parameters. On the contrary, The Branch operator has only two subprocesses and the subprocess to be executed depends upon the result of the condition specified in the parameters. The condition is specified through the condition type and condition value parameters. Macros can be provided in the condition value parameter. Thus the subprocess to be executed can be controlled by using macros. If this operator is placed in any Loop operator this operator will be executed multiple number of times. True power of this operator comes into play when it is used with other operators like various Macro and Loop operators. For example, if this operator is placed in any Loop operator and the condition value parameter is controlled by a macro then this operator can be used to dynamically change the process setup. This might be useful in order to test different layouts.

Input

  • condition

    Any object can be supplied at this port. The condition specified in the parameters is tested on this object. If the condition is satisfied the 'Then' subprocess is executed otherwise the 'Else' subprocess is executed

  • input

    The Branch operator can have multiple inputs. When one input is connected, another input port becomes available which is ready to accept another input (if any). The order of inputs remains the same. The Object supplied at the first input port of the Branch operator is available at the first input port of the nested chain (inside the subprocess).Don't forget to connect all inputs in correct order. Make sure that you have connected the right number of ports at all levels of the chain.

Output

  • input

    The Branch operator can have multiple outputs. When one output is connected, another input port becomes available which is ready to deliver another output (if any). The order of outputs remains the same. The Object delivered at the first input port of subprocess is delivered at the first input of the Branch operator. Don't forget to connect all outputs in correct order. Make sure that you have connected the right number of ports at all levels of the chain.

Parameters

  • condition_typeThe type of condition is selected through this parameter. Range: selection
  • condition_valueThe value of the selected condition type is specified through this parameter. The condition type and condition value parameters together specify the condition statement. This condition will be tested on the object provided at the condition input port. Range:
  • io_objectThis parameter is only available when the condition type parameter is set to 'input exists'. This parameter specifies the class of the object which should be checked for existence. Range: selection
  • return_inner_output This parameter indicates if the outputs of the inner subprocess should be delivered through this operator. Range: boolean

Tutorial Processes

Applying different subprocesses on Golf data set depending upon the performance value

The 'Golf' data set is loaded using the Retrieve operator. The Default Model operator is applied on it. The resultant model is applied on the 'Golf-Testset' data set through the Apply Model operator. The performance of this model is measured by the Performance operator. A breakpoint is inserted here so that you can have a look at this performance vector. You can see that its accuracy value is 64.29%. It is provided at the condition port of the Branch operator. Thus the condition specified in the parameters of the Branch operator will be tested on this performance vector. The 'Golf' data set is also provided to the Branch operator (through the first input port).

Now have a look at the subprocesses of the Branch operator. The 'Then' subprocess simply connects the condition port to the input port without applying any operator. Thus If the condition specified in the parameters is true, the condition object i.e. the performance vector will be delivered by the Branch operator. The 'Else' subprocess does not use the object at the condition port. Instead, it applies the K-NN operator on the object at the first input port i.e. the 'Golf' data set. Thus If the condition specified in the parameters is false, the K-NN operator will be applied on the object at the first input port i.e. the 'Golf' data set and the resultant model will be delivered by the Branch operator.

Now have a look at the parameters of the Branch operator. The condition type parameter is set to 'min performance value' and the condition value parameter is set to 70. Thus if the performance of the performance vector is greater than 70, the condition will be true.

Overall in this process, The Default Model is trained on the 'Golf' data set, if its performance on the 'Golf-Testset' data set is more that 70% the performance vector will be delivered otherwise the K-NN model trained on the 'Golf' data set will be delivered.