Categories

Versions

You are viewing the RapidMiner Studio documentation for version 9.8 - Check here for latest version

Subprocess (RapidMiner Studio Core)

Synopsis

This operator introduces a process within a process. Whenever a Subprocess operator is reached during a process execution, first the entire subprocess is executed. Once the subprocess execution is complete, the flow is returned to the process (the parent process). A subprocess can be considered as a small unit of a process, like in process, all operators and combination of operators can be applied in a subprocess. That is why a subprocess can also be defined as a chain of operators that is subsequently applied.

Description

Double click on the Subprocess operator to go inside the subprocess. The subprocess is then shown in the same Process View. To go back to the parent process, click the blue-colored up arrow button in the Process View toolbar. This works like files and folders work in operating systems. Subprocesses can have subprocesses in them just like folders can have folders in them. The order of execution in case of nested subprocesses is the same as a depth-first-search through a tree structure. When a Subprocess operator is reached, all operators inside it are executed and then the execution flow returns to the parent process and the operator that is located after the Subprocess operator (in the parent process) is executed. This description can be easily understood by studying the attached Example Process.

A subprocess can be considered as a simple operator chain which can have an arbitrary number of inner operators. The operators are subsequently applied and their output is used as input for the succeeding operators. The input of the Subprocess operator is used as input for the first operator in it and the output of the last operator in the subprocess is used as the output of the Subprocess operator. Subprocesses make a process more manageable but don't forget to connect all inputs and outputs in correct order. Also make sure that you have connected the right number of ports at all levels of the chain.

Subprocesses are useful in many ways. They give a structure to the entire process. Process complexity is reduced and they become easy to understand and modify. Many operators have a subprocess as their integral parts e.g. the X-Validation operator which is also shown in the attached Example Process. It should be noted that connecting the input of a Subprocess directly to its output without applying any operator in between or using an empty Subprocess gives no results.

Input

  • input (IOObject)

    The Subprocess operator can have multiple inputs. When one input is connected, another input port becomes available which is ready to accept another input (if any). The order of inputs remains the same. The Object supplied at the first input port of the subprocess is available at the first input port of the nested chain (inside the subprocess). Subprocesses make a process more manageable but don't forget to connect all inputs in correct order. Make sure that you have connected the right number of ports at all levels of the chain.

Output

  • output (IOObject)

    The Subprocess operator can have multiple outputs. When one output is connected, another output port becomes available which is ready to deliver another output (if any). The order of outputs remains the same. The Object delivered at the first output port of the subprocess is delivered at the first output of the outer process. Subprocesses make a process more manageable but don't forget to connect all outputs in correct order. Make sure that you have connected the right number of ports at all levels of the chain.

Tutorial Processes

Using subprocesses to structure a process

The 'Golf' dataset is loaded using the Retrieve operator. It is attached to the first input of the Subprocess operator. Double click on the Subprocess operator to see what is inside this subprocess. The first input of the subprocess is attached with a Decision Tree operator. The output of the Decision Tree operator is given to the first output port. Now, go back to the main process. You will see that the first output port of the Subprocess operator is attached to the first result port. This explains the result 'Tree(decision tree(golf))' in the Results Workspace. This is how it works: The Golf data set enters the subprocess through the first input port, then the Decision Tree operator is applied on it in the subprocess, the resulting model is delivered to the results via the first output port of the subprocess.

During the main process, the Purchases data set is loaded using the Retrieve operator. It is attached to the second input port of the Subprocess operator. Double click on the Subprocess operator to see what is inside this subprocess. The second input port of the subprocess is attached directly to the second output port without applying any operator. Now, go back to the main process. You will see that the second output port of the Subprocess operator is attached to the second result port. But, as no operator is applied to the Purchases data set in the Subprocess, it fails to produce any results (not even the result of the Retrieve operator is shown in the Results Workspace). This explains why we have three results in the Results Workspace despite the attachment of four outputs to the results ports in the main process.

In the subprocess, the Iris data set is loaded using the Retrieve operator. It is connected to the Decision Tree operator and the resultant model is attached to the third output port of the subprocess, which is in turn attached to the third results port in the main process. This explains the result 'Tree (decision tree (Iris))' in the Results Workspace.

In the Subprocess, the Weighting data set is loaded using Retrieve operator. It is connected to the X-Validation operator and the resultant Performance Vector is attached to the forth output port of the Subprocess, which is in turn attached to the forth results port in the main process. This explains the result 'performanceVector (Performance)' in the Results Workspace. The X-Validation operator itself is composed of a subprocess; double click on the X-Validation operator and you will see the subprocess within this operator. Explanation of what is going on inside X-Validation would be a diversion here. This operator was added here just to show how various operators can be composed of a subprocess. To know more about the X-Validation operator you can read its description.

Note: This Example Process is just for highlighting different perspectives of Subprocess operator. It may not be very useful in real scenarios. The Example Process of Performance operator is also a good example of the usage of the Subprocess operator.