Categories

Versions

Loop Batches (RapidMiner Studio Core)

Synopsis

This operator creates batches from the input ExampleSet and executes its subprocess on each of these batches. This can be useful for applying operators on very large data sets that are in a database.

Description

This operator groups the examples of the input ExampleSet into batches of the specified size and executes the inner operators on all batches subsequently. This can be useful for very large data sets which cannot be loaded into memory and must be handled in the database. In such cases, preprocessing methods or model applications and other tasks can be performed on each batch and the results can be written into the database table (by using the Write Database or Update Database operators). Note that the output of this operator is not composed of the results of the subprocess. In fact the subprocess does not need to deliver any output since it operates on a subset view of the input ExampleSet. Thus this operator returns the input ExampleSet without any modifications. The results of the subprocess are not directly accessible, they can be written into a database or a file during the execution of this process. The results of the last batch can be accessed using Remember/Recall operators.

Input

  • example set (Data Table)

    This input port expects an ExampleSet. It is the output of the Retrieve operator in the attached Example Process. The output of other operators can also be used as input.

Output

  • example set (Data Table)

    The ExampleSet that was given as input is delivered through this port without any modifications.

Parameters

  • batch_sizeThis parameter specifies the number of examples in a batch. Range: integer

Tutorial Processes

Introduction to the Loop Batches operator

The 'Iris' data set is loaded using the Retrieve operator. A breakpoint is inserted here so that you can have a look at the ExampleSet before the application of the Loop Batches operator. You can see that the ExampleSet has 150 examples. The Loop Batches operator is applied on this ExampleSet. The batch size parameter is set to 50. Given that there are 150 examples and the batch size is 50, there will be 3 (i.e. 150/50) iterations of this operator. Have a look at the subprocess of the Loop Batches operator. The Remember operator is applied there to store the examples of each iteration into the object table as an ExampleSet. A breakpoint is inserted before the Remember operator so that you can have a look at the examples of each iteration. On execution of process, you will see three iterations of the Loop Batches operator. You can see that the first iteration has examples from id_1 to id_50. Similarly the consequent iterations have examples id_51 to id_100 and id_101 to id_150. At the end, the Recall operator is used for fetching the objects stored by the Remember operator. The Recall operator can only fetch the examples of the last batch because the previous batches were overridden by the consequent batches.