Loop Collection (RapidMiner Studio Core)
SynopsisThis operator iterates over a collection of objects. It is a nested operator and its subprocess executes once for each object of the given collection.
Objects can be grouped into a collection using the Collect operator. In the Process View, collections are indicated by double lines. The Loop Collection operator loops over its subprocess once for every object in the input collection. The output of this operator is also a collection, any additional results of the subprocess can also be delivered through its output ports (as collections). If the unfold parameter is set to true then the output will be the union of all elements of the input collections.
Collections can be useful when you want to apply the same operations on a number of objects. The Collect operator will allow you to collect the required objects into a single collection, the Loop Collection operator will allow you to iterate over all collections and finally you can separate the input objects from collection by individually selecting the required element by using the Select operator.
- collection (Collection)
This input port expects a collection. It is the output of the Collect operator in the attached Example Process.
- output (Collection)
This operator can have multiple outputs. When one output is connected, another output port becomes available which is ready to deliver another output (if any). The order of outputs remains the same. The object supplied at the first output port of the subprocess of the Loop Collection operator is delivered through the first output port of this operator. The objects are delivered as collections.
- set_iteration_macroThis parameter specifies if a macro should be defined for the loop. The macro value will increment after every iteration. The name and start value of the macro can be specified by the macro name and macro start value parameters respectively. Range: boolean
- macro_nameThis parameter is only available when the set iteration macro parameter is set to true. This parameter specifies the name of the macro. Range: string
- macro_start_valueThis parameter is only available when the set iteration macro parameter is set to true. This parameter specifies the starting value of the macro. The value of the macro increments after every iteration of the loop. Range: integer
- unfoldThis parameter specifies whether collections received at the input ports should be unfolded. If the unfold parameter is set to true then the output will be the union of all elements of the input collections. Range: boolean
Introduction to collections
This Example Process explains a number of important ideas related to collections. This Example Process shows how objects can be collected into a collection, then some preprocessing is applied on the collection and finally individual elements of the collection are separated as required.
The 'Golf' and 'Golf-Testset' data sets are loaded using the Retrieve operator. Both ExampleSets are provided as inputs to the Subprocess operator. The subprocess performs some preprocessing on the ExampleSets and then returns them through its output ports. The first output port returns the preprocessed 'Golf' data set which is then used as training set for the Decision Tree operator. The second output port delivers the preprocessed 'Golf-Testset' data set which is used as testing set for the Apply Model operator which applies the Decision Tree model. The performance of this model is measured and it is connected to the results port. The training and testing ExampleSets can also be seen in the Results Workspace.
Now have a look at the subprocess of the Subprocess operator. First of all, the Collect operator combines the two ExampleSets into a single collection. Note the double line output of the Collect operator which indicates that the result is a collection. Then the Loop Collection operator is applied on the collection. The Loop Collection operator iterates over the elements of the collection and performs some preprocessing (renaming an attribute in this case) on them. You can see in the subprocess of the Loop Collection operator that the Rename operator is used for changing the name of the Temperature attribute to 'New Temperature'. It is important to note that this renaming is performed on both ExampleSets of the collection. The resultant collection is supplied to the Multiply operator which generates two copies of the collection. The first copy is used by the Select operator (with index parameter = 1) to select the first element of collection i.e. the preprocessed 'Golf' data set. The second copy is used by the second Select operator (with index parameter = 2) to select the second element of the collection i.e. the preprocessed 'Golf-Testset' data set.