You are viewing the RapidMiner Developers documentation for version 9.6 - Check here for latest version
Creating Super Operators
There are two types of operators in RapidMiner - normal and super operators. Super operators contain one or more sub processes. You started by implementing a normal operator, but sometimes an operator relies on the execution of other operators. And sometimes these operators should be user defined. Take the cross-validation as an example. The user might specify the learner and the way performance is measured; it then executes these subprocesses as needed.
Assume you have an operator that should loop over values, but the Loop values operator in RapidMiner Studio loops over the values of an attribute. You want an operator that loops over values in a given range, with a given step size, and you don't want to create an attribute for this purpose. Instead, build a super operator that re-executes its inner operators for each step of a given range. To do this, create a new class, but this time extend the OperatorChain class. As with a simple operator, you must implement a constructor. The empty class looks like this:
public class LoopValuesRange extends OperatorChain{
public LoopValuesRange(OperatorDescription description) {
super(description, "Executed Process");
}
}
In contrast to the simple operator, you must give the super constructor the names of the subprocesses you are going to create inside your super operator. The number of names you pass to the super constructor determines the number of created subprocesses. If you want to follow the naming convention, start each word uppercase and use blanks to separate words. Later, you might access these subprocesses by index to execute them. But first, define some ports to pass data to the super operator.
Using the PortPairExtender for super operators
You learned earlier how to use the PortPairExtender to create throughput ports for a simple operator. You also need this class to pass data from the super operator to the subprocess and back. Do it in a general way so that the user can pass any number and any type of object to the inner process. You might know this behavior from the Loop operator of RapidMiner Studio. The code for adding this PortPairExtender looks like this:
private final PortPairExtender inputPortPairExtender =
new PortPairExtender("input", getInputPorts(), getSubprocess(0).getInnerSources());
In addition to the PortPairExtender, there is also a PortExtender. Use the PortPairExtender to get an equal number of input and output ports. Take a close look at the PortPairExtender constructor. In addition to the name, you must specify which input ports the extender should attach to. The getInputPorts method delivers the input ports of the current operator (so the port extender is attached on the left side of the operator box). The paired ports are added to the inner sources of the first subprocess. Then, you can access the subprocesses via the getSubprocess method.
If you are familiar with RapidMiner’s integrated super operators like Loop, you know that there are always input ports on the left and output ports on the right of the subprocess. To distinguish these ports from the input and output ports of the super operator, RapidMiner calls them inner sources and inner sinks. In fact, an inner source is technically an output port for the super operator (because it delivers data to this port). The inner sink is an input port for the super operator from where it can retrieve the output of the subprocesses. To deliver outputs from the loop, you could add the following second variant of the PortPairExtender to collect the outputs from all iterations and pass them as a collection to the output of our super operator:
private final CollectingPortPairExtender outExtender =
new CollectingPortPairExtender(
"output", getSubprocess(0).getInnerSinks(), getOutputPorts());
This would result in an operator that looks like this:
To make a PortExtender work, you must initialize it during construction of the operator. Simply add the following lines in the constructor:
inputPortPairExtender.start();
outExtender.start();
To have proper meta data available at the output ports, add some rules:
getTransformer().addRule(inputPortPairExtender.makePassThroughRule());
You must add a rule defining when the subprocess’ meta data is to be transformed. The ordering of the rule definition is crucial because if the meta data isn’t forwarded to the inner ports, there is nothing the meta data transformation of the inner operators can do. This line adds the rule:
getTransformer().addRule(new SubprocessTransformRule(getSubprocess(0)));
Next, you need a rule to pass the meta data from the inner sinks to the output ports:
getTransformer().addRule(outExtender.makePassThroughRule());
The minimal setup of the doWork()
method looks like this:
@Override
public void doWork() throws OperatorException {
outExtender.reset();
inputPortPairExtender.passDataThrough();
getSubprocess(0).execute();
outExtender.collect();
}
First, it resets the CollectingPortPairExtender, then it passes data from the input port of the super operator to the inner ports. Next, execute the subprocess and finally collect all outputs.
Try this instead.
Add four parameters - the start value, the end value, the step size for the range, and a field where you can enter the macro name (which contains the current value during the execution of the loop).
Then, adapt the doWork()
method. Loop over the values in the given range, define the macro value in each iteration and execute the subprocess in each iteration.
The final result looks like this:
You can see that the operator now has the parameters that define the value range of the loop. Within the subprocess you can read the macro value (the current value in the loop) and print it with the first simple operator that you created at the beginning.
The log entries show that, in each iteration, the value of the macro changes and the subprocess is executed.
In the end, the super operator looks like this:
/**
* Example for a super operator, loops over values given by a range and step size.
*/
public class LoopValuesRange extends OperatorChain{
public static final String PARAMETER_START = "start";
public static final String PARAMETER_END = "end";
public static final String PARAMETER_STEP_SIZE = "step size";
public static final String PARAMETER_MACRO_NAME = "iteration macro";
private final PortPairExtender inputPortPairExtender =
new PortPairExtender("input", getInputPorts(), getSubprocess(0).getInnerSources());
private final CollectingPortPairExtender outExtender =
new CollectingPortPairExtender("output",
getSubprocess(0).getInnerSinks(), getOutputPorts());
/**
* Constructor
* @param description
*/
public LoopValuesRange(OperatorDescription description) {
super(description, "Executed Process");
inputPortPairExtender.start();
outExtender.start();
getTransformer().addRule(inputPortPairExtender.makePassThroughRule());
getTransformer().addRule(new SubprocessTransformRule(getSubprocess(0)));
getTransformer().addRule(outExtender.makePassThroughRule());
}
@Override
public void doWork() throws OperatorException {
outExtender.reset();
inputPortPairExtender.passDataThrough();
double start = getParameterAsDouble(PARAMETER_START);
double end = getParameterAsDouble(PARAMETER_END);
double stepsize = getParameterAsDouble(PARAMETER_STEP_SIZE);
String macro = getParameterAsString(PARAMETER_MACRO_NAME);
for(double i=start; i<end; i+=stepsize){
getProcess().getMacroHandler().addMacro(macro,
Double.toString(Math.round(i*100)/100.0));
getSubprocess(0).execute();
}
outExtender.collect();
}
@Override
public List<ParameterType> getParameterTypes() {
List<ParameterType> types = super.getParameterTypes();
types.add(new ParameterTypeDouble(PARAMETER_START,
"start value of the value range",
Integer.MIN_VALUE, Integer.MAX_VALUE, 0, false));
types.add(new ParameterTypeDouble(PARAMETER_END,
"end value of the value range",
Integer.MIN_VALUE, Integer.MAX_VALUE, 1, false));
types.add(new ParameterTypeDouble(PARAMETER_STEP_SIZE,
"step size of the value range",
0, Integer.MAX_VALUE, 0.1, false));
types.add(new ParameterTypeString(PARAMETER_MACRO_NAME,
"This parameter specifies the name of the macro which holds "+
"the current value of the selected range in each iteration.",
"loop_value"));
return types;
}
}