Execute Script (RapidMiner Studio Core)
Synopsis
This operator executes Java code and/or Groovy scripts. This basically means that users can write their own operators directly within the process by specifying Java code and/or a Groovy script which will be interpreted and executed during the process runtime.Description
This is a very powerful operator because it allows you to write your own script. This operator should be used if the task you want to perform through your script cannot be performed by existing RapidMiner operators because writing scripts can be time-consuming and error-prone.
Groovy is an agile and dynamic language for the Java Virtual Machine. It builds upon the strengths of Java but has additional power features inspired by languages like Python, Ruby and Smalltalk. Groovy integrates well with all existing Java classes and libraries because it compiles straight to Java bytecode so you can use it anywhere you can use Java. For a complete reference of Groovy scripts please refer to http://groovy.codehaus.org/.
In addition to the usual scripting code elements from Groovy, the RapidMiner scripting operator defines some special scripting elements:
- If the standard imports parameter is set to true, all important types like Example, ExampleSet, Attribute, Operator etc as well as the most important Java types like collections etc are automatically imported and can directly be used within the script. Hence, there is no need for importing them in your script. However, you can import any other class you want and use it in your script.
- The current operator (the scripting operator for which you define the script) is referenced by operator. Example: operator.log("text")
- All operator methods like log (see above) that access the input or the complete process can directly be used by writing a preceding operator. Example: operator.getProcess()
- Input of the operator can be retrieved via the input method getInput(Class)of the surrounding operator. Example: ExampleSet exampleSet = operator.getInput(ExampleSet.class)
- You can iterate over examples with the following construct: for (Example example : exampleSet) { ... }
- You can retrieve example values with the shortcut: In case of non-numeric values: String value = example["attribute_name"]; In case of numeric values: double value = example["attribute_name"];
- You can set example values with the shortcut: In case of non-numeric values: example["attribute_name"] = "value"; In case of numeric values: example["attribute_name"] = 5.7;
Please study the attached Example Processes for better understanding. Please note that Scripts written for this operator may access Java code. Scripts may hence become incompatible in future releases of RapidMiner.
Input
- input
The Script operator can have multiple inputs. When one input is connected, another input port becomes available which is ready to accept another input (if any).
Output
- output
The Script operator can have multiple outputs. When one output is connected, another output port becomes available which is ready to deliver another output (if any).
Parameters
- scriptThe script to be executed is specified through this parameter. Range:
- standard_importsIf the standard imports parameter is set to true, all important types like Example, ExampleSet, Attribute, Operator etc as well as the most important Java types like collections etc are automatically imported and can directly be used within the script. Hence, there is no need for importing them in your script. However, you can import any other class you want and use it in your script. Range: boolean
Tutorial Processes
Iterating over attributes for changing the attribute names to lower case
The 'Purchases' data set is loaded using the Retrieve operator. A breakpoint is inserted here so that you can view the ExampleSet. Note that the names of all attributes of the ExampleSet are in upper case letters. The Script operator is applied on the ExampleSet. The script changes the attribute names to lower case letters. This can be verified by viewing the results in the Results Workspace.
Here is a brief description of what happens in the script. First the input of the operator is retrieved via the input method getInput(Class). Then the for loop iterates for all attributes and uses the toLowerCase() method to change the names of the attributes to lower case letters. At the end, the modified ExampleSet is returned.
Please note that this is a very simple script, it was included here just to introduce you with working of this operator. This operator can be used to perform very complex tasks.
Iterating over all examples for changing the attribute values to upper case
The 'Purchases' data set is loaded using the Retrieve operator. A Breakpoint is inserted here so that you can view the ExampleSet. Note that the values of all attributes of the ExampleSet are in lower case letters. The Script operator is applied on the ExampleSet. The script changes the attribute values to upper case letters. This can be verified by viewing the results in the Results Workspace.
Here is a brief description of what happens in the script. First the input of the operator is retrieved via the input method getInput(Class). Then the outer for loop iterates for all attributes and stores the name of the current attribute in a string variable. Then the inner for loop iterates over all the examples of the current attribute and changes the values from lower to upper case using the toUpperCase() method. At the end, the modified ExampleSet is returned.
Please note that this is a very simple script, it was included here just to introduce you with working of this operator. This operator can be used to perform very complex tasks.
Subtracting mean of numerical attributes from attribute values
The 'Golf' data set is loaded using the Retrieve operator. A breakpoint is inserted here so that you can view the ExampleSet. Note the values of the 'Temperature' and 'Humidity' attributes. The Script operator is applied on the ExampleSet. The script subtracts the mean of each numerical attribute from all values of that attribute. This can be verified by viewing the results in the Results Workspace.
Here is a brief description of what happens in the script. First the input of the operator is retrieved via the input method getInput(Class). Then the outer for loop iterates for all attributes and stores the name of the current attribute in a string variable and the mean of this attribute in a double type variable. Then the inner for loop iterates over all the examples of the current attribute and subtracts the mean from the current value of the example. At the end, the modified ExampleSet is returned.
Please note that this is a very simple script, it was included here just to introduce you with working of this operator. This operator can be used to perform very complex tasks.