Categories

Versions

You are viewing the RapidMiner Python documentation for version 9.9 - Check here for latest version

Custom operators

On this page we explain how you can embed your Python code into RapidMiner processes even more by creating custom operators using the Python Learner and Python Transformer operators. You can then share these custom operators with others who aren't adept Python coders.

The Python Learner operator

With Python Learner, you can create a Python based model that is compatible with RapidMiner's model interfaces. Models created using Python Learner (and custom operators derived from it) can be applied using the Apply Model operator, can be trained using RapidMiner's Cross-Validation operators, can be fine-tuned using Optimize operators, and so on.

When you drag a new Python Learner to the canvas from the Operators panel is RapidMiner Studio, you will get an operator similar to other learners (in operator color and input/output ports). It will also have a few predefined parameters which you can edit. See the list of supported parameter types below.

The operator information panel also shows the capabilities of this Python based learner.

To edit the parameter definitions and capabilities, click the gear icon on the Parameters panel. A JSON editor will appear where you can add and modify these traits. The JSON you write here will be validated and you will get warnings if something isn't correct. If you Apply an incorrect configuration, your input ports will disappear from the operator on the canvas.

To implement a Python learner, you need to define two functions named rm_train and rm_apply. As the name suggests, the first will run when you train a model in your Python Learner operator. The second will run when the model created with Python Learner is being applied with e.g. Apply Model. We advise to study the provided tutorial processes for more hints on how to implement these functions.

The Python Learner operator does not yet support forecasting models.

The Python Transformer operator

You can think of the Python Transformer as an Execute Python operator with user defined parameters and an arbitrary number of input and output ports.

The name of the operator, its parameters and their types and default values, as well as the input and output ports, are defined by clicking on the gear icon in the operator parameters panel, and editing the JSON definition in the editor that pops up. This is very similar to the one explained above for the Python Learner operator (Transformer doesn't take a list of capabilities, but it requires inputs and outputs). The JSON you write here will be validated and you will get warnings if something isn't correct. If you Apply an incorrect configuration, your input ports will disappear from the operator on the canvas. See the list of supported parameter types below.

In order for the code to execute as expected, you have to follow the same convention as for Execute Python: your main entry point will be the rm_main function, and the number and order of the function parameters and return values will correspond to the operator's input and output ports.

If you need multiple inputs and outputs for your Python Transformer based custom operator, you have to explicitly define this using the inputs and outputs part of the JSON definition. Users accustomed to Execute Python’s dynamic ports may find this non-intuitive.

The sample parameter configuration and code present when you drag a new Python Transformer to the canvas contains all the above hints.

Supported parameter types

Here's a list of supported parameter types for Python Learner and Python Transformer, which you can use in the parameters list of your operator parameter configuration JSON:

Type in JSON Parameter appearance
string string in a textbox
category single-choice dropdown
boolean checkbox
integer integer in a textbox
real floating point number in a textbox

Each parameter definition has the following attributes, which are represented by key-value pairs in the tuple describing a parameter:

Attribute Mandatory? Description
name yes the parameter name shown on the operator parameter panel
type yes the parameter type (see above table for supported types)
categories only if type is category the choices shown in the parameter dropdown, displayed in the order provided by the user. Must be a list of values.
optional no if set to true, the operator will be executed even if the parameter value is empty
value only if optional is false or not provided default value of the parameter

Here are some examples to the above parameter definitions:

"parameters": [
 {
  "name": "1st_parameter",
  "type": "string",
  "optional": true
 },
 {
  "name": "2nd_parameter",
  "type": "integer",
  "value": 100
 },
 {
  "name": "3rd_parameter",
  "type": "category",
  "categories": [
    "Category A",
    "Category B",
    "Category C",
    "Default Category"
  ],
  "value": "Default Category"
 },
 {
  "name": "4th_parameter",
  "type": "boolean"
 },
 {
  "name": "5th_parameter",
  "type": "real",
  "value": 3.1415
 },
 {
  "name": "6th_parameter",
  "type": "string",
  "optional": true
 }
]

Environment handling in custom Python operators

Similarly to the Execute Python operator, you can uncheck the use default Python parameter and specify which environment to use. In case of Python Learner, the model application will be done using the same environment that was used for training.

In case the model application is done on another machine (e.g. on RapidMiner AI Hub), ensure that the same Python environment with the same name is available, otherwise your execution will either fail, or produce unwanted results.

Sharing and distributing custom operators

When you are happy with how a Learner or Transformer you created behaves, the next step could be sharing it with others on your project. Both operators have a Save button on their parameters panel.

When you click Save, then specify a location in your project or repository, a .pyop descriptor file will be created.

Users can then drag this .pyop file to the canvas in RapidMiner Studio, and the Learner or Transformer containing all your code and parameter definitions will be created, using the name you provided for your custom operator. This operator will not be editable, which ensures that code you wrote earlier will execute the same way as you intended it (provided the Python environment it uses is present on the machine running the RapidMiner process).

One drawback of this method of sharing is that it is not possible to update the operators after the .pyop descriptor has been dragged to the canvas and a new operator was created based on it. If you need to ensure that these operators get updated, you need to distribute your custom operators as an extension. To do this, right-click on the folder containing your .pyop files and click on Create Extension... Enter the details on the dialog that appears. You will also be given a list of the custom operators that will be compiled into your new extension. Click Create Extension.

Once the extension is created, you can distribute it as any other extension. When you want to update your operators, you create a new version of the extension and redistribute it to all users.

Note: the created extension will depend on the Python Scripting extension version 9.9 or later, so each user has to have that extension installed as well.