You are viewing the RapidMiner Developers documentation for version 9.6 - Check here for latest version
Python Scripting Extension
RapidMiner provides the Python Scripting extension, including the Operator Execute Python. It enables you to run Python code within a RapidMiner process.
ExampleSets are handled as pandas
DataFrame
objects.
The Extension supports a variety of Python environment management tools, including the popular Anaconda distribution and virtualenvwrapper.
Installation and configuration
The necessary installation and configurations differ based on where you want to install the extension. Read more below to install and configure the extension:
When you're done with the above steps, you should have an environment capable of running any of the tutorial processes provided with the Execute Python operator
Usage
Here are some of the key features of the extension. Make sure to explore the tutorial processes provided with the Execute Python operator as well.
To successfully execute your code inside RapidMiner, you need to structure your code in a way that you declare an rm_main
function as your main entry point.The number and order of input parameters and returned values of your rm_main
function will correspond to the input and output ports of the Execute Python operator.
Running scripts
You can execute your Python code either by editing it in-line with our basic script editor (it provides basic syntax highlighting but lacks all the powerful features of a Python IDE), or by specifying a script file in the Execute Python operator's script file parameter. If your script is stored in a location accessible via internet (such as GitHub), you can also read your script file directly from there with the help of the Open File operator.
Running notebooks
You can also execute ipynb
notebooks with the help of Execute Python. In this case, use the script file parameter of the operator to locate your notebook. The same consideration on how to structure code applies for notebook as for Python scripts.
If you tagged your notebook cells, we offer a selective tag based execution, allowing you to pick which cells to exclude from the execution. Alternatively, you can specify which cells to execute by providing a regular expression.
Fine-tuning the execution
Python environments are a great way to eliminate package dependency pollution and interference between different projects. In this case you will probably have multiple Python environments in use.
To customize the Python environment used in one specific Execute Python operator, all you need to do is uncheck use default Python in the operator parameters, and provide your desired Python environment there. The same options are available as in the RapidMiner Studio preferences (see the installation and configuration chapter above).
Using RapidMiner macros
Macros added into the Python code inline with the %{myMacro}
syntax will be parsed before the script execution, both in case of an inline script and one provided by script file. But, to no surprise, this piece of code then will only run inside RapidMiner, and will otherwise produce a syntax error.
Another, more pythonic way to tackle this is to check the enable macros parameter on your Execute Python operator. Next, you need to add an extra parameter to your rm_main
function, where macros will be accessible during your execution. This will allow you not only to read macro values, but also to define new ones, or overwrite the value of existing macros.
Running on Server
There are only a few special considerations to take into account when running Execute Python operators on RapidMiner Server, otherwise everything will work as expected.
When using an environment manager such as Anaconda, it is a good practice to have the same environments with the same name installed on Studio as well as on Server. To help with the Server part, we have created Python Environment Manager.
When opening an Execute Python operator in RapidMiner Studio, only the local Python environments will be listed, never the ones present on RapidMiner Server, even if the process was opened from a Server repository.