RapidMiner Studio 7.2 comes with a reworked license framework. This article will focus on explaining the introduced limits and how to handle them. Once Studio is installed, you can inspect the active license limits using the Help > Manage Licenses menu within the RapidMiner Studio application.
What are logical processor limits
Some RapidMiner Studio operators are able to perform the computation of complex tasks in a parallel way. For every parallel execution a new thread will be created which can be processed by a logical processor. The number of logical processors and thus the maximum number of threads is limited by your license and can be specified in the RapidMiner Studio Preferences (
Number of threads).
What are data limits
RapidMiner Studio processes data in the form of Example Sets. An Example Set is simply a table created from attributes (columns) and examples (rows). With the introduced data limit operators and views can only process a specific amount of rows which is bound to the installed license.
How do these affect my process
The behavior of the data row limit handling depends on the execution mode of RapidMiner Studio.
If RapidMiner Studio is executed in UI mode and an Operator receives an Example Set which exceeds the size of the allowed data rows the Process will be paused and a corresponding information bubble will be shown. At this point in time you can either downsample your data or upgrade your current license. See the What can I do if I hit a limit section for more information.
The amount of displayed results in the Results view will also be determined by the installed license. If the displayed result is larger than the allowed data rows a corresponding warning banner will be shown and the data rows which exceed the limit will not be processed.
Command line mode
If RapidMiner Studio is executed in command line mode and an Operator receives an Example Set which exceeds the size of the allowed data rows the process will be stopped with the following User error:
Your license only permits up to x rows of data in a process, however the input data contained y rows.
There are several different ways to comply with the data row limit:
Upgrade your license
If you want to leverage the full power of RapidMiner Studio you may want to upgrade your license. Take a look at the product comparison page to find the license which fits your needs.
Downsample via data limit bubble
In UI mode of RapidMiner Studio an information bubble will be shown which offers you the possibility to downsample the data.
If you click on Downsample data the first n rows will be transferred to the next operator (n is the max amount of data rows which complies to the active license). This choice will be remembered for the current process. So any other Operator which hits the limit or a re-execution of the process will also use the downsample strategy. If the automatic downsampling is applied an additional notification banner will be shown in the process panel.
Use Studio Operators
The Sample, Sample (Stratified), Sample (Kennard-Stone) or the Filter Example Range Operator can also be used to downsample the data in regard to the license limit. Simply add one of the mentioned operators after every operator which generates or retrieves more data than your license limit allows. This procedure is especially useful when you want to control and persist the downsampling strategy.