Using Radoop Connection Overrides

Often, in advanced process design, you may encounter situations where some aspect of the process execution on the Hadoop cluster needs to be fine-tuned for an optimal outcome. This can be anything from adjusting timeout values for certain parts of the process, to specifying which YARN queue should be utilized.

One possible solution for this is to create separate Radoop connections for each "tweak", and design the Radoop process into separate "modules", each tweak with each of its own Radoop connection grouped into its own Radoop Nest. This is suboptimal, because it can clutter up your list of Radoop connections, and it also makes the process hard to share, especially in use-cases where RapidMiner AI Hub is used for collaboration.

To solve this problem, we have introduced the concept of connection overrides, which allows you to tweak most of the connection settings, as well as any advanced parameters during your Radoop process execution, on a per-nest-operator basis.

Overrides can be defined and enabled on the following operators:

  • Radoop Nest
  • Subprocess (Radoop)
  • Single Process Pushdown (Radoop)
  • SparkRM (Radoop)

The overrides will only have an effect inside these operators. Nesting them is also supported seamlessly.

Creating connection overrides

To illustrate this feature, we will use an example: you have a process containing a Hive operator that runs long and would time out with your default settings defined in your Radoop connection.

  1. To create a connection override, take your offending operator and encapsulate it into a relevant nest operator. In our example we will use the Subprocess (Radoop) operator.

  2. Click on the Connection Override button in the Parameters panel to open up the Connection overrides dialog.

  3. Set the desired connection overrides as needed. A blue icon next to each tab that contains overrides will appear, to find the overrides easily. An Undo button will be shown below each overridden parameter with the original value shown to easily allow resetting the override.

Keeping track of connection overrides

To minimize the need to browse through each operators to find what was overridden, you can use the following convenience features to get an overview:

  • the Problems panel will show a list of all enabled overrides in your process. Double-clicking the message will open the Connection Override dialog

  • on the Connection Overrides dialog, the tooltip of the blue icon next to the tabs which contain overrides will show a list of all overrides

Sharing connection overrides

Connection overrides are tied to the process design, not the Radoop connection, and as such are portable across different Radoop connections. If you want to share the process with others, which includes your Radoop connection overrides, simply share your process as usual. The recipient of the process can immediately start using them with their own Radoop connection.

You can also export the connection including the overrides as a new Radoop connection, e.g. for testing purposes and easier sharing. Just click the Save As Connection... button on the bottom of the Connection Overrides dialog, then specify a name for your new connection and click OK.

To save your fine-tuned connection to a project or AI Hub Repository, first use the Save as Connection… action, then use the Export… button on the Manage Radoop Connections… dialog.