Installing RapidMiner Radoop on RapidMiner AI Hub

This documentation assumes that RapidMiner AI Hub is deployed using a containerized deployment method, and that a working Radoop connection is available in a repository or project, as described in Configuring Radoop connections. In others cases please consult the previous version of this documentation.

Prerequisites

The following requirements must be met before using RapidMiner Radoop in RapidMiner AI Hub:

  • RapidMiner Radoop Extension installed and tested in RapidMiner Studio.
  • A working Radoop connection to a Hadoop cluster in RapidMiner Studio, stored in a repository or project. See Configuring RapidMiner Radoop Connections to learn how to create it.
  • The same version of the RapidMiner Radoop extension installed in RapidMiner AI Hub. (Containerized deployments ship with a bundled Radoop extension, so you only need to ensure the version match.)
  • A valid license for RapidMiner Radoop installed in RapidMiner AI Hub. You can obtain your license from your RapidMiner Account portal.

Installing RapidMiner Radoop on RapidMiner AI Hub and the connected Job Agent(s)

As the Radoop extension is already in place in AI Hub when using our containerized deployment, the only needed step is to install your Radoop license obtained above.

To do this, log in to AI Hub as an administrator, then click on the Install license action on the Administration --> Manage licenses page and paste your Radoop license key.

Using Radoop connections with RapidMiner AI Hub

Using Radoop connections with RapidMiner AI Hub is as easy as it is with RapidMiner Studio, but there are some caveats which will be discussed in detail below. The Radoop connection used by the RapidMiner process being executed in RapidMiner AI Hub must be in the same repository or project.

Important note: Radoop processes are not supported in RapidMiner AI Hub web services.

Managing multiple Hadoop users with RapidMiner AI Hub executions

When multiple users are running Radoop processes in RapidMiner AI Hub, it's a natural expectation that the jobs created on the Hadoop cluster by Radoop all run as individual users, for auditability.

It is also expected that such clusters are secured using Kerberos and keytabs are used for authentication, each user having their own keytab.

By using RapidMiner AI Hub's vault to securely store these keytabs for each user, it is possible to create a connection that uses each user's own keytab directly from the vault.

To do this, the connection manager or administrator setting up the connection for other users must edit the exported Radoop connection, then click Set injected parameters on the Security tab and select the Kerberos keytab parameter to be injected from RapidMiner AI Hub.

Note: the RapidMiner AI Hub injection option is only available when the Radoop connection is stored in a RapidMiner AI Hub project. The legacy repository is not supported.

Note: administrators must ensure that each user has a valid keytab injected into their user’s vault in RapidMiner AI Hub. This task can be done using RapidMiner AI Hub’s REST APIs, and it is much easier when automated using a script. Please contact our support team to provide a sample script if needed.

Using Radoop Proxy with RapidMiner AI Hub executions

Radoop Proxy is automatically disabled when a process is executed on RapidMiner AI Hub, because in a typical setup, RapidMiner AI Hub runs inside the secure zone, so there is no need to route the traffic through the Radoop Proxy.

If this is not the case, and the RapidMiner AI Hub instance does need Radoop Proxy to access the Hadoop cluster, the Radoop connection needs to be adapted to support this scenario:

  1. Open the Manage Radoop Connections window and edit the original Radoop connection that was exported to a repository or project.

  2. On the RapidMiner AI Hub tab, check Force Radoop Proxy on AI Hub.

  3. Save, then Export the connection to a repository or project.

Note: the Radoop connection and the Radoop Proxy connection must be in the same repository or project, and both need to be located on the same AI Hub where the execution will take place.