Categories

Versions

You are viewing the RapidMiner Radoop documentation for version 7.6 - Check here for latest version

What’s New in RapidMiner Radoop 7.4?

This page describes the new features of RapidMiner Radoop 7.4 as well as its enhancements and bug fixes.

Update / migration

Update is available through the RapidMiner Marketplace.

Introducing SparkRM

RapidMiner Radoop 7.4 introduces SparkRM (available with the “Enterprise” license). With SparkRM any operator or process existing in RapidMiner Studio can be run in parallel in a Hadoop environment, leveraging Spark as the execution framework.

The user-defined Subprocess (i.e. visually defined code) in the new SparkRM meta-operator can contain any in-memory RapidMiner operator, including those from extensions. The operator encapsulates that subprocess and pushes it to Hadoop, where it is automatically executed inside of Spark on potentially multiple Hadoop nodes. The input data provided to the SparkRM operator is partitioned (according to the values of an attribute, linearly or just randomly) and distributed to the Hadoop nodes beforehand. The RapidMiner subprocess is then run on all those partitions, potentially in many Hadoop nodes. After execution, the result is merged if it’s a coherent dataset, or returned as a collection otherwise.

SparkRM opens up a variety of new use cases that can now be solved by Radoop natively on Hadoop, especially those that need an extension, like text analytics, process mining, time series analytics or forecasting and many more. For a more detailed guide, check the SparkRM: Process Pushdown section in the documentation.

Support for Hadoop user impersonation (“proxy” user)

RapidMiner Radoop 7.4 now also supports Hadoop user impersonation, significantly simplifying Radoop connection setup and management when connecting to a Hadoop cluster using RapidMiner Server. A Radoop connection on RapidMiner Server can be defined using the credentials (password or keytab) of a Hadoop “proxy” super-user. When a RapidMiner Studio user logs in to RapidMiner Server, she is authenticated using her RapidMiner credentials. Once logged in, whenever she runs a Radoop job, the super-user then impersonates the RapidMiner user and the job will have the rights and privileges granted to that same user in Hadoop.

This approach reduces administrative work as a single Radoop connection in RapidMiner Server can be used by multiple users. It is especially useful in multi-user installations. For details on the configuration, see the guide Using Hadoop user impersonation in the Radoop connection.

Enhancements and bug fixes

The following pages describe the enhancements and bug fixes in RapidMiner Radoop 7.4 releases: