Categories

Versions

You are viewing the RapidMiner Radoop documentation for version 7.6 - Check here for latest version

What’s New in RapidMiner Radoop 7.5?

This page describes the new features of RapidMiner Radoop 7.5 as well as its enhancements and bug fixes.

Update / migration

Please note that RapidMiner Radoop 7.5 is not backwards compatible and requires RapidMiner Studio 7.5 and optionally RapidMiner Server 7.5. Update is available through the RapidMiner Marketplace.

More improvements in performance

RapidMiner Radoop 7.5 comes with an important improvement in performance. A new mechanism has been implemented with an enhanced management of Spark containers for in-Hadoop data preparation (using Hive-on-Spark). This system greatly reduces the latency of most ETL processes. Additionally, the new parallel Loop and Loop Attributes operators are available for Radoop. Now you can run multiple jobs in parallel in the Hadoop cluster!

New tutorial

A new tutorial has been added to RapidMiner Studio dedicated to providing an easy start for beginners. The tutorial explains how to create full processes and run it in a Hadoop environment. The tasks include simple examples of data ingestion and data movement, data preparation and modeling. The new SparkRM operator, that allows to run any RapidMiner operator in Hadoop is also explained.

Better Machine Learning in-Hadoop

The SparkRM operator introduced in 7.4 to run any regular RapidMiner Studio operator in Hadoop now includes the Bootstrapping algorithm. Using Bootstrapping with SparkRM allows to create ensemble models easily. Together with Combine Models, for instance, you can build Random Forest models by training multiple Decision Trees in parallel and end up with a model which is not included in Spark MLlib originally.

Support for separate databases for UDFs and temporary tables

Previous versions of RapidMiner Radoop needed a default database where both user defined functions (UDFs) and temporary tables could be stored. As an improvement for secured environments, Radoop now supports the option to separate both. That way, permissions can be stricter: UDFs only need to be read, while temporary data can be sandboxed in its own database to be read, written and deleted as needed.

Enhancements and bug fixes

The following pages describe the enhancements and bug fixes in RapidMiner Radoop 7.5 releases: