RapidMiner Radoop Compatibility

Supported Hadoop distributions

RapidMiner Radoop works with most popular Hadoop distributions. Refer to the provider's documentation for information on configuring the Hadoop cluster. The supported distributions are:

For CDH distributions, we only support the minor versions that are also supported by Cloudera. For HDInsight and Amazon EMR operators related to model scoring is not available due their lack of running Hive on Java11.

Supported data warehouse systems (DWS)

RapidMiner Radoop supports the following data warehouse infrastructure:

  • Hive 3.x (for scoring models it must run on Java11 JVM to load Radoop UDFs)

Supported Spark versions

RapidMiner Radoop supports the following Spark versions:

  • Apache Spark 3.x (only Scala 2.12 distribution is supported on Java11 JVM)

Supported Java versions

On the Hadoop cluster, RapidMiner Radoop requires Oracle JDK 11 or OpenJDK 11 installed to operate. The cluster nodes should have at least 32 GB of RAM. On the machine running the extension itself (either within RapidMiner Studio or AI Hub), RapidMiner Radoop requires Oracle Java 11 or OpenJDK Java 11.

RapidMiner extension compatibility

RapidMiner Radoop is not compatible with the Parallel Processing Extension. This extension must be disabled when using Radoop. Please select the Extensions > Manage Extensions... menu item and uncheck the box for Parallel Processing Extension.