Categories

Versions

What's New in RapidMiner Radoop 10.2.0?

Released: August 15, 2023

New Features

  • Support for Cloudera Spark3 distribution
  • Integration with Radoop Proxy 2.0
  • Submission of Spark applications via Radoop Proxy utilizing the spark-submit CLI of the cluster
  • Installation of Radoop Proxy via Cloudera Manager as a CSD and Parcel
  • Configuration of Radoop Proxy via Cloudera Manager as a cluster service
  • Removal of the option for password-based authentication, allowing only keytabs for kerberized connections. Existing connections with passwords must be updated to provide keytabs in order to remain functional
  • Incorporation of Altair Unit Licensing for Radoop. Individual jobs submitted to the Hadoop cluster do not consume additional Altair Units
  • Addition of Radoop Connection Test to cover DNS and reverse DNS lookups
  • Extension of Radoop Connection Test with additional health checks for Radoop Proxy
  • Improvements to the Radoop Connection Import Wizard:
    • Radoop Proxy is now mandatory, so the wizard selects the relevant checkboxes for Studio and AI Hub usage automatically
    • Reliance on Cloudera Spark3 configuration and Spark Assembly location

Bugfixes

  • [Fixed] Radoop Proxy name is not decoded correctly and fails when it contains whitespaces
  • [Fixed] New connection framework-based Radoop connection throws NPE
  • [Fixed] Deadlock during Radoop process validation
  • [Fixed] HDFS test uses old proxy connection name after renaming proxy connection
  • [Fixed] Radoop Nest does not deliver connection output
  • [Fixed] Malfunction in Spark PushDown process extraction
  • [Fixed] PoissonDistribution cause ClassCastException on Spark PushDown
  • [Fixed] Default SparkScript migrated over to Python3
  • [Fixed] Removed unused Mahout related code from Radoop
  • [Fixed] NPE thrown in PushDown MapAccumulator

Compatibility

Radoop 10.2.0 requires RapidMiner Studio and RapidMiner AI Hub version 10.2.0 (or later) to function properly. Additionally, due to the Java 11 requirement on the cluster side, Radoop 10.2.0 requires Hive 3.x to run on the Java 11 JVM. Both a Java 11 JVM and Spark 3.x need to be available on all computational nodes. Model scoring related functionality is not expected to work with Hadoop clusters that do not support Hive on Java 11, such as EMR 5.x, EMR 6.x, HDInsight 4.x, and 5.x.