Categories

Versions

You are viewing the RapidMiner Radoop documentation for version 7.6 - Check here for latest version

What’s New in RapidMiner Radoop 7.3?

This page describes the new features of RapidMiner Radoop 7.3 as well as its enhancements and bug fixes.

Update / migration

Please note that RapidMiner Radoop 7.3 is not backwards compatible and requires RapidMiner Studio 7.3 and optionally RapidMiner Server 7.3. Update is available through the RapidMiner Marketplace.

Introducing Radoop Proxy

RapidMiner Radoop can now connect to Hadoop clusters in secure networks via the new Radoop Proxy component being introduced with RapidMiner Server 7.3. With RapidMiner Server being installed on an edge node of the Hadoop cluster, its new Radoop Proxy component can forward all traffic between RapidMiner Radoop and the Hadoop cluster components. This reduces the number of open ports required for Radoop from over ten to only two which significantly eases connecting Radoop to many secure Hadoop clusters.

Radoop Proxy also requires the next port

Radoop Proxy requires the selected port and the following port. In this example 1081 and 1082.

Spark 2.0 support

RapidMiner Radoop 7.3 brings Spark 2.0 support. As Hadoop distribution vendors are starting to add Spark 2.0 to their distributions, this Radoop version already comes with Spark 2.0 support. Whatever Spark version you are using on your cluster, your Radoop processes run smoothly without any change. Please note that you may need to adapt your Radoop connection (Spark Version).

Cloudera Manager and Ambari integration

Do you use Cloudera Manager or Ambari to manage your Hadoop cluster? If yes, you can quickly create the connection in Radoop by providing the URL and the credentials for the cluster manager service to import most of the connection details automatically.

Enhancements and bug fixes

The following improvements are part of RapidMiner Radoop 7.3.

Enhancements

  • RapidMiner Server can now act as a secure proxy that forwards the Radoop calls from Studio to the internal Hadoop services.
  • Added an option to import Radoop connections using a cluster manager.
  • Added support for Spark 2.0. Please note that Spark 2.0.1 is not supported because of a Spark bug.
  • There is no need for a restart of Studio or Server if radoop_connections.xml is modified - changes are applied on next process run.
  • Failed test application logs are now included in the zip file created by Extract Logs.
  • When using a SOCKS Proxy (SocksSocketFactory), cluster-side changes (making hadoop.rpc.socket.factory.class.default final) is no longer necessary.
  • Apply Model and Single Process Pushdown now warns explicitly for wrong Java version on the cluster (Java 7 and earlier).
  • Advanced Hadoop and Hive parameters on Connection Dialog now appear in alphabetical order.
  • The jar files in the optionally specified Additional Libraries Directory of the Radoop connection are now sorted alphabetically on the classpath.
  • The import configuration wizard now also supports compressed tar files.
  • Connection test now warns if the client does not support the required bitlength.
  • Connections on connection panel and Hive objects on Hadoop Data View can be deleted by hitting backspace on OS X
  • Added quickfixes to insert a Nominal to Numerical operator before learners.
  • Hive JDBC connection pool size can now be adjusted via a Preference setting.
  • Import configuration wizard now recognizes Hive nosasl authentication and sets the JDBC postfix automatically.

Bug fixes

  • BUGFIX: KMS (in case of HDFS encryption) may no longer cause that a popup that asks credentials, if Studio has been running for more than the ticket lifetime
  • BUGFIX: Import in Hadoop Data View now shows the details if an error occurs
  • BUGFIX: Connections imported from configuration files with High Availability no longer fail because of the generated dummy address
  • BUGFIX: Apply Model may no longer log something for each input row (thus preventing decrease in performance)
  • BUGFIX: Apply Model may no longer fail with a timeout after 15 minutes ("RapidMiner libraries upload timed out")
  • BUGFIX: Read Database no longer logs progress with WARNING log level
  • BUGFIX: Invalid Hive table name error now has a proper quickfix that points to changing the tablename parameter
  • BUGFIX: Single Process Pushdown's GC monitoring feature reports the correct error message now if the process was killed by the monitor
  • BUGFIX: Hive on Tez no longer restarts the application killed by the Job Kill test
  • BUGFIX: Exploring a large table on Hadoop Data View may no longer start a MapReduce job
  • BUGFIX: Error retrieval after a failed Single Process Pushdown operator may no longer throw ConcurrentModificationException
  • BUGFIX: Quickfix for adding a Multiply operator inside the Radoop Nest now correctly adds Radoop's Multiply
  • BUGFIX: Warning for a non-Radoop operator inside the Nest is now attached to the concerned operator instead of the Nest
  • BUGFIX: Single Process Pushdown no longer shows incorrect metadata on the output in case of unsupported characters in the attribute name