You are viewing the RapidMiner Radoop documentation for version 7.6 - Check here for latest version

What’s New in RapidMiner Radoop 7.4?

Enhancements and bug fixes

The following improvements are part of RapidMiner Radoop 7.4.

Enhancements

Added user impersonation (proxy user) capabilities: a superuser can now impersonate the RapidMiner Server user on the cluster
Added SparkRM operator for parallel process pushdown onto the cluster
Radoop Proxy is now disabled when running process through Server
Added Spark 2.1 support (new option in the Spark Version list)
Type Conversion now allows to use an attribute filter, so it is now easy to convert multiple (or all) attributes
Single Process Pushdown no longer warns for certain operators that they may not work properly
In case of a Hive connection error, more details may be revealed in the Log
Textfile is now the default input format for all Spark operators instead of Parquet (sometimes better performance and smaller risk of 2GB partition limit problem)
Annotations of data sets inside Radoop Nest are now kept even after a Store and a Retrieve operator (stored in Hive metadata)
Single Process Pushdown no longer tries to run its subprocess second time, if there is a well known process error
Add noise now has a local random seed parameter
Generate Data now allows to define the number of partitions on the output data set and calculates this number by using heuristics by default
Generate Data now allows to specify the file format of the output, and Textfile became the default instead of Parquet
When running on Server, the JBoss configuration and log directories are the primary paths for the radoop_connections.xml and log files
When closing Studio, it will wait if temporary tables are being dropped
The Log panel reports when a submitted Spark job is waiting for free resources for minutes
In case of using LDAP for Hive (empty Hive Principal field), Kerberos settings are ignored in the Hive connection
A specific error message is shown if there is a timeout in a Hive-on-Spark job
There is no design-time warning now for some core operators when they are used inside a Radoop Nest

Bug fixes

BUGFIX: Fixed issues with Kerberos ticket renewal in long-running Studio
BUGFIX: Fixed accesswhitelist option in Radoop connections
BUGFIX: Connection import from Cloudera Manager no longer fails if cluster name contains a space (like Cloudera Quickstart)
BUGFIX: Unsupported attribute filter types (block_type, no_missing_values, numeric_value_filter) can no longer be selected for Radoop operators
BUGFIX: Single Process Pushdown now returns the missing values correctly for integer, nominal and date attributes
BUGFIX: Single Process Pushdown now does not lose the roles when creating an in-memory example set on an IOObject input port.
BUGFIX: Single Process Pushdown no longer overwrites attributes when "canonical" names collide (e.g. when two attribute names only differ in case)
BUGFIX: Single Process Pushdown no longer fails with "getNominalMapping() is not supported" when the input Hive table is in PARQUET format and has TINYINT or SMALLINT columns (see HIVE-14294).
BUGFIX: Fixed that Single Process Pushdown and Generate Data did not clean temporary tables on their output
BUGFIX: Fixed misleading Hive connection error (TTransportException: SASL authentication not complete)
BUGFIX: Fixed potential issues caused by reusing Hive connections with different properties
BUGFIX: Import from Amazon S3 dialog now only lists supported file formats
BUGFIX: Replace with applicable Radoop operator quickfix now adds the multiclass Decision Tree Radoop operator, and not the old binominal version
BUGFIX: Changes in Radoop Proxy settings involved in already established connections are now properly applied without a restart.

Categories

Versions

What’s New in RapidMiner Radoop 7.4?

Enhancements and bug fixes

Enhancements

Bug fixes