Categories

Versions

You are viewing the RapidMiner Radoop documentation for version 7.6 - Check here for latest version

What’s New in RapidMiner Radoop 7.4?

Enhancements and bug fixes

The following improvements are part of RapidMiner Radoop 7.4.

Enhancements

  • Added user impersonation (proxy user) capabilities: a superuser can now impersonate the RapidMiner Server user on the cluster
  • Added SparkRM operator for parallel process pushdown onto the cluster
  • Radoop Proxy is now disabled when running process through Server
  • Added Spark 2.1 support (new option in the Spark Version list)
  • Type Conversion now allows to use an attribute filter, so it is now easy to convert multiple (or all) attributes
  • Single Process Pushdown no longer warns for certain operators that they may not work properly
  • In case of a Hive connection error, more details may be revealed in the Log
  • Textfile is now the default input format for all Spark operators instead of Parquet (sometimes better performance and smaller risk of 2GB partition limit problem)
  • Annotations of data sets inside Radoop Nest are now kept even after a Store and a Retrieve operator (stored in Hive metadata)
  • Single Process Pushdown no longer tries to run its subprocess second time, if there is a well known process error
  • Add noise now has a local random seed parameter
  • Generate Data now allows to define the number of partitions on the output data set and calculates this number by using heuristics by default
  • Generate Data now allows to specify the file format of the output, and Textfile became the default instead of Parquet
  • When running on Server, the JBoss configuration and log directories are the primary paths for the radoop_connections.xml and log files
  • When closing Studio, it will wait if temporary tables are being dropped
  • The Log panel reports when a submitted Spark job is waiting for free resources for minutes
  • In case of using LDAP for Hive (empty Hive Principal field), Kerberos settings are ignored in the Hive connection
  • A specific error message is shown if there is a timeout in a Hive-on-Spark job
  • There is no design-time warning now for some core operators when they are used inside a Radoop Nest

Bug fixes

  • BUGFIX: Fixed issues with Kerberos ticket renewal in long-running Studio
  • BUGFIX: Fixed accesswhitelist option in Radoop connections
  • BUGFIX: Connection import from Cloudera Manager no longer fails if cluster name contains a space (like Cloudera Quickstart)
  • BUGFIX: Unsupported attribute filter types (block_type, no_missing_values, numeric_value_filter) can no longer be selected for Radoop operators
  • BUGFIX: Single Process Pushdown now returns the missing values correctly for integer, nominal and date attributes
  • BUGFIX: Single Process Pushdown now does not lose the roles when creating an in-memory example set on an IOObject input port.
  • BUGFIX: Single Process Pushdown no longer overwrites attributes when "canonical" names collide (e.g. when two attribute names only differ in case)
  • BUGFIX: Single Process Pushdown no longer fails with "getNominalMapping() is not supported" when the input Hive table is in PARQUET format and has TINYINT or SMALLINT columns (see HIVE-14294).
  • BUGFIX: Fixed that Single Process Pushdown and Generate Data did not clean temporary tables on their output
  • BUGFIX: Fixed misleading Hive connection error (TTransportException: SASL authentication not complete)
  • BUGFIX: Fixed potential issues caused by reusing Hive connections with different properties
  • BUGFIX: Import from Amazon S3 dialog now only lists supported file formats
  • BUGFIX: Replace with applicable Radoop operator quickfix now adds the multiclass Decision Tree Radoop operator, and not the old binominal version
  • BUGFIX: Changes in Radoop Proxy settings involved in already established connections are now properly applied without a restart.