You are viewing the RapidMiner Radoop documentation for version 7.6 - Check here for latest version
What’s New in RapidMiner Radoop 7.4?
Enhancements and bug fixes
The following improvements are part of RapidMiner Radoop 7.4.
Enhancements
- Added user impersonation (proxy user) capabilities: a superuser can now impersonate the RapidMiner Server user on the cluster
- Added SparkRM operator for parallel process pushdown onto the cluster
- Radoop Proxy is now disabled when running process through Server
- Added Spark 2.1 support (new option in the Spark Version list)
- Type Conversion now allows to use an attribute filter, so it is now easy to convert multiple (or all) attributes
- Single Process Pushdown no longer warns for certain operators that they may not work properly
- In case of a Hive connection error, more details may be revealed in the Log
- Textfile is now the default input format for all Spark operators instead of Parquet (sometimes better performance and smaller risk of 2GB partition limit problem)
- Annotations of data sets inside Radoop Nest are now kept even after a Store and a Retrieve operator (stored in Hive metadata)
- Single Process Pushdown no longer tries to run its subprocess second time, if there is a well known process error
- Add noise now has a local random seed parameter
- Generate Data now allows to define the number of partitions on the output data set and calculates this number by using heuristics by default
- Generate Data now allows to specify the file format of the output, and Textfile became the default instead of Parquet
- When running on Server, the JBoss configuration and log directories are the primary paths for the radoop_connections.xml and log files
- When closing Studio, it will wait if temporary tables are being dropped
- The Log panel reports when a submitted Spark job is waiting for free resources for minutes
- In case of using LDAP for Hive (empty Hive Principal field), Kerberos settings are ignored in the Hive connection
- A specific error message is shown if there is a timeout in a Hive-on-Spark job
- There is no design-time warning now for some core operators when they are used inside a Radoop Nest
Bug fixes
- BUGFIX: Fixed issues with Kerberos ticket renewal in long-running Studio
- BUGFIX: Fixed accesswhitelist option in Radoop connections
- BUGFIX: Connection import from Cloudera Manager no longer fails if cluster name contains a space (like Cloudera Quickstart)
- BUGFIX: Unsupported attribute filter types (block_type, no_missing_values, numeric_value_filter) can no longer be selected for Radoop operators
- BUGFIX: Single Process Pushdown now returns the missing values correctly for integer, nominal and date attributes
- BUGFIX: Single Process Pushdown now does not lose the roles when creating an in-memory example set on an IOObject input port.
- BUGFIX: Single Process Pushdown no longer overwrites attributes when "canonical" names collide (e.g. when two attribute names only differ in case)
- BUGFIX: Single Process Pushdown no longer fails with "getNominalMapping() is not supported" when the input Hive table is in PARQUET format and has TINYINT or SMALLINT columns (see HIVE-14294).
- BUGFIX: Fixed that Single Process Pushdown and Generate Data did not clean temporary tables on their output
- BUGFIX: Fixed misleading Hive connection error (TTransportException: SASL authentication not complete)
- BUGFIX: Fixed potential issues caused by reusing Hive connections with different properties
- BUGFIX: Import from Amazon S3 dialog now only lists supported file formats
- BUGFIX: Replace with applicable Radoop operator quickfix now adds the multiclass Decision Tree Radoop operator, and not the old binominal version
- BUGFIX: Changes in Radoop Proxy settings involved in already established connections are now properly applied without a restart.