You are viewing the RapidMiner Radoop documentation for version 7.6 - Check here for latest version
What’s New in RapidMiner Radoop 2.3?
This page describes the enhancement being delivered by RapidMiner Radoop 2.3.
Data authorization with Apache Sentry
RapidMiner Radoop 2.3 now integrates with Apache Sentry. Sentry is a way to implement data authorization in many Hadoop environments and allows you to control access to data tables and even table columns in Hive or Impala. RapidMiner Radoop's support of Apache Sentry enables you to implement authorization as part of your data security on Hadoop initiatives.
Linear Regression
With the current release, the set of available machine learning models provided by RapidMiner Radoop has been extended to include a linear regression algorithm based on Apache Spark's machine learning library MLlib. The new linear regression algorithm in RapidMiner Radoop allows you to leverage this high-performance compute framework and train a linear regression model, making use of the full distributed computation power of a Hadoop cluster running Spark.
Enhancements and bug fixes
The following improvements are part of RapidMiner Radoop 2.3.
Enhancements
- Improved Kerberos mapping rules (starting from 2.3.1)
- Adds username authorization to Radoop connections on the Server (starting from 2.3.1)
- Adds support for overwriting attributes in Generate Attributes (starting from 2.3.1)
- Improved S3 and HDFS data import (starting from 2.3.1)
- Improved stopping of Spark jobs
- Improved Spark decision tree models with simpler splits
- New tooltip help for Radoop settings
- Improved user permission handling (with or without Sentry)
- Adds optional ORC data format support for Spark machine learning algorithms
- Enables concurrent execution of RapidMiner Radoop processes on RapidMiner Server (if security settings are the same)
- Adds "invalidate metadata" option to Radoop Nest when using Impala
Bug fixes
- BUGFIX: Fixed data import for Hadoop 1.x (starting from 2.3.1)
- BUGFIX: Fixed general Apply Model for Hive 1.1+ (starting from 2.3.1)
- BUGFIX: Fixed general Apply Model for boolean types (starting from 2.3.1)
- BUGFIX: Fixed reading date types in Read Database (starting from 2.3.1)
- BUGFIX: There are no longer concurrency issues when the same user executes multiple jobs on the same cluster
- BUGFIX: There are no longer Kerberos caching problem when using both secure and unsecure connections or multiple secure connections
- BUGFIX: Fixed possible nominal mapping error for machine learning operators
- BUGFIX: Fixed Naive Bayes model update
- BUGFIX: Fixed boolean values for ORC file format
- BUGFIX: Fixed classification performance when a confidence attribute is not present
- BUGFIX: Fixed Declare Missing Values when using nominal value for numerical attributes
- BUGFIX: Fixed in-Hadoop model apply for Kernel Naive Bayes
- BUGFIX: Fixed Hive table list caching issue (Append, Table Management operators)
- BUGFIX: Fixed possible nominal mapping error during regression model apply
- BUGFIX: Fixed possible nominal mapping error during decision tree training
- BUGFIX: Fixed heuristic memory allocation for the Spark driver application (prevents negative value)
- BUGFIX: Fixed for in-Hadoop model apply errors (missing values, attribute order, cluster model NullPointer)
- BUGFIX: Fixed Spark model nominal mappings
- BUGFIX: Fixed metadata validation