You are viewing the RapidMiner Radoop documentation for version 7.6 - Check here for latest version

What’s New in RapidMiner Radoop 2.3?

This page describes the enhancement being delivered by RapidMiner Radoop 2.3.

Data authorization with Apache Sentry

RapidMiner Radoop 2.3 now integrates with Apache Sentry. Sentry is a way to implement data authorization in many Hadoop environments and allows you to control access to data tables and even table columns in Hive or Impala. RapidMiner Radoop's support of Apache Sentry enables you to implement authorization as part of your data security on Hadoop initiatives.

Linear Regression

With the current release, the set of available machine learning models provided by RapidMiner Radoop has been extended to include a linear regression algorithm based on Apache Spark's machine learning library MLlib. The new linear regression algorithm in RapidMiner Radoop allows you to leverage this high-performance compute framework and train a linear regression model, making use of the full distributed computation power of a Hadoop cluster running Spark.

Enhancements and bug fixes

The following improvements are part of RapidMiner Radoop 2.3.

Enhancements

Improved Kerberos mapping rules (starting from 2.3.1)
Adds username authorization to Radoop connections on the Server (starting from 2.3.1)
Adds support for overwriting attributes in Generate Attributes (starting from 2.3.1)
Improved S3 and HDFS data import (starting from 2.3.1)
Improved stopping of Spark jobs
Improved Spark decision tree models with simpler splits
New tooltip help for Radoop settings
Improved user permission handling (with or without Sentry)
Adds optional ORC data format support for Spark machine learning algorithms
Enables concurrent execution of RapidMiner Radoop processes on RapidMiner Server (if security settings are the same)
Adds "invalidate metadata" option to Radoop Nest when using Impala

Bug fixes

BUGFIX: Fixed data import for Hadoop 1.x (starting from 2.3.1)
BUGFIX: Fixed general Apply Model for Hive 1.1+ (starting from 2.3.1)
BUGFIX: Fixed general Apply Model for boolean types (starting from 2.3.1)
BUGFIX: Fixed reading date types in Read Database (starting from 2.3.1)
BUGFIX: There are no longer concurrency issues when the same user executes multiple jobs on the same cluster
BUGFIX: There are no longer Kerberos caching problem when using both secure and unsecure connections or multiple secure connections
BUGFIX: Fixed possible nominal mapping error for machine learning operators
BUGFIX: Fixed Naive Bayes model update
BUGFIX: Fixed boolean values for ORC file format
BUGFIX: Fixed classification performance when a confidence attribute is not present
BUGFIX: Fixed Declare Missing Values when using nominal value for numerical attributes
BUGFIX: Fixed in-Hadoop model apply for Kernel Naive Bayes
BUGFIX: Fixed Hive table list caching issue (Append, Table Management operators)
BUGFIX: Fixed possible nominal mapping error during regression model apply
BUGFIX: Fixed possible nominal mapping error during decision tree training
BUGFIX: Fixed heuristic memory allocation for the Spark driver application (prevents negative value)
BUGFIX: Fixed for in-Hadoop model apply errors (missing values, attribute order, cluster model NullPointer)
BUGFIX: Fixed Spark model nominal mappings
BUGFIX: Fixed metadata validation

Categories

Versions