Categories

Versions

You are viewing the RapidMiner Studio documentation for version 7.6 - Check here for latest version

New Features in RapidMiner Studio 6.5

This page describes the new features of RapidMiner Studio 6.5 as well as its enhancements and bug fixes.

Introducing Get-More-Open-Core business model with three software editions

With the release of RapidMiner Studio 6.5, RapidMiner introduces a Get-More-Open-Core business model. This business model offers two free editions and one commercial edition of RapidMiner Studio. This new business model starts with an open core model, delivering a free open source edition and a commercial edition of the software, then adds another free edition that offers more than the basics to users who join the RapidMiner Community. The purpose of the free community edition is to foster the mutual learning, innovation and richness that results from the synergies of a thriving community. The three RapidMiner Studio editions include:

  • RapidMiner Studio Basic: Free predictive analytics client software with no registration required. The open source code under the OSI-certified AGPL license is also available for free download.

  • RapidMiner Studio Community: Free predictive analytics client software plus, with email registration, community features and benefits.

  • RapidMiner Studio Professional: Predictive analytics client software plus community and commercial features and benefits.

For complete details, please see the full comparison of RapidMiner Studio editions.

The new Get-More-Open-Core model also introduces an improved onboarding experience. Users are no longer required to register before downloading the software. Instead, after download and installation, users can register from within RapidMiner Studio. Although not required, registration provides users with the benefits and shared knowledge of the vibrant RapidMiner user community.

Expression engine now offers clearer interface, simpler syntax, and significant performance gains

The RapidMiner expression parser and engine are used in many operators throughout the platform (for example, Generate Attributes and Filter Examples). The expression editor interface has a complete new look. Now, the interface arranges functions and input values that can be used inside expressions—like attributes and macros—in a more straightforward dialog. As you build, the expression editor documents the behavior, arguments, and return types of the available selections.

In addition to an updated user interface, RapidMiner Studio 6.5 introduces simplified syntax for expression building. Specifically, the behavior of macros has become much more intuitive.

There are two main use cases for macros—to substitute a literal constant and to substitute an attribute name. RapidMiner Studio 6.4 (and earlier) replaced macros before evaluating an expression, making it difficult to use macros that represented strings that needed escaping or quoting.

The macro expression %{} is now a regular function, evaluated just as any other function, and results in the string value of the macro. The new eval("expression") function allows you to evaluate strings as expressions. To resolve the value of a macro (i.e., as a number), use eval(%{macro}). If a macro references an attribute by name, treat it as an expression written as eval(%{macro}) or the shorthand #{macro}. (This shorthand is only applicable for attribute names.)

While previously attribute names containing non-alphanumeric characters were hard to use in expressions, they can now be written as [<attribute name>]. Also, several pre-defined macros (for example, %{execution_count}) now have more meaningful names. Useful constants, such as date expressions, are now accessible from the user interface. RapidMiner Studio 6.5 removes earlier inconsistencies in, for example, date functions and complex numbers.

For compatibility, expressions created with earlier versions of RapidMiner can be imported—and are functional—in RapidMiner Studio 6.5. To use the new syntax, you must update pre-version 6.5 processes to the latest compatibility level using the corresponding control at the bottom of the Parameters panel.

Finally, the expression engine has been completely rewritten, achieving a roughly 2.5x performance improvement.

Improved pre-flight check and runtime error messages

New warning and error messages clearly identify configuration issues that prevent execution. Previously, RapidMiner Studio displayed potential process setup problems in the Problems view, however these could be easily overlooked. RapidMiner Studio 6.5 now issues a warning prior to execution if it detects an issue that is likely to break the process. These warnings locate the problem and include clear instructions on how to fix the process setup so that it is syntactically correct. You have the option to ignore the warnings and continue to run the process. Depending on the error, it may finish or it may fail. Runtime error messages are redesigned to highlight the problematic operator or control (for example a missing parameter). Errors prevent successful execution and require correction.

New RapidMiner users will likely benefit most from this new feature. However, intermediate and advanced users will likely benefit as well since any user at any level can, for example, accidentally forget to complete a connection.

Hive Connector

Users can now connect RapidMiner Studio and RapidMiner Server to a Hadoop cluster. This requires a Professional edition of RapidMiner Studio. Using the Read Hive operator, a process can read tables from Apache Hive (the most common SQL engine/database layer in Hadoop). Users can even run their own SQL scripts against Hive, applying the result as an input for their RapidMiner process. The connection is simple to setup, and supports LDAP authentication.

It is important to note that pushing the computation down to Hadoop, so designing distributed, scalable processes on TBs or PBs of data is still only possible with the RapidMiner Radoop product. Now, however, users with a RapidMiner Studio Professional license can prototype the processes that use data residing in their Hadoop cluster. With this feature, users can easily build a RapidMiner process that uses aggregated data from Hadoop as its source, without manually moving and transforming the data, or writing custom export scripts.

Enhancements and bug fixes

The following pages describe the enhancements and bug fixes in RapidMiner Studio 6.5 releases: