What's new in RapidMiner Studio 9.7

Projects

RapidMiner Studio and Server, as well as JupyterHub, now support the concept of projects, enabling you to structure and isolate your work., allowing multiple users to collaborate while maintaining a consistent state across the entire project.

On top of that, projects are versioned, providing the following cool features:

  • Linear backup, you can always revert to a past state (nothing is lost, no matter what you do).
  • Each snapshot (project version) is fully consistent, so it's easy to answer compliance questions like "which process trained this model".
  • Traceability: snapshots log who did what, when and why (through user-written comments).
  • There's a Git server used as the version control backend. This also enables storing files of arbitrary types like .py or .csv, making your projects whole.
  • Direct git access for everyone working via Git, e.g. Python coders. This allows seamless, two-way integration for projects between Studio users and e.g. Python coders.

Local repositories that will be created with RapidMiner Studio 9.7 or later can also take advantage of supporting all files you may have on your computer (.py, .jpeg, .pdf, etc).

HDF5 as the new file format

RapidMiner ExampleSets are now written to disk in a new file format: HDF5. This well-established format ensures stability and performance when storing large amounts of data. It also means that Python and RapidMiner Studio can exchange data easier and faster than ever before.

Improvements to our time series support

  • New operator Integrate to integrate time series with different methods (cumulative sum / left and right riemann sum / trapezoidal rule)
  • Added the option to specify negative lags and a default lag for a set of attributes (selected by an attribute subset selector) to the Lag operator
  • Unfortunately due to parameter key incompatibilities, the old version of the 'Lag' operator had to be deprecated and new version with the same name, but different operator key is added.
  • Added options to use padding for Fast Fourier Transformation and calculate the frequency of the amplitude value.

Improvements in our guided machine learning features

Auto Model: * Some processes (e.g. SVM, FLM, or weight calculations) now use Target Encoding (a new operator) instead of one-hot encoding which reduces memory usage and run times * You can submit multiple Auto Model jobs to RapidMiner Server and use its repository to load the results.

Model Ops: * Repositories on RapidMiner Server and RapidMiner Studio can be used as storage locations for deployed models (also known as "deployment location") * Unused and ID columns are now kept in the results after scoring

Updated H2O library

The H2O library, which we use for providing some of our popular learners, has been updated to the latest stable version (3.30.0.1 to be precise). This update will increase stability and performance for Gradient Boosted Trees, Logistic Regression, Deep Learning and Generalized Linear Model operators.

In addition, some enhancements were implemented: - Gradient Boosted Trees now support monotonicity constraints - Deep Learning now exposes model weights on a separate output port - Model training can be fine-tuned using expert parameters. All parameters provided by H2O are supported.

A note on backwards compatibility: to ensure a smooth transition to the new H2O library version, models trained using previous RapidMiner Studio versions will be applied using the old implementation. All new H2O based models will be trained and applied using the new library version.

This means that old models retrained with this version may produce slightly different results (e.g. model performance) than with the previous one.

Enhancements and bug fixes

The following pages describe the enhancements and bug fixes in RapidMiner Studio 9.7 releases: