You are viewing the RapidMiner Studio documentation for version 9.9 - Check here for latest version
What's New in RapidMiner Studio 9.9.0?
Released: March 24th, 2021
The following sections describe what's new in RapidMiner Studio 9.9.0:
New Features
- Data is the central piece in any RapidMiner process. The way RapidMiner internally deals with data has fundamentally changed in this release with the new Data Core (codename Belt). Its new columnar table representation provides a quantum leap in processing speed and memory efficiency for RapidMiner processes. Multiple operators already use it internally and it becomes fully available now for extension developers to create fast and efficient operators.
- Added a Set Positive Value operator for the new Data Core which can make nominal attributes binominal or change the positive value of binominal attributes
Enhancements
- Replaced the Rename by Example Values operator by a new and improved version
- Replaced the Rename operator by a new one that can additionally handle a renaming dictionary
- Replaced the Sort operator by one that can sort by multiple attributes (currently already part of the Operator Toolbox extension)
- Improved the FP-Growth operator so that it only works with explicitly defined positive values (either via binominal attributes or the positive value parameter) for items in dummy coded columns
- Improved memory consumption of Cross Validation in certain circumstances
- The operators Read CSV and Read Excel were improved to use the new data core
- Pivot now supports Least and Mode aggregations for numerical attributes as well
- Annotate now adds the annotations to the meta data as well
- Added warning when trying to run a process on an AI Hub with a lower feature version than the current Studio version
- Added a reason when displaying incompatible extensions in the dialog after startup to show why an extension failed to load. Details available via tooltip.
- Upgraded integrated Chromium to version 84
- Improved some metadata transformation w.r.t. nominal value sets
- The splashscreen no longer shows duplicate extension icons during startup if more than one copy of an extension is installed
- Visualizations now also support Least and Mode aggregations for numerical attributes
- Improved concurrent execution in some corner cases
- Deprecated the Exchange Roles operator
- Model viewer for Gradient Boosted Tree models now respects the Number format settings in Studio preferences
- Auto Model uses new clustering algorithms which no longer require one-hot encoding on the data set and therefore reduce the memory footprint for data sets with nominal columns with many values. As a result, users can no longer specify the minimum number of clusters in the X-Means case (automatic determination of the optimal number of clusters). The minimum is now fixed at 2.
- Time Series: Added the option to ignore invalid values to the Moving Average Filter operator: Invalid values (missing, positive and negative infinity are now ignored when calculating the filtered value
- This also results in valid values at the beginning and end of the filtered time series
- As the Classic Decomposition and the Function and Seasonal Component Forecast are based on the Moving Average Filter, the also have now the "ignore invalid values" option
Bugfixes
- Fixed Data Table reading/writing when LFS light checkout is enabled
- Fixed a problem where an uncaught exception could go through when using date/time attributes with values in the far future/past
- Fixed an uncaught exception that could happen when the process run via Execute Process failed, the user opened it via the popup and ran it directly after fixing the problem
- Fixed wrong attribute weights for Random Forest regression
- Fixed error in Store operator when used after application of k-Means model
- Fixed issue that Save dialogs did not accept any selection if a wildcard (.*) filter was provided (e.g. for Write Document)
- Fixed Pivot meta data column names not matching the real data
- Fixed missing text for the file restoring confirm dialog in projects
- Fixed an issue that could cause Studio startup to silently fail
- Fixed a possible error during startup w.r.t port preconditions on some operators
- Fixed a bug that could cause project creation to not show an error and appear to do nothing
- Removed check for preprocessing models in model deployments for custom models. This has been causing certain grouped models to fail if they contained models which have technically been not preprocessing models (e.g. PCA).
- Time Series: Fixed a bug for the Lag operator, which caused original data to be changed at preceding ports as well
- Time Series: Fixed some small errors in the description of two tutorial processes for Sliding Window Validation
- Time Series: Fixed an error, which occurs in time-based windowing, when the end of the last window is equal to the last timestamp in the input data. This effects all windowing operators (Windowing, Process Windows, Forecast Validation, Sliding Window Validation).
- Cloud Connectivity: File browser now adds the correct path separator character on Windows, and resolves macros properly for AWS, Azure, and Google Cloud file operators
Development
New Data Core
- ExampleSet and ExampleSetMetaData are officially deprecated! From now on, any new operators should be built using Belt Tables (com.rapidminer.belt.table.Table). Obviously existing operators with ExampleSets will continue to work for the time being. See the following resources for help:
Tables/ExampleSets are now retrieved as IOTable from the non-legacy Repositories with TableMetaData as meta data. Something similar to the following will not work anymore:
IOObjectEntry dataEntry = dataLoc.locateData(); if (!ExampleSet.class.isAssignableFrom(dataEntry.getObjectClass())) { return false; } MetaData metaData = dataEntry.retrieveMetaData(); if (!(metaData instanceof ExampleSetMetaData)){ return false; } ... IOObject ioObject = dataEntry.retrieveData(null); if (!(ioObject instanceof ExampleSet)){ return false; } ExampleSet exampleSet = (ExampleSet) ioObject; ...
and should be replaced by
IOObjectEntry dataEntry = dataLoc.locateData(); if (!IODataTable.class.isAssignableFrom(dataEntry.getObjectClass())) { return false; } MetaData metaData = dataEntry.retrieveMetaData(); ExampleSetMetaData esMD = BeltConversionTools.asExampleSetMetaDataOrNull(metaData); if (esMD == null){ return false; } ... IOObject ioObject = dataEntry.retrieveData(null); ExampleSet exampleSet = BeltConversionTools.asExampleSetOrNull(ioObject); if (exampleSet == null){ return false; } ...
The MetaData at ports can now be TableMetaData. All meta data transformations will continue to work since Port#getMetaData() automatically transforms TableMetaData to ExampleSetMetaData but the method has been deprecated and should be replaced by Port#getMetaData(ExampleSetMetaData.class) or Port#getMetaDataAsOrNull(ExampleSetMetaData.class) which automatically converts to the desired class if possible. The new methods are analog to those for data, e.g. Port#getAnyDataOrNull(), which has been already deprecated in 9.4 and should be replaced by Port#getDataAsOrNull(ExampleSet.class) which automatically converts to the desired class if possible. While nothing has changed for the data methods at the ports, there are more operators now that deliver IOTable instead of ExampleSet to ports with 9.9. The operators Read CSV and Read Excel were improved to use the new data core; if you use the corresponding classes CSVExampleSource or ExcelExampleSource in some shape or form, please use CSVTableSource and ExcelTableSource in the future.
Extension logging I18N
Logging now also supports i18n! To do so, follow one of those steps:
- for a RapidMiner Extension: add a LogMessagesXYZ.properties next to where your existing UserErrorMessages.properties etc files are. Only respected by Studio 9.9+, ignored for earlier Studio versions.
- when using the logging module, simply register your LogMessagesXYZ.properties via com.rapidminer.tools.I18N#registerLoggingBundle(ResourceBundle)