You are viewing the RapidMiner Studio documentation for version 9.6 - Check here for latest version
What's New in RapidMiner Studio 9.3.0?
Released: May 28th, 2019
The following describes the bug fixes in RapidMiner Studio 9.3.0:
New Features
Completely reworked how connections (JDBC, as well as any other connections like Twitter, Amazon S3, Dropbox, etc.) work:
- Connections are now self-contained and stored per repository. This means that when you create a connection, everything you need to use it will become part of the connection entry in the repository.
- We have added great flexibility when it comes to injecting certain settings of a connection on-the-fly by having added so called Sources for values. The settings can be anything from credentials, to URLs (or part of URLs), and other parameters. For starters, only Macro and RM Server Vault are available as Sources, but the list will grow over time as any extension can add their own Sources!
- Have a central DB connection where each user should use his own credentials? Create the single connection template on RM Server, indicate that the credentials are injected, and then use our new RM Server Vault as a Source where each user can securely store their credentials!
- You can now easily share a connection with your colleagues via a Server.
- They will also work on any execution node, without you having to manually add the JDBC driver to all nodes yourself.
- To sum it all up, connections are now vastly more powerful than before. They are no longer necessarily statically defined, but instead they can be dynamically altered during runtime to grab the latest credentials, tokens, etc. Of course, you can still put everything that is needed into the connection and be done with it.
- Not all features of the new connections are accessible through a UI. For extremely advanced and powerful features like chaining different value providers for injection (e.g. Server Vault → CyberArk → DB) or using (injectable) placeholders that build up values of other keys, administrators can create the connection manually (it's a ZIP archive, after all). They can create the configuration JSON to suit their needs, and then upload the ZIP to RM Server. This, together with the injection mechanism, makes connection templates a reality, allows admins to manage connections at scale, utilizing commandline tools to build up and distribute them.
- The entire mechanism for connections and their Sources is highly extensible and new Sources and connection types can easily be added by extensions. We foresee a whole host of new connections and Sources to become available over the next few months.
Auto Model can now be executed on RapidMiner Server instead of locally
- Users can select if the execution should happen locally in RapidMiner Studio or if processes should be pushed to a connected Server instead. The latter allows to close RapidMiner Studio and fetch the results later from the Server instance.
- Jobs can be added to any queue the user has access to.
- Results will be stored on the Server and can be loaded back into Auto Model after completion. Loading of partial results is supported as well.
- If RapidMiner Studio is kept open while the execution happens on the Server, results will be loaded dynamically, and the progress is shown. The execution of all remote processes can also be stopped in this case.
Time Series Analysis features:
- New Default Forecast Model
- Predicts always the same forecast value for all future values
- Can be used as a baseline model to compare other forecast models against it
- New operator Default Forecast
- Trains a Default Forecast Model
- The forecast value can be calculated by last value, mean in window, median in window or mode in window
- Last value and mode in window can be used to even create a forecast model for nominal time series
- New function and Seasonal Forecast Model
- Predicts future values by evaluating a polynomial function to forecast the trend of a time series
- Adds or multiplies the values of the seasonal component to the forecasted trend values
- New operator function and Seasonal Component Forecast
- Trains a function and Seasonal Forecast Model
- The operator performs a decomposition (Classic- or STL Decomposition) to determine trend and seasonal component of the input time series
- A polynomial function is fitted to the trend component
- The function and the seasonal component are provided as the function and Seasonal Forecast Model to the model output port
- New operator Autocorrelation / Autocovariance
- Calculates dependency functions (autocorrelation function, autocovariance, partial autocorrelation function) for an input time series
- New Default Forecast Model
Enhancements
- Write Excel now supports creating multiple sheets. Sheet names can be specified via the sheet names parameter
- Write Excel now supports collections of example sets as input
- Added Close all other results action for Result tabs, found in the right-click popup menu
- Improved handling of mandatory parameters that were not set
- Meta data from repository entries loaded by Retrieve operator are annotated with the repository location
- Added forward macro checkbox to Schedule Process which allows you to forward all current macros from the calling process to the scheduled process
- Write Database now defaults to a batch size of 100
- The operators Map, Replace and Rename by Replacing now have a more convenient regex dialog that can store the replacement value as well
- Added new function under Advanced functions named attribute(Nominal attribute_name) to the expression parser. This function evaluates the input and retrieves the value of the attribute with the name specified by the (resolved) input.
- Added a new option Insert as attribute for inserting macros in the UI of the expression parser (e.g. for Generate Attributes).
- Improved meta data for Nominal to Binomial for attributes where the nominal mapping is not clearly defined
- Explain Predictions now offers the calculation of model-specific global weights based on the level of support and contradiction each attribute value contributes to the local explanations
- Turbo Prep now uses the new visualizations for its Charts view
- Auto Model now tracks more run times, including the time needed for scoring 1,000 rows and training the model on 1,000 rows in addition to the total process execution run time. The overview table also show small badges pointing out the best and fastest models
- Auto Model now offers to save all results at the end of a local execution. Those results can be loaded instead of re-running the modeling
- Auto Model now offers a list of recent data sets as well as a list of recent results as part of the first step
- Auto Model now offers to override the selection of columns for text processing
- Auto Model now shows the number of created models, the number of evaluated feature sets, and the number of generated features during a run
- Auto Model now shows the importance of all attributes for each model in addition to the model-independent global weights in the General section of the results
- Visualizations: Bubble charts (Scatter with a size column) can now display more than 5,000 data points
- Visualizations: Scatter3D now also supports a numerical color column
- Visualizations: Scatter Matrix now also supports a numerical or date_time color column
- Visualizations: Added the highly requested color group option to line/bar/column/area/streamgraph plots. Each distinct value in this column becomes an individual plot element, to allow for easy logical grouping of data without pivoting. The column can be of any type.
- Visualizations: Aggregation group-by now also supports numerical columns, it will take each distinct number and convert it to a category
- Visualizations: If the group-by column is numerical or date-time, the groups are now sorted in ascending order
- Visualizations: X-Axis column and aggregation group-by column are now linked, i.e. changing one also changes the other. This makes switching between aggregation/no aggregation more intuitive and easier to follow
- Moving Average Filter now offers to specify the left and right side of the simple filter individually instead of being symmetric
- Improved operator help for Loop Examples
- Added a positive class parameter to Performance (Binominal Classification) which lets the user manually decide what the positive class is.
- Visualizations: Heatmaps with aggregation enabled can now also be grouped by two columns at the same time, resulting in a 2D table-like structure with cells for each value combination of the two group-by columns. If you want to plot multiple value columns, you can still group by a single column as before.
- Copied/pasted operators that have references to other copied operators will now correctly update their parameters.
- When replacing an operator in place, parameters that are shared between both operators are kept.
- Repository entry copies are now simply enumerated at the end of their name instead of suddenly starting with "Copy of". This will make finding the copy in large repositories much more straightforward
- Repository entries can now be directly copied in-place without having to select a target folder first
- Updated default Oracle jdbc driver class
Bugfixes
- Fixed a rare bug in Log operator where a process seemingly was not stopping when it was done
- Fixed a rare bug that could freeze the UI
- Switching tabs is now only possible with a left-click
- Fixed schema retrieval in the parameters for some databases (e.g. MySQL)
- Fixed rare exception in automatic sparsity detection when creating example sets via the new data core
- Fixed error that could occur when starting Studio in relation to Academy Global Search entries
- Fixed error message display in expression property dialog for very long errors
- Fixed Real to Integer when encountering infinity values
- Fixed a bug in Compare ROC that deleted prediction/confidence columns in the input example set in some cases
- Fixed handling of non-finite values for integer and real column grouping attributes in Pivot
- Fixed UI becoming broken when the macro sort order in the Context panel was changed, an empty macro was already in the context, and the user tried adding another macro
- Fixed a problem that could result in Studio endlessly starting when switching between Win32 and Win64 versions on the same machine
- Fixed links to educational materials in Auto Model and Turbo Prep
- Fixed rare bug which could occur for Automatic Feature Engineering if feature generation was enabled with high complexity settings in combination with H2O models
- Visualizations: OS X 10.11 will now have working HTML5 visualizations again
- Visualizations: Fixed matrix data (e.g. correlation matrix) visualizations showing the wrong chart type
- Visualizations: Fixed Scatter3D dots sometimes not being displayed
- Fixed rare cases that no correct Exception was thrown in Extract Aggregates, Extract Mode and Extract Coefficients (Polynomial Fit)
- Fixed expected input for the inner 'model' port of Forecast Validation
- Fixed run-time problems in Replace Missing Values (Series)
- Fixed the Retrieve operator to update output meta data after a repository entry was removed or created
- Remove Unused Values now also sorts mappings that do not have unused values
- Link button icons no longer look pixelated on macOS
- Visualizations: Wordcloud now takes actual number of distinct words into account for the limit check, instead of also counting words that do not actually occur
- Dialog about cancelling Progress threads with dependent tasks is now shown in front of the Progress dialog
- It can no longer happen that Progress threads are still displayed in the Progress dialog even if they are already done
Development
- Added SwingTools#setPrompt(String, JTextComponent) method which can be used to set a prompt in a text field (gray help text displayed when the field is empty)
- Added com.rapidminer.gui.actions.CopyStringToClipboardAction which can be used to copy any dynamically supplied string to the system clipboard
- Added com.rapidminer.gui.ProgressThread#setDependencyPopups method to prevent popups that ask about aborting Progress threads with dependent tasks