Categories

Versions

You are viewing the RapidMiner Studio documentation for version 9.9 - Check here for latest version

The Design View

See also the video introduction to the RapidMiner GUI.

RapidMiner Studio is a visual workflow designer for predictive analytics that brings data science and machine learning to everyone on the analytics team.

When you're working on a new project of any kind, often the first step will be to go to a whiteboard, where you will plan the workflow and identify the key steps on the way to your goal. If you're a data scientist, the workflow will usually include one or more of the following steps:

  • Import data
  • Prepare data
  • Build a model
  • Validate the model
  • Apply the model

RapidMiner Studio implements your whiteboard workflow in software, in the Design View. The Design View includes numerous panels.

  • Data, processes, and results are stored in the Repository.
  • The essential elements of every workflow are called Operators.
  • Operators are connected via ports. The output of the first is passed as input to the second.
  • A connected set of Operators that help you to transform and analyze your data is called a process.
  • The behavior of an Operator can be modified by changing its parameters.
  • The behavior of an Operator can be understood by reading the Help.

Each of these terms will be examined in more detail below.

The default view

Process

Process: A connected set of Operators that help you to transform and analyze your data.

Also known as: flow, program, pipeline, diagram

Your goal is to create a finished process, a connected set of Operators that produce a result. For example, your process might read a data set and build a predictive model. When you have connected all your Operators and set their parameters, press the Run Run arrow button at the top of the user interface, and the results will be displayed in the Results View.

As discussed in Run a Process, there is more than one way to run your process. You can run it:

As your processes grow in size, you will need some way to manage their complexity.

To save your process to a Repository, select File > Save Process from the main menu.

You can easily share a process by first exporting it to an XML file:

  • to export the process, select File > Export Process. The export dialog allows you to save the file as .rmp or .xml; in reality, both these file formats are identical (XML).
  • to import the process, select File > Import Process.

Ports

To build a process, you must connect the output from each Operator to the input of the next via a port. To connect two ports, click on them. Hover a port to see a tooltip with additional information. When connecting two Operators, you need to make sure that the output port of the first is compatible with the input port of the second, or you will get an error message. The input and output ports for each Operator are described in the Operator Help, and a complete list of ports is given in the Getting Started Glossary.

If you want to see the results in the Results View, you must connect the last Operator in a process to the results port ("res") on the right side of the Process Panel.

Hint: double-click on an output port, and it will be connected to the next available results (β€œres”) port.

The following example shows a simple process, where the data from an Excel file is (1) read, (2) stored in the Repository, and (3) displayed in the Results View.

A simple process

The screenshot below shows a more complex process, generated by Auto Model.

A more complex process

Repository

Repository: your central data storage entity. It holds connections, data, processes and results, either locally or remotely.

Also known as: folder, workspace, project

When working with RapidMiner Studio, you need a place to save your work. The Repository can be used to store:

  • connections
  • data
  • processes
  • results
  • arbitrary files (as of RapidMiner Studio 9.7)

Note that as of RapidMiner 9.7, the concept of Repository has several distinct types:

Legacy Repository For RapidMiner Studio 9.6 and earlier, this was the standard Repository, with no version control and a limited set of file types. You cannot create a legacy Repository, but you can still use any legacy Repositories you created previously, or migrate them to a new Repository.

A remote Repository on RapidMiner AI Hub is still of this type.
Remote Repository Each instance of RapidMiner AI Hub had and still has a unique central Repository, with the same characteristics as RapidMiner Studio's legacy Repository: no version control and a limited set of file types.
Repository (local) As of RapidMiner Studio 9.7, a local Repository supports arbitrary file types, but not version control. A Repository of this type has no counterpart on RapidMiner AI Hub.

When you double-click a non-RapidMiner file in a Repository, it opens in the standard application for that file type. When you drag a non-RapidMiner file into the Process Panel, an Open File Operator is created. Extensions, such as the Python Scripting extension, may provide additional functionality, so that dragging a Python script into the Process Panel creates an Execute Python Operator.
Project As of RapidMiner Studio 9.7, a Project supports both version control and arbitrary file types. It behaves in the same way as a Repository, but with the addition of version control.

As discussed in the Projects documentation for RapidMiner Studio and RapidMiner AI Hub, a Project always has both a local component and a server component, and the two are regularly synchronized.

Repositories are the natural place to store your connections and processes (File > Save Process); whether you save your data, results, or other files in a Repository depends on your use case.

To start with, your data probably lives in a file or a database. RapidMiner Studio provides numerous Operators to help you import your data e.g., Read Excel or Read Database. To launch the Import Data wizard, click on Import Data in the Repository Panel, or select File > Import Data from the main menu.

Given the data and the process, your results can always be regenerated, but there might be good reasons to store the results in a Repository:

  • If the result is a complex model based on a large data set, regenerating it will take time.
  • If you are running a process on RapidMiner AI Hub, you will need to store the results (using the Store Operator), because RapidMiner AI Hub has no equivalent to the Results View in RapidMiner Studio.

Bundled Repositories

For both new and experienced users, the Repositories bundled with RapidMiner Studio are an invaluable resource. They provide a wide array of sample data sets and sample processes, together with explanatory notes.

  • The Training Resources Repository is a large set of data sets and processes designed as a companion to the training lessons and courses at the online RapidMiner Academy. We encourage you to take advantage of these free courses to practice your skills.
  • The Samples Repository contains additional data sets and processes, including Time Series examples under Samples > Time Series.
  • The Community Samples Repository is a special collection of data sets and processes published by our top users in the RapidMiner User Community, not by the RapidMiner team. Each process contains a header naming the author, giving a brief explanation of the purpose, and linking to the thread from which it originated. We encourage users to read the online conversation that accompanies each process before using it, because processes in this Repository do not necessarily run as published.

Operators

Operators: The elements of a Process, each Operator takes input and creates output, depending on the choice of parameters.

Also known as: function, formula, node

To use RapidMiner Studio effectively, you have to learn about its Operators. RapidMiner Studio includes hundreds of Operators, and therefore a large part of the task is learning how to find what you need. As so often with search, there are two major strategies: hierarchical search and keyword search. The RapidMiner Community is also a source of support.

To verify that the Operator you have found has the functionality you expect, read the Help.

Once you've found the Operator you want, there are at least 3 ways of getting it into the Process Panel.

  1. Drag-and-drop the Operator

  2. Double-click the Operator

  3. Right-click the Operator, and choose Insert Operator from the context menu.

Hierarchical search

The hierarchy of folders in the Operators Panel reflects a typical data science workflow:

  • Data Access
  • Blending
  • Cleansing
  • Modeling
  • Scoring
  • Validation
  • Utility
  • Extensions

By opening these folders and their subfolders, you will get some insight into what's available.

This same hierarchy can be examined on the docs website, which includes the Help for each Operator.

Keyword search

The alternative is keyword search. Although the Operators Panel includes a search field, the recommended procedure is to use the global search, in the upper right corner of the user interface. The global search finds not just Operators, but data and processes from the Repository, extensions from the Marketplace, and even actions you can take from the menu!

Hint: when you hover an Operator displayed by the global search, the Help for that Operator is displayed immediately in the Help Panel. If you first maximize the Help Panel, you can quickly scan the Help pages for all the Operators that appear in your search.

Community search (Wisdom of Crowds)

If you've started building a process, and you're looking for hints, the "Wisdom of Crowds" can be helpful. The "Wisdom of Crowds" is an opt-in recommender system, based on the usage pattern of other RapidMiner users. It predicts which Operators you might need, based on the Operators that are already included in your process. To activate it, click on the button that says Activate Wisdom of Crowds. You can activate it or deactivate it at any time via the menu item Settings > Preferences > Recommender > Enable operator recommendations.

If you still can't find what you are looking for, the RapidMiner Community can probably help. RapidMiner's data science team actively contributes.

Parameters

Parameters: Options for configuring the behavior of an Operator.

The content of the Parameters Panel is context-dependent. Select any Operator that is displayed in the Process Panel, and the Parameters Panel displays the options for configuring that Operator. Because RapidMiner Studio includes many Operators, each with its own unique functionality, the range of parameters is also quite diverse. By default, RapidMiner Studio will show you only the more commonly used parameters. To see all of the available parameters, click Show advanced parameters.

To understand the parameters, you need to learn more about the Operator; reading the Help for that Operator is probably a good place to start. Alternatively, hover the information icon πŸ›ˆ next to the parameter of interest, and a help text is displayed.

Help

Help: Displays a help text for the current Operator.

The content of the Help Panel is also context-dependent. Select any Operator that is displayed in the Process Panel, and the Help Panel displays a help text for that Operator. The Help Panel provides useful background information, including:

  • An overview of the Operator, its purpose, and its functionality
  • A description of the Operator's input and output ports
  • A description of the Operator's parameters
  • One or more examples, in the form of a Tutorial Process

Within the Help Panel, clicking on an example immediately opens the associated Tutorial Process in RapidMiner Studio, so that you can examine a relevant application.

All of the Operator help texts provided within RapidMiner Studio are also available online.

Reconfiguring the Design View

To restore the Design View to the default panel setup, select View > Restore Default View.

To optimize your screen real estate, you might consider reorganizing the panels. Notice first that you can right-click the tab connected with any panel, and select one of the following:

  • Detach - The panel is detached from RapidMiner Studio.
  • Maximize - The panel fills the entire space allotted to panels.
  • Close - The panel is removed from the user interface.

If you need more space to read the Help, for example, you can maximize the Help Panel, then click the panel tab a second time to restore the user interface to its original state. If you don't really need a panel, you can close it. A closed panel can be restored via the menu items under View > Show Panel.

Additional configuration is available via drag and drop:

  • Panels can be resized
  • Panels can be moved
  • Panels can be displayed as tabs.

The table below summarizes the available panels, most of which are not displayed, e.g., the XML Panel, that displays an XML representation of your process. The panels displayed in bold are the default panels. To display a panel from this list, select it from the menu under View > Show Panel.

Panel Description
App Objects Simulate a RapidMiner AI Hub App environment
Background Monitor Manage background processes and results
Context Advanced process settings and macros
Data editor Offers spreadsheet-like data manipulation
Help Documentation for the selected Operator
Log View recorded events
Macros Live overview of defined macros
Operators All Operators available to add to your process
Overview A zoomed out overview for huge processes
Parameters Configure Operator behavior in your process
Problems View potential problems in your process
Process Create and design your process here
Repository Manage your data and processes
Resource Monitor Displays the RAM currently used
Result History A history of all process results this session
AI Hub Monitor Processes running on RapidMiner AI Hub
Snapshot History Show the Snapshot History and status of Projects under version control
Tree A tree representation of your process
XML An XML representation of your process

To restore the Design View to the default panel setup, select View > Restore Default View.