You are viewing the RapidMiner Studio documentation for version 2024.0 - Check here for latest version
Connect to your data
Introduction
To be effective as a data science tool, Altair AI Studio has to first connect to your data.
- If the data is in a file on your computer, Altair AI Studio has to read the file format.
- If the data is in a database, Altair AI Studio has to connect to that database, and know the language of that database (SQL / NoSQL).
- If the data is in the cloud, Altair AI Studio has to connect to the cloud service and know its API.
- If the data is imported from or exported to another software tool, for example Python or Tableau, Altair AI Studio has to know about that tool.
- If the connection is via a proxy or a self-signed SSL certificate, Altair AI Studio has to navigate that hurdle.
The good news is that Altair AI Studio supports a wide range of file formats, databases, cloud services, and other software tools, either natively or via extensions.
Methods
Read more: Database best practices
When processing remote data using Altair AI Studio or Altair AI Hub, there are, broadly speaking, three different approaches:
You can download the data from the remote repository and process it in AI Studio / AI Hub, before writing it back again, using operators such as Read Amazon S3, Read Google Storage, Write Database, and Write Salesforce.
You can process the data within the remote repository, using the repository's native tools:
The Execute SQL operator executes SQL directly in a remote database.
The In-Database Processing extension allows you to create native AI Studio processes that are automatically translated into the query language of the remote repository, whether it be Google BigQuery, Oracle, Snowflake, on any other of the supported services.
Connection Objects
When the connection to your data occurs over a network, you must first create a connection object. A connection object enables the connection to a database, cloud, or email service. All connection objects are stored in a repository, in the Connections subfolder.
From now on, we'll simply call them connections, remembering however that they have similarities to other objects in the repository. You can, for example, drag a database connection into the Process Panel to Retrieve it, before connecting the output to the Read Database Operator.
To create a connection, right-click on the Connections folder, and select Create Connection. The Create connection dialog opens, and you can configure your connection. If you're connecting to an SQL database:
Choose the Connection Type ( Database), Repository (where the connection will be stored) and Connection Name.
Press Create and the Edit Connection dialog opens.
Under the Setup tab, select the Database System and fill in User, Password, Host, Port, and (optionally) the Database name.
Press Test connection. Once it's working, Save the connection. The connection will appear in the
Connections subfolder of the repository you selected in step (1).
You can view the connection details at any time by double-clicking on the connection in the Repository Panel, or by right-clicking on the connection and choosing Open or Edit.
Injected parameters: sharing connections
Connection objects can be shared.
Suppose that a group of users has access to the same database, and they collaborate on Altair AI Hub. Can they share the database connection, without sharing their usernames and passwords? The answer is yes!
The solution is to build the connection as a template, where all the common parameters are pre-filled, and all the parameters unique to each user are injected. The values of the injected parameters are not stored in the connection object, but retrieved from an external source every time the connection is used. Possible external sources include macros and secure storage on Altair AI Hub.
To create a connection in an Altair AI Hub repository, or to copy a connection to an Altair AI Hub repository, a user has to belong to the connection manager group. See Sharing and permissions.
In outline, assuming the database credentials will be securely stored on Altair AI Hub, the whole process of using a connection template might proceed as follows. We'll call the user with the connection manager role the admin.
Within Altair AI Studio, the admin creates a connection in an Altair AI Hub repository. While it's possible to create a connection in a local repository, that connection will only provide macros as an injection source.
While editing the connection, the admin presses the button Set injected parameters and selects the parameters whose values will be left blank until later (e.g. User and Password). The admin must also choose Altair AI Hub as the source of the injected values.
To set the injected values, a user must connect to the web interface of Altair AI Hub. Either click the link displayed in the Edit connection dialog
or connect directly to the web interface, then navigate to Repository > Connections, and identify the connection by name. A warning says: This connection has missing values. The user clicks the link, fills in his or her own username and password, and presses the button Save in Altair AI Hub, where the credentials are securely saved. Step (3) needs to be repeated by each individual user.
For more details, read the Altair AI Hub documentation Create connections and Usage and injection.
Macros as a source of injected parameters
Within Altair AI Studio, using values from process macros for your connection settings is immediately possible. When editing a connection, press Set injected parameters and choose which parameters should get values from macros. The macro name then needs to match the parameter key to be able to inject that value. The parameter key can be found in the information next to the parameter.
Configuration for the macro source is optional. Without configuring a prefix, the macro name has to match the parameter key. If the prefix for the configuration is given, the macro name has to match the prefix followed by an underscore (_
), ending with the parameter key. For the prefix myprefix the parameter key user would require the macro name
myprefix_user
The macro that should be used will be shown when setting injection, as well as in the view and edit dialogs themselves.
Use this for your macro to properly inject it into the connection.
Placeholders
Placeholders can be used inside any configuration parameter's value to reference other parameters. It is possible to concatenate placeholders and free text. Nesting of placeholders is not supported.
Since the syntax for placeholders is the same as for macros, it is important to make the context clear:
- The context for macros is processes.
- The context for placeholders is connections.
A placeholder can access parameter values from the current tab as well as from any other tab. To find out the key of a field you want to reference via placeholder in a different field, look at the information tooltip of the original field. The Full key is what you're looking for:
To use this placeholder in another field, simply reference the full key in the other field by surrounding it with a percentage sign (%
) and curly brackets ({}
), like this:
%{db_config.database}
If a placeholder cannot be resolved, it is simply replaced with an empty string, but still counts as an injected value and will not fail the process execution.
The JDBC based database connections use this mechanism to create the URL from the parameters.
Without parameter information the URL consists of several placeholders and a double colon. By setting the parameters these values are replaced.
Use the placeholder system exactly like this to configure dynamic parameter values.