Categories

Versions

You are viewing the RapidMiner Hub documentation for version 10.0 - Check here for latest version

Upgrade from AI Hub 9

For a more streamlined approach to migration, including a migration script, see the docker compose-based approach described in Upgrade from 9.10.11 to 10.0.0.

The document that follows describes the migration in more detail, but it does not include a migration script.

Table of contents

You need to be at least on AI Hub version 9.10.4 and use a containerized setup.

This document outlines migration instructions coming from AI Hub version >= 9.10.4 (minimum requirement). By default, the required migration properties need to be enabled explicitly for each application (AI Hub/Server, Job Agent, Scoring Agent). The updated deployment descriptor files do have some reasonable defaults which work out of the box, but some old data sources like old home directories of AI Hub or the respective home directories for Job Agents need to be accessible by the new setup meaning that you need to have the old data and database still available and running during migration.

Please read this documentation in full before starting!

All performed AI Hub 10 migration steps do not alter existing data, but instead copy data from your old instance and its database to the new instance and its new database. This way, a failed migration will not impact your existing installation and in case of an unsuccessful migration, you can easily roll back to the state before AI Hub 10 by simply starting up the old instance.

Keep in mind that migration might take quite some time as data is actually being copied. The time needed depends on the amount and size of all your Projects. Another factor is the amount of jobs, schedules and queues.

This guide will not cover how to set up new Job Agents, but if you had multiple queues and multiple Job Agents or Scoring Agents attached to it before, you need to re-create the same setup in the new deployment adapting the deployment's descriptor files, e.g. the docker-compose.yml and .env files.

The following sections outline steps which are highly advised to do before starting migration to AI Hub/Server 10.

Preparation steps useful in AI Hub 9

AI Hub 10 drops legacy web services, the legacy Repository and Web Apps. Only Scoring Agents, Projects and Dashboards are available. It's recommended that before actually upgrading to version 10, you should already do some pre-migration steps in your current installation. This will ultimately make the migration process easier and more automated later.

The following sections outline potential steps you should do upfront in your old instance. This also has the advantage to start from a fresh and cleaned up baseline for migration and your future work with AI Hub utilizing the full power and synergies of collaboration with Projects.

If you choose not to do the steps pre-migration, post-migration will be more complex.

Moving to Projects from Legacy Repository

Projects are the new default for managing your assets and data. Please refer to the Projects section how to create and add contents to it. This section outlines following two approaches how to move to Projects before starting the major version upgrade to AI Hub 10:

  1. utilizing RapidMiner Studio or
  2. using AI Hub/Server legacy repository dump functionality.

For each of these approaches, your users need to later re-create any vault entries for injected parameters of Connections! Legacy repository and Project vaults are different systems.

In contrast to the legacy repository, Projects are self-contained and designed to encapsulate use cases or a project itself. Please think about how your current legacy repository structure might fit into this concept and keep in mind that cross-Project sharing of assets or data is not supported. In addition, Project access is per Project and not on a file and folder base.

We recommend against moving all contents to a single Project!

There's also an extension available for RapidMiner Studio which helps you to set up a proper Project structure. It's called Projects on Marketplace.

With RapidMiner Studio

  • Create the desired Project structure in AI Hub 9 and assign access rights
  • Connect from RapidMiner Studio 9 to your legacy repository
  • Clone the Projects you've created in RapidMiner Studio 9
  • In RapidMiner Studio 9, copy desired legacy repository contents needed for your Projects to the respective newly cloned Project and create Snapshots to push the contents to AI Hub/Server

Keep in mind that users having assigned injected values for Connections, need to enter their credentials inside the Project's contents interface again. This step is optional, because it can also be done after the migration succeeded.

In addition, you might need to look into each process and ensure that it's using the correct paths for storing and retrieving assets.

With Server dump functionality

This approach comes with the major downside of not dumping Connections, but can still serve your needs to move other assets and data to a Project. Please consider the conceptual changes and what a Project is which has been mentioned above.

  • Create the desired Project structure in AI Hub 9 and assign access rights
  • Go to the Repository web interface UI of AI Hub/Server 9 and
    • Click on Download in the right menu
    • Click on ZIP dump which creates a ZIP file containing the contents of the current folder you're browsing
  • Go to the desired Project's contents page, press on Add Content, select the previously downloaded ZIP file and hit the Add button

As outlined above, Connections are not dumped as this would impose a security risk. Please re-create them manually inside RapidMiner Studio 9 or copy them over as outlined in the previous Moving to Projects via RapidMiner Studio section.

In addition, you might need to look into each process and ensure that it's using the correct paths for storing and retrieving assets.

Moving to Scoring Agents

After moving to Projects, you can create Deployments from the AI Hub/Server web interface when browsing the contents of a Project and add them to your Scoring Agents.

Please refer to the Scoring Agents documentation how to create and deploy deployments.

Moving existing web services to Scoring Agents affects how your process takes input and produces output. For further information, please consult the example creation of a deployment and its process inside the Scoring Agents section.

Moving to Dashboards

Dashboards are backed up by Grafana and can point to any Scoring Agent instance. Please refer to the proper Dashboards and Scoring Agent sections how to create deployments and wire them into Dashboards.

Requirements

  • Your old instance has at least version 9.10.4
  • Your old instance is preferably on a platform/docker setup
  • Ensure that your environment has enough disk space left, because existing source migration data is not altered, but copied
    • Projects data including large files
    • Jobs data from Job Agents
    • Deployment data from Scoring Agents
  • Old home directories and old databases are still available, because migration needs to access them

What data is migrated?

Before jumping into starting the migration, the following table outlines what data is migrated to AI Hub 10.

Application Data Description
Job Agent Jobs The jobs directory (logs and error files distributed to AI Hub/Server for the web interface on demand) of the old home directory will be copied to the new home directory. Existing files will be overwritten.
Job Agent ID file The .id.properties of the old home directory will be copied to the new home directory. Existing files will be overwritten.
Scoring Agent Deployments The deployments directory of the old home will be copied to the home directory. Existing files will be overwritten.
AI Hub/Server Permissions The Permissions for Queues and Projects are migrated. The existence of users and groups is verified in the keycloak service.
AI Hub/Server Queues The Queues are migrated and created in the new system with the migrated permissions. If a Queue with the same name already exists, the Queue will not be migrated. If no suitable owner is found, the admin will become the owner.
AI Hub/Server Projects The Projects are migrated and copied into the home directory into the new system. If a Project with the same name already exists, the Project will not be migrated. If no suitable owner is found, the admin will become the owner.
AI Hub/Server Injectable Values The injectable values are migrated if the secret used for encrypting them is valid.
AI Hub/Server Schedules The schedules are migrated.
AI Hub/Server Job Agents The job agents saved in the old database table JOBSERVICE_JOB_AGENT will be migrated into the new database table.
AI Hub/Server Jobs The jobs residing in table JOBSERVICE_JOB will be migrated except for running jobs. Pending jobs will be scheduled. All related job errors, contexts and logs will also be migrated.
AI Hub/Server Archived Jobs All jobs residing in the database table A_JOBSERVICE_JOB will be migrated. All related job errors, contexts and logs will also be migrated.
AI Hub/Server Users All local users will be migrated and stored inside Keycloak. External users will not be migrated.
AI Hub/Server Groups All groups will be migrated. If the group is mirrored, there will be additional logging as to what external group the local group was mirrored to.

Users and Groups

While migrating users and groups you may encounter some additional logging if the user or group was externally managed meaning by Keycloak or LDAP. If the user has been external, the user won't be migrated. External groups are still migrated and the corresponding LDAP/SAML groups will be logged as well for manual replication of them.

At the end of a successful user and group migration, a result file named migrated_users.csv is created inside the rapidminer-server-home directory. This CSV file includes all users and their "new", temporary passwords. For security reasons old passwords are not migrated. When migrated users log into AI Hub/Server for the first time, they will then be asked to change their password.

What data is NOT migrated?

Legacy Repository contents including vault entries, Web Services and Web Apps are not migrated, because support for them has been dropped. Please refer to the prior section on how to move to the latest concepts with Scoring Agents, Projects and Dashboards.

Start migration

To ensure consistency, it's advised to follow these steps in your old instance.

  1. In the running AI Hub/Server 9 instance
    1. Pause all schedules so that no new jobs are submitted anymore
    2. Ensure that no jobs are running or pending, if so, stop them manually in the Executions page
  2. Shut down AI Hub 9 instance
  3. Make a proper backup
  4. Ensure that your environment has enough disk space, because existing source migration data is not altered, but copied
    1. Projects data including large files
    2. Jobs data from Job Agents
    3. Deployment data from Scoring Agents

Mirroring deployments and application descriptors

The default deployment descriptors like docker-compose.yml of AI Hub 10 assumes one Job Agent and one Scoring Agent.

If you already added additional Job Agents or Scoring Agents in your old instance, your new version 10 deployment descriptor and environment needs to reflect those changes to fully mirror old behavior. Please make changes to the deployment descriptor files accordingly.

You can look up any property inside the respective settings tables for AI Hub/Server, Job Agents and Scoring Agents.

Example

Depending on your previous setup of AI Hub, required environment variables may differ. Please refer to the migration property tables for detailed information about each of them, although the example below should serve as a good starter.

The migration environment variables must be applied to your deployment's descriptor file properly depending on your application you like to migrate.

AI Hub/Server

Please ensure that the old home directory is properly bound/mounted to /migration/old-home in the deployment descriptor file, e.g. rm-server-home-vol:/migration/old-home when coming from the default docker-compose 9.x setup.

In this example, PostgreSQL has been used in the old instance. Please refer to the database properties section for other database types. Ensure that your database probably needs to join the new setup's network in order to be accessible via <address-or-service-name-of-old-database>.

SPRING_PROFILES_ACTIVE=default,migration10,migration10-postgres
MIGRATION_DATABASE_DATASOURCE_URL=jdbc:postgresql://<address-or-service-name-of-old-database>:5432/rm_server
MIGRATION_DATABASE_DATASOURCE_USERNAME=<your-old-database-username>
MIGRATION_DATABASE_DATASOURCE_PASSWORD=<your-old-database-password>

Job Agent

If you have multiple Job Agents, you need to migrate all of them.

Please ensure that the old home directory is properly mounted to /migration/old-home in the deployment descriptor file.

JOB_AGENT_MIGRATION_ENABLE_PRE_X_MIGRATION=true

Scoring Agent

If you have multiple Scoring Agents, you need to migrate all of them.

Please ensure that the old home directory is properly mounted to /migration/old-home in the deployment descriptor file.

RTS_MIGRATION_ENABLE_PRE_X_MIGRATION=true

Advanced properties

The following tables outlines properties which need to be set to invoke migration steps required for AI Hub/Server 10, Job Agents and Scoring Agents in your deployment in detail. You may need those properties to extend the example.

AI Hub 10 only supports containerized setups. You need to make your old-home available to the default location outlined in the following table and that user 2011 as access writes. You can also change the default location, then the mounted volume or host bind needs to point to that changed location instead of the default location.

In addition, your need to set the MIGRATION_ENABLE_PRE_X_MIGRATION property to true.

Application Property Default Description
AI Hub/Server MIGRATION_ENABLE_PRE_X_MIGRATION false (true in migration10 profile) If migration tasks from pre X installations should be carried out
AI Hub/Server MIGRATION_ENABLE_PRE_X_DATABASE false (true in migration10-dbtype profiles) If migration tasks which need a connection to an old database should be carried out
AI Hub/Server MIGRATION_ENABLE_PRE_X_SAMPLE_MIGRATION false If migration of the sample projects should be carried out
AI Hub/Server MIGRATION_PRE_X_HOME_DIR /migration/old-home The home dir of pre AI Hub X installations
AI Hub/Server MIGRATION_PRE_X_REPOSITORY_DIRECTORY /migration/old-home/data/repositories The dir where repositories have been located inside the old home dir of pre AI Hub X installations
AI Hub/Server MIGRATION_DATABASE_DATASOURCE none The datasource configuration of the old AI Hub database. See documentation of main application database
AI Hub/Server MIGRATION_DATABASE_JPA none The JPA configuration of the old AI Hub database. See documentation of main application database
AI Hub/Server MIGRATION_QUARTZ Pre X default values The Quartz scheduler configuration of the old AI Hub database. If changes were done in the old scheduler configuration they need to be transposed to here as well
Job Agent JOB_AGENT_MIGRATION_ENABLE_PRE_X_MIGRATION false If migration tasks from pre X installations should be carried out
Job Agent JOB_AGENT_MIGRATION_PRE_X_HOME_DIR /migration/old-home The home dir of pre AI Hub X installations
Scoring Agent RTS_MIGRATION_ENABLE_PRE_X_MIGRATION false If migration tasks from pre X installations should be carried out
Scoring Agent RTS_MIGRATION_PRE_X_HOME_DIR /migration/old-home The home dir of pre AI Hub X installations

AI Hub/Server

At the start of the application, when using a deployed docker image, migration will be performed. The migration will use the current home directory of AI Hub, which is set from the AIHUB_HOME_DIR environment variable. Migration tasks, which take care of the migration from a pre AI Hub X home dir will only be executed when the MIGRATION_ENABLE_PRE_X_MIGRATION environment variable is set to true. The default dir of MIGRATION_OLD_HOME_DIR can be used as a mounting point for old home directory. Migration tasks, which use the old database will only be carried out, if MIGRATION_ENABLE_PRE_X_DATABASE is set to true.

Database

For migration steps, which require the old database, the connection details can be defined with the known Spring Data values. You can set these values as environment variables before starting the migration. Because the database configuration varies between systems, no default values for the connection properties are provided.

If any changes were done to the Hibernate configuration of pre X AI Hub installations, you need to make sure to transpose them to the MIGRATION_DATABASE_JPA_PROPERTIES_HIBERNATE configuration

Property Description
MIGRATION_DATABASE_DATASOURCE_URL The complete URL of the data source, e.g. jdbc:mysql://localhost:1456/velox?useSSL=false&useUnicode=yes&characterEncoding=UTF-8&allowPublicKeyRetrieval=true
MIGRATION_DATABASE_DATASOURCE_USERNAME The username of the database user
MIGRATION_DATABASE_DATASOURCE_PASSWORD The password of the database user
MIGRATION_DATABASE_JPA_PROPERTIES_HIBERNATE_DIALECT The HHibernate dialect of the old system, e.g. org.hibernate.dialect.MySQL57InnoDBDialect

Please ensure that the old database joins the deployment’s network in order to be accessible, otherwise migration will fail.

Migrating from AI Hub/Server 9 instances using Oracle as database is currently not supported yet.

Depending on your old database, you need to enable different profiles with the SPRING_PROFILES_ACTIVE environment variable.

Ensure to always provide the default,migration10 and in addition your database type. Please see the example for a full setup of migration.

Property Profile name
Main migration invoked migration10
Postgres migration10-postgres
MySQL 5.7 migration10-mysql-5-7
MySQL 8 migration10-mysql-8
MSSQL 8 migration10-mssql-8

Users, groups and permissions

The migration of users, groups and permissions require the new Keycloak instance and step uses the service account of aihub-backend.

You need to make sure that is has realm-management -> manage-users (for creating groups and users during migration) role. Please refer to the special roles section for more information.

By default, the role should be applied to the service account.

Job Agent

At the start of the application, when using a deployed docker image, migration will be performed. The migration will use the current home directory of the Job Agent, which is set from the JOBAGENT_HOME_DIR environment variable. Migration tasks, which take care of the migration from a pre AI Hub X home dir will only be executed when the JOB_AGENT_MIGRATION_ENABLE_PRE_X_MIGRATION environment variable is set to true. The default dir of JOB_AGENT_MIGRATION_PRE_X_HOME_DIR can be used as a mounting point for older home dirs.

Scoring Agent

At the start of the application, when using a deployed docker image, migration will be performed. The migration will use the current home directory of the Scoring Agent, which is set from the SCORING_AGENT_HOME_DIR environment variable. Migration tasks, which take care of the migration from a pre AI Hub X home dir will only be executed when the RTS_MIGRATION_ENABLE_PRE_X_MIGRATION environment variable is set to true. The default dir of RTS_MIGRATION_PRE_X_HOME_DIR can be used as a mounting point for older home dirs.

Post migration

When the new upgraded version has started up and migration has been completed, there might be some steps you need to perform manually.

Legacy repository contents

If you have not moved to Projects in your old instance before starting the migration process, you'll notice that there are no legacy repository contents available in your new instance. Those are not migrated as support for legacy Repository has been dropped.

It’s advised to move to Projects before migration, but you can also migrate contents manually afterwards by utilizing different RapidMiner Studio versions, although this manual migration step imposes more work.

Please read through the Moving to Projects from legacy repository section to see the differences between Projects and the legacy repository on a conceptual level.

Depending on your setup and the availability of your old instance in parallel to the new instance, steps needed slightly differ, but you can achieve the same result by performing the steps sequentially.

During the migration of legacy repository contents, please keep in mind that RapidMiner Studio 10 can only connect to AI Hub 10 instances and RapidMiner Studio 9 can only connect to AI Hub 9 instances. This means that you need to have different RapidMiner Studio versions installed on your system to perform the manual migration step.

  • In RapidMiner Studio 9 and with the old AI Hub/Server 9 instance being up and running
    • Create a new Repository in RapidMiner Studio 9 (local, not connected to an AI Hub/Server instance)
    • Connect to the old AI Hub 9 instance
    • Copy contents to the freshly created Repository, so you have everything on your local disk
  • Create necessary Projects in AI Hub/Server 10
  • Switch to RapidMiner Studio 10 and
    • check out desired Projects
    • copy contents from (local) repository to the Project

Connection injected parameters and vault entries

If you have been working with Projects and created Connections containing injected parameters, your existing vault entries are migrated seamlessly.

If you have been working mostly with the legacy repository where Connections reside in the top-level global /Connections folder, then you either moved to Projects before or after the major version upgrade. In any case, chances are high that either your Connections are not present if you've used the dump approach, and thus you need to re-create them through RapidMiner Studio 10 with your AI Hub/Server 10 version running or that your Connections have been moved to Projects before. If latter is the case, then you need to manually ask users to add their values for any injected parameters, although they should quickly notice as process execution will fail if required values are not present in the Projects' vault.

Additional configuration and system settings

RapidMiner AI Hub/Server 9 System Settings web interface has been dropped in version 10. Its main responsibility was to configure Web Service and Web Apps in-application execution and additionally some parameters of Server configuration itself. Starting from version 10, configuration has moved to an environment variables based configuration approach for changing behavior of AI Hub/Server. This means, that all properties containing rapidanalytics are not supported anymore.

Please refer to the list of available environment variables in the configuration section of AI Hub/Server 10 what matches your explicitly changed system settings.

In contrast to the aforementioned properties, RapidMiner Studio related properties like rapidminer.python_scripting.python_binary or rapidminer.proxy.mode can be modified for AI Hub/Server, Job Agents and Scoring Agents by changing their rapidminer.properties file inside the respective application's home directory or volume.

Scoring Agents provide web service functionality. For securing them or allowing anonymous access, please refer to the Scoring Agent's documentation section.

Moving to Keycloak and adapting mirror groups

When you have been on a native installment of AI Hub/Server 9 and have been using direct LDAP or SAML integration, you need to look into the migration.log file for further instructions, e.g., mapping of mirror groups are not created automatically. Please also look into the users and groups section and cross-check if this matches your setup.

If you're already on the platform docker-based setup, then Keycloak migration is done automatically to adapt to changes in special roles and clients. You should cross-check configuration of externalized Identity Provider integration nevertheless.

Clean up deployment descriptor files

After migration succeeded, you can remove all related environment variables from your deployment descriptor file, they're no longer needed, e.g. SPRING_ACTIVE_PROFILES or any environment variable starting with MIGRATION_.

In addition, you can shut down your old instance and old database.