Categories

Versions

You are viewing the RapidMiner Deployment documentation for version 9.8 - Check here for latest version

Docker image for RapidMiner Server

The documentation below describes the following Docker image:

  • RapidMiner Server (rapidminer/rapidminer-server:latest)
  • RapidMiner Server with Python Environment Manager support (rapidminer/rapidminer-server:latest-python)

This image is available for download on Docker Hub.

Description

This image contains a RapidMiner Server instance, which lets you collaborate and deploy to production, using a scalable architecture. It needs at least one connected Job Agent to be able to execute RapidMiner processes.

For available versions, please see the tags:

We created a self-contained deployment for trial and testing purposes, containing an embedded Job Agent and an database. You should only use this to get a feel for RapidMiner Server and its features. Read the specifics of a self-contained deployment below.

Bootstrap procedure

RapidMiner Server persist the user configuration in a Database and on the filesystem in the RapidMiner home directory. The image can be used with and also without existing persistent configuration. We implemented an optional initialization procedure and a mandatory startup procedure, that is executed on every container startup.

  • Starting without existing RapidMiner home volume mounted (empty mount point) and empty database will trigger the initialization procedure:

    • Initializes the mounted empty volume with the default RapidMiner Home folder content
    • If provided, then checks the externally provided JDBC drivers (/jdbc-drives volume mount), the JDBC_JAR_FILENAME and JDBC_DRIVER_CLASS parameters.
    • Based on the optionally provided JDBC settings and/or DBTYPE parameter configures the JDBC driver module and Data Source definition in the RapidMiner Server / JBoss. We ship default JDBC drivers for PostgreSQL, Microsoft SQL and Oracle databases, but not for MySQL. If you prefer other versions than the sipped default JDBC drivers, then you can also use the mentioned JDBC driver configuration to provide your custom JDBC driver.
    • Executes the RapidMiner database initialization using its default content (during the RapidMiner Server / JBoss startup using the Hibernate technology)
  • Providing an existing (not empty) RapidMiner home volume and initialized RapidMiner database content will skip the initialization procedure (e.g. skips the JDBC driver initialization, doesn't load the default RapidMiner Home directory and default database content) and continue with the startup procedure:

    • Optionally configure the database connection (host, port, username and password) if provided using external parameters
    • Optionally configure the ActiveMQ credentials (username and password) if provided using external parameters
    • Execute the externally mounted optional bootstrap scripts (/bootstrap.d volume mount)
    • Start RapidMiner Server / JBoss.

Configuration

  • Volumes
    • /persistent-rapidminer-home: volume mount which stores the RapidMiner home folder, including all the configuration files, extensions, licenses, logs and repository data. See the data persistence chapter below for more details.
    • /bootstrap.d: volume mount for optional startup-time configuration scripts, that are executed after the initialization phase, but before the RapidMiner Server (JBoss) startup.
    • /jdbc-drives: volume to mount externally provided JDBC driver jar files (e.g. for MySQL database)
  • Ports:
    • ports 1081 and 1082, used for Radoop Proxy communication
    • port 5672, used for ActiveMQ communication
    • port 8080, used for the web interface
  • Environment variables:
    • DBTYPE: if you start the container without existing database and initialized RapidMiner Home folder, then you can set this variable to one of the supported database platforms (mysql, pgsql, mssql or oracle) in order to use (do the initialization procedure) with the required database type. Default value: "pgsql". By selecting MySQL as database type, you should provide also the JDBC driver jar file, it is not packaged together with our components because of its licensing conditions.
    • DBHOST, DBPORT, DBUSER, DBPASS, DBSCHEMA: set these variables to configure RapidMiner Server to connect to an external database. If the provided database is empty, it will be initialized with an initial RapidMiner Server database.
    • JDBC_JAR_FILENAME: if you mount an external JDBC driver jar file to the /jdbc-drivers mount point, then you should provide its filename (just the filename, not the full path) during the initialization procedure of the container in order to pick up and use that driver.
    • JDBC_DRIVER_CLASS: this parameter can be optionally provided together with the JDBC_JAR_FILENAME. If you set, then this Java class will be specified at the RapidMiner Server DataSource definition. If you do not specify anything, then our bootstrap procedure tries to guess the Java class name based on the provided JDBC jar file.
    • BROKER_ACTIVEMQ_USERNAME and BROKER_ACTIVEMQ_PASSWORD: credentials for the ActiveMQ service used for communication between the Server and its connected Job Agents
    • JOBSERVICE_AUTH_SECRET: authentication secret used for communication between the Server and its connected Job Agents. You have to provide the same secret for the Job Agents to enable communication between them. A Base64 encoded string is expected here.
    • PROXY_INTERNALPROXIES, PROXY_HTTP_PORT and PROXY_HTTPS_PORT: if using RapidMiner Server behind a reverse proxy you can set these values to let RapidMiner Server know about the proxy, its port and based on these parameters RapidMiner Server can build up the external URLs (e.g. redirects).
    • INTERACTIVE_MODE: setting this variable to "1" will start an interactive bash shell without starting the RapidMiner Server process. The server can be configured, plugins can be installed and afterwards the RapidMiner Server process can be started manually. Intended for troubleshooting purposes only.
    • TZ: time zone specification based on the TZ database format (e.g. America/New_York)
    • PA_BASE_URL: the HTTP URL used to access the Platform Admin component. Only available in the -python versions of the image.
    • SSO_PUBLIC_URL, SSO_IDP_REALM, SSO_CLIENT_ID, SSO_CLIENT_SECRET: RapidMiner Identity and Security configuration. Filled automatically by the init service.
    • SERVER_MAX_MEMORY: the amount of maximum memory that RapidMiner Server can use.
    • LEGACY_REST_BASIC_AUTH_ENABLED: allows HTTP basic authentication on Server REST endpoints. Defaults to false. Enabling it should only be considered for legacy reasons as it lowers security.
    • JUPYTER_URL_SUFFIX, GRAFANA_URL_SUFFIX: the external URL suffixes where JupyterHub and Dashboards are accessible.

Data persistence

The RapidMiner home directory stores all the data and configuration connected with the RapidMiner Server image.

To make this data persistent, make sure to start the container with a volume mounted on the mount point /persistent-rapidminer-home.

  • If the mounted volume is empty, then a default configuration and data content will be propagated to it for use by RapidMiner Server during the initialization phase.
  • If the volume contains data from any previous executions, then the server will be started with that data.

This volume will contain all the configuration files, extensions, licenses, logs and repository data. After the first execution (with a mounted empty volume), the following data can be edited:

  • Extensions can be installed by adding them to the folder <volume>/resources/extensions
  • Licences can be installed in <volume>/resources/licenses
  • The configuration can be tuned via files stored in <volume>/configuration

Good to know

  • RapidMiner Server requires at least 8GB of memory. On Windows hosts, please make sure that the Docker Engine is configured to run with enough memory.
  • The licenses mount point should be a standard RapidMiner licenses folder, containing the license files in subfolders named rapidminer-server, rapidminer-studio, radoop.
  • To mount volumes on a Windows system you should pay attention to the Windows-specific Docker volume mount settings:
    • Make sure the drive is shared in the Docker settings
    • If using docker-compose, consider setting the environment variable "COMPOSE_CONVERT_WINDOWS_PATHS=1"
    • Make sure that Docker can read and write the mounted files and folders

Self-contained deployments

This is intended only for quick single-user trials on one’s own computer. Data persistence is not implemented in these deployments. For all other purposes start from one of our deployment templates.

Self-contained deployments use an embedded Job Agent and database to get the job done. To start a self-contained deployment, you need to specify the following environment variables:

  • EMBEDDED_DATABASE: set this to "1" in order to start the embedded PostgreSQL database server in the container. Note that data persistence using an embedded database server is not implemented.
  • BUNDLED_JOB_AGENT: set this to "1" in order to start the bundled Job Agent
  • JOBAGENT_QUEUE_ACTIVEMQ_USERNAME, JOBAGENT_QUEUE_ACTIVEMQ_PASSWORD: credentials for the ActiveMQ service used for communication between the Server and the bundled Job Agent
  • JOBAGENT_AUTH_SECRET: used to configure the bundled Job Agents' authentication secret. A Base64 encoded string is expected here.