Technology Overview

This page gives a background on the deployment technology behind the RapidMiner platform, and provides examples on common deployment admin tasks such as creating deployments, scaling up and down, and adding or removing deployable components.

We publish our components as sets of Docker images (also referred to as services throughout this document). These images are the building blocks for both single and multiple machine deployments.

Single machine deployments

To deploy RapidMiner platform on a single machine we opted for the docker-compose technology.

To deploy using docker-compose you need to have:

  • a target (physical or virtual) machine with Docker installed, and Linux as the preferred host operating system
  • a docker-compose.yml deployment description file, that describes the components (services), the netwworking configuration, the volumes and the environment variables
  • an optional, but recommended .env configuration file, where the required parameters can be defined and referred from the docker-compose.yml file (this makes deployment configuration easier and less error prone)

In the docker-compose section we provide these docker-compose.yml and .env files to the most common use-cases. These templates should work out of the box.

Starting and stopping a deployment

To start such a deployment you can use the docker-compose command line tool (from the directory where you downloaded the above files):

  • Start the whole platform (all services):

    docker-compose up -d

  • Start only selected services (e.g. the Postgres database, the RapidMiner Server and a Job Agent)

    docker-compose up -d rm-postgresql-svc rm-server-svc rm-server-job-agent-svc

Here are some examples how to stop and restart services.

  • Stop every service:

    docker-compose down

  • Restart one of the services (e.g. the RapidMiner server instance, that is called rm-server-svc) to reload its configuration:

    docker-compose restart rm-server-svc

Scaling a deployment

The scaling use-case typically means scaling the number of Job Agent containers in the deployment.

  • Scale up or down one of the services (e.g. the RapidMiner Job Agent services):

    docker-compose up --scale rm-server-job-agent-svc=5 -d

Customizing a deployment

For each docker image there can be a list of environment variables that should set in order to start the docker container based on that image. We publish the full image reference for each RapidMiner Docker image.

Our published templates are already configured to let the defined services work together. The configuration parameters (environment variables) are externalized to a .env configuration file, where you can fine-tune the configuration parameters. There are

  • "global" configuration variables, that are used in multiple services (like, the AUTH_SECRET, that is used RapidMiner Server, Job Agent and also Jupyterhub sevices)
  • "service-specific" configuration variables (like the memory parameters for the Job Agent)

Upgrading a deployment

To upgrade a deployment, all you need to do is edit your docker-compose.yml to include the newer version of the services you plan to upgrade, then restarting the impacted services (see above). Be sure to check the image reference for any possible new configuration parameters that might be needed.

Besides the docker-compose CLI tool you can use the Docker Deployment Manager, that we ship with our cloud images and is a web UI to interact with your docker-compose-based deployments.

Multiple machine deployments

For multiple machine deployments, we recommend Kubernetes as the orchestration technology. Our Docker Images are ready to deploy to any Kubernetes Cluster.

We tested our example configuration with these Kubernetes services:

In the kubernetes templates section we provide example deployment configurations and tutorials, but the final deployment depends on your requirements.

For each template we provide - the proposed volume definitions, where you can adjust the storage size required by the component - the service definitions, which are either internal services used by other services (e.g. the database service) or external ones (e.g. RapidMIner Server Web UI), that are exposed to the users. - the deployment configurations that are container definitions very similar to the ones used in the docker-compose.yml file above, also including the required environment variables.

Kubernetes deployment process

Based on the object definitions proposed in the templates section, you can deploy the RapidMiner Server Platform to a Kubernetes cluster:

  • Make sure that the connection to your Kubernetes cluster is working
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:02:58Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
  • Create and check the volumes
$ kubectl apply -f volumes.yaml
persistentvolumeclaim/pgvolume-claim created
persistentvolumeclaim/rmsvolume-claim created
$ kubectl get pv pvc
$ kubectl get pv pv
  • Create and check services
$ kubectl apply -f services.yaml`
service/rapidminer-server-amq-svc created
service/postgres-svc created
service/rapidminer-server-svc created
$ kubectl get svc
NAME                        TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
postgres-svc                ClusterIP      10.152.183.3             5432/TCP         72s
rapidminer-server-amq-svc   ClusterIP      10.152.183.128           5672/TCP         72s
rapidminer-server-svc       LoadBalancer   10.152.183.252   ******        8080:30661/TCP   72s
  • Deploy services
$ kubectl apply -f database.yaml
pod/database created
$ kubectl apply -f rapidminer-server.yaml
pod/rapidminer-server created
$ kubectl apply -f job-agent.yaml
deployment.apps/job-agent created
  • Check the running PODs
$ kubectl get pod
NAME                                          READY   STATUS    RESTARTS   AGE
pod/database                                  1/1     Running   0          41m
pod/job-agent-556b49567b-5cm8n                1/1     Running   0          44s
pod/job-agent-556b49567b-6585h                1/1     Running   0          44s
pod/job-agent-556b49567b-zk44g                1/1     Running   0          44s
pod/rapidminer-server                         1/1     Running   0          40m