Kubernetes

Our Docker Images are ready to deploy to any Kubernetes Cluster. Here we provide example deployment configurations and tutorials, but the final deployment depends on your requirements.

The following guide requires a running Kubernetes cluster. We tested our example configuration with these Kubernetes services:

Deployment architecture and definition

In our example, we deploy a PostgeSQL database server, RapidMiner Server, and some Job Agents on Kubernetes.

To deploy RapidMiner Server on Kubernetes, you need to define the services, volumes and pods.

Volumes

Our example configuration uses two persistent volumes:

  1. A volume for the PostgreSQL database data storage
  2. A volume for the RapidMiner Home of the RapidMiner Server

To define the volumes, you can apply the following Kubernetes Object Configuration YAML file.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pgvolume-claim
  labels:
    app: database
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rmsvolume-claim
  labels:
    app: rapidminer-server
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

Services

To deploy the example configuration, we specify three Kubernetes Service Endpoints:

  1. The ActiveMQ service endpoint is an internal endpoint that is used by the Job Agents (port: 5672)
  2. The database service endpoint is an internal endpoint that used to connect from the RapidMiner Server (port: 5432)
  3. The RapidMiner Server service endpoint represent the public web interface of the RapidMiner Server (port: 8080).

Note: the public endpoint definition may differ on different Kubernetes Clusters. Public cloud providers support the LoadBalancer type, but the MicroK8S implementation requires the setting of an Ingress to enable public access.

To define the service endpoints, you can apply the following Kubernetes Object Configuration YAML file:

kind: Service
apiVersion: v1
metadata:
  name: rapidminer-server-amq-svc
  labels:
    app: rapidminer-server-amq-svc
    role: server
spec:
  ports:
  - port: 5672
    targetPort: amq
  selector:
    app: rapidminer-server
    role: server
---
kind: Service
apiVersion: v1
metadata:
  name: postgres-svc
  labels:
    app: database
spec:
  ports:
  - port: 5432
    targetPort: postgresport
  selector:
    app: database
---
kind: Service
apiVersion: v1
metadata:
  name: rapidminer-server-svc
  labels:
    app: rapidminer-server-svc
    role: server
spec:
  ports:
  - port: 8080
    targetPort: rmswebui
  selector:
    app: rapidminer-server
    role: server
  type: LoadBalancer

PODs / Containers

Our example configuration defines the following 3 deployments:

  • The Database pod contains the PostgreSQL container. The pgvolume-claim is used as persistent volume. We also defined a subPath to ensure empty mount point for the postgres container.
kind: Pod
apiVersion: v1
metadata:
  name: database
  labels:
    app: database
spec:
  containers:
  - name: database
    image: postgres:9.6
    ports:
    - name: postgresport
      containerPort: 5432
    env:
    - name: POSTGRES_DB
      value: rmsdb
    - name: POSTGRES_USER
      value: rmsdbuser
    - name: POSTGRES_PASSWORD
      value: rmsdbpassword
    volumeMounts:
    - name: pgvolume
      mountPath: /var/lib/postgresql/data
      subPath: postgres
  volumes:
  - name: pgvolume
    persistentVolumeClaim:
      claimName: pgvolume-claim
  • The RapidMiner Server container is defined with the following configuration. The environment variables are defined based on our Docker Image documentation. The rmsvolume-claim is used to provide the persistent RapidMiner Home Folder. We also defined a subPath on the volume to ensure empty mount point for the first startup to let the RapidMiner Server container do the initialization of the RapidMiner Home Folder.
kind: Pod
apiVersion: v1
metadata:
  name: rapidminer-server
  labels:
    app: rapidminer-server
    role: server
spec:
  containers:
  - name: rapidminer-server
    image: rapidminer/rapidminer-server:9.3.0
    ports:
    - name: rmswebui
      containerPort: 8080
    - name: amq
      containerPort: 5672
    env:
    - name: JOBSERVICE_QUEUE_ACTIVEMQ_USERNAME
      value: amq-user
    - name: JOBSERVICE_QUEUE_ACTIVEMQ_PASSWORD
      value: amq-pass
    - name: JOBSERVICE_AUTH_SECRET
      value: c29tZS1hdXRoLXNlY3JldAo=
    - name: DBHOST
      value: postgres-svc
    - name: DBSCHEMA
      value: rmsdb
    - name: DBUSER
      value: rmsdbuser
    - name: DBPASS
      value: rmsdbpassword
    volumeMounts:
    - name: rmsvolume
      mountPath: /persistent-rapidminer-home
      subPath: rapidminer-home
  volumes:
  - name: rmsvolume
    persistentVolumeClaim:
      claimName: rmsvolume-claim
  • The Job Agent containers are deployed using a Deployment Kubernetes object type, that provides replication and starts three instances in our example.
kind: Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: job-agent
  labels:
    app: job-agent
    role: execution
spec:
  replicas: 3
  selector:
    matchLabels:
      app: job-agent
  template:
    metadata:
      labels:
        app: job-agent
        role: execution
    spec:
      containers:
      - name: job-agent
        image: rapidminer/rapidminer-execution-jobagent:9.3.0
        env:
        - name: RAPIDMINER_SERVER_HOST
          value: rapidminer-server-svc
        - name: RAPIDMINER_SERVER_PORT
          value: '8080'
        - name: JOBAGENT_QUEUE_ACTIVEMQ_URI
          value: failover:(tcp://rapidminer-server-amq-svc:5672)
        - name: JOBAGENT_QUEUE_ACTIVEMQ_USERNAME
          value: amq-user
        - name: JOBAGENT_QUEUE_ACTIVEMQ_PASSWORD
          value: amq-pass
        - name: JOBAGENT_AUTH_SECRET
          value: c29tZS1hdXRoLXNlY3JldAo=

Deployment process

Based on the object definitions shown above, you can deploy the RapidMiner Server on Kubernetes Cluster with the database and Job Agent dependencies:

  • Make sure that the connection to your Kubernetes Cluster is working
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:02:58Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
  • Create and check the volumes
$ kubectl apply -f volumes.yaml
persistentvolumeclaim/pgvolume-claim created
persistentvolumeclaim/rmsvolume-claim created
$ kubectl get pv pvc
$ kubectl get pv pv
  • Create and check services
$ kubectl apply -f services.yaml`
service/rapidminer-server-amq-svc created
service/postgres-svc created
service/rapidminer-server-svc created
$ kubectl get svc
NAME                        TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
postgres-svc                ClusterIP      10.152.183.3             5432/TCP         72s
rapidminer-server-amq-svc   ClusterIP      10.152.183.128           5672/TCP         72s
rapidminer-server-svc       LoadBalancer   10.152.183.252   ******        8080:30661/TCP   72s
  • Deploy services
$ kubectl apply -f database.yaml
pod/database created
$ kubectl apply -f rapidminer-server.yaml
pod/rapidminer-server created
$ kubectl apply -f job-agent.yaml
deployment.apps/job-agent created
  • Check the running PODs
$ kubectl get pod
NAME                                          READY   STATUS    RESTARTS   AGE
pod/database                                  1/1     Running   0          41m
pod/job-agent-556b49567b-5cm8n                1/1     Running   0          44s
pod/job-agent-556b49567b-6585h                1/1     Running   0          44s
pod/job-agent-556b49567b-zk44g                1/1     Running   0          44s
pod/rapidminer-server                         1/1     Running   0          40m