Categories

Versions

You are viewing the RapidMiner Hub documentation for version 2024.0 - Check here for latest version

Kubernetes

The provided Docker Images are ready to deploy to any Kubernetes Cluster.

Please review the configuration below according to your environment and requirements.

The following guide requires a running Kubernetes cluster.

Our example configuration was tested in the following Kubernetes services:

Deployment architecture and definition

This tutorial covers Multi-container-based deployment on Kubernetes with the following components:

  • Real-Time Scoring Agent,
  • Real-Time Scoring Web UI Web UI,
  • a frontend proxy and,
  • a cron container.

To deploy Real-Time Scoring on Kubernetes, you need to define the services, volumes and deployments.

Volumes

The Volumes configuration uses four persistent volumes, similar as descibed in the Docker-based deployment section:

  1. A volume for uploaded files storage
  2. A volume for cron log files storage
  3. A volume for license data storage
  4. A volume for the deployments of the Real-Time Scoring

To define the volumes, you can apply the following Kubernetes Object Configuration YAML file.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rapidminer-uploaded-pvc
  labels:
    app: rapidminer-webui
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rapidminer-cron-log-pvc
  labels:
    app: rapidminer-cron
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rts-licenses-pvc
  labels:
    app: rapidminer-rts
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100M

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rts-deployments-pvc
  labels:
    app: rapidminer-rts
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

Services

To deploy the example configuration, 3 Kubernetes Service Endpoints are defined:

  1. The public proxy service endpoint represents the public web interface of the proxy container (port: 443).
  2. The private WebUI service endpoint represents the private web interface of the WebUI container (port: 80)
  3. The private Real-Time Scoring service endpoint represents the private web interface of the Real-Time Scoring container (port: 8090)

Note:

  • the public endpoint definition may differ on different Kubernetes Clusters. Public cloud providers support the LoadBalancer type, but the MicroK8S implementation requires the setting of an Ingress to enable public access.
  • When testing in MiniKube, the annotation block and the type: LoadBalancer line can be ignored. Please read the Notices about minikube.
  • It is strongly recommended to use a valid certificate. The sample service definition contains recommended settings to set up an AWS loadbalancer for https offloading with AWS Certificate Manager. For usage in a protected network, or for testing (eg. MiniKube or MicroK8S), the annotation block can be omitted and nodePort can be used for all the services.

To define the service endpoints, you can apply the following Kubernetes Object Configuration YAML file:

kind: Service
apiVersion: v1
metadata:
  name: rapidminer-proxy
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:XX-XXXX-X:XXXXXXXXXXXX:certificate/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
    service.beta.kubernetes.io/aws-load-balancer-ssl-negotiation-policy: "ELBSecurityPolicy-TLS-1-2-2017-01"
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "http"
    service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "60"
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "false"
    service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags: "Name=rapidminer-rts-elb"
  labels:
    app: real-time-scoring-webui
    role: webui
spec:
  type: LoadBalancer
  ports:
  - name: rts-proxyhttp
    port: 443
    protocol: TCP
    targetPort: rts-proxy-http
  selector:
    app: real-time-scoring-webui
    role: webui

---

kind: Service
apiVersion: v1
metadata:
  name: real-time-scoring-webui
  labels:
    app: real-time-scoring-webui
    role: webui
spec:
  ports:
  - name: rts-webuiport
    port: 81
    protocol: TCP
    targetPort: rts-webuiport
  selector:
    app: real-time-scoring-webui
    role: webui

---

kind: Service
apiVersion: v1
metadata:
  name: real-time-scoring-agent
  labels:
    app: real-time-scoring-agent
    role: real-time-scoring
spec:
  ports:
  - name: rts-scoreport
    port: 8090
    protocol: TCP
    targetPort: rts-scoreport
  selector:
    app: real-time-scoring-agent
    role: real-time-scoring

Deployments (Pods, Containers)

The containers are deployed using a Deployment Kubernetes object type, that provides replication and starts one replica from each type in this example.

The environment variables are defined based on the Docker Image documentation.

The example configuration defines the following 2 deployments:

  • The Real-Time Scoring Agent pod is defined with the following configuration. The rts-deployments-pvc is used to provide the persistency for the scoring deployments.

Because sharing volumes between Kubernetes pods can be difficult to set up and maintain, the example configuration below is prepared to download the licensing information from the WebUI at container startup.

Please review the resource limitations to fit with your hardware capabilities.

To constrain a pod so that it prefers to run on a particular worker node, you first have to add a label to the node, and with the nodeSelector property you can set this up in the deployment too.

kind: Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: real-time-scoring-agent
  labels:
    app: real-time-scoring-agent
    role: real-time-scoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: real-time-scoring-agent
  template:
    metadata:
      labels:
        app: real-time-scoring-agent
        role: real-time-scoring
    spec:
      containers:
      - name: real-time-scoring-agent
        image: rapidminer/rapidminer-execution-scoring:latest
        ports:
        - name: rts-scoreport
          containerPort: 8090
        env:
        - name: WAIT_FOR_LICENSES
          value: "1"
        - name: MANAGEMENT_API_ENDPOINT
          value: "http://real-time-scoring-webui:81/"
        resources:
          requests:
            memory: "2G"
            cpu: "1"
          limits:
            memory: "32G"
            cpu: "1"
        volumeMounts:
        - name: rts-deployments-pv
          mountPath: /rapidminer-scoring-agent/home/deployments
      volumes:
      - name: rts-deployments-pv
        persistentVolumeClaim:
          claimName: rts-deployments-pvc
#      nodeSelector:
#        node-label-name: label-value-of-worker-node-where-rts-may-started
  • The Real-Time Scoring WebUI pod is defined with the following configuration. The rapidminer-uploaded-pvc, rapidminer-cron-log-pvc, rts-licenses-pvc are used to provide the persistency for the uploaded files, logs, and licenses.

Because sharing volumes between Kubernetes pods can be difficult to set up and maintain, the example pod configuration below contains 3 containers, so they are deployed always on the same worker node by Kubernetes and this way they can share volumes.

The resource limitations are included for reference, this containers are not resource intensive.

To influence that on which worker node Kubernetes should start the pod, first you have to add a label to a worker node of the cluster, and with the nodeSelector property you can set this up in the deployment too.

kind: Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: real-time-scoring-webui
  labels:
    app: real-time-scoring-webui
    role: webui
spec:
  replicas: 1
  selector:
    matchLabels:
      app: real-time-scoring-webui
  template:
    metadata:
      labels:
        app: real-time-scoring-webui
        role: webui
    spec:
      containers:
      - name: rapidminer-cron
        image: rapidminer/rapidminer-real-time-scoring-cron:latest
        resources:
          requests:
            memory: "100M"
            cpu: "0.5"
          limits:
            memory: "200M"
            cpu: "0.5"
        volumeMounts:
        - name: rapidminer-uploaded-pv
          mountPath: /rapidminer/uploaded/
        - name: rapidminer-cron-log-pv
          mountPath: /var/log/
        - name: rts-licenses-pv
          mountPath: /rapidminer/rts_home/licenses/
      - name: real-time-scoring-webui
        image: rapidminer/rapidminer-real-time-scoring-webui:latest
        ports:
        - name: rts-webuiport
          containerPort: 81
        resources:
          requests:
            memory: "200M"
            cpu: "0.5"
          limits:
            memory: "500M"
            cpu: "0.5"
        volumeMounts:
        - name: rapidminer-uploaded-pv
          mountPath: /var/www/html/uploaded
      - name: rapidminer-proxy
        image: rapidminer/rapidminer-real-time-scoring-proxy:latest
        ports:
        - name: rts-proxy-http
          containerPort: 80
        resources:
          requests:
            memory: "200M"
            cpu: "1"
          limits:
            memory: "200M"
            cpu: "1"
        volumeMounts:
        - name: rapidminer-uploaded-pv
          mountPath: /rapidminer/uploaded
          readOnly: true
      volumes:
      - name: rapidminer-uploaded-pv
        persistentVolumeClaim:
          claimName: rapidminer-uploaded-pvc
      - name: rapidminer-cron-log-pv
        persistentVolumeClaim:
          claimName: rapidminer-cron-log-pvc
      - name: rts-licenses-pv
        persistentVolumeClaim:
          claimName: rts-licenses-pvc
#      nodeSelector:
#        node-label-name: label-value-of-worker-node-where-rts-may-started

Deployment process

Based on the object definitions shown above, the Real-Time Scoring can be deployed on Kubernetes Cluster with all the components:

  • Make sure that the connection to your Kubernetes Cluster is working
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:54Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:15:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
  • Create and check the volumes
$ kubectl apply -f volumes.yml
persistentvolumeclaim/rapidminer-uploaded-pvc created
persistentvolumeclaim/rapidminer-cron-log-pvc created
persistentvolumeclaim/rts-licenses-pvc created
persistentvolumeclaim/rts-deployments-pvc created
$ kubectl get pv,pvc
  • Create and check services
$ kubectl apply -f services.yml`
service/rapidminer-proxy created
service/real-time-scoring-webui created
service/real-time-scoring-agent created
$ kubectl get svc
NAME                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
rapidminer-proxy          ClusterIP   10.103.149.61            443/TCP,80/TCP   115s
real-time-scoring-agent   ClusterIP   10.104.163.156           8090/TCP         115s
real-time-scoring-webui   ClusterIP   10.98.219.140            80/TCP           115s
  • Create Deployments
$ kubectl apply -f real-time-scoring-agent.yml
deployment.apps/real-time-scoring-agent created
$ kubectl apply -f real-time-scoring-webui.yml
deployment.apps/real-time-scoring-webui created
  • Check the running deployments
$ kubectl get pod
NAME                                       READY   STATUS    RESTARTS   AGE
real-time-scoring-agent-85c57b9675-6l2fv   1/1     Running   0          6m6s
real-time-scoring-webui-66799d6b74-7c8j9   3/3     Running   0          6m6s
  • Check the logs of a running Real-Time Scoring Agent container/pod (replace pad names as your get pod command above outpouts)
$ kubectl logs -f real-time-scoring-agent-85c57b9675-6l2fv
...
[INFO] Waiting for license synchronization.... Please upload your licenses on the Web UI
[INFO] Waiting for license synchronization.... Please upload your licenses on the Web UI
...

In case of the real-time-scoring-webui Pod, it is a bit more different, because the pod contains 3 containers, so the container shold be defined in the command too:

$ kubectl logs -f real-time-scoring-webui-66799d6b74-7c8j9 -c rapidminer-proxy
...
[entrypoint.sh] Mandatory file missing, waiting...
[entrypoint.sh] Starting nginx...
2019/09/02 15:07:46 [warn] 18#18: "ssl_stapling" ignored, issuer certificate not found for certificate "/rapidminer/uploaded/certs/validated_cert.crt"
nginx: [warn] "ssl_stapling" ignored, issuer certificate not found for certificate "/rapidminer/uploaded/certs/validated_cert.crt"
$ kubectl logs -f real-time-scoring-webui-66799d6b74-7c8j9 -c real-time-scoring-webui
...
[Mon Sep 02 15:07:11.778314 2019] [core:notice] [pid 1] AH00094: Command line: 'apache2 -D FOREGROUND'
$ kubectl logs -f real-time-scoring-webui-66799d6b74-7c8j9 -c rapidminer-cron
...
[entrypoint.sh] Starting cron...

From here the way you can connect to the Web UI depends on your installation:

  • In case of deploying to a cloud providers Kubernetes cluster, you will see a new LoadBalancer in your resources list,
  • With MikroK8S you have to define an Ingress,
  • With MiniKube please look at the end of the Notices about minikube section.

Please note, that by default the proxy container at the port 443 works with a self signed certificate, when opening the Web UI first time, you will see a warning about that. You can bypass the warning, the communication will be encrypted between your browser and the proxy, but it is strongly recommended to replace this certificate with a trusted one.

Limitations

  • At the moment set replicas to more than 1 is not supported.
  • After a new certificate is deployed, the reverse proxy should be reloaded with the following command:
    kubectl exec -it `kubectl get pods | grep webui | awk '{print $1}'` -c rapidminer-proxy -- /etc/init.d/nginx reload

Notices about minikube

In the default MiniKube installation, the cluster resources are limited to a very low level. Using the following commands you can lift up these limitations permanently:

minikube config set memory 16384
minikube config set cpus 8
minikube config set disk-size 200000MB

If you are using a linux workstation and have docker installed, you can start MiniKube with a vm-driver none option, in that case all the cluster services and deployed objects will run on your existing docker engine. To set this permanently, the following command can be used:

minikube config set vm-driver none

In case using the vm-driver none option, minikube api server can be bound to your host:

minikube start  --apiserver-ips 127.0.0.1 --apiserver-name localhost

The configuration above will take effect after delete and start minikube commands.

Minikube has no support for loadbalancer, so please modify the services.yml file:

  • remove the complete "annotations:" block
  • remove the "type: LoadBalancer" lines
  • add a line "type: NodePort" right after line "spec:" at every service definition

To find the exposed ports run the following commands after the deployment process is done:

minikube service list
|-------------|-------------------------|----------------------------|
|  NAMESPACE  |          NAME           |           URL              |
|-------------|-------------------------|----------------------------|
| default     | rapidminer-proxy        | http://10.103.149.61:31871 |
| default     | real-time-scoring-agent | http://10.103.149.61:31488 |
| default     | real-time-scoring-webui | http://10.103.149.61:30274 |
|-------------|-------------------------|----------------------------|

Alternatively:

$ kubectl get services | grep proxy
rapidminer-proxy          ClusterIP   10.103.149.61            443/TCP,80/TCP   8m2s

From the output above, you can open the Web UI in your browser on the following URLs:

  • https://10.103.149.61:443/rts-admin/ (using https protocol)
  • http://10.103.149.61:80/rts-admin/ (using http protocol)

(in case you are using the default 80 and 443 ports, they can be omitted from the URLs)