You are viewing the RapidMiner Hub documentation for version 2024.0 - Check here for latest version
Kubernetes
The provided Docker Images are ready to deploy to any Kubernetes Cluster.
Please review the configuration below according to your environment and requirements.
The following guide requires a running Kubernetes cluster.
Our example configuration was tested in the following Kubernetes services:
- Amazon Managed Kubernetes Service (Amazon EKS)
- Azure Kubernetes Service (AKS)
- MiniKube (Please read the Notices about minikube)
- MicroK8S
Deployment architecture and definition
This tutorial covers Multi-container-based deployment on Kubernetes with the following components:
- Real-Time Scoring Agent,
- Real-Time Scoring Web UI Web UI,
- a frontend proxy and,
- a cron container.
To deploy Real-Time Scoring on Kubernetes, you need to define the services, volumes and deployments.
Volumes
The Volumes configuration uses four persistent volumes, similar as descibed in the Docker-based deployment section:
- A volume for uploaded files storage
- A volume for cron log files storage
- A volume for license data storage
- A volume for the deployments of the Real-Time Scoring
To define the volumes, you can apply the following Kubernetes Object Configuration YAML file.
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: rapidminer-uploaded-pvc labels: app: rapidminer-webui spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: rapidminer-cron-log-pvc labels: app: rapidminer-cron spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: rts-licenses-pvc labels: app: rapidminer-rts spec: accessModes: - ReadWriteOnce resources: requests: storage: 100M --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: rts-deployments-pvc labels: app: rapidminer-rts spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi
Services
To deploy the example configuration, 3 Kubernetes Service Endpoints are defined:
- The public proxy service endpoint represents the public web interface of the proxy container (port: 443).
- The private WebUI service endpoint represents the private web interface of the WebUI container (port: 80)
- The private Real-Time Scoring service endpoint represents the private web interface of the Real-Time Scoring container (port: 8090)
Note:
- the public endpoint definition may differ on different Kubernetes Clusters.
Public cloud providers support the
LoadBalancer
type, but the MicroK8S implementation requires the setting of anIngress
to enable public access. - When testing in MiniKube, the annotation block and the type: LoadBalancer line can be ignored. Please read the Notices about minikube.
- It is strongly recommended to use a valid certificate. The sample service definition contains recommended settings to set up an AWS loadbalancer for https offloading with AWS Certificate Manager. For usage in a protected network, or for testing (eg. MiniKube or MicroK8S), the annotation block can be omitted and nodePort can be used for all the services.
To define the service endpoints, you can apply the following Kubernetes Object Configuration YAML file:
kind: Service apiVersion: v1 metadata: name: rapidminer-proxy annotations: service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:XX-XXXX-X:XXXXXXXXXXXX:certificate/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX service.beta.kubernetes.io/aws-load-balancer-ssl-negotiation-policy: "ELBSecurityPolicy-TLS-1-2-2017-01" service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "http" service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "60" service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "false" service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags: "Name=rapidminer-rts-elb" labels: app: real-time-scoring-webui role: webui spec: type: LoadBalancer ports: - name: rts-proxyhttp port: 443 protocol: TCP targetPort: rts-proxy-http selector: app: real-time-scoring-webui role: webui --- kind: Service apiVersion: v1 metadata: name: real-time-scoring-webui labels: app: real-time-scoring-webui role: webui spec: ports: - name: rts-webuiport port: 81 protocol: TCP targetPort: rts-webuiport selector: app: real-time-scoring-webui role: webui --- kind: Service apiVersion: v1 metadata: name: real-time-scoring-agent labels: app: real-time-scoring-agent role: real-time-scoring spec: ports: - name: rts-scoreport port: 8090 protocol: TCP targetPort: rts-scoreport selector: app: real-time-scoring-agent role: real-time-scoring
Deployments (Pods, Containers)
The containers are deployed using a Deployment Kubernetes object type, that provides replication and starts one replica from each type in this example.
The environment variables are defined based on the Docker Image documentation.
The example configuration defines the following 2 deployments:
- The Real-Time Scoring Agent pod is defined with the following configuration. The
rts-deployments-pvc
is used to provide the persistency for the scoring deployments.
Because sharing volumes between Kubernetes pods can be difficult to set up and maintain, the example configuration below is prepared to download the licensing information from the WebUI at container startup.
Please review the resource limitations to fit with your hardware capabilities.
To constrain a pod so that it prefers to run on a particular worker node, you first have to add a label to the node, and with the nodeSelector property you can set this up in the deployment too.
kind: Deployment apiVersion: apps/v1 kind: Deployment metadata: name: real-time-scoring-agent labels: app: real-time-scoring-agent role: real-time-scoring spec: replicas: 1 selector: matchLabels: app: real-time-scoring-agent template: metadata: labels: app: real-time-scoring-agent role: real-time-scoring spec: containers: - name: real-time-scoring-agent image: rapidminer/rapidminer-execution-scoring:latest ports: - name: rts-scoreport containerPort: 8090 env: - name: WAIT_FOR_LICENSES value: "1" - name: MANAGEMENT_API_ENDPOINT value: "http://real-time-scoring-webui:81/" resources: requests: memory: "2G" cpu: "1" limits: memory: "32G" cpu: "1" volumeMounts: - name: rts-deployments-pv mountPath: /rapidminer-scoring-agent/home/deployments volumes: - name: rts-deployments-pv persistentVolumeClaim: claimName: rts-deployments-pvc # nodeSelector: # node-label-name: label-value-of-worker-node-where-rts-may-started
- The Real-Time Scoring WebUI pod is defined with the following configuration. The
rapidminer-uploaded-pvc
,rapidminer-cron-log-pvc
,rts-licenses-pvc
are used to provide the persistency for the uploaded files, logs, and licenses.
Because sharing volumes between Kubernetes pods can be difficult to set up and maintain, the example pod configuration below contains 3 containers, so they are deployed always on the same worker node by Kubernetes and this way they can share volumes.
The resource limitations are included for reference, this containers are not resource intensive.
To influence that on which worker node Kubernetes should start the pod, first you have to add a label to a worker node of the cluster, and with the nodeSelector property you can set this up in the deployment too.
kind: Deployment apiVersion: apps/v1 kind: Deployment metadata: name: real-time-scoring-webui labels: app: real-time-scoring-webui role: webui spec: replicas: 1 selector: matchLabels: app: real-time-scoring-webui template: metadata: labels: app: real-time-scoring-webui role: webui spec: containers: - name: rapidminer-cron image: rapidminer/rapidminer-real-time-scoring-cron:latest resources: requests: memory: "100M" cpu: "0.5" limits: memory: "200M" cpu: "0.5" volumeMounts: - name: rapidminer-uploaded-pv mountPath: /rapidminer/uploaded/ - name: rapidminer-cron-log-pv mountPath: /var/log/ - name: rts-licenses-pv mountPath: /rapidminer/rts_home/licenses/ - name: real-time-scoring-webui image: rapidminer/rapidminer-real-time-scoring-webui:latest ports: - name: rts-webuiport containerPort: 81 resources: requests: memory: "200M" cpu: "0.5" limits: memory: "500M" cpu: "0.5" volumeMounts: - name: rapidminer-uploaded-pv mountPath: /var/www/html/uploaded - name: rapidminer-proxy image: rapidminer/rapidminer-real-time-scoring-proxy:latest ports: - name: rts-proxy-http containerPort: 80 resources: requests: memory: "200M" cpu: "1" limits: memory: "200M" cpu: "1" volumeMounts: - name: rapidminer-uploaded-pv mountPath: /rapidminer/uploaded readOnly: true volumes: - name: rapidminer-uploaded-pv persistentVolumeClaim: claimName: rapidminer-uploaded-pvc - name: rapidminer-cron-log-pv persistentVolumeClaim: claimName: rapidminer-cron-log-pvc - name: rts-licenses-pv persistentVolumeClaim: claimName: rts-licenses-pvc # nodeSelector: # node-label-name: label-value-of-worker-node-where-rts-may-started
Deployment process
Based on the object definitions shown above, the Real-Time Scoring can be deployed on Kubernetes Cluster with all the components:
- Make sure that the connection to your Kubernetes Cluster is working
$ kubectl version Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:54Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:15:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
- Create and check the volumes
$ kubectl apply -f volumes.yml persistentvolumeclaim/rapidminer-uploaded-pvc created persistentvolumeclaim/rapidminer-cron-log-pvc created persistentvolumeclaim/rts-licenses-pvc created persistentvolumeclaim/rts-deployments-pvc created $ kubectl get pv,pvc
- Create and check services
$ kubectl apply -f services.yml` service/rapidminer-proxy created service/real-time-scoring-webui created service/real-time-scoring-agent created $ kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE rapidminer-proxy ClusterIP 10.103.149.61443/TCP,80/TCP 115s real-time-scoring-agent ClusterIP 10.104.163.156 8090/TCP 115s real-time-scoring-webui ClusterIP 10.98.219.140 80/TCP 115s
- Create Deployments
$ kubectl apply -f real-time-scoring-agent.yml deployment.apps/real-time-scoring-agent created $ kubectl apply -f real-time-scoring-webui.yml deployment.apps/real-time-scoring-webui created
- Check the running deployments
$ kubectl get pod NAME READY STATUS RESTARTS AGE real-time-scoring-agent-85c57b9675-6l2fv 1/1 Running 0 6m6s real-time-scoring-webui-66799d6b74-7c8j9 3/3 Running 0 6m6s
- Check the logs of a running Real-Time Scoring Agent container/pod (replace pad names as your get pod command above outpouts)
$ kubectl logs -f real-time-scoring-agent-85c57b9675-6l2fv ... [INFO] Waiting for license synchronization.... Please upload your licenses on the Web UI [INFO] Waiting for license synchronization.... Please upload your licenses on the Web UI ...
In case of the real-time-scoring-webui Pod, it is a bit more different, because the pod contains 3 containers, so the container shold be defined in the command too:
$ kubectl logs -f real-time-scoring-webui-66799d6b74-7c8j9 -c rapidminer-proxy ... [entrypoint.sh] Mandatory file missing, waiting... [entrypoint.sh] Starting nginx... 2019/09/02 15:07:46 [warn] 18#18: "ssl_stapling" ignored, issuer certificate not found for certificate "/rapidminer/uploaded/certs/validated_cert.crt" nginx: [warn] "ssl_stapling" ignored, issuer certificate not found for certificate "/rapidminer/uploaded/certs/validated_cert.crt"
$ kubectl logs -f real-time-scoring-webui-66799d6b74-7c8j9 -c real-time-scoring-webui ... [Mon Sep 02 15:07:11.778314 2019] [core:notice] [pid 1] AH00094: Command line: 'apache2 -D FOREGROUND'
$ kubectl logs -f real-time-scoring-webui-66799d6b74-7c8j9 -c rapidminer-cron ... [entrypoint.sh] Starting cron...
From here the way you can connect to the Web UI depends on your installation:
- In case of deploying to a cloud providers Kubernetes cluster, you will see a new LoadBalancer in your resources list,
- With MikroK8S you have to define an Ingress,
- With MiniKube please look at the end of the Notices about minikube section.
Please note, that by default the proxy container at the port 443 works with a self signed certificate, when opening the Web UI first time, you will see a warning about that. You can bypass the warning, the communication will be encrypted between your browser and the proxy, but it is strongly recommended to replace this certificate with a trusted one.
Limitations
- At the moment set replicas to more than 1 is not supported.
- After a new certificate is deployed, the reverse proxy should be reloaded with the following command:
kubectl exec -it `kubectl get pods | grep webui | awk '{print $1}'` -c rapidminer-proxy -- /etc/init.d/nginx reload
Notices about minikube
In the default MiniKube installation, the cluster resources are limited to a very low level. Using the following commands you can lift up these limitations permanently:
minikube config set memory 16384 minikube config set cpus 8 minikube config set disk-size 200000MB
If you are using a linux workstation and have docker installed, you can start MiniKube with a vm-driver none
option, in that case all the cluster services and deployed objects will run on your existing docker engine. To set this permanently, the following command can be used:
minikube config set vm-driver none
In case using the vm-driver none
option, minikube api server can be bound to your host:
minikube start --apiserver-ips 127.0.0.1 --apiserver-name localhost
The configuration above will take effect after delete and start minikube commands.
Minikube has no support for loadbalancer, so please modify the services.yml
file:
- remove the complete "annotations:" block
- remove the "type: LoadBalancer" lines
- add a line "type: NodePort" right after line "spec:" at every service definition
To find the exposed ports run the following commands after the deployment process is done:
minikube service list |-------------|-------------------------|----------------------------| | NAMESPACE | NAME | URL | |-------------|-------------------------|----------------------------| | default | rapidminer-proxy | http://10.103.149.61:31871 | | default | real-time-scoring-agent | http://10.103.149.61:31488 | | default | real-time-scoring-webui | http://10.103.149.61:30274 | |-------------|-------------------------|----------------------------|
Alternatively:
$ kubectl get services | grep proxy rapidminer-proxy ClusterIP 10.103.149.61443/TCP,80/TCP 8m2s
From the output above, you can open the Web UI in your browser on the following URLs:
- https://10.103.149.61:443/rts-admin/ (using https protocol)
- http://10.103.149.61:80/rts-admin/ (using http protocol)
(in case you are using the default 80 and 443 ports, they can be omitted from the URLs)