Categories

Versions

You are viewing the RapidMiner Go documentation for version 9.9 - Check here for latest version

RapidMiner Job Container

Job Containers are the back-end components of RapidMiner Go that execute CPU-heavy computations such as model training and prediction. The default docker-compose-services only starts one Job Container on the same host as RapidMiner Go, but in a production environment multiple Job Containers should be started on separate machines. The load balancing between JC instances is handled by the AMQ service. A JC instance only performs one job at a time, so the next job in queue will be picked by the JC instance that first becomes idle.

Licensing

Job Containers depend on the license file at licenses/rapidminer-go-on-prem directory - if this is not present JC will not start. This folder is automatically mounted into the file system of every RapidMiner Go and Job Container instance - so there's no need to copy it manually.

Configuration using environment variables

A Job Container is a Spring Boot application. It currently has a single valid Spring profile value: broker-amq.

Table of default environment variables:

Environment variable name Description
JOB_QUEUE AMQ job queue name
JOB_STATUS_QUEUE AMQ status queue name
JOB_COMMAND_TOPIC AMQ topic name
AMQ_URL AMQ URL
AMQ_USERNAME AMQ username
AMQ_PASSWORD AMQ password

Multiple JobContainers and per user job limitation

Multiple JobContainer instances can be run by increasing the JOB_CONTAINERS variable in .env file. In this case make sure there is enough available RAM on the host machine to be allocated for these instances. The default value of MEMORY_PER_JOB_CONTAINER requires 4GB per JobContaner. For instance by using the default memory settings with 2 JCs will require 4 + 4 * 2 = 12Gb RAM in total.

With multiple JCs available you can also increase the AUTOMODELER_EXECUTION_QUEUE_LIMIT_PER_USER in AutoModeler settings. If this setting is equal to the number of JCs one user's jobs can be run parallely on all JCs - so an other user submitting his or her job later will need to wait until both JC finish their current job. By decreasing the queue limit you can limit every user to a fraction of the JCs thus preserving execution resources for other concurrent users.