You are viewing the RapidMiner Legacy documentation for version 9.8 - Check here for latest version
Job Agents
On start, Job Agents spawn a configurable amount of Job Containers as separate system processes. Job Agents are responsible for redirecting incoming jobs from their assigned queue to the locally spawned Job Containers via REST communication. The spawned Job Containers serve solely as executing unit for RapidMiner processes. Job Containers are kept alive as long as the Job Agent application runs and will automatically shut down when the managing Job Agent shuts down.
This page outlines how to configure a Job Agent. Please refer to the architecture page to read more about the Job Agent and Job Container structure overview.
Configuration
Agent Properties
You can alter the configuration of the Job Agent by changing {homeDir}/config/agent.properties
. Each property has a
comment which explains what effect the configuration has on the agent and the spawned Job Containers. Besides
settings like the amount of spawned Job Containers or the maximum amount of used memory per Job Container, it is also
possible to configure more complex behavior outlined in the following sub-sections.
Container ports
When Job Containers are spawned as separate system processes during the Job Agent's startup, they are bound to
system ports. This is necessary because the Job Agent communicates via REST endpoint with them, e.g. to redirect jobs
or to retrieve a job's latest status. The Job Agent will use successive ports beginning
from the defined starting port. The last port is determined by the amount of configured Job Containers, e.g. if 1000
is defined as starting port and four Job Containers should be spawned in total, the Job Agent will bind the ports
1000
, 1001
, 1002
and 1003
. Job Containers only listen locally and are unreachable from anywhere
else than localhost
/127.0.0.1
.
Container restart policies
By default, Job Containers will run indefinitely and not restart after a job has been executed. With this behavior,
it's possible to execute huge amounts of jobs nearly instantly. A possible downside is that jobs might have an effect
on each other when run sequentially. To overcome this, it is possible to assign restart policies to Job Containers.
Supported restart policies are: run indefinitely, terminate after a configurable amount of executed processes and restart
on a regular basis via cron expression. When a restart is invoked, the currently active job execution will be finished
before the job containers are restarted. To change this behavior, it's possible to set the jobagent.restart.timeout
property. The Job Agent will then kill the Job Containers forcibly after the execution time exceeds this timeout
regardless if it's still running.
Container caching for Projects
When a process from a Project is executed by a Job Container, the Job Container will first download the corresponding project files in order to use them during execution. After the process finishes, those temporary working files are deleted. If process execution changes project files, they are automatically added as a new Snapshot.
Because Job Containers need to download project files, it will take time for large projects and thus process execution
time might increase due to this. In order to reduce this initial time to download files, each Job Container caches
already downloaded Projects by applying a caching strategy. In the agent.properties
file, this behavior can be adjusted
by changing the value of the jobagent.container.repository.caching.strategy
property. By default, a Job Container
will keep two Projects in cache and replace the least recently used one when a new Project needs to be downloaded. For
more details about which different strategies exist and how they can be configured to fit your needs, please have a look
at the descriptions of the properties provided in the agent.properties
file.
Graceful Job Agent shutdown
Job Agents will by default wait for all job executions to finish before shutting down.
This can however be avoided by setting the Job Agent's jobagent.shutdown.timeout
.
Container Properties
You can add properties to a Job Container depending when you like to have them available. In general, Job Containers reference their properties in two different ways:
- on start and
- when a job is going to be executed.
On start
When a new Job Container has been spawned by a Job Agent, the execution context defined in
{homeDir}/config/rapidminer/.RapidMiner
is being copied to the Job Container so that it can use it during execution.
You can place your own configuration files into this directory if you need it for your extensions.
You can also use the Central resource management to synchronize the execution context from the RapidMiner Server home folder.
It is also possible to add additional properties during Job Container Studio initialization. This is particularly useful if you need to provide extension properties which are already required during Job Container start, e.g. when operators are registered:
- Use
jobagent.container.initWithProperties.enabled
to enable or disable, disabled by default - Use
jobagent.container.initWithProperties.location
to set an absolute location to a properties file, defaults torapidminer-init.properties
in the{homeDir}/config/rapidminer/
folder
Those property files are not automatically synchronized and might need to be adapted for each Job Agent instance you've deployed.
On queuing a new job
When you submit a job to a queue, it is picked up by the corresponding Job Agent responsible for this queue. Afterwards
it is forwarded to a Job Container managed by this Job Agent via REST. Whenever this happens, the properties file
{homeDir}/config/rapidminer/rapidminer.properties
is read by the Job Agent and its contents are piped into the
job so that the Job Container can use them as system properties and therefore they are also exposed to extensions during
execution. Remember, that properties are overwritten for new jobs. This means that changing the file between executions
results in different property values being propagated to the Job Container for different jobs.
This file can also be used to provide custom properties (e.g. extension properties) for a Job Container but which are not already required during Job Container start.
Container JVM arguments
Job Containers are started by their Job Agent with a default set of JVM arguments, e.g. something like XX:+UseG1GC
.
To add additional arguments which will be transposed to the Job Container, edit the agent.properties
file and add new
properties by specifying them similar to jobagent.container.jvmCustomOptions = -Dnew.property=new -Danother.property=another
.
This will transpose -Dnew.property=new -Danother.property=another
to each Job Container spawned by a Job Agent.
Please notice that the entire value of the property jvmCustomOptions
will be transposed to the Job Container start
arguments. Any error in this might lead to the Job Container not spawning correctly anymore.
If necessary, it's also possible to override all default JVM arguments although we highly advise against it. In certain
use cases this might still be feasible and needed. To override them you need to edit the agent.properties
file
and define something similar to jobagent.container.jvmProperties = Dtest.property1=test1,Dtest.property2=test2
. Ensure
that any default argument which you need is still present. All JVM default arguments are printed in the agent.log
when the Job Agent starts.
Please notice that there are no leading hyphens and that properties are separated via comma for the jvmProperties
property.
Resources
To enable correct execution of RapidMiner processes the Job Agent uses various external resources like JDBC drivers, RapidMiner extensions, custom Java libraries, and RapidMiner Server licenses.
These resources are stored within the {homeDir}/resources/
folder of the Job Agent.
Central resource management
The Job Agent's external resources are centrally managed and automatically synchronized from the RapidMiner Server instance the Job Agent is connected to. Also the Execution context will be synchronized by default to all Job Agents.
Resource management
Centrally managed Job Agent resources are stored in the resource/
folder of the RapidMiner Server home folder.
Both RapidMiner Server and all connected Job Agents use the same set of resources.
To install a new or manage an existing centrally managed resources do the following:
- Select the resource type you want to update (JDBC, Extensions, Custom libraries)
- From the table below locate the path of the resource type you want to update
- Update the resource type by adding or removing content from the selected folder
- Restart RapidMiner Server. All connected Job Agents will automatically synchronize the new resource configuration from RapidMiner Server.
Type | Path |
---|---|
JDBC | <rapidminer-server-home>/resources/jdbc/ |
Extensions | <rapidminer-server-home>/resources/extensions/ |
Custom libraries | <rapidminer-server-home>/resources/libs/ |
Licenses | Licenses are automatically synced to connected Job Agents on license installation via the RapidMiner Server UI |
Execution context | <rapidminer-server-home>/.RapidMiner/ |
Automatic synchronization
By default, all resources and the execution context are automatically synchronized from the RapidMiner Server instance after the Job Agent has been started. Also, the resources are synchronized after a restart of the RapidMiner Server instance.
The Job Agent downloads the resources to the type specific resource folder and overwrites all potential existing files. These type specific resource folders are:
Type | Path |
---|---|
JDBC | <jobagent-home>/resources/jdbc/ |
Extensions | <jobagent-home>/resources/extensions/ |
Custom libraries | <jobagent-home>/resources/libs/ |
Licenses | <jobagent-home>/resources/licenses/ |
Execution context | <jobagent-home>/config/rapidminer/.RapidMiner |
Individual resource management
To setup a Job Agent with an individual resources that differ from the centrally managed resource set the automatic synchronization can be disabled by setting jobagent.sync.enabled = false
in the config/agent.properties
file.
After a restart of the Job Agent it will only use the resources that are already available in the respective resource folders.
To install a new or manage an existing local resources do the following:
- Shutdown the Job Agent
- Locate the path of the resource type you want to update
- Update the resource type by adding or removing content from the selected folder
- Restart the Job Agent
Resource types
JDBC
JDBC connections can be defined in the {homeDir}/resources/jdbc/jdbc_properties.xml
file.
Extensions
Extensions are provided from {homeDir}/resources/extensions/
directory.
Custom libraries
Custom libraries are Java libraries which you can be used in a RapidMiner process, for example within the
Execute Script
operator. You can add these libraries to the {homeDir}/resources/libs/
folder and then they are
automatically available for execution.
Don't confuse custom libraries with JDBC drivers or extensions.
Licenses
The Job Agent licenses are installed in the {homeDir}/resources/licenses/
directory.
You can define the number of spawned Job Containers (jobagent.container.count
) for each Job Agent and the memory per Job Container (jobagent.container.memoryLimit
) in the {homeDir}/config/agent.properties
file.
Keep in mind that these settings need to comply with your current server license.
Execution context
The Execution context for each Job Container are copied over from {homeDir}/config/rapidminer/.RapidMiner
during Job Container start.