You are viewing the RapidMiner Server documentation for version 9.3 - Check here for latest version
Installation guide
In this guide we'll run you through installing RapidMiner Server as a High Availability cluster in a Linux environment. It covers installing RapidMiner Server High Availability for the first time, with no existing data.
Terminology
In this guide we'll use the following terminology:
- Installation directory -
is the directory where you installed RapidMiner Server on a node. - Shared home directory –
The RapidMiner Server home directory that is accessible to all nodes in the cluster via the same path.
Test RapidMiner Server High Availability installation
Be sure to test your RapidMiner Server High Availability installation thoroughly before deploying to production.
- Set up and test RapidMiner Server High Availability in your staging environment before deploying to a production environment.
- Test RapidMiner Server High Availability with identical data (repositories, users, extensions) to your production instance.
Accessing a RapidMiner Server High Availability installation
When the installation is completed, the URL of RapidMiner Server will be the URL of the load balancer; this machine should be identified as RapidMiner Server by the DNS. The remaining machines do not need to be publicly accessible to your users.
Provision the shared database, shared filesystem, and ActiveMQ broker
Provision the shared database
Set up the shared database server and make sure that your database allows enough concurrent connections.
With many RapidMiner Server nodes connecting to the same database the default connection limit might be quickly exceeded.
For PostgreSQL, for example, the default limit is 100 connections. To increase the limit, edit the postgresql.conf file and increase the value of max_connections
, then restart PostgreSQL.
Provision the shared filesystem
Set up the shared NFS filesystem and make sure RapidMiner Server nodes can access it and have full read and write permissions.
Provision ActiveMQ broker
Although the RapidMiner Server cluster will function with a single instance of ActiveMQ, we highly recommend clustering it as well, because high availability depends on each component being highly available. You don't want ActiveMQ to be the single point of failure. For the sake of completeness, both a single-node setup and a clustered setup are outlined below.
Single node ActiveMQ setup
- Download and install ActiveMQ.
Currently only ActiveMQ version 5.14.5 has been tested and is officially supported but feel free to test more updated 5.x versions.
If you’re using GNU/Linux ActiveMQ packages should be provided by your distribution. You can easily install them with your package manager and start the application with the help of a system daemon like initd or systemd.
Configure the ActiveMQ broker user that will be used by RapidMiner Server and the Job Agents:
Open
<activemq-conf-dir>/users.properties
and add a new broker user and password (e.g., the user "brokerUser" with password "brokerP4ssw0rd"):admin=admin brokerUser=brokerP4ssw0rd
Open
<activemq-conf-dir>/groups.properties
and add the new user to the users group:admins=admin users=brokerUser
Write down the new user's credentials. They are needed to configure the connection from RapidMiner Server and from the Job Agents to the broker.
- Start ActiveMQ.
Clustered ActiveMQ setup
- Download and install ActiveMQ on all your machines serving as ActiveMQ instances.
- Install the ActiveMQ instances on every machine. To do so, follow any setup described here.
- It is advised to use the Shared File System Master Slave setup as your clustered setup already has a shared file system for the RapidMiner Server home directory.
- Please make sure that all instances share the same broker user credentials (see "Single node ActiveMQ setup" on how to setup credentials)
- Start all instances.
Prepare a headless installation
To install RapidMiner Server on the nodes we will use the headless installation option. A detailed description is given on the headless installation documentation page. However here's a short overview on how to prepare the headless installation:
- Download the RapidMiner Server installer on a machine with a UI
- Start the installer and choose the "Install RapidMiner Server on a headless machine" option
- Go through the installer steps and use configuration values appropriate for the clustered setup of RapidMiner Server
- Use the reachable hostname/IP address
load_balancer_address
of the load balancer for the server host name - Make sure to disable bundled Job Agents
- Do not enable the Radoop proxy
- Use the reachable hostname/IP address
- Finally, generate the installation XML file and store it on your disk. This file will be used to install RapidMiner Server on the nodes.
Prepare the first RapidMiner Server node
- Provision the infrastructure of the first RapidMiner Server node. You can automate this by using a configuration management tool such as Chef or Puppet or by spinning up identical virtual machine snapshots.
Make sure the filesystem of your RapidMiner Server node supports UTF-8. If not add the following statement to the
/etc/environment
configuration file:LC_ALL=en_US.UTF-8 LANG=en_US.UTF-8
Mount the shared home directory.
For example, let's assume your RapidMiner Server home directory is
/var/rapidminer/application-data/rapidminer-server/
and your shared home directory is available as an NFS export calledrapidminer-san:/rapidminer-server-home
. Add the following line to/etc/fstab
on each cluster node:rapidminer-san:/rapidminer-server-home /var/rapidminer/application-data/rapidminer-server/ nfs lookupcache=pos,noatime,intr,rsize=32768,wsize=32768 0 0
Then mount it:
mkdir -p /var/rapidminer/application-data/rapidminer-server/ sudo mount -a
Make sure all nodes have synchronized clocks and identical timezone configuration. Here are some examples for how to do this:
Red Hat Enterprise Linux or CentOS:
sudo yum install ntp sudo service ntpd start sudo tzselect
Ubuntu:
sudo apt-get install ntp sudo service ntp start sudo dpkg-reconfigure tzdata
Install RapidMiner Server on the first node
Once the infrastructure for the first RapidMiner Server node is available and meets all the node requirements, you can start installing RapidMiner Server.
Install RapidMiner Server
- Download the RapidMiner Server installer and extract it
- Upload the headless installation XML file to the node
Run the headless installation:
cd <rapidminer-server-installer> ./bin/rapidminer-server-installer <file_name>.xml
Adapt configuration
After the installation has finished you need to adapt a few configurations to configure RapidMiner Server for High Availability.
First adapt the
execution.properties
configuration file to enable the cluster mode. The file can be found in the<shared home>/configuration/
folder.Enable clustered mode for RapidMiner Server via
rapidminer.server.isClustered = true
Configure the load balancer URL as the RapidMiner Server URL like this
rapidminer.server.protocol = http rapidminer.server.host = <load_balancer_address> rapidminer.server.port = <port>
Disable the embedded ActiveMQ broker and point to the external broker like this:
jobservice.queue.activemq.embeddedBroker.enabled = false jobservice.queue.activemq.uri = failover:(tcp://172.31.21.116:61616,tcp://172.31.21.112:61616) jobservice.queue.activemq.username = brokerUser jobservice.queue.activemq.password = brokerP4ssw0rd
Next update
scheduler.properties
configuration file to enabled a clustered scheduler. The config file is located in the same folder as theexecution.properties
file. Add following lines:org.quartz.jobStore.isClustered = true org.quartz.jobStore.clusterCheckinInterval = 10000
Edit the
standalone.conf
file located in the<install directory>/bin/
folder.Look for
JAVA_OPTS="$JAVA_OPTS -Djboss.server.log.dir=$RAPIDMINER_SERVER_HOME/log"
and change it to a new log folder that matches the instance name. For example:
JAVA_OPTS="$JAVA_OPTS -Djboss.server.log.dir=$RAPIDMINER_SERVER_HOME/log/instance1"
Also, add a new line that points the Execution Backend to the localhost right next to the other JAVA_OPTS lines. For example:
JAVA_OPTS="$JAVA_OPTS -Dexecution-backend-url=http://localhost:8080/executions"
Add the RapidMiner Server node to the load balancer
- Start the first RapidMiner Server node
- Open Web UI of RapidMiner Server at
http(s)://<load_balancer_address>:<port>
and login as admin - Make sure everything works fine (e.g. extensions are loaded, server logs can be inspected, etc.)
Install additional RapidMiner Server nodes
Once the first RapidMiner Server node is up and running, you can add more nodes to the cluster. There are two ways you can add more nodes: either manually or with a snapshot of the first node. Both are described below. The manual option requires a little more effort though.
Add nodes manually
To add nodes manually:
- Provision the infrastructure for additional modes, and then repeat the headless installation steps described in the section above.
You do not need to adapt the whole configuration again. But unfortunately each RapidMiner Server headless installation overwrites the shared configuration folder of the initial installation. Please go to the
<shared home>
folder and restore the backup configuration every time the headless installation has finished. For example:cd <rapidminer-server-installer> ./bin/rapidminer-server-installer <file_name>.xml ### # wait for installation to finish ### cd /var/rapidminer/application-data/rapidminer-server/ # delete newly created configuration and replace initial config rm -rf configuration/ mv configuration_backup_9.1.0_2018-11-08_14-40-42/ configuration/
Configure a new log folder in the file
<install directory>/bin/standalone.conf
, as described in the section above.- Once the installation is finished and the initial configuration is restored, you can make the new node available as an endpoint by adding the IP address and port
8080
to the loadbalancer. - Start the new RapidMiner Server node
Add nodes from snapshot
If you are running RapidMiner Server in a virtual infrastructure or in the Cloud, we recommend creating a snapshot of the initial node, then adding new nodes from the snapshot.
To do so:
- Shutdown RapidMiner Server on the initial node
- Create a snapshot of the virtual instance
- Restart the initial RapidMiner Server node once the snapshot has been created
- Create a new node from the just created snapshot
- SSH to the new cluster node and configure a new log folder in the
<install directory>/bin/standalone.conf
file as described in the section above. - Add the new node to the load balancer
- Start new RapidMiner Server node
Install Job Agents
Each Job Agent should be installed on a dedicated machine. You can download the Job Agent ZIP file from RapidMiner Server's web interface, or you can call the REST API. We recommend the second approach, because you don't have to upload the ZIP file via SSH to your dedicated Job Agent machine. Using the second approach, proceed as follows:
- SSH to your machine on which the JobAgent will run.
To download the JobAgent ZIP file:
Obtain a token (value of the
idToken
field) which is eligible to access the download JobAgent route, e.g. the admin user:curl -u admin:PASSWORD http(s)://<load_balancer_address>:<port>/api/rest/tokenservice
Download the ZIP for a queue QUEUENAME. The default queue is named DEFAULT. Be aware that names are case sensitive.
curl -H "Authorization: Bearer TOKEN_FROM_REQUEST_ABOVE" http(s)://<load_balancer_address>:<port>/executions/queues/QUEUENAME/agent --output /path/to/save/location/JobAgent.zip
Unzip the ZIP file to your preferred location. For example:
unzip /path/to/save/location/JobAgent.zip -d /path/to/extract/location
Adjust properties in the
home/config/agent.properties
file to your needs. The ActiveMQ broker URI should point to your ActiveMQ cluster which you've already configured in theexecution.properties
file of the shared RapidMiner Server home directory. Theuri
property represents a set of available ActiveMQ instances with their default port61616
. For example:jobagent.queue.activemq.uri = failover:(tcp://172.31.21.116:61616,tcp://172.31.21.112:61616) jobagent.queue.activemq.username = brokerUser jobagent.queue.activemq.password = brokerP4ssw0rd
(Optional) Add extensions or JDBC drivers.
- Start the JobAgent.
Congratulations!
That's it! RapidMiner Server is accessible in High Availability mode from a URL like this: http(s)://<load_balancer_address>:<port>