Categories

Versions

You are viewing the RapidMiner Server documentation for version 9.7 - Check here for latest version

Requirements

This page describes the requirements for RapidMiner Server in a high-availability environment, including potential problems for the components of such a setup.

System requirements

In a high-availability environment, there are at least two RapidMiner Server nodes. The system requirements for each of these nodes are identical to the system requirements for RapidMiner Server outside of a high-availability environment.

Due to inherent deadlocks that can occur in the MySQL database engine at high load, we do not support it as an operational database for the RapidMiner Server High Availability setup.

Component requirements

RapidMiner Server High Availability consists of a cluster of components, each on a dedicated machine, and connected over a high-speed LAN connection.

Each component has specific requirements, but only the load balancer needs to have a publicly accessible URL. The URL of RapidMiner Server is the URL of the load balancer; this is the machine that is identified as RapidMiner Server in the DNS.

The remaining machines (RapidMiner Server nodes, Job Agent nodes, shared database, shared file system, and ActiveMQ broker) do not need to be publicly accessible to your users.

RapidMiner Server node

Note the following requirements for the RapidMiner Server nodes:

  • Each RapidMiner Server node must run on a dedicated machine. The machine can be virtual or physical.
  • The nodes must be connected on a high speed LAN (high bandwith, low latency).
  • The nodes need not be identical, but for consistent performance we recommend that they be as similar as possible.
  • All nodes must run the same version of RapidMiner Server.
  • All nodes must have synchronized clocks (for example, using NTP) and be configured with the same timezone.
  • All nodes must connect to the ActiveMQ broker.

Job Agent node

Note the following requirements for the Job Agent nodes:

  • Each Job Agent node must run on a dedicated machine. The machine can be virtual or physical.
  • The nodes must be connected on a high speed LAN (high bandwith, low latency).
  • All Job Agent nodes must run the same version, matching the RapidMiner Server version.
  • All nodes must have synchronized clocks (for example, using NTP) and be configured with the same timezone.
  • All nodes must connect to the ActiveMQ broker.

Load balancer

We do not bundle a load balancer for RapidMiner Server High Availability. You can use the load balancer of your choice.

  • Your load balancer must support sticky sessions.
  • Your load balancer should run on a dedicated machine.
  • Your load balancer must have a high-speed LAN connection to the RapidMiner Server nodes.
  • For best performance, terminating SSL (HTTPS) at your load balancer and running plain HTTP between the load balancer and RapidMiner Server is highly recommended.

We recommend using HAProxy, which supports all required features out-of-the-box. The load balancer page describes how to set up HAProxy as a load balancer.

Shared database

A shared database is used to store configuration data and other metadata.

  • The shared database must run on a dedicated machine.
  • The shared database must be available to all RapidMiner Server nodes via a high-speed LAN (it must be in the same physical data center).
  • All the usual databases are supported, except MySQL.

Shared file system

RapidMiner Server High Availability requires a high performance shared file system such as a SAN, NAS, RAID server, or high-performance file server optimized for I/O.

  • The shared file system must run on a dedicated machine.
  • The file system must be available to all cluster nodes via a high-speed LAN (it must be in the same physical data center).
  • The shared file system should be accessible via NFS as a single mount point.

ActiveMQ broker

RapidMiner Server High Availability requires a remote ActiveMQ broker, as the bundled one is not active for RapidMiner Server High Availability.

  • The ActiveMQ broker must run on a dedicated machine.
  • Currently, only ActiveMQ version 5.14.5 has been tested and is officially supported, but feel free to test more updated 5.x versions.
  • You can use a standalone ActiveMQ installation or a clustered installation.