You are viewing the RapidMiner Legacy documentation for version 9.8 - Check here for latest version
Load balancer
There are several solutions for load balancing traffic between different instances of the same application (nginx load balancer in commercial version, Elastic Load Balancing), but HAProxy is the current open-source go-to solution for load balancing with support for session stickiness and will therefore be presented in this guide. It's able to handle a lot of traffic. Similar to nginx, it uses a single-process, event-driven model and therefore has a low memory fingerprint and is able to handle a large number of concurrent requests.
This article covers how to set up HAProxy to load balance between two RapidMiner Server instances, but SSL configuration is not covered by this guide.
Setup
Be sure that you follow the steps outlined in this article. The load balancer should be a dedicated machine which is only responsible for redirecting traffic and load balancing several RapidMiner Server instances. In this setup we'll assume that you use an Ubuntu machine and that SSL configuration will not be done within the load balancer but within an additional reverse proxy.
Install
haproxy
with the package manager of your distribution. For Ubuntu there's a dedicated repository to install thehaproxy
package:sudo add-apt-repository ppa:vbernat/haproxy-1.8 sudo apt-get update sudo apt-get install -y haproxy
After the installation, the HAProxy configuration can be found at
/etc/haproxy/haproxy.cfg
. The default configuration is split into two sections: global and defaults. If you want to change the user which runs the HAProxy process or adapt some logging behaviour, you can do this in those sections. See the HAProxy documentation for more details. For our basic setup we'll skip those and just define two additional sections:frontend
andbackend
. The frontend section contains the connections where HAProxy receives incoming traffic. The backend section contains the connections where HAProxy redirects and load balances the traffic to.Add the
frontend
section to yourhaproxy.cfg
:frontend localnodes bind *:80 mode http default_backend rapidminerservers
In this example setup, HAProxy will listen for requests on all network interfaces (
*
) on port80
but only for the HTTP protocol. The frontend section serves as traffic input. All observed/incoming traffic from this port is load balanced between nodes defined in the backend sectionrapidminerservers
(traffic output).Add the
backend
section to yourhaproxy.cfg
:backend rapidminerservers mode http balance roundrobin option forwardfor http-request set-header X-Forwarded-Port %[dst_port] http-request add-header X-Forwarded-Proto https if { ssl_fc } option httpchk HEAD / HTTP/1.1\r\nHost:localhost cookie RAPIDMINER_SRV prefix server rapidminerserver1 ip-address-of-first-instance:8080 cookie check server rapidminerserver2 ip-address-of-second-instance:8080 cookie check
mode http
: This will pass HTTP requests to the servers listed.balance roundrobin
: Use the round-robin strategy for load distribution.option forwardfor
: Adds the X-Forwarded-For header so RapidMiner Server instances can get the clients actual IP address. Without this, RapidMiner Server instances would instead see every incoming request as coming from the load balancer's IP address.http-request set-header X-Forwarded-Port %[dst_port]
: Manually add the X-Forwarded-Port header so that RapidMiner Server instances know which port to use when redirecting.option httpchk HEAD / HTTP/1.1\r\nHost:localhost
: Set the health check HAProxy uses to test if the RapidMiner Server instances are still responding. If these fail to respond without error, the server is removed from HAProxy. This sends a HEAD request with the HTTP/1.1 and Host header set.http-request add-header X-Forwarded-Proto https if { ssl_fc }
: Add the X-Forwarded-Proto header and set it to "https" if the "https" scheme is used over "http" (viassl_fc
). Similar to the forwarded-port header, this can help RapidMiner Server instances determine which scheme to use when sending redirects.cookie RAPIDMINER_SRV prefix
: Add a unique session identifier. With the help of this sticky sessions are enabled.server rapidminerserver1 ip-address-of-first-instance:8080 cookie check
: Add RapidMiner Server instances for HAProxy to balance traffic between. Set their IP address and port (RapidMiner Server's default port is8080
), and adds the directive check to tell HAProxy to health check the server. Thecookie
directive tells HAProxy to always re-use the same server for a session (stickiness).
Ensure that the load balancer can reach all RapidMiner Server instances on port
8080
.(Optional) Add a statistics website to monitor traffic and load balancing. Adjust your
haproxy.cfg
and add:listen stats *:1936 stats enable stats uri / stats hide-version stats auth someUser:somePassword
(Optional) If you want to add a RapidMiner Server instance to HAProxy, add
server rapidminerserverX ip-address-of-instance:8080 cookie check
to the backend section.The load balancer is ready to serve after you've started the service with
sudo service haproxy start
. Depending on the distribution you use, you might need to setENABLED = 1
in the/etc/default/haproxy
config file. It will load balance between all configured RapidMiner Server instances. If you've configured the stats listen you can now visit the load balancer's IP address on port1936
with the usersomeUser
and the passwordsomePassword
to monitor HAProxy.