Categories

Versions

You are viewing the RapidMiner Server documentation for version 9.3 - Check here for latest version

Load balancer

There are several solutions for load balancing traffic between different instances of the same application (nginx load balancer in commercial version, Elastic Load Balancing), but HAProxy is the current open-source go-to solution for load balancing with support for session stickiness and will therefore be presented in this guide. It's able to handle a lot of traffic. Similar to nginx, it uses a single-process, event-driven model and therefore has a low memory fingerprint and is able to handle a large number of concurrent requests.

This article covers how to set up HAProxy to load balance between two RapidMiner Server instances, but SSL configuration is not covered by this guide.

Setup

Be sure that you follow the steps outlined in this article. The load balancer should be a dedicated machine which is only responsible for redirecting traffic and load balancing several RapidMiner Server instances. In this setup we'll assume that you use an Ubuntu machine and that SSL configuration will not be done within the load balancer but within an additional reverse proxy.

  1. Install haproxy with the package manager of your distribution. For Ubuntu there's a dedicated repository to install the haproxy package:

      sudo add-apt-repository ppa:vbernat/haproxy-1.8
      sudo apt-get update
      sudo apt-get install -y haproxy
    
  2. After the installation, the HAProxy configuration can be found at /etc/haproxy/haproxy.cfg. The default configuration is split into two sections: global and defaults. If you want to change the user which runs the HAProxy process or adapt some logging behaviour, you can do this in those sections. See the HAProxy documentation for more details. For our basic setup we'll skip those and just define two additional sections: frontend and backend. The frontend section contains the connections where HAProxy receives incoming traffic. The backend section contains the connections where HAProxy redirects and load balances the traffic to.

  3. Add the frontend section to your haproxy.cfg:

     frontend localnodes
         bind *:80
         mode http
         default_backend rapidminerservers
    

    In this example setup, HAProxy will listen for requests on all network interfaces (*) on port 80 but only for the HTTP protocol. The frontend section serves as traffic input. All observed/incoming traffic from this port is load balanced between nodes defined in the backend section rapidminerservers (traffic output).

  4. Add the backend section to your haproxy.cfg:

     backend rapidminerservers
         mode http
         balance roundrobin
         option forwardfor
         http-request set-header X-Forwarded-Port %[dst_port]
         http-request add-header X-Forwarded-Proto https if { ssl_fc }
         option httpchk HEAD / HTTP/1.1\r\nHost:localhost
         cookie RAPIDMINER_SRV prefix
         server rapidminerserver1 ip-address-of-first-instance:8080 cookie check
         server rapidminerserver2 ip-address-of-second-instance:8080 cookie check
    
    • mode http: This will pass HTTP requests to the servers listed.
    • balance roundrobin: Use the round-robin strategy for load distribution.
    • option forwardfor: Adds the X-Forwarded-For header so RapidMiner Server instances can get the clients actual IP address. Without this, RapidMiner Server instances would instead see every incoming request as coming from the load balancer's IP address.
    • http-request set-header X-Forwarded-Port %[dst_port]: Manually add the X-Forwarded-Port header so that RapidMiner Server instances know which port to use when redirecting.
    • option httpchk HEAD / HTTP/1.1\r\nHost:localhost: Set the health check HAProxy uses to test if the RapidMiner Server instances are still responding. If these fail to respond without error, the server is removed from HAProxy. This sends a HEAD request with the HTTP/1.1 and Host header set.
    • http-request add-header X-Forwarded-Proto https if { ssl_fc }: Add the X-Forwarded-Proto header and set it to "https" if the "https" scheme is used over "http" (via ssl_fc). Similar to the forwarded-port header, this can help RapidMiner Server instances determine which scheme to use when sending redirects.
    • cookie RAPIDMINER_SRV prefix: Add a unique session identifier. With the help of this sticky sessions are enabled.
    • server rapidminerserver1 ip-address-of-first-instance:8080 cookie check: Add RapidMiner Server instances for HAProxy to balance traffic between. Set their IP address and port (RapidMiner Server's default port is 8080), and adds the directive check to tell HAProxy to health check the server. The cookie directive tells HAProxy to always re-use the same server for a session (stickiness).

    Ensure that the load balancer can reach all RapidMiner Server instances on port 8080.

  5. (Optional) Add a statistics website to monitor traffic and load balancing. Adjust your haproxy.cfg and add:

     listen stats *:1936
         stats enable
         stats uri /
         stats hide-version
         stats auth someUser:somePassword
    
  6. (Optional) If you want to add a RapidMiner Server instance to HAProxy, add server rapidminerserverX ip-address-of-instance:8080 cookie check to the backend section.

  7. The load balancer is ready to serve after you've started the service with sudo service haproxy start. Depending on the distribution you use, you might need to set ENABLED = 1 in the /etc/default/haproxy config file. It will load balance between all configured RapidMiner Server instances. If you've configured the stats listen you can now visit the load balancer's IP address on port 1936 with the user someUser and the password somePassword to monitor HAProxy.