Load balancing with HAProxy
What is HAProxy?
HAProxy is an HTTP proxy. Which means that it receives an incoming HTTP request from a client (most likely, an upstream web server), selects a free Mongrel instance, forwards the request to it, receives a response and passes it back to wherever the request came from.
Why HAProxy?
Rails is not designed to run concurrently in multiple threads within the same process. To scale to a large number of requests and fully utilize available hardware, a Rails application must be executed in several processes. RubyWorks stack uses a web server called Mongrel to execute application code, and by default is configured to run four of them.
Something must distribute the incoming traffic among Mongrels. HAProxy plays this role in RubyWorks stack.
A number of sources recommend using Apache’s mod_proxy_balancer for the same purpose. HAProxy, however, has a big advantage over it.
Remember that Mongrel is essentially a single-threaded process. More accurately, each incoming request is assigned to a separate thread, but just before entering Rails code, there is a lock that only allows one thread to be there at any time.
Imagine that one of your four Mongrels gets stuck for 5 minutes on an occasional request that takes a lot of time to execute. With a simple round-robin load balancer, this means that every fourth request within the next 5 minutes goes to the Mongrel instance busy with the long-running request, to sit on the Rails lock.
HAProxy, on the other hand, can be configured to send no more than one request at a time to any Mongrel. In other words, it will always pick a Mongrel instance that is not busy with something else.
Configuration
HAProxy configuration is in /etc/rails/haproxy.conf file. It consists of three sections that start with
words global, defaults and listen.
global section contains process-wide settings, such as user and group to run under
(rails/rails), and the syslog facility to send logs to (local0 on 127.0.0.1).
listen section defines a service that receives requests on port 3001 and distributes them among downstream servers (Mongrel processes) running on ports 3002-3005.
defaults section is configuration settings that apply to any listen section. By default, RubyWorks stack has only one listen section, but there may be several, for example, if multiple Rails applications run on the same server.
A sysadmin should read and understand every setting in /etc/rails/haproxy.conf (there are some helpful
comments in that file). In practice, an application is most likely to be affected by the timeout settings.
srvtimeout 30000 parameter in the defaults section makes HAProxy drop a request if Mongrel takes more
than 30 seconds (30000 msec) to process it. If you have requests that take longer than that, you’ll have to increase the value.
Notes on process management and troubleshooting
HAProxy process (/usr/bin/haproxy) is launched by runit, executing this file:
/var/service/haproxy/run
To control HAProxy process, run the following commands:
- Start:
sudo monit start haproxy - Stop:
sudo monit stop haproxy - Restart:
sudo monit restart haproxy
Process ID of a running HAProxy process is at /var/service/haproxy/supervise/pid. If no HAProxy is running,
the pid file is empty.
HAProxy logs are directed via TCP/IP to syslog facility local0 at localhost. There are instructions on
setting up syslog to write HAProxy logs to a file at
Installation continued page.
Uncomment debug directive in global section of /etc/rails/haproxy.conf to turn on verbose
logging.
