Process management with runit
What is runit?
runit is a program to manage (start, stop and keep running) processes. It was intended to be a replacement for the standard Linux init, although RubyWorks stack uses it as an add-on instead.
Init (or whatever is the native way to launch daemon processes on a particular platform) is used to launch runit during the boot sequence, and then runit takes care of starting all other processes comprising the RubyWorks stack.
runit is a nice, uniform way to control services. It works the same way on all flavors of Linux, and doesn’t need actual services to be able to run in detached mode (Mongrel is notoriously weak in this area).
It also shortens boot time quite noticeably, especially on minimalist Linux configurations, by starting all services simultaneously, whereas init does it one process at a time.
As mentioned above, at boot time init has to start one process,
/usr/sbin/runsvdir. On RedHat and CentOS
RubyWorks installation adds this line to /etc/inittab:
On Debian and Ubuntu
runsvdir is started by
runsvdir then looks at directory
/var/service/. Every subdirectory there is a service. In other words,
a process that runit has to start and manage.
For every service subdirectory in
/var/service/, runsvdir spawns a service controller process
/usr/sbin/runsv), which executes
run script in service’s subdirectory.
For example, HAProxy is launched by running
/var/service/haproxy/run in a text editor, and you will note that it starts HAProxy in foreground
mode, using exec, which is a shell keyword to replaces the shell process with a program specified in exec
parameters. The end result is an
haproxy process, which is a child of
runsv process, which
is a child of
init (process #1) runsvdir (runit's main process) runsv haproxy (HAProxy service controller) haproxy (the service itself) runsv mongrel_3002 (service controller for Mongrel on port 3002) rails_mongrel -p 3002 (Mongrel service) ... (and so on)
Observe that actual services always have a parent controller process (
runsv). If a service process
haproxy in the example above) ever terminates, its controller receives a SIGCHILD signal,
and promptly responds to it by spawning another
haproxy service is kept alive.
Another thing worth mentioning is that every service has a
supervise directory, which has a number of
files used by runit itself, but also some that are for the benefit of external programs and scripts.
Thus, process id (PID) of the currently running
haproxy process can be obtained by reading the contents of
/var/service/haproxy/supervise/pid file, and the status of haproxy service (“up” or “down”) is in
Runit has a command-line interface to control services. It is called
sv, and can be used like this:
sv up /var/service/haproxy #bring up haproxy service by running /var/service/haproxy/run sv start haproxy #same as 'sv up haproxy', but then wait for 7 seconds and fail if haproxy isn't running sv kill haproxy #terminate HAProxy harshly, by sending it SIGKILL
man sv for many more options.
On thing you should keep in mind is that in RubyWorks stack running
sv stop haproxy from the
command line doesn’t really achieve the desired effect, because services are constantly monitored by a program called
monit. As soon as it notices that HAProxy is down, it will ask runit to restart it.
Monit itself is the only RubyWorks service that should be started and stopped by
Reading recent error messages through ps
runsvdir has a useful feature called ‘ps logging’. If any process under its control writes anything to its stderr, it
appears on runsvdir’s
ps entry. So, to have a quick glance at what’s currently going on in the stack,
run a command like this:
ps -ef | grep runsvdir, and you may see something like this:
[root@vmtest haproxy]# ps -ef | grep runsvdir root 3153 1 0 02:09 ? 00:00:01 runsvdir -P /var/service log: aproxy' process is not running? 'haproxy' trying to restart?monit: pidfile `/var/service/haproxy/supervise/pid' does not contain a valid pidn umber?monit: pidfile `/var/service/haproxy/supervise/pid' does not contain a valid pidnumber?'haproxy' start: /usr/bin/sv?monit: pidfile `/var/service/haproxy/supervise/pid' does not contain a valid pidnumber?'haproxy' process is running with pid 4081?...
The ps log has a fixed length, and runsvdir adds a dot to the end of it every 15 minutes. If nothing interesting (i.e., worth reporting in stderr) happens for a long time in any service, eventually you see just a long line of dots.