If you care about your application performance you have to schedule extra tasks into background when handling requests. One of such tasks may be collecting performance or business metrics. In this post I’ll show you how to avoid potential problems with threaded background workers.
I was working on chillout client to collect metrics from ActiveRecord creations. Initially the code was sending collected metrics during the request. It was simpler but slowed down the application response to the customer. The response time was also fragile with regard to metrics endpoint availability. So I had the idea to start a worker thread in background responsible for that. Since everything worked like a charm in development, a deployment was inevitable. Then things started to get hairy.
My production application was running on Unicorn and it was configured to preload application code. In that settings Unicorn master process will boot an application and next when code is loaded it will fork into several application workers.
The problem with fork call is that only main thread survives it:
Inside the child process, only one thread exists. It is made from a copy of the thread that called fork in the parent.
This means that under any forking server (e.g Unicorn, Phusion Passenger) our background thread will die, provided it was started before process forked. You may think:
I know, I’ll use after_fork hook.
And this might be solution for you and your specific web server. It definitely isn’t a solution when you don’t want to be tied to particular deployment option or explicitly support all webserver specific solutions.
The other possibility is to start our worker thread lazily when it’s actually needed for the first time. A naive implementation may look like this:
class MetricClient def initialize @queue = Queue.new end def enqueue(metric) start_worker unless worker_running? @queue << metric end def worker_running? @worker_thread && @worker_thread.alive? end def start_worker @worker_thread = Thread.new do worker = Worker.new(@queue) worker.run end end end
Now that we have lazy loading mechanism we’re good to deploy anywhere, right? Wrong! As soon as we deploy to threaded server (e.g. Puma) we’ll encounter another problem.
Since changing webserver model to threaded we will service several requests in one process concurrently. Each of these threads servicing request will be racing to start the worker in background but we want only one instance of the worker to be present. Thus we have to make worker starting code thread-safe:
class MetricClient def initialize @queue = Queue.new @worker_mutex = Mutex.new end def enqueue(metric) ensure_worker_running @queue << metric end def ensure_worker_running return if worker_running? @worker_mutex.synchronize do return if worker_running? start_worker end end def worker_running? @worker_thread && @worker_thread.alive? end def start_worker @worker_thread = Thread.new do worker = Worker.new(@queue) worker.run end end end
Now we’re good to go on any forking or threading web server. We’re covered even in such a rare case of webserver forking to threaded workers (does it actually exist?). Life is good.
The case of BufferedLogger
There’s one peculiar thing left. If you happen to use logger in your worker thread and it is BufferedLogger from Rails you’ll be surprised to find out some of your messages don’t get logged. It’s a known and apparently solved issue. If you have to support apps which didn’t get the fix just remember to explicitly call flush on logger.