Monitoring Sidekiq queues with middlewares

Monitoring Sidekiq queues with middlewares

Sidekiq, similarly to Rack, has a concept of middlewares. A list of wrappers around its processing logic that you can use to include custom behavior.

In chillout we use it to collect and send a number of metrics:

  • how long did it take to process a job

    Obviously it is nice to notice when a certain jobs starts to work much slower than usually.

  • how long did it take between scheduling a job and starting a job

    This is useful to know if your Sidekiq workers are not saturated. Ideally the numbers should be around 1-2ms, which means you are processing everything as it comes and have no delay.

    Depending on what your application does a second or two of a delay might be good enough as well. But if the number is getting higher it means you are having problems and maybe you need more machines, threads or just investigate a temporary issue.

    If it is one job causing you problems, check out your options in Handle sidekiq processing when one job saturates your workers and the rest queue up.

    I used to think that number of unprocessed jobs is a good metric, but I think this is better. I doesn’t matter if you have 1 or 10_000 jobs waiting if you can start all of them very quickly because you have enough workers and the jobs are processed very quickly.

    The delay before processing is a better indicator than queue size. Because you don’t know if you have 1000 jobs which take 10ms each, or 1 job which takes 10 minutes to finish. And all you care about is the effect on other jobs waiting in queues.

  • did it finish successfully or with a failure

    So that one can monitor a failure rate

  • queue and job names

    To have granular metrics per jobs and queues.

The code is very simple and nicely explained in Sidekiq documentation so if you want to build your own logging or monitoring, it’s not hard.

class SidekiqMonitor
  def initialize(options)
    @client = options.fetch(:client)

  def call(_worker, job, queue)
    started =
    success = false
    success = true
    enqueue(queue, job, started, success)

  def enqueue(queue, job, started, success)
    finished =

class SidekiqJobMeasurement
  attr_reader :retriable, :queue, :started,
    :finished, :delay, :duration, :success

  def initialize(job, queue, started, finished, success)
    @class     = job["class"].to_s
    @retriable = job["retry"].to_s
    @queue     = queue
    @started   = started.utc
    @finished  = finished.utc
    enqueued_at = job["enqueued_at"]
    @delay = 1000.0 * (@started.to_f - enqueued_at)
    @duration = 1000.0 * (@finished.to_f - @started.to_f)
    @success = success.to_s

Sidekiq.server_middleware.add SidekiqMonitor,
  client: client

Effect (click to enlarge):

Testing middlewares is also easy:

  def setup
    @client = mock("Client")
    Sidekiq::Testing.server_middleware.add SidekiqMonitor,
      client: client

  def teardown

  class EmptyJob
    include Sidekiq::Worker
    def perform; end

  def test_enqueues_stats
    @client.expects(:enqueue).with do |measurement|
      SidekiqJobMeasurement === measurement
    Sidekiq::Testing.inline! { EmptyJob.perform_async }

  class ErrorJob
    Doh =
    include Sidekiq::Worker
    def perform
      raise Doh

  def test_enqueues_stats_even_on_failure
    @client.expects(:enqueue).with do |measurement|
      SidekiqJobMeasurement === measurement &&
        measurement.success == "false"
    Sidekiq::Testing.inline! do
      assert_raises(ErrorJob::Doh) do

Do you want to become a trustworthy developer?

Learn how to communicate professionaly and solve your clients problems. Learn from 13 real-life stories, use our fuckups and from our experience. Develop good habits that will help you avoid production outages. Avoid stress with step-by-step execution guide how to act in case of emergency.

Click here to read more!

Responsible Rails

There is more... check out other books published by us

You might also like