Sidekiq, similarly to Rack, has a concept of middlewares. A list of wrappers around its processing logic that you can use to include custom behavior.

In chillout we use it to collect and send a number of metrics:

  • how long did it take to process a job

    Obviously it is nice to notice when a certain jobs starts to work much slower than usually.

  • how long did it take between scheduling a job and starting a job

    This is useful to know if your Sidekiq workers are not saturated. Ideally the numbers should be around 1-2ms, which means you are processing everything as it comes and have no delay.

    Depending on what your application does a second or two of a delay might be good enough as well. But if the number is getting higher it means you are having problems and maybe you need more machines, threads or just investigate a temporary issue.

    If it is one job causing you problems, check out your options in Handle sidekiq processing when one job saturates your workers and the rest queue up.

    I used to think that number of unprocessed jobs is a good metric, but I think this is better. I doesn’t matter if you have 1 or 10_000 jobs waiting if you can start all of them very quickly because you have enough workers and the jobs are processed very quickly.

    The delay before processing is a better indicator than queue size. Because you don’t know if you have 1000 jobs which take 10ms each, or 1 job which takes 10 minutes to finish. And all you care about is the effect on other jobs waiting in queues.

  • did it finish successfully or with a failure

    So that one can monitor a failure rate

  • queue and job names

    To have granular metrics per jobs and queues.

The code is very simple and nicely explained in Sidekiq documentation so if you want to build your own logging or monitoring, it’s not hard.

class SidekiqMonitor
  def initialize(options)
    @client = options.fetch(:client)
  end

  def call(_worker, job, queue)
    started = Time.now.utc
    success = false
    yield
    success = true
  ensure
    enqueue(queue, job, started, success)
  end

  def enqueue(queue, job, started, success)
    finished = Time.now.utc
    @client.enqueue(SidekiqJobMeasurement.new(
      job,
      queue,
      started,
      finished,
      success
    ))
  end
end

class SidekiqJobMeasurement
  attr_reader :retriable, :queue, :started,
    :finished, :delay, :duration, :success

  def initialize(job, queue, started, finished, success)
    @class     = job["class"].to_s
    @retriable = job["retry"].to_s
    @queue     = queue
    @started   = started.utc
    @finished  = finished.utc
    enqueued_at = job["enqueued_at"]
    @delay = 1000.0 * (@started.to_f - enqueued_at)
    @duration = 1000.0 * (@finished.to_f - @started.to_f)
    @success = success.to_s
  end
end

Sidekiq.server_middleware.add SidekiqMonitor,
  client: client

Effect (click to enlarge):

Testing middlewares is also easy:

  def setup
    @client = mock("Client")
    Sidekiq::Testing.server_middleware.add SidekiqMonitor,
      client: client
  end

  def teardown
    Sidekiq::Testing.server_middleware.clear
  end

  class EmptyJob
    include Sidekiq::Worker
    def perform; end
  end

  def test_enqueues_stats
    @client.expects(:enqueue).with do |measurement|
      SidekiqJobMeasurement === measurement
    end
    Sidekiq::Testing.inline! { EmptyJob.perform_async }
  end

  class ErrorJob
    Doh = Class.new(StandardError)
    include Sidekiq::Worker
    def perform
      raise Doh
    end
  end

  def test_enqueues_stats_even_on_failure
    @client.expects(:enqueue).with do |measurement|
      SidekiqJobMeasurement === measurement &&
        measurement.success == "false"
    end
    Sidekiq::Testing.inline! do
      assert_raises(ErrorJob::Doh) do
        ErrorJob.perform_async
      end
    end
  end