Multi tenant applications with horizontal sharding and Rails Event Store

… and check why 5600+ Rails engineers read also this

Multi tenant applications with horizontal sharding and Rails Event Store

Horizontal sharding has been introduced in Rails 6.0 (just the API, with later additions to support automatic shard selection). This enables us to easily build multi tenant applications with tenant’s data separated in different databases. In this post I will explore how to build such an app with separate event store data for each tenant.

Application idea

Let’s sketch some non-functional requirements for our sample application.

  • First: all tenant data is separated into a shard database,
  • Second: tenant’s management, shared data, cache & queues use a single shared database (admin’s application),
  • Third: embrace async, all event handlers will be async and implemented using Solid Queue,
  • Fourth: each tenant uses separate domain.

And one last thing … the tenant database setup will be static, defined in config/database.yml file. This means to add a new tenant requires database setup, config file update & application deployment.

So after generating new Rails 8 application let’s make it work.

Database configuration

Rails documentation provides a very good description of how to configure your application to use horizontal sharding.

What you have to do is to define all shards (remember the primary one for the default shard, which we will use for the admin data) in each of the application environments.

Our application’s config/database.yml file will look like this:

default: &default
  adapter: sqlite3
  pool: 5
  timeout: 5000

development:
  primary:
    <<: *default
    database: storage/development.sqlite3
    migrations_paths: db/migrate
  cache:
    <<: *default
    database: storage/development_cache.sqlite3
    migrations_paths: db/cache_migrate
  queue:
    <<: *default
    database: storage/development_queue.sqlite3
    migrations_paths: db/queue_migrate
  cable:
    <<: *default
    database: storage/development_cable.sqlite3
    migrations_paths: db/cable_migrate
  arkency:
    <<: *default
    database: storage/development.arkency.sqlite3
    migrations_paths: db/shards
  railseventstore:
    <<: *default
    database: storage/development.railseventstore.sqlite3
    migrations_paths: db/shards

Remember to specify migration paths, or you’ll end up with your schema messed up.

Database schema

To create/setup/migrate the database you use the typical Rails database tasks.

By default the Rails tasks run for all shards. If you want to run it on a single shard just add a shard name after : to the task name, like in the example:

bin/rails db:migrate:<shard_name>

To generate migrations for a specific shard use the --database parameter to specify the shard where the generated migration should be executed:

bin/rails generate migration XXXX --database=<shard_name>

Define models

Because we have a mix of ActiveRecord models, some reaching the primary shard for tenant’s & shared data, and others used only to read/write to specific shards (just tenant’s business data) we need to define different base abstract classes for this 2 kinds of model classes.

Mark you “default” base class with primary_abstract_class and sets it to always connect to the primary database (default shard).

class ApplicationRecord < ActiveRecord::Base
  primary_abstract_class
  connects_to database: { writing: :primary, reading: :primary }
end

For the sharded models we have to define the base class as:

class ShardRecord < ApplicationRecord
  self.abstract_class = true

  connects_to shards: {
    arkency: { writing: :arkency, reading: :arkency },
    railseventstore: { writing: :railseventstore, reading: :railseventstore }
  }
end

Remember to add an entry here when a new tenant is defined.

Rails Event Store setup

Having defined the shard and an abstract base class that allows us to connect to a selected shard (how it is selected will be explained below) we could setup our RailsEventStore::Client instance.

There are three things to notice in the RES setup that are not typical ones.

1st: Event repository

We have to define the RES repository as an instance of RubyEventStore::ActiveRecord::EventRepository with specific model_factory. The ShardRecord should be used here as the abstract base class. This will allow RES to connect to a specific shard database.

2nd: Asynchronous dispatcher

In the RES documentation the RailsEventStore::AfterCommitAsyncDispatcher is recommended to be used if you want to handle domain events asynchronously. However since Rails has introduced enqueue_after_transaction_commit in the ActiveJob::Base all we need is:

class ApplicationJob < ActiveJob::Base
  self.enqueue_after_transaction_commit = true
end

and then just RubyEventStore::ImmediateAsyncDispatcher with RailsEventStore::ActiveJobScheduler is enough.

… which is very convenient, as RailsEventStore::AfterCommitAsyncDispatcher does not check what database it is connected to :(

3rd: Store shard in domain event’s metadata

The app uses an automatic shard selector & resolver (Rails feature). However this works only within the scope of a web request. All asynchronously executed application jobs, scripts, etc need to manually select the valid shard. That’s why the current shard is stored in each domain event’s metadata.

This information will be used in the event handlers. First to connect to the valid shard - the same as the handled domain event. Second to set up the RES client’s metadata to keep the shard information in all domain events published by the event handler. This is implemented in the ShardedHandler module.

module ShardedHandler
  def perform(event)
    shard = event.metadata[:shard] || :default
    ActiveRecord::Base.connected_to(role: :writing, shard: shard.to_sym) do
      Rails
        .configuration
        .event_store
        .with_metadata(shard: shard) { super }
    end
  end
end

RES setup

The complete setup of RailsEventStore::Client looks like this:

Rails.configuration.to_prepare do
  Rails.configuration.event_store = RailsEventStore::Client.new(
    repository: RubyEventStore::ActiveRecord::EventRepository.new(
      model_factory: RubyEventStore::ActiveRecord::WithAbstractBaseClass.new(ShardRecord),
      serializer: JSON,
    ),
    dispatcher: RubyEventStore::ComposedDispatcher.new(
      RubyEventStore::ImmediateAsyncDispatcher.new(
        scheduler: RailsEventStore::ActiveJobScheduler.new(serializer: JSON)
      ),
      RubyEventStore::Dispatcher.new
    ),
    request_metadata: ->(env) do
      request = ActionDispatch::Request.new(env)
      { remote_ip: request.remote_ip, request_id: request.uuid, shard: ShardRecord.current_shard.to_s }
    end,
  )
end

Automatic shard selection

Rails documentation describes how the Rails framework allows to define automatic database/shard selection.

First generate an initializer class using:

bin/rails g active_record:multi_db

In the sample application we need to modify it to match our needs.

Rails.application.configure do
  config.active_record.shard_selector = { lock: false, class: "ShardRecord" }
  config.active_record.shard_resolver = ->(request) { Tenant.find_by(host: request.host)&.shard || :default }
end

We set shard_selector to use the ShardRecord class as the source of information about defined shards. The lock: false allows the application to read from multiple shards at the same time (we need to manually handle that - details how to do it are described in Rails documentation).

Rails documentation advises:

For tenant based sharding, lock should always be true to prevent application code from mistakenly switching between tenants.

However we have the requirement to keep shared data in a separate database and only keep tenant’s business data in its shards. That’s why we allow switching shards.

The shard_resolver is very simple here - just use the request’s host name to find the tenant in shared database and then use its shard method to find out which shard the tenant’s data should be stored or read from.

Event handlers

Rails handles shard selection for us during a web request. But as I’ve mentioned before this will not work in asynchronously processed application jobs - event handlers. RailsEventStore already defines async handler helper modules RailsEventStore::AsyncHandler to handle domain event asynchronously and RailsEventStore::CorrelatedHandler to ensure traceability by defining correlation & causation ids for published events. By defining (above) the ShardedHandler module we ensure the event handler code is executed using a valid shard connection and that each published domain event will still have shard information in event’s metadata.

class LogVisitsByIp < ApplicationJob
  prepend ShardedHandler
  prepend RailsEventStore::CorrelatedHandler
  prepend RailsEventStore::AsyncHandler

  def perform(event)
   ...
  end
end

By prepending the event handler’s code with this 3 modules we have clean code, free from infrastructure “plumbing”.

Using SolidQueue

Because we have application jobs executing using a tenant’s shard connection and we want to avoid separate queues for each tenant the setup of queue connections have to be defined:

It is set in config/environments/<env>.rb files. For each shard (including default) we have to define the database where SolidQueue will connect to.

  ...
  config.active_job.queue_adapter = :solid_queue
  config.solid_queue.connects_to = {
    shards: {
      default: { writing: :queue },
      arkency: { writing: :queue },
      railseventstore: { writing: :queue }
    }
  }
  ...

Summary

The complete sample application for this post can be found in RES examples repository. You might also like my previous post, with different way of separating data in Rails application.

You might also like