Comparison of approaches to multitenancy in Rails apps
Multitenancy means serving multiple independent customers from one app. Pretty typical for SaaS model. You can implement it on several different levels:
- Row level - you put a
tenant_idcolumn into every DB table and filter by
tenant_idin every query.
- Schema level - for every tenant you create a separate namespaced set of tables inside one database. Easily achievable with PostgreSQL schemas. See next paragraph on how this relates to MySQL.
- Database level - you setup a whole new DB for every tenant.
Here’s how they compare to each other:
|Tenant setup time
|⚡️ Just create a record
|🐢 Slower (need to create schema, create tables)
|🐌 Even slower + possible operational overhead
|Leaking data between tenants
|💥 If you forget a
|👍 Get a couple things right and you’ll be fine
|✅ You’d need to try hard to get one
tenant_id columns and filters all over the code
|Need shared tables or merging data across tenants
|👍 Can still be done in SQL
|🚫 In-app only, cannot do in SQL
|Running DB migrations
|👍 Standard Rails
|🛠 Occasionally at odds with Rails assumptions
|👍 Not really
|👍 Not really
|❓ What if pricing depends on the # of DBs?
|👍 Occasionally. You have an ever growing number of db tables.
|🛠 You now have a lot of databases
tenant_id keys everywhere
|🌴 an exotic PG feature & stateful
|🌍 Pretty much anywhere
|⚠️ Are you on a managed DB? Double check if all features and ops possible
|⚠️ Got rights to create databases? Can be done on the fly?
|Cost of switching
|⚡️ Just set a variable
|⚡️ Set the
search_path for the current db connection
|🐢 You need to establish a separate db connection
|Extract a single tenant’s data
|Per-tenant DB structure customizations
|✅ Even more possible
MySQL vs PostgreSQL schemas
MySQL has no feature like PostgreSQL schemas, but MySQL databases can be used in a similar way. You don’t need to establish another connection to change the database in MySQL - you can switch via the
use statement, similarly to what you’d do with PostgreSQL’s
set search_path. You can also similarly mix data from different databases by prefixing the table names.
The drawback is that in MySQL you need to make sure there’s no name collisions with other DBs. You also need to have create-database privileges to setup a new tenant. This can be a substantial difference if you don’t fully control the DB server. In case of PostgreSQL you only need the privilege to create new schemas inside your existing DB (and name collisions are constrained to it). This can work fine even on managed databases.
Quick reasons to pick one or another
|A lot of tenants?
|A lot of low-value tenants? (like abandoned accounts or free tiers)
|Less tenants and they’re high-value?
|schema-level more viable
|Anxious about data isolation? (ensuring no data leaks between tenants)
|Customers might require more data isolation for legal reasons?
|consider schema-level or even db-level
|On a managed or cloud hosted database?
|if you wanna go for schema-level make sure it all works for you
|Multitenantizing an existing single-tenant code base?
|schema-level might be easier to introduce
|row-level more viable
|Need to combine a lot of data across tenants
|schema-level possible, but row-level is a safer bet
|Some customers may have exceptional performance/capacity requirements
|consider enabling db-level
These three options don’t constitute the whole spectrum of approaches. For example:
- Even if you have db-level separation you can still choose whether to share application servers between tenants - which makes them run in the same process. If you don’t, you achieve even higher level of separation, which most people wouldn’t call multitenant.
- Even if a DB engine doesn’t facilitate namespaces (like PG schemas), it can still be done manually by prefixing table names like
tenant_123_users. Reportedly, this is how WordPress.com works.
- In row-level approach you can employ Row Level Security and achieve a higher level of isolation, but this can have implications on reusing db connections. Docs for PostgreSQL.
- With schema-level approach, you can start sharding larger numbers of schemas into multiple db servers - e.g. when reaching performance limits or when a particular tenant has higher data isolation requirements.
- Hybrid approach. It’s also possible to implement row-level multitenancy and still store the data in separate schemas/DBs (for some or all tenants). This way it’s easier to migrate one way or the other according to security/scaling needs.