Multitenancy with Postgres schemas: key concepts explained
… and check why 5600+ Rails engineers read also this
Multitenancy with Postgres schemas: key concepts explained
PostgreSQL schemas let you hold multiple instances of the same set of tables inside a single database. They’re essentially namespaces for tables. Sounds like a compelling way to implement multitenancy — at least in specific cases. But if you’re like me, you’d rather know exactly how it works before you rush to implement it. Let’s explain some basic concepts first: schema itself, search_path and session.
PostgreSQL schema
As I said before schema is basically a collection of tables (and other db “objects”). Schema is not the most fortunate name for it, as it can be confused with db schema in the sense of db structure. Namespace would be a more adequate name. Actually namespace is how PG calls it in some contexts - e.g. in pg_namespace
catalog.
Let’s open psql
and play a little.
Even before you create any schemas yourself, there’s already one schema there - the default schema, named public
. You can see it by listing all schemas with the psql command \dn
.
Alternatively, if you run select * from pg_namespace;
you’ll also see all schemas — along with some internal ones.
public
schema is where all your tables live by default - if you don’t specify any specific schema. So if you create a table CREATE TABLE things (name text)
it just ends up in the public schema, which you’ll see when listing all the tables with \dt
(of course provided you haven’t changed the search_path
- see next section).
my-db::DATABASE=> \dt
List of relations
Schema | Name | Type | Owner
--------+--------+-------+----------------
public | things | table | ydhtoowbnonxqk
(1 row)
Now let’s create a new schema: CREATE SCHEMA tenant_1
. You should now see another entry if you list all existing schemas with \dn
:
my-db::DATABASE=> \dn
List of schemas
Name | Owner
----------+----------------
public | ydhtoowbnonxqk
tenant_1 | ydhtoowbnonxqk
Now we can create a table inside this schema. You can do it by prefixing the table name with the schema name and a dot: CREATE TABLE tenant_1.things (name text)
.
You won’t see it when listing all tables with \dt
(again, provided you haven’t yet changed the search_path
). To list the table, run \dt
with an additional param:
my-db::DATABASE=> \dt tenant_1.*
List of relations
Schema | Name | Type | Owner
----------+--------+-------+----------------
tenant_1 | things | table | ydhtoowbnonxqk
So at this moment we should have two tables named things
which live in separate namespaces (schemas). To interact with them, you just prefix the table name with the schema name and a dot:
SELECT * FROM tenant_1.things;
SELECT * FROM public.things;
SELECT * FROM things; -- will query the default schema
To get rid of the schema, run DROP SCHEMA tenant_1;
. It’ll fail in this situation though, because there are tables in it. To remove it together with the tables, run DROP SCHEMA tenant_1 CASCADE;
.
PostgreSQL search_path
So far we accessed other schemas by using their fully qualified name: schema_name.table_name
. If we skip the schema name, the default schema is used — public
. Now, search_path
is a Postgres session variable that determines which schema is the default one. Let’s check its value:
my-db::DATABASE=> SHOW search_path; search_path
-----------------
"$user", public
As you can see, it has a couple comma-separated values. It works similarly to PATH
in a shell — if you try to access a table, Postgres first looks for it in the first schema listed in the search_path
— which is "$user"
. If it cannot find it there, it looks in the second one — public
. If it’s not there, then we get an error.
Now "$user"
is a special value that makes it actually look for a schema named after the user. Personally I’ve never used it. It’s just there by default. The ability to use multiple schemas also looks like a feature I’d rather not use, but sometimes you have to - e.g. to handle Postgres extensions - more here.
To change the search path, run: SET search_path = tenant_1
. If you now run SELECT * FROM things
, it will access tenant_1.things
.
To get back, you typically do SET search_path = public
— or whatever is your default. Sidenote: in Rails you can set your default schema via schema_search_path
option in database.yml
.
A good question to ask: what is the scope and lifetime of this variable. This is a session variable, i.e. it affects the current Postgres session and is discarded when the session closes.
Sidenote: there’s a variant — SET LOCAL
which works for a transaction instead of a session, but personally I’ve never had to use it. Another sidenote: search_path
resolution is actually more complex — apart from aforementioned session variable, it can be also permanently set for the whole DB or role.
PostgreSQL session
Now it makes sense to explain what exactly a Postgres session is — when it’s initiated and closed.
Postgres session, depending on context, might be referred to as Postgres connection or backend.
Under normal circumstances, whenever you establish a new connection to Postgres (e.g. via psql
or from Rails), a new Postgres session is instantiated. It’s closed when the connection closes. That perhaps explains the interchangeable usage of session and connection.
Actually when you get hold of a DB connection in Rails, you don’t always get a new DB connection — because there’s a connection pool (read more here). Connection pool is there to limit the number of DB connections Rails app can make. From the perspective of Postgres sessions, a crucial fact is that the DB connection is not necessarily closed if a Rails request releases it — it stays in the pool and might be used by a different request later. The consequence is that if you SET
a Postgress session variable in one request, it’ll be set in the next one — if it happens to run on the same DB connection. That’s why you have to make sure you always SET the desired search_path before you do any DB work in a request or background job.
Every session gets a separate OS process on the DB server, which you can check yourself by running e.g. ps aux | grep postgres
. You can also see the sessions by querying SELECT * FROM pg_stat_activity
— there’s a lot of useful data in it.
Now it’s worth saying that even if Rails makes an actual new connection to the DB, it usually means a new connection on the DB server, but not always — it depends on what stands inbetween the Rails app and the DB server. If you happen to have PgBouncer in your stack, sessions are managed by it. If PgBouncer runs in any mode different than session mode, you can even end up mixing tenant’s data. — read more here.
If you want to know what session you’re currently on, you may use pg_backend_pid()
. It’s basically the PID of this session’s backend (OS process).
p ActiveRecord::Base.connection.execute("select pg_backend_pid()").first
Feel like giving a comment? Reply under this tweet.