Multitenancy with Postgres schemas: key concepts explained

… and check why 5600+ Rails engineers read also this

Multitenancy with Postgres schemas: key concepts explained

PostgreSQL schemas let you hold multiple instances of the same set of tables inside a single database. They’re essentially namespaces for tables. Sounds like a compelling way to implement multitenancy — at least in specific cases. But if you’re like me, you’d rather know exactly how it works before you rush to implement it. Let’s explain some basic concepts first: schema itself, search_path and session.

PostgreSQL schema

As I said before schema is basically a collection of tables (and other db “objects”). Schema is not the most fortunate name for it, as it can be confused with db schema in the sense of db structure. Namespace would be a more adequate name. Actually namespace is how PG calls it in some contexts - e.g. in pg_namespace catalog.

Let’s open psql and play a little.

Even before you create any schemas yourself, there’s already one schema there - the default schema, named public. You can see it by listing all schemas with the psql command \dn.

Alternatively, if you run select * from pg_namespace; you’ll also see all schemas — along with some internal ones.

public schema is where all your tables live by default - if you don’t specify any specific schema. So if you create a table CREATE TABLE things (name text) it just ends up in the public schema, which you’ll see when listing all the tables with \dt (of course provided you haven’t changed the search_path - see next section).

my-db::DATABASE=> \dt
            List of relations
 Schema |  Name  | Type  |     Owner
--------+--------+-------+----------------
 public | things | table | ydhtoowbnonxqk
(1 row)

Now let’s create a new schema: CREATE SCHEMA tenant_1. You should now see another entry if you list all existing schemas with \dn:

my-db::DATABASE=> \dn
     List of schemas
   Name   |     Owner
----------+----------------
 public   | ydhtoowbnonxqk
 tenant_1 | ydhtoowbnonxqk

Now we can create a table inside this schema. You can do it by prefixing the table name with the schema name and a dot: CREATE TABLE tenant_1.things (name text).

You won’t see it when listing all tables with \dt (again, provided you haven’t yet changed the search_path). To list the table, run \dt with an additional param:

my-db::DATABASE=> \dt tenant_1.*
             List of relations
  Schema  |  Name  | Type  |     Owner
----------+--------+-------+----------------
 tenant_1 | things | table | ydhtoowbnonxqk

So at this moment we should have two tables named things which live in separate namespaces (schemas). To interact with them, you just prefix the table name with the schema name and a dot:

SELECT * FROM tenant_1.things;
SELECT * FROM public.things;
SELECT * FROM things;  -- will query the default schema

To get rid of the schema, run DROP SCHEMA tenant_1;. It’ll fail in this situation though, because there are tables in it. To remove it together with the tables, run DROP SCHEMA tenant_1 CASCADE;.

PostgreSQL search_path

So far we accessed other schemas by using their fully qualified name: schema_name.table_name. If we skip the schema name, the default schema is used — public. Now, search_path is a Postgres session variable that determines which schema is the default one. Let’s check its value:

my-db::DATABASE=> SHOW search_path;                                                                                                                        search_path
-----------------
 "$user", public

As you can see, it has a couple comma-separated values. It works similarly to PATH in a shell — if you try to access a table, Postgres first looks for it in the first schema listed in the search_path — which is "$user". If it cannot find it there, it looks in the second one — public. If it’s not there, then we get an error.

Now "$user" is a special value that makes it actually look for a schema named after the user. Personally I’ve never used it. It’s just there by default. The ability to use multiple schemas also looks like a feature I’d rather not use, but sometimes you have to - e.g. to handle Postgres extensions - more here.

To change the search path, run: SET search_path = tenant_1. If you now run SELECT * FROM things, it will access tenant_1.things.

To get back, you typically do SET search_path = public — or whatever is your default. Sidenote: in Rails you can set your default schema via schema_search_path option in database.yml.

A good question to ask: what is the scope and lifetime of this variable. This is a session variable, i.e. it affects the current Postgres session and is discarded when the session closes.

Sidenote: there’s a variant — SET LOCAL which works for a transaction instead of a session, but personally I’ve never had to use it. Another sidenote: search_path resolution is actually more complex — apart from aforementioned session variable, it can be also permanently set for the whole DB or role.

PostgreSQL session

Now it makes sense to explain what exactly a Postgres session is — when it’s initiated and closed.

Postgres session, depending on context, might be referred to as Postgres connection or backend.

Under normal circumstances, whenever you establish a new connection to Postgres (e.g. via psql or from Rails), a new Postgres session is instantiated. It’s closed when the connection closes. That perhaps explains the interchangeable usage of session and connection.

Actually when you get hold of a DB connection in Rails, you don’t always get a new DB connection — because there’s a connection pool (read more here). Connection pool is there to limit the number of DB connections Rails app can make. From the perspective of Postgres sessions, a crucial fact is that the DB connection is not necessarily closed if a Rails request releases it — it stays in the pool and might be used by a different request later. The consequence is that if you SET a Postgress session variable in one request, it’ll be set in the next one — if it happens to run on the same DB connection. That’s why you have to make sure you always SET the desired search_path before you do any DB work in a request or background job.

Every session gets a separate OS process on the DB server, which you can check yourself by running e.g. ps aux | grep postgres. You can also see the sessions by querying SELECT * FROM pg_stat_activity — there’s a lot of useful data in it.

Now it’s worth saying that even if Rails makes an actual new connection to the DB, it usually means a new connection on the DB server, but not always — it depends on what stands inbetween the Rails app and the DB server. If you happen to have PgBouncer in your stack, sessions are managed by it. If PgBouncer runs in any mode different than session mode, you can even end up mixing tenant’s data. — read more here.

If you want to know what session you’re currently on, you may use pg_backend_pid(). It’s basically the PID of this session’s backend (OS process).

p ActiveRecord::Base.connection.execute("select pg_backend_pid()").first

Feel like giving a comment? Reply under this tweet.

You might also like