Multitenancy with Postgres schemas: key concepts explained
PostgreSQL schemas let you hold multiple instances of the same set of tables inside a single database. They’re essentially namespaces for tables. Sounds like a compelling way to implement multitenancy — at least in specific cases. But if you’re like me, you’d rather know exactly how it works before you rush to implement it. Let’s explain some basic concepts first: schema itself, search_path and session.
As I said before schema is basically a collection of tables (and other db “objects”). Schema is not the most fortunate name for it, as it can be confused with db schema in the sense of db structure. Namespace would be a more adequate name. Actually namespace is how PG calls it in some contexts - e.g. in
psql and play a little.
Even before you create any schemas yourself, there’s already one schema there - the default schema, named
public. You can see it by listing all schemas with the psql command
Alternatively, if you run
select * from pg_namespace; you’ll also see all schemas — along with some internal ones.
public schema is where all your tables live by default - if you don’t specify any specific schema. So if you create a table
CREATE TABLE things (name text) it just ends up in the public schema, which you’ll see when listing all the tables with
\dt (of course provided you haven’t changed the
search_path - see next section).
my-db::DATABASE=> \dt List of relations Schema | Name | Type | Owner --------+--------+-------+---------------- public | things | table | ydhtoowbnonxqk (1 row)
Now let’s create a new schema:
CREATE SCHEMA tenant_1. You should now see another entry if you list all existing schemas with
my-db::DATABASE=> \dn List of schemas Name | Owner ----------+---------------- public | ydhtoowbnonxqk tenant_1 | ydhtoowbnonxqk
Now we can create a table inside this schema. You can do it by prefixing the table name with the schema name and a dot:
CREATE TABLE tenant_1.things (name text).
You won’t see it when listing all tables with
\dt (again, provided you haven’t yet changed the
search_path). To list the table, run
\dt with an additional param:
my-db::DATABASE=> \dt tenant_1.* List of relations Schema | Name | Type | Owner ----------+--------+-------+---------------- tenant_1 | things | table | ydhtoowbnonxqk
So at this moment we should have two tables named
things which live in separate namespaces (schemas). To interact with them, you just prefix the table name with the schema name and a dot:
SELECT * FROM tenant_1.things; SELECT * FROM public.things; SELECT * FROM things; -- will query the default schema
To get rid of the schema, run
DROP SCHEMA tenant_1;. It’ll fail in this situation though, because there are tables in it. To remove it together with the tables, run
DROP SCHEMA tenant_1 CASCADE;.
So far we accessed other schemas by using their fully qualified name:
schema_name.table_name. If we skip the schema name, the default schema is used —
search_path is a Postgres session variable that determines which schema is the default one. Let’s check its value:
my-db::DATABASE=> SHOW search_path; search_path ----------------- "$user", public
As you can see, it has a couple comma-separated values. It works similarly to
PATH in a shell — if you try to access a table, Postgres first looks for it in the first schema listed in the
search_path — which is
"$user". If it cannot find it there, it looks in the second one —
public. If it’s not there, then we get an error.
"$user" is a special value that makes it actually look for a schema named after the user. Personally I’ve never used it. It’s just there by default. The ability to use multiple schemas also looks like a feature I’d rather not use, but sometimes you have to - e.g. to handle Postgres extensions - more here.
To change the search path, run:
SET search_path = tenant_1. If you now run
SELECT * FROM things, it will access
To get back, you typically do
SET search_path = public — or whatever is your default. Sidenote: in Rails you can set your default schema via
schema_search_path option in
A good question to ask: what is the scope and lifetime of this variable. This is a session variable, i.e. it affects the current Postgres session and is discarded when the session closes.
Sidenote: there’s a variant —
SET LOCAL which works for a transaction instead of a session, but personally I’ve never had to use it. Another sidenote:
search_path resolution is actually more complex — apart from aforementioned session variable, it can be also permanently set for the whole DB or role.
Now it makes sense to explain what exactly a Postgres session is — when it’s initiated and closed.
Postgres session, depending on context, might be referred to as Postgres connection or backend.
Under normal circumstances, whenever you establish a new connection to Postgres (e.g. via
psql or from Rails), a new Postgres session is instantiated. It’s closed when the connection closes. That perhaps explains the interchangeable usage of session and connection.
Actually when you get hold of a DB connection in Rails, you don’t always get a new DB connection — because there’s a connection pool (read more here). Connection pool is there to limit the number of DB connections Rails app can make. From the perspective of Postgres sessions, a crucial fact is that the DB connection is not necessarily closed if a Rails request releases it — it stays in the pool and might be used by a different request later. The consequence is that if you
SET a Postgress session variable in one request, it’ll be set in the next one — if it happens to run on the same DB connection. That’s why you have to make sure you always SET the desired search_path before you do any DB work in a request or background job.
Every session gets a separate OS process on the DB server, which you can check yourself by running e.g.
ps aux | grep postgres. You can also see the sessions by querying
SELECT * FROM pg_stat_activity — there’s a lot of useful data in it.
Now it’s worth saying that even if Rails makes an actual new connection to the DB, it usually means a new connection on the DB server, but not always — it depends on what stands inbetween the Rails app and the DB server. If you happen to have PgBouncer in your stack, sessions are managed by it. If PgBouncer runs in any mode different than session mode, you can even end up mixing tenant’s data. — read more here.
If you want to know what session you’re currently on, you may use
pg_backend_pid(). It’s basically the PID of this session’s backend (OS process).
p ActiveRecord::Base.connection.execute("select pg_backend_pid()").first
Feel like giving a comment? Reply under this tweet.