How to start using Arrays in Rails with PostgreSQL
… and check why 5600+ Rails engineers read also this
How to start using Arrays in Rails with PostgreSQL
So far we covered a lot of PostgreSQL goodness. We’ve already talked about using uuid or storing hashes in our Rails applications with PostgreSQL database. Now is the time to do something in the middle of these topics (more complex than uuid
, but easier than hstore
) - we want to store list of simple values under one attribute. How can we do that? You may think “arrays” right now and you are correct. Let’s see how we can achieve that.
A little bit of theory
Arrays are ordered, integer-indexed collections of any object. They are great for storing collection of elements even with different types.
First normal form (1NF) is a relation’s property in a relational database. It requires every attribute to have domain that can consist of only atomic values so value of each attribute is always single value from that domain.
For some wise reasons we pursue to normalize our databases so that no duplicated data is stored there. So should we consider combining relational databases with arrays as something that breaks 1NF
? It seems so.
However there are cases when we might want to have redundant data, that’s the place where arrays suit the best.
Postgres
We’ll start with empty database and create there example table:
➜ Arkency-Blog git:(master) ✗ psql -d postgres
psql (9.3.5, server 9.3.4)
Type "help" for help.
postgres=# \dt
No relations found.
postgres=# CREATE TABLE arrays_example(
name text,
values integer[]
);
CREATE TABLE
We can put some data now:
postgres=# INSERT INTO arrays_example VALUES('numbers', '{1, 2, 3}');
INSERT 0 1
postgres=# SELECT * FROM arrays_example;
name | values
----------+---------
one-type | {1,2,3}
(1 row)
postgres=# SELECT name FROM arrays_example WHERE values[1] = 1;
name
---------
numbers
(1 row)
Official Postgres documentation provides a lot of useful examples to start working on SQL level with database.
Rails
In the last blogpost about hstore
we showed how to enable particular extension. This time is different (maybe easier), because array
is Postgres’ data type, not an extension so there’s no need to enable that, because it’s accessible out of the box!
Migration
In two previous articles (mentioned in the introduction of this article) we created Book
model and appropriate SQL schema. Let’s stick to that and extend it a little bit:
rails g migration add_subjects_to_book subjects:text
And the migration file:
class AddSubjectsToBook < ActiveRecord::Migration
def change
add_column :books, :subjects, :text, array:true, default: []
end
end
We can check it now:
2.1.2 :001 > b = Book.create
(0.2ms) BEGIN
SQL (2.0ms) INSERT INTO "books" ("created_at", "updated_at") VALUES ($1, $2) RETURNING "id" [["created_at", "2014-10-17 08:21:17.870437"], ["updated_at", "2014-10-17 08:21:17.870437"]]
(0.5ms) COMMIT
=> #<Book id: "39abef75-56af-4ad5-8065-6b4d58729ee0", title: nil, created_at: "2014-10-17 08:21:17", updated_at: "2014-10-17 08:21:17", description: {}, metadata: {}, subjects: []>
2.1.2 :002 > b.subjects.class
=> Array
Model manipulation
Now is the time to add some subjects for books and then query them. Please keep in mind that all of the following examples are executed Rails 4.2.0.beta1
environment.
2.1.2 :003 > b.subjects << 'education'
=> ["education"]
2.1.2 :004 > b.save!
(0.2ms) BEGIN
SQL (0.5ms) UPDATE "books" SET "subjects" = $1, "updated_at" = $2 WHERE "books"."id" = '39abef75-56af-4ad5-8065-6b4d58729ee0' [["subjects", "{education}"], ["updated_at", "2014-10-17 08:23:35.657137"]]
(1.0ms) COMMIT
=> true
2.1.2 :005 > Book.first.subjects
Book Load (0.9ms) SELECT "books".* FROM "books" ORDER BY "books"."id" ASC LIMIT 1
=> ["education"]
2.1.2 :006 > b.subjects.push 'business'
=> ["education", "business"]
2.1.2 :007 > b.save!
(0.1ms) BEGIN
SQL (0.4ms) UPDATE "books" SET "subjects" = $1, "updated_at" = $2 WHERE "books"."id" = '39abef75-56af-4ad5-8065-6b4d58729ee0' [["subjects", "{education,business}"], ["updated_at", "2014-10-17 08:24:25.883010"]]
(0.9ms) COMMIT
=> true
2.1.2 :008 > Book.first.subjects
Book Load (0.5ms) SELECT "books".* FROM "books" ORDER BY "books"."id" ASC LIMIT 1
=> ["education", "business"]
2.1.2 :009 > b.subjects += ['history']
=> ["education", "business", "history"]
2.1.2 :010 > b.save!
(0.2ms) BEGIN
SQL (0.4ms) UPDATE "books" SET "subjects" = $1, "updated_at" = $2 WHERE "books"."id" = '39abef75-56af-4ad5-8065-6b4d58729ee0' [["subjects", "{education,business,history}"], ["updated_at", "2014-10-17 18:53:12.755711"]]
(0.9ms) COMMIT
=> true
2.1.2 :011 > Book.first.subjects
Book Load (0.4ms) SELECT "books".* FROM "books" ORDER BY "books"."id" ASC LIMIT 1
=> ["education", "business", "history"]
Caveats
In previous versions of Rails we may encounter some weird behavior:
2.1.2 :012 > b.subjects << 'art'
=> ["education", "business", "history", "art"]
2.1.2 :013 > b.save!
(0.2ms) BEGIN
(0.1ms) COMMIT
=> true
2.1.2 :014 > Book.first.subjects
Book Load (0.6ms) SELECT "books".* FROM "books" ORDER BY "books"."id" ASC LIMIT 1
=> ["education", "business", "history"]
What happened here? Why subjects array wasn’t updated?
Dirty tracking
ActiveModel::Dirty
module provides a way to track changes in your objects. Sometimes our record does not know that underlying object properties have been changed and that’s why we have to point this explicitly.
2.1.2 :015 > b.subjects_will_change!
=> ["education", "business", "history"]
2.1.2 :016 > b.subjects << 'art'
=> ["education", "business", "history", "art"]
2.1.2 :017 > b.save!
(0.2ms) BEGIN
SQL (2.8ms) UPDATE "books" SET "subjects" = $1, "updated_at" = $2 WHERE "books"."id" = '39abef75-56af-4ad5-8065-6b4d58729ee0' [["subjects", ["education", "business", "history", "art"]], ["updated_at", Fri, 17 Oct 2014 19:14:52 UTC +00:00]]
(1.0ms) COMMIT
=> true
2.1.2 :018 > Book.first.subjects
Book Load (0.6ms) SELECT "books".* FROM "books" ORDER BY "books"."id" ASC LIMIT 1
=> ["education", "business", "history", "art"]
And everything went as we wanted. So if you have any problem with updating properties of some enclosed object you can indicate that this particular object will be changed by your operations and then safely save it with new properties that will be updated:
2.1.2 :019 > b.subjects << 'finances'
=> ["education", "business", "history", "finances"]
2.1.2 :020 > b.changed?
=> false
2.1.2 :021 > b.subjects_will_change!
=> ["education", "business", "history", "finances"]
2.1.2 :022 > b.changed?
=> true
2.1.2 :023 > b.subjects_changed?
=> true
2.1.2 :024 > b.changes
=> {"subjects"=>[["education", "business", "history"], ["education", "business", "history", "finances"]]}
2.1.2 :025 > b.save!
(0.2ms) BEGIN
SQL (3.1ms) UPDATE "books" SET "subjects" = $1, "updated_at" = $2 WHERE "books"."id" = '39abef75-56af-4ad5-8065-6b4d58729ee0' [["subjects", ["education", "business", "history", "finances"]], ["updated_at", Fri, 17 Oct 2014 19:21:25 UTC +00:00]]
(1.0ms) COMMIT
=> true
Querying
PostgreSQL have a bunch of useful array methods that you can leverage in your Rails applications.
2.1.2 :026 > Book.where("'history' = ANY (subjects)")
Book Load (0.5ms) SELECT "books".* FROM "books" WHERE ('history' = ANY (subjects))
=> #<ActiveRecord::Relation [#<Book id: "39abef75-56af-4ad5-8065-6b4d58729ee0", title: nil, created_at: "2014-10-17 08:21:17", updated_at: "2014-10-17 19:21:25", description: {}, metadata: {}, subjects: ["education", "business", "history", "finances"]>]>
2.1.2 :027 > Book.where("subjects @> ?", '{finances}')
Book Load (0.5ms) SELECT "books".* FROM "books" WHERE (subjects @> '{finances}')
=> #<ActiveRecord::Relation [#<Book id: "39abef75-56af-4ad5-8065-6b4d58729ee0", title: nil, created_at: "2014-10-17 08:21:17", updated_at: "2014-10-17 19:21:25", description: {}, metadata: {}, subjects: ["education", "business", "history", "finances"]>]>
Summary
After reading all of these three articles you should be PostgreSQL trouper. You can now have flexible and relational database at the same time. There are a lot of topics worth to read, but not covered in any of these blogposts. I hope you find our tutorials useful.
Off-topic
While I was researching arrays in Postgres I found an interesting thing, that I wasn’t aware of before:
Character Types
Name | Descirption |
---|---|
character varying(n) , varchar(n) |
variable-length with limit |
character(n) , char(n) |
fixed-length, blank padded |
text |
variable unlimited length |
Tip: There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used instead.