How to persist hashes in Rails applications with PostgreSQL

… and check why 5600+ Rails engineers read also this

How to persist hashes in Rails applications with PostgreSQL

Our recent blogpost about using UUID in Rails projects with PostgreSQL was appreciated so we decided to publish a tutorial about storing hashes in our favorite database. Let’s see how it can be achieved.

Sometimes we may need to not only store plain attributes like string, integer or boolean but also more complex objects. For now let’s talk about hashes.

hstore

HStore is a key value store within Postgres database. You can use it similar to how you would use a hash in Ruby application, though it’s specific to a table column in the database. Sometimes we might need to combine relational databases’ features with flexibility of No-SQL ones at the same time without having two separate data stores.

If you need to think about possible use cases, please note that hashes in other languages are called dictionaries, so it’s already a hint how to and why use them.

SQL

Firstly, let’s see how it looks like in plain SQL, everything according to official PostgreSQL documentation.

  Dev  psql
psql (9.3.5)
Type "help" for help.

postgres=# CREATE DATABASE hstore_example;
CREATE DATABASE

postgres=# \c hstore_example
You are now connected to database "hstore_example" as user "postgres".

hstore_example=# CREATE EXTENSION IF NOT EXISTS hstore;
CREATE EXTENSION

hstore_example=# SELECT 'company => arkency, blogpost => hstore'::hstore;
                   hstore
--------------------------------------------
 "company"=>"arkency", "blogpost"=>"hstore"
(1 row)

hstore_example=#

Pretty simple. We created example DB, enabled extension and selected some key-value structure as a hstore.

Setup

In previous blogpost about UUIDs we already showed how to enable particular extension. That’s what we gonna do again:

rails g migration enable_hstore_extension
class EnableHstoreExtension < ActiveRecord::Migration
  def change
    enable_extension 'hstore'
  end
end

Let’s reuse an existing example:

rails g migration add_description_to_books
class AddDescriptionToBooks < ActiveRecord::Migration
  def change
    add_column :books, :description, :hstore, default: {}, null: false
  end
end

And now we can play with that:

b = Book.create
#   (0.2ms)  BEGIN
#  SQL (0.7ms)  INSERT INTO "books" ("created_at", "updated_at")
#   VALUES ($1, $2)
#   RETURNING "id"  [["created_at", "2014-10-10 10:42:49.968435"], ["updated_at", "2014-10-10 10:42:49.968435"]]
#   (0.9ms)  COMMIT
# => #<Book id: "75fd5620-7a09-4ae1-88b4-935385a4e970", title: nil, created_at: "2014-10-10 10:42:49", updated_at: "2014-10-10 10:42:49", description: {}>

b.description
# => {}

b.description.class
# => Hash

b.description['en'] = 'Eccentric duck'
# => "Eccentric duck"

b.save
#   (0.2ms)  BEGIN
#  SQL (0.4ms)  UPDATE "books" SET "description" = $1, "updated_at" = $2
#   WHERE "books"."id" = '75fd5620-7a09-4ae1-88b4-935385a4e970'
#   [["description", "\"en\"=>\"Eccentric duck\""], ["updated_at", "2014-10-10 10:43:35.869645"]]
#   (0.9ms)  COMMIT
# => true

b = Book.first
#  Book Load (0.8ms)  SELECT  "books".* FROM "books"  ORDER BY "books"."id" ASC LIMIT 1
# => #<Book id: "75fd5620-7a09-4ae1-88b4-935385a4e970", title: nil, created_at: "2014-10-10 10:42:49", updated_at: "2014-10-10 10:43:35", description: {"en"=>"Eccentric duck"}>

b.description
# => {"en"=>"Eccentric duck"}

b.description['en']
# => "Eccentric duck"

Awesomeness

The hstore datatype can be indexed with one of two types: GiST or GIN.

In choosing which index type to use, GiST or GIN, consider these performance differences:

  • GIN index lookups are about three times faster than GiST
  • GIN indexes take about three times longer to build than GiST
  • GIN indexes are moderately slower to update than GiST indexes, but about 10 times slower if fast-update support was disabled
  • GIN indexes are two-to-three times larger than GiST indexes
  • GIN indexes are not lossy for standard queries, but their performance depends logarithmically on the number of unique words
  • GiST indexes are lossy because each document is represented in the index by a fixed-length signature and it is necessary to check the actual table row to eliminate such false matches.

As a rule of thumb, GIN indexes are best for static data because lookups are faster. For dynamic data, GiST indexes are faster to update. Specifically, GiST indexes are very good for dynamic data and fast if the number of unique words (lexemes) is under 100,000, while GIN indexes will handle 100,000+ lexemes better but are slower to update.

So let’s create an another migration:

rails g migration add_index_for_description_in_books
class AddIndexForDescriptionInBooks < ActiveRecord::Migration
  def change
    add_index :books, :description, name: 'books_description_idx', using: :gin
    # :gist is default
  end
end

I don’t have to explain why we should use indexes and why they are important when it comes to querying performance. If you want to use hstore selectors, you should definitely create these indexes.

Here they are:

Return the value from column description for key en:

"description -> 'en'"

Does the specified column description contain a key en:

"description ? 'en'"

Does the specified column description contain a value of Eccentric duck for key en:

"description @> 'en -> Eccentric duck'"

How to use them:

Create example book:

b = Book.create(description: { en: 'Eccentric duck', pl: 'Kaczka dziwaczka' })
#   (0.1ms)  BEGIN
#  SQL (0.2ms)  INSERT INTO "books" ("created_at", "description", "updated_at") VALUES ($1, $2, $3) RETURNING "id"  [["created_at", "2014-10-10 11:21:02.294080"], ["description", "\"en\"=>\"Eccentric duck\", \"pl\"=>\"Kaczka dziwaczka\""], ["updated_at", "2014-10-10 11:21:02.294080"]]
#   (0.9ms)  COMMIT
# => #<Book id: "3ba701d2-15b9-43c3-88b6-56410b176b36", title: nil, created_at: "2014-10-10 11:21:02", updated_at: "2014-10-10 11:21:02", description: {"en"=>"Eccentric duck", "pl"=>"Kaczka dziwaczka"}>

Find a book with particular polish description:

Book.where("description -> 'pl' = 'Kaczka dziwaczka'")
#  Book Load (0.5ms)  SELECT "books".* FROM "books" WHERE (description -> 'pl' = 'Kaczka dziwaczka')
# => #<ActiveRecord::Relation [#<Book id: "3ba701d2-15b9-43c3-88b6-56410b176b36", title: nil, created_at: "2014-10-10 11:21:02", updated_at: "2014-10-10 11:21:02", description: {"en"=>"Eccentric duck", "pl"=>"Kaczka dziwaczka"}>]>

Find a book containing ‘duck’ in english description:

Book.where("description @> 'en=>duck'")
#  Book Load (0.3ms)  SELECT "books".* FROM "books" WHERE (description @> 'en=>duck')
# => #<ActiveRecord::Relation [#<Book id: "c484009e-a8e1-4534-8127-2032a92f9bc1", title: nil, created_at: "2014-10-10 10:39:58", updated_at: "2014-10-10 10:40:27", description: {"en"=>"duck"}>]>

Find all books with english descriptions provided:

Book.where("description ? 'en'")
#  Book Load (0.3ms)  SELECT "books".* FROM "books" WHERE (description ? 'en')
# => #<ActiveRecord::Relation [#<Book id: "c484009e-a8e1-4534-8127-2032a92f9bc1", title: nil, created_at: "2014-10-10 10:39:58", updated_at: "2014-10-10 10:40:27", description: {"en"=>"duck"}>, #<Book id: "75fd5620-7a09-4ae1-88b4-935385a4e970", title: nil, created_at: "2014-10-10 10:42:49", updated_at: "2014-10-10 10:54:06", description: {"en"=>"false", "pl"=>"1", "short"=>"{:en=>\"Duck\", :pl=>\"Kaczka\"}"}>, #<Book id: "3ba701d2-15b9-43c3-88b6-56410b176b36", title: nil, created_at: "2014-10-10 11:21:02", updated_at: "2014-10-10 11:21:02", description: {"en"=>"Eccentric duck", "pl"=>"Kaczka dziwaczka"}>]>

Problematic issues

Note the update statement stringifies our hash:

(...) ["description", "\"en\"=>\"Eccentric duck\""] (...)

In that case we may expect:

b.description['en'] = false
# => false

b.description['pl'] = 1
# => 1

b.save
#   (0.2ms)  BEGIN
#  SQL (0.3ms)  UPDATE "books" SET "description" = $1, "updated_at" = $2 WHERE "books"."id" = '75fd5620-7a09-4ae1-88b4-935385a4e970'  [["description", "\"en\"=>\"false\", \"pl\"=>\"1\""], ["updated_at", "2014-10-10 10:51:19.683650"]]
#   (0.8ms)  COMMIT
# => true

b = Book.first
#  Book Load (0.3ms)  SELECT  "books".* FROM "books"  ORDER BY "books"."id" ASC LIMIT 1
# => #<Book id: "75fd5620-7a09-4ae1-88b4-935385a4e970", title: nil, created_at: "2014-10-10 10:42:49", updated_at: "2014-10-10 10:51:19", description: {"en"=>"false", "pl"=>"1"}>

b.description['pl']
# => "1"

b.description['pl'].class
# => String

b.description['en']
# => "false"

b.description['en'].class
# => String

Although it may not be a common case to store other types than string in a dictionary, it is still worth to have in mind that hstore supports only string data.

The more possible case may be storing nested dictionaries:

b.description['short'] = { en: 'Duck', pl: 'Kaczka' }
# => {:en=>"Duck", :pl=>"Kaczka"}

b.save
#   (0.1ms)  BEGIN
#  SQL (0.2ms)  UPDATE "books" SET "description" = $1, "updated_at" = $2 WHERE "books"."id" = '75fd5620-7a09-4ae1-88b4-935385a4e970'  [["description", "\"en\"=>\"false\", \"pl\"=>\"1\", \"short\"=>\"{:en=>\\\"Duck\\\", :pl=>\\\"Kaczka\\\"}\""], ["updated_at", "2014-10-10 10:54:06.442615"]]
#   (0.9ms)  COMMIT
# => true

b = Book.first
#  Book Load (0.3ms)  SELECT  "books".* FROM "books"  ORDER BY "books"."id" ASC LIMIT 1
# => #<Book id: "75fd5620-7a09-4ae1-88b4-935385a4e970", title: nil, created_at: "2014-10-10 10:42:49", updated_at: "2014-10-10 10:54:06", description: {"en"=>"false", "pl"=>"1", "short"=>"{:en=>\"Duck\", :pl=>\"Kaczka\"}"}>

b.description['short']
# => "{:en=>\"Duck\", :pl=>\"Kaczka\"}"

b.description['short'].class
# => String

Ough! We have a problem, again. So is there any solution for more complex cases?

json

In addition to hstore, json is a full document datatype. That means it supports nested objects and more datatypes. It also has more operators that are very well described in documentation. If you are using JSON somewhere in your application already and want to store it directly in database, then the JSON datatype is perfect choice.

Migration

rails g migration add_metadata_to_books
class AddMetadataToBooks < ActiveRecord::Migration
  def change
    add_column :books, :metadata, :json, default: {}, null: false
  end
end

Then do whatever you want:

metadata = { pages: 400, published: false, isbn: SecureRandom.uuid }
# => {:pages=>400, :published=>false, :isbn=>"35280581-7169-48ce-88fa-9e85de5df778"}

Book.create(metadata: metadata)
#   (0.1ms)  BEGIN
#  SQL (1.0ms)  INSERT INTO "books" ("created_at", "metadata", "updated_at") VALUES ($1, $2, $3) RETURNING "id"  [["created_at", "2014-10-10 12:56:42.473753"], ["metadata", "{\"pages\":400,\"published\":false,\"isbn\":\"35280581-7169-48ce-88fa-9e85de5df778\"}"], ["updated_at", "2014-10-10 12:56:42.473753"]]
#   (1.1ms)  COMMIT
# => #<Book id: "00449e9f-4b03-423d-a8a9-4bcce6a92df4", title: nil, created_at: "2014-10-10 12:56:42", updated_at: "2014-10-10 12:56:42", description: {}, metadata: {"pages"=>400, "published"=>false, "isbn"=>"35280581-7169-48ce-88fa-9e85de5df778"}>

b = Book.last;

b.metadata
#  Book Load (0.7ms)  SELECT  "books".* FROM "books"  ORDER BY "books"."id" DESC LIMIT 1
# => {"pages"=>400, "published"=>false, "isbn"=>"e29d7a86-da0e-4f89-8dcd-f8321a5f9980"}

b.metadata.class
# => Hash

b.metadata['pages'].class
# => Fixnum

b.metadata['published'].class
# => FalseClass

However you should know that when searching for some records, you should stringify search parameters:

Book.where("metadata->>'published' = ?", 'false')
#  Book Load (0.6ms)  SELECT "books".* FROM "books" WHERE (metadata->>'published' = 'false')
# => #<ActiveRecord::Relation [#<Book id: "00449e9f-4b03-423d-a8a9-4bcce6a92df4", title: nil, created_at: "2014-10-10 12:56:42", updated_at: "2014-10-10 12:56:42", description: {}, metadata: {"pages"=>400, "published"=>false, "isbn"=>"35280581-7169-48ce-88fa-9e85de5df778"}>]>
Book.where("metadata->>'pages' = ?", '400')

MongoDB

I believe you see the power of using PostgreSQL with ActiveRecord in your Rails projects. If you still wonder whether MongoDB will be better choice for you needs, you should definitely check JSONB support introduced in 9.4 version of postgres, which is real data type with binary storage and indexing, a structured format for storing json.

With the new JSONB data type for PostgreSQL, users no longer have to choose between relational and non-relational data stores: they can have both at the same time. JSONB supports fast lookups and simple expression search queries using Generalized Inverted Indexes (GIN). Multiple new support functions enables users to extract and manipulate JSON data, with a performance which matches or surpasses the most popular document databases. With JSONB, table data can be easily integrated with document data for a fully integrated database environment.

You can also read PostgreSQL Outperforms MongoDB in New Round of Tests

Heroku

If you’re heroku user sometimes you may encounter the following error during rake db:migrate task:

No such file or directory - pg_dump -i -s -x -O -f /app/db/structure.sql

pg_dump command is not present on Heroku’s environment, which is not needed in production, but it’s nice to have this dumped SQL structure in development. You can get around it by turning of this feature when not needed:

Rake::Task['db:structure:dump'].clear unless Rails.env.development?

Summary

I believe after that article you see the benefits of using store and json types in Rails projects. A lot of flexibility combined with relational database give us almost unlimited room for improving our storage patterns.

Resources

You might also like