How to store emoji in a Rails app with a MySQL database
… and check why 5600+ Rails engineers read also this
How to store emoji in a Rails app with a MySQL database
Suppose you have a Rails app and you are storing your data in a MySQL database. You were requested to add emoji support to your application. Probably you are thinking:
Oh, that’s simple, it’s just unicode, isn’t it?
The answer is: no. Unfortunately, MySQL’s utf8
character set allows to store only a subset of Unicode characters - only those characters that consist of one to three bytes. Inserting characters that require 4 bytes would result in corrupted data in your database.
Problems with utf8
character set
Look at this example:
mysql> SET NAMES utf8;
Query OK, 0 rows affected (0,00 sec)
mysql> INSERT INTO messages (message) VALUES ('What a nice emoji😀!');
Query OK, 1 row affected, 1 warning (0,00 sec)
mysql> SHOW WARNINGS;
+---------+------+---------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+---------------------------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xF0\x9F\x98\x80!' for column 'message' at row 1 |
+---------+------+---------------------------------------------------------------------------+
1 row in set (0,00 sec)
mysql> SELECT message FROM messages;
+-------------------+
| message |
+-------------------+
| What a nice emoji |
+-------------------+
1 row in set (0,00 sec)
As you can see, using utf8
character set is not enough. You are getting a warning and your data is truncated at the first 4-bytes unicode character.
utf8mb4
to the rescue
MySQL 5.5.3 introduced new character set - utf8mb4
that maps to real UTF-8 and fully support all Unicode characters, including 4-bytes emoji. It is fully backward compatible, so there should be no data loss during migrating your database. You just need to convert your tables to the new character set and change your connection’s settings. You can do it in migration:
class ConvertDatabaseToUtf8mb4 < ActiveRecord::Migration
def change
# for each table that will store unicode execute:
execute "ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin"
# for each string/text column with unicode content execute:
execute "ALTER TABLE table_name CHANGE column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin"
end
end
Please notice the VARCHAR(191)
fragment. There is one important thing you should know - when switching to utf8mb4
charset, the maximum length of a column or index key is the same as in utf8
charset in terms of bytes. This means it is smaller in terms of characters, since the maximum length of a character in utf8mb4
is four bytes, instead of three in utf8
. The maximum index length of InnoDB storage engine is 767 bytes, so if you are indexing your VARCHAR
columns, you would need to change their length to 191 instead of 255.
You should also change your database.yml
and add encoding and (optionally) collation keys:
production:
# ...
encoding: utf8mb4
collation: utf8mb4_bin
Now you are ready to handle emoji 👍
Rails, why you don’t like utf8mb4
?
After changing character set, you may experience the Mysql2::Error: Specified key was too long; max key length is 767 bytes: CREATE UNIQUE INDEX
error when performing rake db:migrate
task. It is related to the InnoDB maximum index length described in previous section. There is a fix for schema_migrations
table in Rails 4+, however you still can experience this error on tables created by yourself. As far as I am concerned this is still not fixed in Rails 4.2. You can resolve this issue in two ways:
- You can monkey patch
ActiveRecord::ConnectionAdapters::AbstractMysqlAdapter::NATIVE_DATABASE_TYPES
using the following initializer:
# config/initializers/mysql_utf8mb4_fix.rb
require 'active_record/connection_adapters/abstract_mysql_adapter'
module ActiveRecord
module ConnectionAdapters
class AbstractMysqlAdapter
NATIVE_DATABASE_TYPES[:string] = { :name => "varchar", :limit => 191 }
end
end
end
- You can also switch to
DYNAMIC
MySQL table format add addROW_FORMAT=DYNAMIC
to yourCREATE TABLE
calls when creating new tables in migrations (that would increase the maximum key length from 767 bytes to 3072 bytes):
create_table :table_name, options: 'ROW_FORMAT=DYNAMIC' do |t|
# ...
end
You wouldn’t experience this issues when using PostgreSQL, but sometimes you just have to support legacy application that uses MySQL and migrating data to other RDBMS may not be an option.
Would you like to continue learning more?
If you enjoyed the article, subscribe to our newsletter so that you are always the first one to get the knowledge that you might find useful in your everyday Rails programmer job.
Content is mostly focused on (but not limited to) Ruby, Rails, Web-development and refactoring Rails applications.