How we migrated from globalize to hstore_translate
As part of an international company (Leadformance), we soon had to deal with some translated contents. At some point, we wanted to offer our clients to customize the content for several languages. This means translations stored in the database.
Separated Translations Tables
Back in 2011, when we had to implement this feature, it was obvious that we will go with globalize (we will go with globalize-accessors as well to ease the thing).
This was working pretty fine.
Here is how the model looked like:
class Post < ActiveRecord::Base
translates :title, :text
globalize_accessors :locales => [:en, :fr], :attributes => [:title, :text]
endCreate
To create a record we just had to do:
Post.create(title: "Title", text: "Text")
(1.0ms) BEGIN
SQL (4.2ms) INSERT INTO "posts" ("created_at", "updated_at") VALUES ($1, $2) RETURNING "id" [["created_at", "2014-05-25 17:31:44.589125"], ["updated_at", "2014-05-25 17:31:44.589125"]]
SQL (2.2ms) INSERT INTO "post_translations" ("created_at", "locale", "post_id", "text", "title", "updated_at") VALUES ($1, $2, $3, $4, $5, $6) RETURNING "id" [["created_at", "2014-05-25 17:31:44.612382"], ["locale", "en"], ["post_id", 1], ["text", "Text"], ["title", "Title"], ["updated_at", "2014-05-25 17:31:44.612382"]]
(1.1ms) COMMIT
=> #<Post id: 1, created_at: "2014-05-25 17:31:44", updated_at: "2014-05-25 17:31:44">Or, if we wanted to create several languages in a row:
Post.create(title_fr: "Titre", text_fr: "Texte", title_en: "Title", text_en: "Text")
(3.8ms) BEGIN
SQL (3.8ms) INSERT INTO "posts" ("created_at", "updated_at") VALUES ($1, $2) RETURNING "id" [["created_at", "2014-05-25 17:39:42.736275"], ["updated_at", "2014-05-25 17:39:42.736275"]]
SQL (3.2ms) INSERT INTO "post_translations" ("created_at", "locale", "post_id", "text", "title", "updated_at") VALUES ($1, $2, $3, $4, $5, $6) RETURNING "id" [["created_at", "2014-05-25 17:39:42.760307"], ["locale", "fr"], ["post_id", 2], ["text", "Texte"], ["title", "Titre"], ["updated_at", "2014-05-25 17:39:42.760307"]]
SQL (1.4ms) INSERT INTO "post_translations" ("created_at", "locale", "post_id", "text", "title", "updated_at") VALUES ($1, $2, $3, $4, $5, $6) RETURNING "id" [["created_at", "2014-05-25 17:39:42.767345"], ["locale", "en"], ["post_id", 2], ["text", "Text"], ["title", "Title"], ["updated_at", "2014-05-25 17:39:42.767345"]]
(1.2ms) COMMIT
=> #<Post id: 2, created_at: "2014-05-25 17:39:42", updated_at: "2014-05-25 17:39:42">(Notice that this was generating 3 INSERTs for a single record)
Update
If we wanted to update the translations, the gems handled that for us as well:
post.title_fr = "Mon titre français"
=> "Mon titre français"
post.save
(3.6ms) BEGIN
SQL (2.0ms) UPDATE "posts" SET "updated_at" = $1 WHERE "posts"."id" = 2 [["updated_at", "2014-05-25 17:56:34.915778"]]
SQL (3.4ms) UPDATE "post_translations" SET "title" = $1, "updated_at" = $2 WHERE "post_translations"."id" = 2 [["title", "Mon titre français"], ["updated_at", "2014-05-25 17:56:34.957760"]]
(1.4ms) COMMIT
=> trueincludes
To avoid the n+1 queries, we had to put the includes:
Post.includes(:translations)
Post Load (1.4ms) SELECT "posts".* FROM "posts"
Post::Translation Load (3.0ms) SELECT "post_translations".* FROM "post_translations" WHERE "post_translations"."post_id" IN (2, 3)
=> #<ActiveRecord::Relation [#<Post id: 2, created_at: "2014-05-25 17:31:44", updated_at: "2014-05-25 17:31:44">, #<Post id: 3, created_at: "2014-05-25 17:39:42", updated_at: "2014-05-25 17:39:42">]>Sometimes, we would also wanted to perform some queries directly tied to the translations.
For example, to retrieve all Posts having a title translated in french, we would have done this:
Post.includes(:translations).
where(post_translations: { locale: "fr" }).
where("post_translations.title IS NOT NULL")
SQL (1.6ms) SELECT "posts"."id" AS t0_r0, "posts"."created_at" AS t0_r1, "posts"."updated_at" AS t0_r2, "post_translations"."id" AS t1_r0, "post_translations"."post_id" AS t1_r1, "post_translations"."locale" AS t1_r2, "post_translations"."created_at" AS t1_r3, "post_translations"."updated_at" AS t1_r4, "post_translations"."title" AS t1_r5, "post_translations"."text" AS t1_r6 FROM "posts" INNER JOIN "post_translations" ON "post_translations"."post_id" = "posts"."id" WHERE "post_translations"."locale" = 'fr' AND (post_translations.title IS NOT NULL)
=> #<ActiveRecord::Relation [#<Post id: 2, created_at: "2014-05-25 17:39:42", updated_at: "2014-05-25 17:39:42">]>hstore to the rescue
Though, as we are using PostgreSQL, we thought we weren’t using the real potential of it and we could use something even clearer. I’m talking about hstore.
It makes it clearer simply because your fields are now in the same table than your other data.
As per the documentation, the hstore data type is for storing sets of key/value pairs within a single PostgreSQL value.
This is exactly what we need right?
For this purpose, we’re gonna use hstore-translate written by the awesome Rob Worley.
The great thing with hstore-translate is that it makes the thing really easy to switch from globalize + globalize-accessors to it as there are sharing the same API.
That also means that you cannot use globalize AND hstore-translate at the same time.
The model now looks like this:
class Post < ActiveRecord::Base
translates :title, :text
endWe don’t need to setup the accessible locales in the model as it is part of hstore-translate itself.
Create
To create the records, it works exactly the same way than previously but:
Post.create(title: "Title", text: "Text")
(3.0ms) BEGIN
SQL (1.7ms) INSERT INTO "posts" ("created_at", "text_translations", "title_translations", "updated_at") VALUES ($1, $2, $3, $4) RETURNING "id" [["created_at", "2014-05-25 18:51:07.848169"], ["text_translations", "\"en\"=>\"Text\""], ["title_translations", "\"en\"=>\"Title\""], ["updated_at", "2014-05-25 18:51:07.848169"]]
(1.1ms) COMMIT
=> #<Post id: 1, title_translations: {"en"=>"Title"}, text_translations: {"en"=>"Text"}, created_at: "2014-05-25 18:51:07", updated_at: "2014-05-25 18:51:07">
Post.create(title_en: "Title", text_en: "Text", title_fr: "Titre", text_fr: "Texte")
(1.0ms) BEGIN
SQL (2.2ms) INSERT INTO "posts" ("created_at", "text_translations", "title_translations", "updated_at") VALUES ($1, $2, $3, $4) RETURNING "id" [["created_at", "2014-05-25 18:48:50.470176"], ["text_translations", "\"en\"=>\"Text\",\"fr\"=>\"Texte\""], ["title_translations", "\"en\"=>\"Title\",\"fr\"=>\"Titre\""], ["updated_at", "2014-05-25 18:48:50.470176"]]
(1.2ms) COMMIT
=> #<Post id: 2, title_translations: {"en"=>"Title", "fr"=>"Titre"}, text_translations: {"en"=>"Text", "fr"=>"Texte"}, created_at: "2014-05-25 18:48:50", updated_at: "2014-05-25 18:48:50">Update
post.title_fr = "Titre Français"
=> "Titre Français"
post.save
(3.5ms) BEGIN
SQL (3.8ms) UPDATE "posts" SET "title_translations" = $1, "updated_at" = $2 WHERE "post_hstores"."id" = 2 [["title_translations", "\"en\"=>\"Title\",\"fr\"=>\"Titre Français\""], ["updated_at", "2014-05-25 20:05:17.925179"]]
(1.0ms) COMMIT
=> trueIt’s now faster as it’s tied to only one table, but it’s also easier to query:
Post.where("defined(title_translations, ?)", 'fr')
Post Load (1.4ms) SELECT "posts".* FROM "posts" WHERE (defined(title_translations, 'fr'))
=> #<ActiveRecord::Relation [#<Post id: 2, title_translations: {"en"=>"Title", "fr"=>"Titre"}, text_translations: {"en"=>"Text", "fr"=>"Texte"}, created_at: "2014-05-25 18:48:50", updated_at: "2014-05-25 18:48:50">]>Or even (provided by the gem):
Post.with_title_translation("Titre Français", "fr")
Post Load (3.1ms) SELECT "posts".* FROM "posts" WHERE ("title_translations" @> hstore('fr', 'Titre Français'))
=> #<ActiveRecord::Relation [#<Post id: 2, title_translations: {"en"=>"Title", "fr"=>"Titre Français"}, text_translations: nil, created_at: "2014-05-25 19:42:32", updated_at: "2014-05-25 20:13:54">]>Bonus Point
We can do something like this as well:
Post.where("defined(title_translations, ?)", 'fr').
pluck("title_translations -> 'fr'")
(1.5ms) SELECT title_translations -> 'fr' FROM "posts" WHERE (defined(title_translations, 'fr'))
=> ["Titre"]Migration
The migration has been really painless thanks to the shared API.
I’ve issued a Pull Request to ease the migration based on what we encountered ourselves.
If you are coming from globalize, you’ll probably have to migrate your data.
If so, here is an example of what we used to make it happen.
execute <<-SQL
UPDATE posts
SET
title_translations = translations.title
FROM (
SELECT
hstore(
array_agg(locale ORDER BY locale),
array_agg(post_translations.title ORDER BY locale)
) AS title,
post_id
FROM post_translations
GROUP BY post_id
) translations
WHERE translations.post_id = posts.id;
SQLConclusion
We are really happy about this choice to change our translation mechanism. It reduces the overhead of having the data related to a single record shared across several tables.
We are now able to provide to our clients more fields to translate and this is possible without impacting the performances.
Thanks for reading.