How to represent a world tree structure
In our main product (BRIDGE), a Store Locator, the SEO has to be really good. As such, we have to store a world tree structure in the database.
Use Case
A point of interest is attached to a world_node_id
which has to be a leaf.
In a perfect world, every country would share the same structure, but that’s definitely not the case.
Just to take a few, in France, we have 2 sub-levels:
-
the region (
Rhône-Alpes
) -
the department (
Savoie
)
In USA, there is only one sub-level (New-York
).
The United Kingdom, well… it’s complicated
So what if we need to efficiently retrieve the country name for all the points of interest in a single query?
Bonus: What if we need to count the number of points of interest per country or attached node/level?
The MySQL
way
Let’s take as an example how it would look like with awesome_nested_set
(the gem we were actually using).
Schema
This is how the structure would look like:
-
parent_id
retrieves the closest parent. -
depth
identifies how deep we are in the tree. -
rgt
andlft
are there to query the relationship over the nodes.
rgt
and lft
are used in a lot of queries like for ancestors
here, descendants
here or even leaf
here.
JOIN
s vs CTE
s
Back to our use case.
As a node is only aware about its closest parent, we need some JOIN
s and for the case of France it would look like this:
But as the number of levels is different over the countries, we can’t do that.
This is where a RECURSIVE CTE
would be helpful (This is one of the reason I PostgreSQL).
The WITH RECURSIVE
part will create a temporary table with the point_of_interest_id
and the country_id
as an output.
We will then be able to use that table as any other table into another query.
That works, but at some point, we had to admit it was painful.
Slow is slow
As seen before, awesome_nested_set
maintains a rgt
and lft
for every node.
This has a pretty negative impact when it comes to add more nodes to the tree. This because we have to rebuild all this stuff to make it usable.
The PostgreSQL way
What could be better than a ltree
datatype to represent a tree?
The gem ltree_hierarchy
(again wrote by the awesome Rob Worley ) has been built to answer that question.
ltree_hierarchy
shares the same API with awesome_nested_set
.
Let’s see how it works and how it changes the thing up.
Schema
The structure would now look like this:
-
tree_path
, as its name suggests, will store the whole path of the current node. -
parent_id
will only store the closest parent.
tree_path
changes everything.
JOIN
s the return
Back to the use case, here is a query that retrieves all the country names for every point of interest:
This is almost equivalent to the query that we would do for France in the previous case, but this time it’s compatible with everything.
Fast is fast
As we don’t have to compute and maintain the rgt
/lft
columns for every node when we update a single node, it’s really faster to populate the tree.
Benchmarks
I wanted to measure the impact on a script that adds 1200+ nodes in the existing tree.
With awesome_nested_set
, the script runs in 8 minutes.
With ltree_hierarchy
, the script runs in 1 minutes.
For this test, I only switched the library from awesome_nested_set
to ltree_hierarchy
.
8 times faster… not bad.
Conclusion
Few months after the switch from awesome_nested_set
to ltree_hierarchy
and so far so good.
We were massively caching the tree structure to answer faster to some queries. Every time we were performing an update in the tree we had to rebuild the cache.
The main problem is that we were not able to easily retrieve the informations we needed.
SQL query + Cache together aren’t good friends, are they?
We came back to the basics and we just got rid of the extra cache layer. We were also able to provide some JSON out of the database to reduce the impact of not relying on a caching layer anymore.
Note: the 14th of October, Rob transfered the ownership of his 2 gems (ltree_hierarchy
and hstore_translate
) to Leadformance.
The idea of these 2 gems has been discussed internally when we were exchanging about how to improve the things.
Out of these discussions, he developed the idea and open sourced them (faster than me ).
Rob.