SQL: Hierarchic Data
Today we’re going to cover a somewhat textbook exercise when working with SQL databases: Representing hierarchic data (as a tree) and implementing a query to flatten the tree.
SQL databases are great for many things - and terrible for others. Many projects start of with “just a simple SQL database” but eventually have to handle data which is not well suited for an SQL database. Lately I’ve been working on such a project where the need arose to represent a tree/graph in an SQL database.
Goal
Lets assume we want to store a tree structure / hierarchic data in an SQL database:
.
└── John
├── Peter
│ ├── Bob
│ ├── Carmen
│ └── Ella
├── Kim
│ ├── Steve
│ └── Lisa
│ ├── Alexander
│ ├── Sara
│ └── Daniel
│ ├── Kurt
│ └── Pascal
└── Martin
└── David
Our goal is to store & retrieve this data from an SQL database. We want to design a query which allows to retrieve the flattened tree structure:
root
root->John
root->John->Kim
root->John->Martin
root->John->Peter
root->John->Kim->Lisa
root->John->Kim->Steve
root->John->Martin->David
root->John->Peter->Bob
root->John->Peter->Carmen
root->John->Peter->Ella
root->John->Kim->Lisa->Alexander
root->John->Kim->Lisa->Daniel
root->John->Kim->Lisa->Sara
root->John->Kim->Lisa->Daniel->Kurt
root->John->Kim->Lisa->Daniel->Pascal
Implementation
A standard SQL database does not provide any means to store tree structures. The simplest approach, which will be outlined in this post, consists of recording the parent of each element.
Preparation
We’ll start off by creating a table named groups
:
CREATE TABLE IF NOT EXISTS groups
(
'id' INTEGER PRIMARY KEY AUTOINCREMENT,
'name' TEXT NOT NULL,
'parent' INTEGER,
FOREIGN KEY(parent) REFERENCES groups(id)
);
The name
column records the name of each element in the tree while the parent
column records the parent element. As the parent
has to reference another (existing) element from the same table we’ll setup a foreign key accordingly.
Once the table is created, we’ll add the data corresponding to the tree structure illustrated above:
INSERT INTO groups (id, name, parent)
VALUES
(1, 'root', null),
(2, 'John', 1),
(3, 'Peter', 2),
(4, 'Kim', 2),
(5, 'Martin', 2),
(6, 'Bob', 3),
(7, 'Carmen', 3),
(8, 'Ella', 3),
(9, 'Steve', 4),
(10, 'Lisa', 4),
(11, 'Alexander', 10),
(12, 'Sara', 10),
(13, 'Daniel', 10),
(14, 'Kurt', 13),
(15, 'Pascal', 13),
(16, 'David', 5);
Our table now looks like this:
+----+-----------+--------+
| id | name | parent |
+----+-----------+--------+
| 1 | root | |
| 2 | John | 1 |
| 3 | Peter | 2 |
| 4 | Kim | 2 |
| 5 | Martin | 2 |
| 6 | Bob | 3 |
| 7 | Carmen | 3 |
| 8 | Ella | 3 |
| 9 | Steve | 4 |
| 10 | Lisa | 4 |
| 11 | Alexander | 10 |
| 12 | Sara | 10 |
| 13 | Daniel | 10 |
| 14 | Kurt | 13 |
| 15 | Pascal | 13 |
| 16 | David | 5 |
+----+-----------+--------+
Query
Now we’re ready to design our query.
Inexperienced programmers would likely resort to querying the raw table data and flattening the tree on the client side. However, there is no need for this. With a simple recursive CTE (Common Table Expression) we can flatten the tree directly on the SQL server:
WITH RECURSIVE group_hierarchy AS (
SELECT id,
name,
parent,
'root' AS path
FROM groups
WHERE parent IS NULL
UNION ALL
SELECT
e.id,
e.name,
e.parent,
group_hierarchy.path || '->' || e.name
FROM groups e, group_hierarchy
WHERE e.parent = group_hierarchy.id
)
SELECT * FROM group_hierarchy;
Once we run the query, the following table will be returned:
+----+-----------+--------+---------------------------------------+
| id | name | parent | path |
+----+-----------+--------+---------------------------------------+
| 1 | root | | root |
| 2 | John | 1 | root->John |
| 4 | Kim | 2 | root->John->Kim |
| 5 | Martin | 2 | root->John->Martin |
| 3 | Peter | 2 | root->John->Peter |
| 10 | Lisa | 4 | root->John->Kim->Lisa |
| 9 | Steve | 4 | root->John->Kim->Steve |
| 16 | David | 5 | root->John->Martin->David |
| 6 | Bob | 3 | root->John->Peter->Bob |
| 7 | Carmen | 3 | root->John->Peter->Carmen |
| 8 | Ella | 3 | root->John->Peter->Ella |
| 11 | Alexander | 10 | root->John->Kim->Lisa->Alexander |
| 13 | Daniel | 10 | root->John->Kim->Lisa->Daniel |
| 12 | Sara | 10 | root->John->Kim->Lisa->Sara |
| 14 | Kurt | 13 | root->John->Kim->Lisa->Daniel->Kurt |
| 15 | Pascal | 13 | root->John->Kim->Lisa->Daniel->Pascal |
+----+-----------+--------+---------------------------------------+
And there you have it: Tree flattening directly on the SQL server!
The query above can be modified to fit your needs. For example, the separator (->
) can be replaced for something else. Retrieving only the flattened path is also quite easy by modifying the SELECT
statement at the bottom of the query accordingly.