SQL: Hierarchic Data

Today we’re going to cover a somewhat textbook exercise when working with SQL databases: Representing hierarchic data (as a tree) and implementing a query to flatten the tree.

SQL databases are great for many things - and terrible for others. Many projects start of with “just a simple SQL database” but eventually have to handle data which is not well suited for an SQL database. Lately I’ve been working on such a project where the need arose to represent a tree/graph in an SQL database.

Goal

Lets assume we want to store a tree structure / hierarchic data in an SQL database:

.
└── John
    ├── Peter
    │   ├── Bob
    │   ├── Carmen
    │   └── Ella
    ├── Kim
    │   ├── Steve
    │   └── Lisa
    │       ├── Alexander
    │       ├── Sara
    │       └── Daniel
    │           ├── Kurt
    │           └── Pascal
    └── Martin
        └── David

Our goal is to store & retrieve this data from an SQL database. We want to design a query which allows to retrieve the flattened tree structure:

root
root->John
root->John->Kim
root->John->Martin
root->John->Peter
root->John->Kim->Lisa
root->John->Kim->Steve
root->John->Martin->David
root->John->Peter->Bob
root->John->Peter->Carmen
root->John->Peter->Ella
root->John->Kim->Lisa->Alexander
root->John->Kim->Lisa->Daniel
root->John->Kim->Lisa->Sara
root->John->Kim->Lisa->Daniel->Kurt
root->John->Kim->Lisa->Daniel->Pascal

Implementation

A standard SQL database does not provide any means to store tree structures. The simplest approach, which will be outlined in this post, consists of recording the parent of each element.

Preparation

We’ll start off by creating a table named groups:

CREATE TABLE IF NOT EXISTS groups
(
    'id' INTEGER PRIMARY KEY AUTOINCREMENT,
    'name' TEXT NOT NULL,
    'parent' INTEGER,

    FOREIGN KEY(parent) REFERENCES groups(id)
);

The name column records the name of each element in the tree while the parent column records the parent element. As the parent has to reference another (existing) element from the same table we’ll setup a foreign key accordingly.

Once the table is created, we’ll add the data corresponding to the tree structure illustrated above:

INSERT INTO groups (id, name, parent)
VALUES
    (1, 'root', null),
    (2, 'John', 1),
    (3, 'Peter', 2),
    (4, 'Kim', 2),
    (5, 'Martin', 2),
    (6, 'Bob', 3),
    (7, 'Carmen', 3),
    (8, 'Ella', 3),
    (9, 'Steve', 4),
    (10, 'Lisa', 4),
    (11, 'Alexander', 10),
    (12, 'Sara', 10),
    (13, 'Daniel', 10),
    (14, 'Kurt', 13),
    (15, 'Pascal', 13),
    (16, 'David', 5);

Our table now looks like this:

+----+-----------+--------+
| id |   name    | parent |
+----+-----------+--------+
| 1  | root      |        |
| 2  | John      | 1      |
| 3  | Peter     | 2      |
| 4  | Kim       | 2      |
| 5  | Martin    | 2      |
| 6  | Bob       | 3      |
| 7  | Carmen    | 3      |
| 8  | Ella      | 3      |
| 9  | Steve     | 4      |
| 10 | Lisa      | 4      |
| 11 | Alexander | 10     |
| 12 | Sara      | 10     |
| 13 | Daniel    | 10     |
| 14 | Kurt      | 13     |
| 15 | Pascal    | 13     |
| 16 | David     | 5      |
+----+-----------+--------+

Query

Now we’re ready to design our query.

Inexperienced programmers would likely resort to querying the raw table data and flattening the tree on the client side. However, there is no need for this. With a simple recursive CTE (Common Table Expression) we can flatten the tree directly on the SQL server:

WITH RECURSIVE group_hierarchy AS (
  SELECT    id,
            name,
            parent,
            'root' AS path
  FROM groups
  WHERE parent IS NULL

  UNION ALL

SELECT
    e.id,
    e.name,
    e.parent,
    group_hierarchy.path || '->' || e.name
  FROM groups e, group_hierarchy
  WHERE e.parent = group_hierarchy.id
)
SELECT * FROM group_hierarchy;

Once we run the query, the following table will be returned:

+----+-----------+--------+---------------------------------------+
| id |   name    | parent |                 path                  |
+----+-----------+--------+---------------------------------------+
| 1  | root      |        | root                                  |
| 2  | John      | 1      | root->John                            |
| 4  | Kim       | 2      | root->John->Kim                       |
| 5  | Martin    | 2      | root->John->Martin                    |
| 3  | Peter     | 2      | root->John->Peter                     |
| 10 | Lisa      | 4      | root->John->Kim->Lisa                 |
| 9  | Steve     | 4      | root->John->Kim->Steve                |
| 16 | David     | 5      | root->John->Martin->David             |
| 6  | Bob       | 3      | root->John->Peter->Bob                |
| 7  | Carmen    | 3      | root->John->Peter->Carmen             |
| 8  | Ella      | 3      | root->John->Peter->Ella               |
| 11 | Alexander | 10     | root->John->Kim->Lisa->Alexander      |
| 13 | Daniel    | 10     | root->John->Kim->Lisa->Daniel         |
| 12 | Sara      | 10     | root->John->Kim->Lisa->Sara           |
| 14 | Kurt      | 13     | root->John->Kim->Lisa->Daniel->Kurt   |
| 15 | Pascal    | 13     | root->John->Kim->Lisa->Daniel->Pascal |
+----+-----------+--------+---------------------------------------+

And there you have it: Tree flattening directly on the SQL server!

The query above can be modified to fit your needs. For example, the separator (->) can be replaced for something else. Retrieving only the flattened path is also quite easy by modifying the SELECT statement at the bottom of the query accordingly.

comments powered by Disqus