Table of Contents
Hierarchical (Recursive) Queries in Valentina SQL
Learn about how to use recursive queries in Valentina SQL for managing hierarchical data
Introduction
This article covers possible tasks and SQL queries on self-recursive tables:
- Solving hierarchical storage problems with the help of standard SQL and/or SQL extensions
- How Valentina SQL solves problems in one line of code instead of the 20-30 lines in other databases
- How Valentina self recursive tables are faster, better than other solutions
Task Description
Let`s assume that we have a table with a self-recursive link: some record(s) as the parent record(s), and child-records having pointers to a parent record. Such table contains a hierarchy of records.
In the example below, we use a common relational way to link records with the help of a PTR field pointing to the ID field of a parent record. The top record has the value of the PTR field equal to NULL; that means there is no parent to such record. It is possible however to have several root records in the table.
Such a hierarchical organization can reflect the organization of human relationships, threaded forums and anything else that can be described as a tree-structure.
CREATE TABLE t1 ( id LONG, ptr LONG, f1 String(20) ) INSERT INTO t1 VALUES ( 1, NULL, 'Root' ) INSERT INTO t1 VALUES ( 2, 1, 'Branch1' ) INSERT INTO t1 VALUES ( 3, 1, 'Branch2' ) INSERT INTO t1 VALUES ( 4, 3, 'SubBranch1' ) INSERT INTO t1 VALUES ( 5, 2, 'SubBranch2' )
EXISTING SQL Solutions
Single Query for Some Recursion Level
Child-records ON Nth Level
It is easy to see that getting records ON LEVEL 1 of a hierarchy is simple.
Inner joins produce a table with N + N fields of the original T1 table. Note that L0.* values are parents for L1.* ones - row-by-row. Each result row consists of parent and child values. Single-join queries produce a list of all possible pairs in a parent-child relationship.
To get a list of child records for record with ID = 2 we should do the following:
SELECT L1.* FROM T1 L0 JOIN T1 L1 ON L0.id = L1.ptr WHERE L0.id = 2
Getting records on any Nth level requires more complex queries. With the previous single join, we get direct relationships only (parent-child). To get 3-level relations, we should use two nested joins. And again, each row looks like “parent fields - child fields - grandchild fields” (L0.* - L1.* - L2.*). Look at the queries for the next levels 2 and 3:
SELECT L2.* FROM T1 L0 JOIN T1 L1 ON L0.id = L1.ptr JOIN T1 L2 ON L1.id = L2.ptr WHERE L1.id = 2
SELECT L3.* FROM T1 L0 JOIN T1 L1 ON L0.id = L1.ptr JOIN T1 L2 ON L1.id = L2.ptr JOIN T1 L3 ON L2.id = L3.ptr WHERE L2.id = 2
You can see that each next level requires one more JOIN. It is important to note that this way you cannot write a query that works for ANY number of levels.
Child-records UP TO Nth Level
Let`s assume that we want to select not only the children but their parents too. In other words, we need the row for the grandparent record, then the row for the parent record, then for the child one and so on… N nested joins give us all this info but in a single row - “L0.*, L1.*, L2.*”. The solution can be as simple as N UNION operations:
SELECT L1.* FROM T1 L0 JOIN T1 L1 ON L0.id = L1.ptr JOIN T1 L2 ON L1.id = L2.ptr WHERE L1.id = 2 UNION ALL SELECT L2.* FROM T1 L0 JOIN T1 L1 ON L0.id = L1.ptr JOIN T1 L2 ON L1.id = L2.ptr WHERE L1.id = 2
Again - you can only write a query for a (fixed) particular relation level only.
Stored Procedure Way
Descendants
There is a possible procedure (MS-SQL) to find all children for a passed ID:
CREATE PROCEDURE sp1 @id INT AS DECLARE @level INT /* Creating temporary table for tree storage*/ CREATE TABLE #temp_tree ( level INT, id INT NOT NULL PRIMARY KEY, ptr INT, f1 CHAR(20) ) INSERT #temp_tree (level, id, ptr, f1) SELECT 0, id, ptr, f1 FROM t1 WHERE id = @id SELECT @level = 0 WHILE 1 = 1 BEGIN INSERT #temp_tree (level, id, ptr, f1) SELECT @level + 1, t1.id, t1.ptr, t1.f1 FROM t1, #temp_tree tt WHERE t1.ptr = tt.id AND tt.level = @level IF @@rowcount = 0 BREAK SELECT @level = @level + 1 END SELECT id, ptr, f1, level FROM #temp_tree DROP TABLE #temp_Tree GO sp1 2 GO
id | ptr | f1 | level |
---|---|---|---|
2 | 1 | Branch1 | 0 |
5 | 2 | SubBranch2 | 1 |
Ancestors
There is a possible procedure (MS-SQL) to find all ancestors for passed ID.
CREATE PROCEDURE sp1 @id INT AS DECLARE @level INT /* Creating temporary table for tree storage*/ CREATE TABLE #temp_tree ( level INT, id INT NOT NULL PRIMARY KEY, ptr INT, f1 CHAR(20) ) INSERT #temp_tree (level, id, ptr, f1) SELECT 0, id, ptr, f1 FROM t1 WHERE id = @id SELECT @level = 0 WHILE 1 = 1 BEGIN INSERT #temp_tree (level, id, ptr, f1) SELECT @level + 1, t1.id, t1.ptr, t1.f1 FROM t1, #temp_tree tt WHERE tt.ptr = t1.id AND tt.level = @level IF @@rowcount = 0 BREAK SELECT @level = @level + 1 END SELECT id, ptr, f1, level FROM #temp_tree DROP TABLE #temp_Tree GO sp1 5
id | ptr | f1 | level |
---|---|---|---|
1 | NULL | Root | 2 |
2 | 1 | Branch1 | 1 |
5 | 2 | SubBranch2 | 0 |
Limitations of the Procedural Way
Performance
- Produces a temporary table.
- Each row in this table is the result of a join operation.
Reusability
If there are several hierarchical tables you will have to duplicate such a procedure for every hierarchical table. In other words, the solution is not generic.
Recursive CTE
CTE, or Common Table Expressions are only one way to work with recursive queries. This is way of SQL 1999 standard. See wikipedia for details.
Descendants
Let's find descendants for all the levels. One way to resolve this is via recursive CTE:
WITH w1( id, ptr, f1, level ) AS ( SELECT id, ptr, f1, 0 AS level FROM t1 WHERE ptr IS NULL UNION ALL SELECT t1.id, t1.ptr, t1.f1, level + 1 FROM t1 JOIN w1 ON t1.ptr = w1.id ) SELECT * FROM w1
id | ptr | f1 | level |
---|---|---|---|
1 | NULL | Root | 0 |
2 | 1 | Branch1 | 1 |
3 | 1 | Branch2 | 1 |
4 | 3 | SubBranch1 | 2 |
5 | 2 | SubBranch2 | 2 |
Ancestors
A similar approach may be used to find the ancestors:
WITH w1( id, ptr, f1, level ) AS ( SELECT id, ptr, f1, 0 AS level FROM t1 WHERE id = 4 UNION ALL SELECT t1.id, t1.ptr, t1.f1, level + 1 FROM t1 JOIN w1 ON w1.ptr = t1.id ) SELECT * FROM w1
id | ptr | f1 | level |
---|---|---|---|
4 | 3 | SubBranch1 | 0 |
3 | 1 | Branch2 | 1 |
1 | NULL | Root | 2 |
Limitations of Recursive CTE
- PERFORMANCE: it is obvious that each step produces one more temporary table, UNIONs are execute, which requires sorting and distincts, records copied between that temporary tables. See the details of this e.g. on Postgre SQL page.
- There are some limitations for using CTE recursive definitions
(For example you can not use recursive references within subquery (MSSQL-2005))
- Moreover, assume you have to find all ancestors for some record, but the path is too long (say 100 and more ancestors). So you should increase the recursion limit, which implies as for the first example, that this solution is not generic.
- And finally, this kind of query looks ugly and produces a lot of joins.
CONNECT BY (Oracle, ...)
Oracle have introduced CONNECT BY syntax. See wikipedia for details.
SELECT select_list FROM table_expression [ WHERE ... ] [ START WITH start_expression ] CONNECT BY { PRIOR parent_expr = child_expr | child_expr = PRIOR parent_expr } [ ORDER SIBLINGS BY column1 [ ASC | DESC ] [, column2 [ ASC | DESC ] ] ... [ GROUP BY ... ] [ HAVING ... ] ...
Example:
SELECT LEVEL, LPAD (' ', 2 * (LEVEL - 1)) || ename "employee", empno, mgr "manager" FROM emp START WITH mgr IS NULL CONNECT BY PRIOR empno = mgr;
Limitations of Oracle
- The rule
START WITH … CONNECT BY
is somehow in mess syntax. Look below to compare with Valentina solution, which we think is more elegant.
- There is the way to go UP and DOWN, but there is no way to search WIDE.
- It seems there is no way to exclude root-records.
NEW Solution in Valentina DB
Valentina introduces a new way of solving this kind of tasks. Valentina provides:
- very simple, natural SQL queries with non verbose syntax
- queries simple to write and to read
- queries effective and fast on many reasons, e.g. because indexes of original table can be used easily
- queries can express every aspect of a recursive task
This is the syntax of Valentina SQL commands for a recursive task:
vext_recursive_table -- v4.1 : {ANCESTORS | BROTHERS | DESCENDANTS} OF vext_root_objects [{TO|ON} LEVEL uint_or_var] USING link_name | search_condition [WITH {ROOT | ROOTS}] [WITH {COLUMN | COLUMNS} vext_pseudo_column, ... ] // added IN v5.0 vext_root_objects : UINT | variable_name | ( search_condition ) vext_pseudo_column : {LEVEL | IS_LEAF} [[AS] IDENT] uint_or_var : UINT | variable_name
You can see that this command can search:
- DOWN by recursion starting from a given parent record to find its child-records.
- UP by recursion to find parent-records.
- WIDE by recursion to find brother-records.
NOTE: a brother of X-record is a record Y, which is a child of the same parent-record.
Link Instead of Table
Valentina command specifies not a self-recursive table, but a Link, which establishes this recursion. It is obvious that the usage of Valentina links is a cleaner and more efficient way to work with hierarchical data. The same table can exist with more than one Link, each one of which defines its own hierarchy of objects. Since the Link must be recursive, it specifies the table explicitly.
Meanwhile, you can have no recursive link for some reason. In this case you can specify some recursive condition like “t1.id=t1.ptr” which will be used to build a temporary recursive link for this query.
Result is a Virtual Table
The result of this command is a new virtual table, which has the same fields as the original one, and contains the specified records. In the best case this command does not even produce a temporary table. You can consider these commands as a particular kind of VIEW.
TO/ON
This command allows you to search records ON a given recursion level only or collect all records up to a given level. If you do not specify a LEVEL at all then all the recursive records will be collected in the result.
WITH ROOT(S)
The WITH ROOT(S) suffix will include into the result the root record(s) from which we start the recursive search.
Specification of Roots
Finally, this command gives you a very powerful way to specify a root record, using its RecID or Primary Key, or a set of root records, using a search condition of any complexity in the WHERE clause.
Example:
For our example we should define a link explicitly. You may choose any kind of link - RDB (Foreign Key based), ObjectPtr or BinaryLink. We use the RDB link in this example, because there are ID and PTR fields already, which looks like Primary and Foreign keys.
ALTER TABLE t1 ADD CONSTRAINT PK PRIMARY KEY (id); ALTER TABLE t1 ADD CONSTRAINT l1 FOREIGN KEY (ptr) REFERENCES t1(id) ON DELETE CASCADE;
Brothers
Find brothers for record 2
BROTHERS OF 2 ON LEVEL 1 USING l1
id | ptr | f1 |
---|---|---|
3 | 1 | Branch2 |
Find cousins for record 4
BROTHERS OF 4 ON LEVEL 2 USING l1
id | ptr | f1 |
---|---|---|
5 | 2 | SubBranch2 |
Ancestors/Descendants
Find all Ancestors of record 4:
ANCESTORS OF 4 TO LEVEL 2 USING l1
id | ptr | f1 |
---|---|---|
3 | 1 | Branch2 |
1 | NULL | Root |
Find Ancestors of level 1 for record 4:
ANCESTORS OF 4 ON LEVEL 1 USING l1
id | ptr | f1 |
---|---|---|
3 | 1 | Branch2 |
Find Ancestors of level 2 for record 4:
ANCESTORS OF 4 ON LEVEL 2 USING l1
id | ptr | f1 |
---|---|---|
1 | NULL | Root |
Even - find Ancestors for records 4 and 5:
ANCESTORS OF (RecID IN(4,5)) ON LEVEL 1 USING l1
id | ptr | f1 |
---|---|---|
2 | 1 | Branch1 |
3 | 1 | Branch2 |
Find subset of Ancestors for records 4 and 5:
SELECT * FROM (ANCESTORS OF (RecID IN(4,5)) ON LEVEL 1 USING l1) WHERE f1 = 'Branch2'
id | ptr | f1 |
---|---|---|
3 | 1 | Branch2 |
Given that looking for descendants looks very similar, it is omitted here.
Conclusion
You may see that the Valentina approach:
- has a natural and clean syntax, with much less text to write.
- is easier both to write and read a query.
- has a unified syntax for all 3 possible kinds of queries: descendants, ancestors, brothers.
- is much more flexible: needs only a few parameters to control any possible aspect of a recursive query.
- uses much less resources (disk and RAM) because there is no need to produce intermediate JOIN tables.
- is much faster because of the foregoing
- is much faster because implemented on the low level of Valentina Kernel instead of using SQL tricks with recursive CTE.
See Also
- Valentina SQL Reference: HIERARCHICAL (RECURSIVE) QUERIES for a description of the exact syntax of these commands and examples.