Switch to: V9V8V7V6V5

Hierarchical (Recursive) Queries in Valentina SQL

Learn about how to use recursive queries in Valentina SQL for managing hierarchical data

Introduction

This article covers possible tasks and SQL queries on self-recursive tables:

  • Solving hierarchical storage problems with the help of standard SQL and/or SQL extensions
  • How Valentina SQL solves problems in one line of code instead of the 20-30 lines of other databases
  • How Valentina self recursive tables are faster, better than other solutions

Task Description

Assume we have a table with a self-recursive link: some record(s) as the parent record(s), and child-records having pointers to a parent record. This table contains a hierarchy of records.

In the example below, we use a common relational way to link records with the help of a PTR field pointing to the ID field of a parent record. The top record has the value of the PTR field equal to NULL; this means there is no parent to this record. It is possible however to have several root records in the table.

Such a hierarchical organization can mirror the organization of human relationships, threaded forums and anything else that can be described as a tree-structure.

CREATE TABLE t1 ( id LONG, ptr LONG, f1 String(20) )
 
INSERT INTO t1 VALUES ( 1, NULL, 'Root' )
INSERT INTO t1 VALUES ( 2, 1, 'Branch1' )
INSERT INTO t1 VALUES ( 3, 1, 'Branch2' )
INSERT INTO t1 VALUES ( 4, 3, 'SubBranch1' )
INSERT INTO t1 VALUES ( 5, 2, 'SubBranch2' )

EXISTING SQL Solutions

Single Query for Some Recursion Level

Child-records ON Nth Level

It is easy to see that getting records ON LEVEL 1 of a hierarchy is simple.

Inner joins produce a table with N + N fields of the original T1 table. Note that L0.* values are parents for L1.* ones - row-by-row. Each result row consist of parent and child values. Single-join queries produce a list of all possible pairs in a parent-child relationship.

To get a list of child records for record with ID = 2 we do the following:

SELECT L1.* 
FROM T1 L0 JOIN T1 L1 ON L0.id = L1.ptr 
WHERE L0.id = 2

Getting records on any Nth level requires more complex queries. With the previous single join, we get direct relationships only (parent-child). To get 3-level relations, use two nested joins. And again, each row looks like “parent fields - child fields - grandchild fields” (L0.* - L1.* - L2.*). See queries for the next levels 2 and 3:

SELECT L2.* 
FROM T1 L0 JOIN T1 L1 ON L0.id = L1.ptr
           JOIN T1 L2 ON L1.id = L2.ptr  
WHERE L1.id = 2
SELECT L3.*
FROM T1 L0 JOIN T1 L1 ON L0.id = L1.ptr
           JOIN T1 L2 ON L1.id = L2.ptr  
           JOIN T1 L3 ON L2.id = L3.ptr  
WHERE L2.id = 2

You can see that each next level requires one more JOIN. It is important to note that in this way you cannot write a query that works for ANY number of levels.

Child-records UP TO Nth Level

Assume we want to select not only the children but their parents, too. In other words, we need the row for the grandparent record, then the row for parent record, then for child and so on… N nested joins give us all this info but in a single row - “L0.*, L1.*, L2.*”. The solution can be as simple as N UNION operations:

    SELECT L1.* 
    FROM T1 L0 JOIN T1 L1 ON L0.id = L1.ptr
               JOIN T1 L2 ON L1.id = L2.ptr  
    WHERE L1.id = 2
 
UNION ALL
 
    SELECT L2.* 
    FROM T1 L0 JOIN T1 L1 ON L0.id = L1.ptr
               JOIN T1 L2 ON L1.id = L2.ptr  
    WHERE L1.id = 2

Again - you can only write a query for a (fixed) particular relation level only.

Stored Procedure Way

Descendants

There is a possible procedure (MS-SQL) to find all children for a passed ID:

CREATE PROCEDURE sp1 @id int
AS
	DECLARE @level int
 
	/* Creating temporary table for tree storage*/
	CREATE TABLE #temp_tree
	(
		level int,
		id int NOT NULL PRIMARY KEY,
		ptr int,
		f1 char(20)
	)
 
	INSERT #temp_tree (level, id, ptr, f1)
	SELECT 0, id, ptr, f1 FROM t1 WHERE id = @id
	SELECT @level = 0
 
	WHILE 1 = 1
	BEGIN
		INSERT #temp_tree (level, id, ptr, f1)
		SELECT 
			@level + 1, t1.id, t1.ptr, t1.f1	
		FROM 
			t1, #temp_tree tt
		WHERE 
				t1.ptr = tt.id
			AND tt.level = @level
 
		IF @@rowcount = 0 BREAK
		SELECT @level = @level + 1
	END
 
	SELECT id, ptr, f1, level
	FROM #temp_tree
	DROP TABLE #temp_Tree
GO
 
sp1 2
go
id ptr f1 level
2 1 Branch1 0
5 2 SubBranch2 1

Ancestors

There is a possible procedure (MS-SQL) to find all ancestors for passed ID.

CREATE PROCEDURE sp1 @id int
AS
	DECLARE @level int
 
	/* Creating temporary table for tree storage*/
	CREATE TABLE #temp_tree
	(
		level int,
		id int NOT NULL PRIMARY KEY,
		ptr int,
		f1 char(20)
	)
 
	INSERT #temp_tree (level, id, ptr, f1)
	SELECT 0, id, ptr, f1 FROM t1 WHERE id = @id
	SELECT @level = 0
 
	WHILE 1 = 1
	BEGIN
		INSERT #temp_tree (level, id, ptr, f1)
		SELECT 
			@level + 1, t1.id, t1.ptr, t1.f1	
		FROM 
			t1, #temp_tree tt
		WHERE 
				tt.ptr = t1.id
			AND tt.level = @level
 
		IF @@rowcount = 0 BREAK
		SELECT @level = @level + 1
	END
 
	SELECT id, ptr, f1, level
	FROM #temp_tree
	DROP TABLE #temp_Tree
GO
 
sp1 5
id ptr f1 level
1 NULL Root 2
2 1 Branch1 1
5 2 SubBranch2 0

Limitations of the Procedural Way

Performance
  1. Produces a temporary table.
  2. Each row in this table is the result of a join operation.
Reusability

If there are several hierarchical tables then you would need to duplicate such a procedure for every hierarchical table, in other words: the solution is not generic.

Recursive CTE

CTE, or Common Table Expressions are one way to work with recursive queries. This is way of SQL 1999 standard. See wikipedia for details.

Descendants

Let's find descendants for all levels. One way to resolve this is via recursive CTE:

WITH w1( id, ptr, f1, level ) AS 
(		SELECT 
			id, 
			ptr, 
			f1,
			0 AS level
		FROM 
			t1 
		WHERE 
			ptr IS NULL
	UNION ALL 
		SELECT 
			t1.id, 
			t1.ptr, 
			t1.f1,
			level + 1
		FROM 
			t1 JOIN w1 ON t1.ptr = w1.id 
) 
SELECT * FROM w1
id ptr f1 level
1 NULL Root 0
2 1 Branch1 1
3 1 Branch2 1
4 3 SubBranch1 2
5 2 SubBranch2 2

Ancestors

A similar approach might be used to find ancestors:

WITH w1( id, ptr, f1, level ) AS 
(		SELECT 
			id, 
			ptr, 
			f1,
			0 AS level
		FROM 
			t1 
		WHERE 
			id = 4
	UNION ALL 
		SELECT 
			t1.id, 
			t1.ptr, 
			t1.f1,
			level + 1
		FROM 
			t1 JOIN w1 ON w1.ptr = t1.id
) 
SELECT * FROM w1
id ptr f1 level
4 3 SubBranch1 0
3 1 Branch2 1
1 NULL Root 2

Limitations of Recursive CTE

  • PERFORMANCE: it is obvious that each step produces one more temporary table, UNIONs are execute, which require sorting and distincts, records copied between that temporary tables. See details of this e.g. on Postgre SQL page.
  • There are some limitations for using CTE recursive definitions

(For example you can not use recursive references within subquery (MSSQL-2005))

  • Moreover, assume you have to find all ancestors for some record, but the path is too long (say 100 and more ancestors). So you should increase the recursion limit, which implies as for the first example, that this solution is not generic.
  • And finally, this kind of query looks ugly and produces a lot of joins.

CONNECT BY (Oracle, ...)

Oracle have introduced CONNECT BY syntax. See wikipedia for details.

SELECT select_list
FROM table_expression
[ WHERE ... ]
[ START WITH start_expression ]
CONNECT BY { PRIOR parent_expr = child_expr | child_expr = PRIOR parent_expr }
[ ORDER SIBLINGS BY column1 [ ASC | DESC ] [, column2 [ ASC | DESC ] ] ...
[ GROUP BY ... ]
[ HAVING ... ]
...

Example:

SELECT LEVEL, LPAD (' ', 2 * (LEVEL - 1)) || ename "employee", empno, mgr "manager"
FROM emp 
START WITH mgr IS NULL
CONNECT BY PRIOR empno = mgr;

Limitations of Oracle

  • Rule START WITH … CONNECT BY is somehow in mess syntax. Look below to compare to Valentina solution, which we think is more elegant.
  • There is way go UP and DOWN, but there is no way to search WIDE.
  • It seems there is no way to exclude root-records.

NEW Solution in Valentina DB

Valentina introduces a new way of solving this kind of tasks. Valentina provides

  • very simple, natural SQL queries with non verbose syntax
  • queries are simple to write and to read
  • queries are very effective and fast on many reasons, e.g. because indexes of original table can be used easy
  • queries can express every aspect of a recursive task

This is the syntax of Valentina SQL commands for a recursive task:

vext_recursive_table         -- v4.1
    :   {ANCESTORS | BROTHERS | DESCENDANTS} OF vext_root_objects 
        [{TO|ON} LEVEL uint_or_var] 
        USING link_name | search_condition
        [WITH {ROOT | ROOTS}]
        [WITH {COLUMN | COLUMNS} vext_pseudo_column, ... ]          // added IN v5.0
 
vext_root_objects
    :   UINT
    |   variable_name
    |   ( search_condition )
 
vext_pseudo_column
    :  {LEVEL | IS_LEAF} [[AS] IDENT]
 
uint_or_var
    :   UINT
    |   variable_name

You can see that this command can search:

  • DOWN by recursion starting from a given parent record to find its child-records.
  • UP by recursion to find parent-records.
  • WIDE by recursion to find brother-records.

NOTE: a brother of X-record is a record Y, which is child of the same parent-record.

Link Instead of Table

Valentina command specifies not a self-recursive table, but a Link, which establishes this recursion. It is obvious that the usage of Valentina links is a cleaner and more efficient way to work with hierarchical data. The same table can exists with more than one Link, each of which defines its own hierarchy of objects. Since the Link must be recursive, it specifies the table explicitly.

Meanwhile, you can have no recursive link for some reason. In this case you can specify some recursive condition like “t1.id=t1.ptr” which will be used to build a temporary recursive link for this query.

Result is a Virtual Table

The result of this command is a new virtual table, which have the same fields as the original table, and contains the specified records. In the best case this command does not even produce a temporary table. You can consider these commands as a particular kind of VIEW.

TO/ON

This command allows you to search records ON a given recursion level only or collect all records up to a given level. If you do not specify a LEVEL at all then all recursive records will be collected into the result.

WITH ROOT(S)

The WITH ROOT(S) suffix will include into the result the root record(s) from which we start the recursive search.

Specification of Roots

Finally, this command gives you a very powerful way to specify a root record, using its RecID or Primary Key, or a set of root records, using a search condition of any complexity in the WHERE clause.

Example:

For our example we should define a link explicitly. You may choose any kind of link - RDB (Foreign Key based), ObjectPtr or BinaryLink. For this example we use the RDB link, because there are ID and PTR fields already, which look like Primary and Foreign keys.

ALTER TABLE t1 ADD CONSTRAINT PK PRIMARY KEY (id);
ALTER TABLE t1 ADD CONSTRAINT l1 FOREIGN KEY (ptr) REFERENCES t1(id) ON DELETE CASCADE;

Brothers

Find brothers for record 2

BROTHERS OF 2 ON LEVEL 1 USING l1
id ptr f1
3 1 Branch2

Find cousins for record 4

BROTHERS OF 4 ON LEVEL 2 USING l1
id ptr f1
5 2 SubBranch2

Ancestors/Descendants

Find all Ancestors of record 4:

ANCESTORS OF 4 TO LEVEL 2 USING l1
id ptr f1
3 1 Branch2
1 NULL Root

Find Ancestors of level 1 for record 4:

ANCESTORS OF 4 ON LEVEL 1 USING l1
id ptr f1
3 1 Branch2

Find Ancestors of level 2 for record 4:

ANCESTORS OF 4 ON LEVEL 2 USING l1
id ptr f1
1 NULL Root

Even - find Ancestors for records 4 and 5:

ANCESTORS OF (RecID IN(4,5)) ON LEVEL 1 USING l1
id ptr f1
2 1 Branch1
3 1 Branch2

Find subset of Ancestors for records 4 and 5:

SELECT * 
FROM (ANCESTORS OF (RecID IN(4,5)) ON LEVEL 1 USING l1)  
WHERE f1 = 'Branch2'
id ptr f1
3 1 Branch2

As looking for descendants looks very similar, it is omitted here.

Conclusion

You may see that the Valentina approach:

  • has a natural and clean syntax, with much less text to write.
  • is easier both to write and read a query.
  • has a unified syntax for all 3 possible kinds of queries: descendants, ancestors, brothers.
  • is much more flexible: needs only a few parameters to control any possible aspect of a recursive query.
  • uses much less resources (disk and RAM) because there is no need to produce intermediate JOIN tables.
  • is much faster because of above and
  • is much faster because implemented on the low level of Valentina Kernel instead of using SQL tricks with recursive CTE.

See Also