visit
Most modern DBMSs (Database Management System) are relational, i.e., they represent data in the form of a two—dimensional table in which there are rows (records) and columns (fields of records). But in practice, we often come across a different data organization, namely hierarchical.
Hierarchical queries are an important tool in the field of data management, allowing you to work effectively with hierarchical data structures such as organizational trees, category trees, genealogical structures, etc. These queries play a key role in identifying relationships between data and extracting information in the form of tree structures.
But such tools appeared in the SQL-1999 standard. However, by that time, Oracle already had its own CONNECT BY
operator. Despite this, in SQL 1999, the syntax of recursive queries is completely different from the CONNECT BY
syntax in Oracle and uses the WITH
keyword. The implementation of recursive queries in other DBMS is somewhat late, so it appeared in MS SQL Server only in version 2005.
id
– ID,
name
- is the name of the employee,
manager_id
- is the ID of the manager (refers to the ID of another record in the same table)
create table test_employees (
id number primary key,
name varchar2(32),
manager_id number
);
Let's fill the table with test data according to the following tree:
insert into test_employees(id, name, manager_id)
values (1, 'Henry Anderson', null);
insert into test_employees(id, name, manager_id)
values (2, 'Julia Miller', 1);
insert into test_employees(id, name, manager_id)
values (3, 'Mark White', 1);
insert into test_employees(id, name, manager_id)
values (4, 'Paul Scott', 2);
insert into test_employees(id, name, manager_id)
values (5, 'Daniel Hill', 2);
insert into test_employees(id, name, manager_id)
values (6, 'Maria Lopez', 3);
insert into test_employees(id, name, manager_id)
values (7, 'Monika Moore', 3);
insert into test_employees(id, name, manager_id)
values (8, 'Alex Smith', 4);
insert into test_employees(id, name, manager_id)
values (9, 'Brandon Brown', 4);
insert into test_employees(id, name, manager_id)
values (10, 'Elizabeth Jackson', 5);
insert into test_employees(id, name, manager_id)
values (11, 'Kevin Green', 6);
insert into test_employees(id, name, manager_id)
values (12, 'Jessica Johnson', 7);
insert into test_employees(id, name, manager_id)
values (13, 'Lisa Williams', 7);
insert into test_employees(id, name, manager_id)
values (14, 'Angela Martinez', 7);
The optional START WITH
operator tells the Oracle where to start the loop, i.e. which row (or strings) will be the root. The condition can be almost anything, you can even use functions or internal queries: manager_id is null
, or manager_id = 1
, or even name like 'Henry%'
.
The CONNECT BY
operator is required. It establishes a relationship between the parent and child elements of the hierarchy. In the condition of the CONNECT BY
operator, it is absolutely necessary to use the unary PRIOR
operator, which refers to the previous record.
How does it work? Oracle finds the first record that satisfies the condition in START WITH
, and starts looking for the next one. At the same time, that first entry can be accessed via PRIOR
. If we did everything correctly, then Oracle will search for records in which the field for storing information about the parent (manager_id
) will contain a value equal to the id of our first record.
This way, all descendants of the root record will be found. And since the process is recursive, a similar search will continue with each row found until all the descendants are found. It is also possible to use the rownum pseudo column, in which the rows are numbered starting from 1 in the order in which they are issued. And the pseudo-column LEVEL
, which shows the level in the hierarchy, will also be very useful to us. So, the 1st record will have level 1, its descendants level 2, descendants of descendants - 3, etc.
select level, te.id, te.manager_id, te.name
from test_employees te
start with te.manager_id is null
connect by prior te.id = te.manager_id
order siblings by te.id
Using the siblings
keyword, we say that you need to sort only within one level of the hierarchy. This will become more clear if you remove all unnecessary fields in the request and add margins:
select lpad(' ', 3 * level) || te.name as Tree
from test_employees te
start with te.manager_id is null
connect by prior te.id = te.manager_id
order siblings by te.id
TREE
Henry Anderson
Julia Miller
Paul Scott
Alex Smith
Brandon Brown
Daniel Hill
Elizabeth Jackson
Mark White
Maria Lopez
Kevin Green
Monika Moore
Jessica Johnson
Lisa Williams
Angela Martinez
NOCYCLE
is a parameter for excluding cyclic references. Use this parameter along with the CONNECT_BY_ISCYCLE
pseudocolumn to see which rows contain the loop.
create table test_employees (
id integer primary key,
name text,
manager_id integer
);
with recursive employees(id, manager_id, name) as (
select te.id, te.manager_id, te.name
from test_employees te
where te.manager_id is null
union all
select te.id, te.manager_id, te.name
from employees e, test_employees te
where te.manager_id = e.id
)
select id, manager_id, name from employees e;
The non-recursive part is calculated. For UNION
(but not UNION ALL
), duplicate rows are discarded. All remaining rows are included in the result of the recursive query and are also placed in a temporary worktable. The non-recursive part is an analog of START WITH
for an oracle query.
with recursive employees(id, level, manager_id, name) as (
select te.id, 1, te.manager_id, te.name
from test_employees te
where te.manager_id is null
union all
select te.id, e.level + 1, te.manager_id, te.name
from employees e, test_employees te
where te.manager_id = e.id
)
select id, level, manager_id, name from employees e;
The decision between Oracle and PostgreSQL to process hierarchical data depends on a number of factors.