307 reads

Hierarchical Queries: Comparative Analysis in Oracle and PostgreSQL

by Andrei RogalenkoDecember 27th, 2023

Too Long; Didn't Read

A guide about hierarchical querying in Oracle & PostgreSQL, comparing syntax, efficiency, & suitability for diverse data structures.

People Mentioned

Company Mentioned

featured image - Hierarchical Queries: Comparative Analysis in Oracle and PostgreSQL

Most modern DBMSs (Database Management System) are relational, i.e., they represent data in the form of a two—dimensional table in which there are rows (records) and columns (fields of records). But in practice, we often come across a different data organization, namely hierarchical.

Hierarchical queries are an important tool in the field of data management, allowing you to work effectively with hierarchical data structures such as organizational trees, category trees, genealogical structures, etc. These queries play a key role in identifying relationships between data and extracting information in the form of tree structures.

The problem is that data with a hierarchical structure is very poorly represented in the relational model. The SQL-92 standard does not have the means to process them.

But such tools appeared in the SQL-1999 standard. However, by that time, Oracle already had its own CONNECT BY operator. Despite this, in SQL 1999, the syntax of recursive queries is completely different from the CONNECT BY syntax in Oracle and uses the WITH keyword. The implementation of recursive queries in other DBMS is somewhat late, so it appeared in MS SQL Server only in version 2005.

As well as in syntax, there are differences in terminology. In Oracle, commonly discussed queries are called “hierarchical,” while all others are “recursive.” The essence of this does not change; I will use both.

In this article, we will focus on the consideration of hierarchical queries in two popular database management systems - Oracle and PostgreSQL. Both platforms have unique features and approaches to processing hierarchical data, and our introduction to them will allow the reader to better understand the context of their application.

A hierarchical structure can be represented as a set of nodes, where each node is connected to one or more nodes at a higher level. Important terms include parents, children, leaves, and roots.Examples of hierarchical structures include company organizational trees, family family trees, category trees in online stores, class hierarchies in educational institutions, and more.

Implementation in Oracle

For demonstration, we will use a test employee table consisting of 3 fields:

id – ID,

name - is the name of the employee,

manager_id - is the ID of the manager (refers to the ID of another record in the same table)

create table test_employees (
	id number primary key,	
	name varchar2(32),
	manager_id number
);

Let's fill the table with test data according to the following tree:

insert into test_employees(id, name, manager_id)
values (1, 'Henry Anderson', null);

insert into test_employees(id, name, manager_id)
values (2, 'Julia Miller', 1);

insert into test_employees(id, name, manager_id)
values (3, 'Mark White', 1);

insert into test_employees(id, name, manager_id)
values (4, 'Paul Scott', 2);

insert into test_employees(id, name, manager_id)
values (5, 'Daniel Hill', 2);

insert into test_employees(id, name, manager_id)
values (6, 'Maria Lopez', 3);

insert into test_employees(id, name, manager_id)
values (7, 'Monika Moore', 3);

insert into test_employees(id, name, manager_id)
values (8, 'Alex Smith', 4);

insert into test_employees(id, name, manager_id)
values (9, 'Brandon Brown', 4);

insert into test_employees(id, name, manager_id)
values (10, 'Elizabeth Jackson', 5);

insert into test_employees(id, name, manager_id)
values (11, 'Kevin Green', 6);

insert into test_employees(id, name, manager_id)
values (12, 'Jessica Johnson', 7);

insert into test_employees(id, name, manager_id)
values (13, 'Lisa Williams', 7);

insert into test_employees(id, name, manager_id)
values (14, 'Angela Martinez', 7);

In Oracle, hierarchical queries appeared in version 8, long before the standard appeared. Therefore, a completely different syntax is still used.

The optional START WITH operator tells the Oracle where to start the loop, i.e. which row (or strings) will be the root. The condition can be almost anything, you can even use functions or internal queries: manager_id is null, or manager_id = 1, or even name like 'Henry%'.

The CONNECT BY operator is required. It establishes a relationship between the parent and child elements of the hierarchy. In the condition of the CONNECT BY operator, it is absolutely necessary to use the unary PRIOR operator, which refers to the previous record.

How does it work? Oracle finds the first record that satisfies the condition in START WITH, and starts looking for the next one. At the same time, that first entry can be accessed via PRIOR. If we did everything correctly, then Oracle will search for records in which the field for storing information about the parent (manager_id) will contain a value equal to the id of our first record.

This way, all descendants of the root record will be found. And since the process is recursive, a similar search will continue with each row found until all the descendants are found. It is also possible to use the rownum pseudo column, in which the rows are numbered starting from 1 in the order in which they are issued. And the pseudo-column LEVEL, which shows the level in the hierarchy, will also be very useful to us. So, the 1st record will have level 1, its descendants level 2, descendants of descendants - 3, etc.

select level, te.id, te.manager_id, te.name 
from test_employees te
start with te.manager_id is null
connect by prior te.id = te.manager_id
order siblings by te.id

Using the siblings keyword, we say that you need to sort only within one level of the hierarchy. This will become more clear if you remove all unnecessary fields in the request and add margins:

select lpad('     ', 3 * level) || te.name as Tree
from test_employees te
start with te.manager_id is null
connect by prior te.id = te.manager_id
order siblings by te.id

TREE
   Henry Anderson
      Julia Miller
         Paul Scott
            Alex Smith
            Brandon Brown
         Daniel Hill
            Elizabeth Jackson
      Mark White
         Maria Lopez
            Kevin Green
         Monika Moore
            Jessica Johnson
            Lisa Williams
            Angela Martinez

NOCYCLE is a parameter for excluding cyclic references. Use this parameter along with the CONNECT_BY_ISCYCLE pseudocolumn to see which rows contain the loop.

Implementation in PostgreSQL

We use the same test table.

create table test_employees (
	id integer primary key,	
	name text,
	manager_id integer
);

Queries to fill the table with data will look exactly the same as in Oracle.

with recursive employees(id, manager_id, name) as (
	select te.id, te.manager_id, te.name
	from test_employees te
	where te.manager_id is null

	union all
	
	select te.id, te.manager_id, te.name
	from employees e, test_employees te
	where  te.manager_id = e.id
	)
select id, manager_id, name from employees e;

Calculating a recursive query:

The non-recursive part is calculated. For UNION (but not UNION ALL), duplicate rows are discarded. All remaining rows are included in the result of the recursive query and are also placed in a temporary worktable. The non-recursive part is an analog of START WITH for an oracle query.
While the worksheet is not empty, the following steps are repeated:
1. The recursive part is calculated so that the recursive reference to the query itself refers to the current contents of the worksheet. For UNION (but not UNION ALL), duplicate rows and rows duplicating previously received ones are discarded. All remaining rows are included in the result of the recursive query and are also placed in a temporary intermediate table.
2. The contents of the worktable are replaced with the contents of the intermediate table, and then the intermediate table is cleared.

What are the other differences from the Oracle? There are no amenities like level, but they can be done on their own:

with recursive employees(id, level, manager_id, name) as (
	select te.id, 1, te.manager_id, te.name
	from test_employees te
	where te.manager_id is null

	union all
	
	select te.id, e.level + 1, te.manager_id, te.name
	from employees e, test_employees te
	where  te.manager_id = e.id
	)
select id, level, manager_id, name from employees e;

Recommendations for choosing between Oracle and PostgreSQL for hierarchical data

The decision between Oracle and PostgreSQL to process hierarchical data depends on a number of factors.

Data structure: If the data structure is a complex hierarchy with a large number of nesting levels, Oracle can provide more efficient query execution due to its optimized implementation.
The scale of the project: In the case of large enterprises with high performance and support requirements, Oracle can provide a complete set of tools and functionality to handle complex hierarchies.
Flexibility and extensibility: Open source-oriented PostgreSQL may be a more flexible solution for projects that need the freedom to adapt to changes in hierarchical data.

In general, the capabilities of PostgreSQL are about the same as in Oracle, although, in my opinion, it is a little less convenient and transparent.