SQL Hierarchical Queries without Recursion and Cursors (The life before CTE in SQL Server)
…And Why We Need That Now! (A Modern Use Case)
Remember those old times when we had to write scripts to find the ancestors or descendants of a particular node in a hierarchical tree structure stored within the RDBMS table. Recall that scripting circus we went through using cursors and/or recursive functions, which possibly triggered various side effects and performance issues.
In our case, as high traffic was expected in the application, we were totally against the use of cursors or recursions in our project back then. Still, we couldn’t find any online or offline solutions or any sort of suggestions from others for a better alternative. However, I had an intuition that it could be possible and decided to give it a try anyway. Eventually created a simple and elegant solution using a ‘while’ loop and a temporary table with identity column, which saved us a lot of time, effort and of course some extra CPU cycles.
After the introduction of Common Type Expression (CTE) which was released with Microsoft SQL Server 2005 version, a temporary named result set mechanism, I never had to use the above-mentioned legacy solution. But surprisingly something happened recently in a project and I had to resort to the same classical solution once again. Actually, that inspired me to share the original script here after a decade and half. I guess it might be helpful for the people who are facing similar issues with corrupted hierarchical data.
Let’s start from the beginning. Check the below multiple organizations tree structure diagram
The Pre-SQL Server 2005 era
For all those who started their career before SQL2005 mostly had nightmares about writing hierarchical queries? For instance, we had a table to manage multi-level organizations (or customers in a multi-level marketing application) where each row has parent organization id column which might be null for root organization rows. We were asked to write the script to fetch all organizations under a given organization id which was passed as the input parameter. We did not want to use the cursors and recursive functions for obvious performance issues. Only option left was to use the middle layer programming language to process it and cache it or send it back to database for additional retrieval processing or updates etc. Of course, it won’t be a recommended solution for a high traffic application or where frequent updates happening with the hierarchical data.
Recursive functions and cursors
Using a recursive table function and cursor, we can iterate through all rows and load data to the table variable or temp table to use in subsequent queries. But that needs multiple iterations depending upon the total levels and also may need expensive cursors and unwanted locks on data. This add extra load on the processor and memory thereby causing significant performance issues. So, what was the workaround?
A better solution using a while loop and a table variable or temp table
Sounds impossible? Well not at all. There is a quite easy and simple trick.
Let’s go by above example:
Created a table to hold multi-level organization structure with parent child relationship as given in the above diagram where four root organizations and their descendant organizations were inserted with special naming pattern for readability. Check the GitHub repository given below to see the scripts in proper readable form.
[https://github.com/mailsanish/HierarchicalQueries]
Create/Insert Script:
Result:
The “identity” magic with @table variable inside a while loop
Assume the input is the org id for which we need to find all the child organizations up to the last level.
Script:
Result:
The Logic:
The solution is simple, the table variable (or temporary table) is created with an identity column (1,1), and is pre-populated with the starting value which was passed as the param value. And when the child elements selected (based on the parent id same as the above first value) are inserted into the temp table, the identity column values created automatically in the order (seed value as 1 and increment by 1). Then we loop again by picking one by one using a manual counter which point to the current id in the temp table and next level child elements are being added and this continues till the counter crosses the max value of identity field value (when no more rows to process).
We can use temporary #table instead of @table variable. Even we can use global temporary ##table if needed. But make the decision wisely based on the project requirements as each one has its own cons and pros.
Post SQL Server 2005 Era (Introduction of CTE)
But from SQL Server 2005 onwards, with the introduction of CTE (Common Table Expression) we have a straight forward solution. It is quite easy to find descendants and ancestors easily.
Result:
Finding ancestors instead of descendants
If we need to find all the ancestors of a particular organization and not descendants, that can be easily achieved by a small change.
Just interchange Id and ParentId in the script using CTE.
Result:
Notes:
The similar solution using the classic while loop like the one above is uploaded to the GitHub repository. Please download it from there in case you are interested.
[https://github.com/mailsanish/HierarchicalQueries]
Solution in Oracle Database
Similar solution exists in oracle by using CONNECT BY PRIOR and START WITH keywords. Comparing with SQL Server, we may need a temporary table to populate the result set to use in subsequent queries where in SQL Server CTE can be used directly in following queries.
Oracle Script:
Notes:
Script to find the ancestors in Oracle DB is uploaded to the GitHub repository given below. Also, the scripts to create table and insert demo data are available there.
[https://github.com/mailsanish/HierarchicalQueries]
Other Databases
Similar solution exists in MySQL and other related databases. And incase if there is no CTE like solutions available, we can use the above mentioned technique by simply using basic while loop and temporary table and it will do the magic for us.
Do we need the legacy script now?
Using CTE for hierarchical queries in latest SQL Server versions saves a few extra lines and also has a better readability. But if we need to work with pre-2005 versions, the first script would be our best option.
But there is another scenario I stumbled upon in the recent past, where I found the old method really handy. It helped us to identify the corrupted data in the organization tree structure. An organization was wrongly updated with a parent id which is actually the id of child organization, that means, it ended up in cyclic dependency. I was using CTE within the stored procedure to find the ancestors of a given organization id. It started to fail with following error message:
“The statement terminated. The maximum recursion 100 has been exhausted before statement completion.”
CTE failed with the above error but didn’t say where exactly it failed. That made it hard to identify which entry or entries caused the issue. The data in the table were huge and it was hard to identify by checking manually by following each organization ancestry. Hence, after weighing different options, I thought of getting help from my old savior by making some tweaks here and there. After that the while loop looked like below (deliberately kept some subqueries for better understanding):
Result:
To test the above script, change the parent of B1 from NULL to 12, which is the id of one of the descendants. And we will get this:
Notes:
The above script was to identify the invalid entry while searching descendants. Similarly, changes were made to find invalid ancestors by tweaking the script in slightly different way. Please find the script for that (in case you need) in the GitHub repository. [https://github.com/mailsanish/HierarchicalQueries]
Final Words
After 18 years now, it may look like a low-level solution, however back then it was a life saver for all of us which we used in various projects. And as I mentioned just above it is really a handy tool to easily identify the corrupted data in a hierarchical data while CTE simply fails with an error. Moreover, all we need are just a while loop and a temporary table with an identity column. At least it saved us weeks of labor. Smart work over hard work :)
Please feel free to share your thoughts.
Links
Complete scripts available on the Git Repository.
GitHub: https://github.com/mailsanish/HierarchicalQueries
LinkedIn: https://www.linkedin.com/in/sanishabraham/
Medium: https://medium.com/@sanish.abraham/sql-hierarchical-queries-without-recursion-and-cursors-the-life-before-cte-in-sql-server-bf7745e382de
Keywords:
SQLServer, Oracle, Hierarchical Queries, Without Cursor and Recursion, While loop, Temporary table, Table variable, Identity column, CTE, Finding Circular Reference, Cyclic Dependency, Cyclic Complexity, Finding invalid data in parent child relationship table, Multi-level table structure
Comments
Post a Comment