Recursive CTE in SQL Server with examples

Recursive CTE in SQL Server with examples

Table of contents

After understanding the basics of CTE, Multiple CTEs, and Nested CTEs in the previous post of this CTE series, we will be exploring Recursive CTE in this post. Recursion takes place when we invoke the block of code inside itself to perform repeated procedural loops. Likewise, Recursive CTE is used to perform repeated procedural loops, the recursive query will call itself until the query satisfies the clause in which condition to terminate the recursion. This makes it clear that Recursive CTE refers to itself and is helpful where we are dealing with hierarchical structures or where the pattern is repeating like employees hierarchy in the company, railway connections, applications navigational menus, etc.

Let's create a simple Recursive query to display rows containing 1 to 10.

with counter(previous_value) as (
  select
    1                            -- anchor part

  union all            

  select
    previous_value + 1
  from counter                    -- recursive part
  where previous_value < 10        -- termination clause
)

select
  *
from counter;                    -- invocation with outer query

recursive cte example generate number from 1 to 10.PNG

The above SQL snippet showcases the simple Recursive CTE implementation. Let's analyze each part of the above script in detail. Recursive CTE is similar to normal CTE with the exception that we must give a list of columns inside the parentheses, so we can refer to the column value and use it as the recursive part of the query. Our CTE named Counter has only one column named previous_value. In the above SQL snippets, we can see the commented words for some lines, mentioning what it is called. These are also essential elements of the Recursive CTE: an anchor member, a recursive member, and a termination clause.

As the name suggests, an anchor member is something like an anchor point or a starting point to start with. Like in our snippet, we wanted to show a number between 1 to 10, in which case the starting point or anchor point is number 1. In the above SQL snippet, we expressed the anchor member as "select 1".

Then comes the recursive member which tells us what to do in each step. In our case, after expressing the anchor member and considering number 1 as starting point, we will add another row with a number that is incrementing by 1 than the most recently generated row. The below snippets depict recursive members in our original snippet above generating a number between 1 to 10. "select previous_value + 1 from counter" At this point, the CTE Counter refers to itself (FROM Counter).

After this comes the termination clause part, wherein the condition mentioned helps us to eventually stop our recursion, otherwise, it will be an endless loop and never stops. In our case "where previous_value < 10 " means that we will continue recursion until we reach number 10 and then terminate.

Do remember that only UNION ALL is permitted in a recursive CTE. To remove the duplicate values, we can use DISTINCT in the invocation part or outer query. And we can have n number of columns in our recursive CTE, but the only thing is both the anchor part and recursive part should have the same number of columns.

MAXRECURSION

In the above SQL snippet, we tried to generate numbers between 1 to 10, but want if we wanted to generate numbers from say 1 to 150. our CTE will terminate with an error stating that "The statement terminated. The maximum recursion 100 has been exhausted before statement completion." This is because, as mentioned in the error statement by default SQL only allows a maximum recursion depth of 100. We can change the default setting of the recursion depth with the MAXRECURSION n option, so we can use this option when we are sure of the computations and depth we require. Let's illustrate by modifying the previous SQL snippet and generating numbers from 1 to 150 this time.

with counter(previous_value) as (
  select
    1                            

  union all            

  select
    previous_value + 1
  from counter                    
  where previous_value < 150        
)

select
  *
from counter
option (MAXRECURSION 150);

The part "OPTION (MAXRECURSION 150)" in the invocation or outer query part tells SQL Server to override the default recursion depth and set it to 150. However, do note that valid values for the integers are between 0 and 32767. Here 0 value means that there is no limit on the depth of recursion and the query ay run infinitely which is risky, but you may require it to write a very deep recursion.

Let's see one practical illustration of recursive CTE for a hierarchical / tree-based structure with our classic example. We have a company with an organizational hierarchy in place, i.e. each employee has a superior (except for the boss). An employee has precisely one superior, but one superior may have multiple subordinates. To represent the structure in the table format, we will have another column or attribute for the table employee named superior_id, containing the id of the employee who is their superior and reports to. It is also a classic example of a self-join case. Let's create a dataset for the same.

drop table if exists employee;
create table employee (
    id INT,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    superior_id INT
);
insert into employee (id, first_name, last_name, superior_id) values (1, 'Rollo', 'Peris', NULL);
insert into employee (id, first_name, last_name, superior_id) values (2, 'Christi', 'Ploughwright', NULL);
insert into employee (id, first_name, last_name, superior_id) values (3, 'Alon', 'Manntschke', 1);
insert into employee (id, first_name, last_name, superior_id) values (4, 'Fabiano', 'Duchart', 1);
insert into employee (id, first_name, last_name, superior_id) values (5, 'Gustie', 'MacCostigan', 2);
insert into employee (id, first_name, last_name, superior_id) values (6, 'Valentine', 'Shutler', 3);
insert into employee (id, first_name, last_name, superior_id) values (7, 'Kaitlynn', 'Dowrey', 3);
insert into employee (id, first_name, last_name, superior_id) values (8, 'Bonnie', 'Pamphilon', 4);
insert into employee (id, first_name, last_name, superior_id) values (9, 'Wayland', 'Richel', 5);
insert into employee (id, first_name, last_name, superior_id) values (10, 'Shena', 'Mattessen', 5);
insert into employee (id, first_name, last_name, superior_id) values (11, 'Al', 'Broy', 6);
insert into employee (id, first_name, last_name, superior_id) values (12, 'Wallas', 'Rofe', 7);
insert into employee (id, first_name, last_name, superior_id) values (13, 'Suellen', 'Burtonshaw', 7);
insert into employee (id, first_name, last_name, superior_id) values (14, 'Hillard', 'Tisor', 8);
insert into employee (id, first_name, last_name, superior_id) values (15, 'Lolly', 'Vasiliev', 9);
insert into employee (id, first_name, last_name, superior_id) values (16, 'Gaylord', 'Sivyour', 10);
insert into employee (id, first_name, last_name, superior_id) values (17, 'Fitzgerald', 'Trask', 10);
insert into employee (id, first_name, last_name, superior_id) values (18, 'Aveline', 'Mouget', 12);
insert into employee (id, first_name, last_name, superior_id) values (19, 'Alys', 'Rydeard', 13);
insert into employee (id, first_name, last_name, superior_id) values (20, 'Brianna', 'Vannozzii', 13);
insert into employee (id, first_name, last_name, superior_id) values (21, 'Sibylla', 'Ferfulle', 14);
insert into employee (id, first_name, last_name, superior_id) values (22, 'Gwynne', 'Okie', 15);
insert into employee (id, first_name, last_name, superior_id) values (23, 'Cortney', 'Mellhuish', 20);
insert into employee (id, first_name, last_name, superior_id) values (24, 'Nikos', 'Muscroft', 20);
insert into employee (id, first_name, last_name, superior_id) values (25, 'Lorinda', 'Barsham', 21);
insert into employee (id, first_name, last_name, superior_id) values (26, 'Hephzibah', 'Keat', 22);
insert into employee (id, first_name, last_name, superior_id) values (27, 'Patrice', 'Reardon', 22);
insert into employee (id, first_name, last_name, superior_id) values (28, 'Gregorius', 'Larkkem', 24);
insert into employee (id, first_name, last_name, superior_id) values (29, 'Kaylyn', 'Gaiger', 23);
insert into employee (id, first_name, last_name, superior_id) values (30, 'Constantina', 'Jane', 19);

Now let's see how to use a recursive query for the hierarchical structure like the employee table, we will draw their path to the boss.

with employee_tree (id, first_name, last_name, superior_id, path) as (
    select
        id,
        first_name,
        last_name,
        superior_id,
        cast(first_name +' '+last_name as nvarchar(max)) as path
    from employee
    where superior_id is null

    union all

    select 
        e.id,
        e.first_name,
        e.last_name,
        e.superior_id,
        t.path + N' -> '+ e.first_name + N' '+ e.last_name
    from employee e
    join employee_tree t
        on e.superior_id = t.id
)

select * from employee_tree;

recursive cte with hierarchal tree based structure.PNG

In the anchor part of the employee_tree, CTE we started with the anchor point "BOSS" i.e employee who doesn't have anyone superior, and hence their superior_id was null. With getting superiors or boss in the anchor part, in the recursive part, we perform the join between the employee table and employee_tree table with union all to club the result set of both the part. As you can see in the image above which the output of the employee_tree SQL snippet rows 1 and 2 did not have superior_id has only their names as the path value, indicating they are the boss or the most superior in the company. The rest of the rows have path values from their boss's name to themselves including any superiors in between. This represents the path of the superior to the subordinate employee.

Summary

  1. Can have Multiple CTEs some of which can be recursive CTE.

  2. Only UNION ALL is permitted in the recursive CTE, to only return unique values for the columns, you can use distinct in our outer part or invocation query.

  3. Both the anchor part and recursive part should have the same number of columns and data types.

  4. In case of string columns want to do any operation on string data type like concatenation etc, it is better to cast those string columns as nvarchar(max) in both the part of the query

  5. Valid integer value for the option of maxrecursion is between 0 and 32767.