SQL Server Nested CTE with examples

Table of contents

After understanding the basics of CTE, multiple CTEs, and the explanation in the previous post, in this post we will be exploring Nested CTEs in this post. As discussed, we can say that CTE is a named temporary set of rows similar to subqueries. Like subqueries, you can't put one CTE definition inside another CTE definition. The syntax structure for the CTE is as follows

with cte_expression_1 as (
    cte_query_1
),
cte_expression_2 as (
    cte_query_2 -- (in the FROM clause refers cte_expression_1)
)
select from cte_epression_2 -- outer query

As mentioned in the syntax above, the second CTE definition cte_expression_2 uses the first CTE definition cte_expression_1. That's why there is a comment with cte_query in the second CTE definition cte_query_2. This means that, once a CTE is defined we can freely use it in subsequent CTEs. This is helpful in use cases where it's impossible to use one aggregate function inside another like finding the average maximum, average minimum, an average of an average, group comparisons for reports, etc.

Let's create a table and seed it with data for illustrations. We are having a scenario wherein the cab service company wants to know the average maximum distance their driver rides in each city they provide the service.

drop table if exists city
create table city(
    id int identity(1,1) primary key,
    name varchar(30) not null,
    country_code varchar(5)
);

insert into city (name, country_code) values ('Fanhu', 'USA');
insert into city (name, country_code) values ('Goodlands', 'USA');
insert into city (name, country_code) values ('Furong', 'USA');
insert into city (name, country_code) values ('Guxian', 'JPN');
insert into city (name, country_code) values ('Ar Rahad', 'OMN');
insert into city (name, country_code) values ('Uitenhage', 'USA');
insert into city (name, country_code) values ('Yanglinshi', 'JPN');
insert into city (name, country_code) values ('Mengjia', 'JPN');

drop table if exists driver
create table driver(
    id int identity(1,1) primary key,
    first_name varchar(20),
    last_name varchar(20),
    vehicle_id int,
    city_id int
);

insert into driver (first_name, last_name, vehicle_id, city_id) values ('Cicely', 'Heaker', 1, 5);
insert into driver (first_name, last_name, vehicle_id, city_id) values ('Brandy', 'Rizzone', 2, 2);
insert into driver (first_name, last_name, vehicle_id, city_id) values ('Hewitt', 'Foden', 3, 2);
insert into driver (first_name, last_name, vehicle_id, city_id) values ('Freddie', 'Rohfsen', 4, 8);
insert into driver (first_name, last_name, vehicle_id, city_id) values ('Johann', 'McRill', 5, 1);
insert into driver (first_name, last_name, vehicle_id, city_id) values ('Arabella', 'MacKeague', 6, 4);
insert into driver (first_name, last_name, vehicle_id, city_id) values ('Jonie', 'Greenan', 7, 8);
insert into driver (first_name, last_name, vehicle_id, city_id) values ('Alysa', 'Hanselman', 8, 2);
insert into driver (first_name, last_name, vehicle_id, city_id) values ('Vale', 'Heathwood', 9, 1);
insert into driver (first_name, last_name, vehicle_id, city_id) values ('Kennedy', 'Headon', 10, 7);
insert into driver (first_name, last_name, vehicle_id, city_id) values ('Ugo', 'Rapper', 11, 1);
insert into driver (first_name, last_name, vehicle_id, city_id) values ('Laverna', 'Harkin', 12, 3);
insert into driver (first_name, last_name, vehicle_id, city_id) values ('Washington', 'Kingdon', 13, 8);
insert into driver (first_name, last_name, vehicle_id, city_id) values ('Rogers', 'Ferronet', 14, 7);
insert into driver (first_name, last_name, vehicle_id, city_id) values ('Raleigh', 'Norgan', 15, 5);

drop table if exists daily_ride
create table daily_ride(
    date date,
    driver_id int,
    amount_earned decimal(6,2),
    ride_distance decimal(5,2)
);

insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-11', 8, 198.41, 29.67);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-09', 10, 65.82, 23.39);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-03', 8, 187.77, 5.77);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-28', 13, 273.41, 41.34);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-02', 9, 102.92, 94.51);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-02', 1, 74.85, 38.8);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-06', 4, 312.21, 70.7);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-19', 1, 148.68, 59.91);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-21', 14, 167.77, 35.76);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-24', 9, 223.94, 7.49);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-16', 9, 5.88, 6.5);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-19', 13, 38.31, 25.31);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-24', 4, 33.94, 71.32);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-14', 13, 71.26, 94.04);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-23', 8, 152.09, 1.5);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-06', 6, 206.48, 8.88);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-20', 7, 261.01, 97.81);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-03', 1, 119.72, 6.63);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-13', 10, 77.95, 8.86);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-06', 14, 102.9, 36.22);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-17', 7, 263.06, 63.85);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-05', 6, 125.6, 83.41);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-18', 1, 80.19, 81.05);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-18', 13, 349.35, 60.93);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-18', 3, 230.82, 35.77);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-31', 11, 156.39, 4.28);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-03', 10, 376.38, 72.98);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-05', 7, 114.81, 14.06);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-23', 1, 11.3, 75.65);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-21', 5, 398.57, 11.91);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-28', 9, 178.44, 66.02);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-22', 13, 266.44, 12.16);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-27', 13, 382.47, 37.16);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-28', 13, 327.74, 79.3);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-14', 3, 74.7, 26.2);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-03', 9, 157.97, 36.15);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-22', 13, 212.57, 23.22);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-13', 1, 301.57, 36.76);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-30', 5, 31.8, 73.93);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-12', 3, 396.49, 30.46);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-21', 2, 222.17, 57.34);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-12', 3, 363.4, 41.21);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-12', 9, 227.96, 11.48);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-21', 5, 77.59, 71.0);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-14', 8, 275.88, 97.34);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-16', 5, 365.88, 96.7);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-14', 4, 271.33, 59.66);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-18', 3, 383.73, 79.69);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-08', 6, 245.05, 79.13);
insert into daily_ride (date, driver_id, amount_earned, ride_distance) values ('2021-03-14', 6, 328.01, 91.68);

With all the tables and data-set ready, let's dive in further. ' We have table name city, driver, and daily_rides which have respective records about them. Our cab service company wants to know the average total maximum distance traveling for each city they operate in.

with total_distance_ride as (
    select
        sum(r.ride_distance) as total_distance,
        r.driver_id,
        d.city_id
    from daily_ride r
    join driver d
        on r.driver_id = d.id
    join city c
        on d.city_id = c.id
    group by r.driver_id, d.city_id
), max_ride_distance as (
    select
        max(total_distance) as max_distance,
        city_id
    from total_distance_ride
    group by city_id
)

select avg(max_distance) as avg_max_distance, city_id from max_ride_distance group by city_id

nested cte example finding average total maximum distance travelled.PNG

As you see the above image which shows the output of the above Nested CTE finds the average total maximum distance of ride occurs in each city. In the first CTE, we find the total distance driven in each city for each date. In the second CTE, we take this total_distance which is the sum of the ride_distance for that date and city and choose the maximum amongst each city_id. Finally, in the outer query, we return the average maximum ride distance across all the cities. After finding the average maximum distance across those cities, No company wants to find the number of days where the average maximum amount earned exceeded 50 USD. For this we first need to find the maximum amount earned on a given date for each city, then with the previous result set, we need to calculate the average maximal amount across all the cities for a given date. finally, we count the number of dates where the average maximum amount exceeded 50 USD.

with max_amount_earned as (
    select
    max(r.amount_earned) as max_earned,
    r.date,
    d.city_id
    from daily_ride r
    join driver d
        on r.driver_id = d.id
    join city c
        on d.city_id = c.id
    group by r.date, d.city_id
), average_maximal_amount as (
    select
        avg(max_earned) as day_average,
        date
    from max_amount_earned
    group by date
)

select count(date) as number_of_days from average_maximal_amount where day_average > 50;

-- output 
number_of_days
--------------
22

(1 row affected)


Completion time: 2022-03-19T09:00:10.2015488+05:30

The above snippet returns the count of days where the average maximum amount earned was more than 50 USD. Now one last example wherein we will be using nested CTE for comparing two groups of the result set. We want to know the average earning of the cities of the USA and other country codes.

with grouping as (
    select 
        id as city_id,
        case when
            country_code = N'USA'
                then N'US Cities'
            else N'Other'
        end as group_name
    from city
), total_amount_earned as (
    select
        r.driver_id,
        group_name,
        sum(r.amount_earned) as total_earned
    from daily_ride r
    join driver d 
        on r.driver_id = d.id
    join grouping g
        on g.city_id = d.city_id
    group by r.driver_id, group_name
)

select
    group_name,
    avg(total_earned) as average_total_earned
from total_amount_earned
group by group_name order by group_name desc;

nested cte with grouping example.PNG

With the above image, showing the output, we can that the script above returned the average total amount earned based on our city grouping. In the first CTE - grouping, we just only segregated and clubbed cities as per our grouping requirement based on country_code with help of the case statement. We labeled cities having country_code value 'USA' as 'US Cities' and the rest other as 'Other'. In the second CTE - total_amount_earned we computed total_earned which depicts the sum of amount_earned by the driver and combined it with the grouping label we created in the first CTE. Then in the outer query, we computed the group-level average as the average total earned.

Summary

  1. Like Subqueries we cant define CTE inside CTE. But we can refer to the previously defined CTE in subsequent CTEs in the FROM clause.

  2. Nested CTEs are more useful in cases where we need to compute aggregate functions on several levels and also to compare two or more grouping of rows.

  3. We can use a subquery in the outer query part of the CTE script.