After looking at how to set Window Frame for the Window Functions in the previous post, we will discuss Analytic or Valued Window Functions. Up until now, we had seen aggregate windows functions. Analytic functions are adequately identical to aggregate functions and the syntax is also the same, the only difference between Analytic and Aggregate Window Functions is, the latter is used for cumulative operation for all the rows in the window frame, while analytic functions refer to single rows within the window frame.
The lead function gives access to the data of the following rows or the row at a given offset that follows the current row in the same result set without the use of self-join.
Syntax
Syntax and explanations are given below, LEAD ( scalar_expression [ , offset ] , [ default ] )
OVER ( [ partition_by_clause ] order_by_clause )
scalar_expression: The value to be returned, can be a column name or an expression of any type which returns a single value. It cannot be an analytic function.
offset: The number of rows following the current row from which value is to be obtained, default is 1. An offset can be a column, subquery, or expression returning a positive integer. Negative integers and analytic functions are not allowed.
default: it is a value that can be used when the offset is outside of the scope of the partition. The default value is null if not specified and it has to be of compatible data type as scalar_expression.
partition_clause: it makes the partitioning or grouping of the dataset as per the criteria given.
order_by: it sorts the data either ascendingly or descendingly, default is ascending order. If we mention partition by sub-clause, then it will sort the data set in the partitioning group as per requirement.
Let's prepare a table and dataset for the demonstration.
CREATE TABLE dbo.daily_revenue(
date date NOT NULL,
amount numeric(10, 2) NOT NULL
)
INSERT dbo.daily_revenue (date, amount) VALUES (CAST(N'2021-11-22' AS Date), CAST(6660.18 AS Numeric(10, 2)))
INSERT dbo.daily_revenue (date, amount) VALUES (CAST(N'2021-11-03' AS Date), CAST(8431.62 AS Numeric(10, 2)))
INSERT dbo.daily_revenue (date, amount) VALUES (CAST(N'2021-11-11' AS Date), CAST(1588.48 AS Numeric(10, 2)))
INSERT dbo.daily_revenue (date, amount) VALUES (CAST(N'2021-12-01' AS Date), CAST(7769.18 AS Numeric(10, 2)))
INSERT dbo.daily_revenue (date, amount) VALUES (CAST(N'2021-07-24' AS Date), CAST(9273.53 AS Numeric(10, 2)))
INSERT dbo.daily_revenue (date, amount) VALUES (CAST(N'2021-09-30' AS Date), CAST(2094.20 AS Numeric(10, 2)))
INSERT dbo.daily_revenue (date, amount) VALUES (CAST(N'2022-02-12' AS Date), CAST(7923.16 AS Numeric(10, 2)))
INSERT dbo.daily_revenue (date, amount) VALUES (CAST(N'2021-11-11' AS Date), CAST(45646.48 AS Numeric(10, 2)))
INSERT dbo.daily_revenue (date, amount) VALUES (CAST(N'2021-12-01' AS Date), CAST(466.18 AS Numeric(10, 2)))
INSERT dbo.daily_revenue (date, amount) VALUES (CAST(N'2021-07-24' AS Date), CAST(9546.53 AS Numeric(10, 2)))
INSERT dbo.daily_revenue (date, amount) VALUES (CAST(N'2021-06-17' AS Date), CAST(7026.66 AS Numeric(10, 2)))
INSERT dbo.daily_revenue (date, amount) VALUES (CAST(N'2021-12-27' AS Date), CAST(4168.42 AS Numeric(10, 2)))
INSERT dbo.daily_revenue (date, amount) VALUES (CAST(N'2021-12-27' AS Date), CAST(4687.42 AS Numeric(10, 2)))
INSERT dbo.daily_revenue (date, amount) VALUES (CAST(N'2021-12-27' AS Date), CAST(6796.52 AS Numeric(10, 2)))
INSERT dbo.daily_revenue (date, amount) VALUES (CAST(N'2021-11-15' AS Date), CAST(1172.48 AS Numeric(10, 2)))
INSERT dbo.daily_revenue (date, amount) VALUES (CAST(N'2021-05-16' AS Date), CAST(7693.74 AS Numeric(10, 2)))
INSERT dbo.daily_revenue (date, amount) VALUES (CAST(N'2021-12-02' AS Date), CAST(5484.18 AS Numeric(10, 2)))
SQL Lead function without a default value.
Let's do the first demonstration using the dataset we prepared above. First, we will see the basic example of the Lead function without the default part in the clause.
SELECT
date,
amount,
LEAD(amount) OVER(ORDER BY date ASC) as lead_amount
FROM daily_revenue;
In the above result set of the query, we can see that value of the next date amount is returned in the column named lead_amount, and in the last row, the lead_amount is NULL.
SQL calculates Deltas using the Lead function
The LEAD() function is very useful for the use case of calculating deltas like calculating the difference in current-day sales and next-day sales or current-date revenue and next-date revenue. We will see one example below
SELECT
date,
amount,
LEAD(amount) OVER(ORDER BY date ASC) as lead_amount,
LEAD(amount) OVER(ORDER BY date ASC) - amount as difference_amount
FROM daily_revenue;
As you can see in the above image, with the above query, we returned to date, amount, lead_amount I.e. amount of the next day, and difference_amount which is the difference between the next date's amount and the current row amount.
SQL lead example with offset and default value
In the above previous two examples, we haven’t used the default port of the Lead window function. We will just modify the query used in the first example to include the default port, and also we will add the offset value. In the earlier lead function, we didn’t mention offset value and it still worked, it is to be noted that, if an offset is not mentioned then the default is considered which is 1.
SELECT
date,
amount,
LEAD(amount, 1, 0) OVER(ORDER BY date) as lead_amount
FROM daily_revenue;
Here we have mentioned the offset value and default value in the Lead function. An offset value is explained above, the default value provided will be returned in cases where we get a NULL value. However, the default value and the scalar value should be compatible data-type.
SQL Lead function with PARTITOIN BY and ORDER BY
To demonstrate the Lead function with PARTITION BY and ORDER BY sub-clause, let's create another table and seed it with some data.
CREATE TABLE weekly_sales(
month TINYINT,
week TINYINT,
amount DECIMAL(10,2)
);
INSERT INTO weekly_sales(month, week, amount) VALUES
(1, 1, 35000.00),
(1, 2, 16543.15),
(1, 3, 54453.54),
(1, 4, 45378.12),
(2, 1, 37485.87),
(2, 2, 78567.00),
(2, 3, 54378.00),
(2, 4, 45378.00),
(3, 1, 38753.63),
(3, 2, 23787.03),
(3, 3, 37896.00),
(3, 4, 97855.00);
Do select all queries on the weekly_sales table we create above, to get acquainted with the dataset we are working on. We want to analyze data for the monthly sales, we need partitioning on column month. Below is an example query,
SELECT
month,
week,
amount,
LEAD(amount, 1, 0) OVER(PARTITION BY month ORDER BY month, week) as next_week_sales_amount
FROM weekly_sales;
In the above image of the output sets, we can see that three partitions for months 1, 2 and 3 are made. The lead function takes the next value within the month-wise partition and returns the default value in case of NULL.
We used ascending sort criteria on columns month and week. Let's change it to descending sorting.
SELECT
month,
week,
amount,
LEAD(amount, 1, 0) OVER(PARTITION BY month ORDER BY month, week desc) as next_week_sales_amount
FROM weekly_sales;
SQL Lead function with CTE example
From the weekly_sales dataset we created above, suppose we want to get net monthly sales and compare it with next month's sales amount for any analysis.
;WITH monthly_sales as (
SELECT
month,
SUM(amount) as monthly_amount
FROM weekly_sales
GROUP BY month
)
select
month,
monthly_amount,
LEAD(monthly_amount, 1, 0) OVER(ORDER BY month) as next_month_amount
from monthly_sales;
In the above query, we used CTE to find the total sales for each month, using SUM and GROUP BY clauses on the month column. In the outer query, we used the rows returned by CTE to compute monthly sales for the comparison.
Summary
The LEAD function gives access to values of the following or next rows as per the offset mentioned, if not mentioned, then the default offset is considered as 1.
PARTITION BY clause sets the local boundary of the data set as per the mentioned specified conditions. If not mentioned, it considers the whole result sets for the boundary.
For out-of-the-frame data, the LEAD function returns a NULL value like the LAG function and we can mention the default value if we want it as a third parameter while calling the function.
In next post, we will understand LAG window function.