SQL Server FIRST_VALUE Window Function
In the previous post, we explored SQL Server LAG Window Function with different example use cases. In this post, we will be understanding FIRST_VALUE Window Function, its syntax, and examples. As the name suggests, FIRST_VALUE returns the first value in an ordered result set. The FIRST_VALUE Function like other window functions will apply to the result set of the query output, distributing the rows as per grouping/partitioning conditions mentioned, and ordering each partition set as per the criteria mentioned. If no partition sub-clause is mentioned, then the function considers the whole query output as a group and applies to order criteria on it, before executing the window function.
Syntax :
FIRST_VALUE ( [scalar_expression ] ) OVER ( [ partition_by_clause ] order_by_clause [ rows_range_clause ] )
scalar_expression: The value to be returned, can be a column name, subquery, or an expression of any type which returns a single value. It cannot be an analytic function.
partition_clause: it makes the partitioning or grouping of the result-set returned by the FROM clause as per the criteria mentioned in the OVER PARTITION BY sub-clause. And if not mentioned then, the function treats all result-set of the query as a single grouping set to perform the operations.
order_by: it sorts the rows inside the grouping set as per the criteria mentioned.
Let's create a table and example insert statement, to use for the demonstration below.
CREATE TABLE product_sales (
date DATE,
product_id INT,
category_id TINYINT,
quantity INT
);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-04', 1, 2, 2);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-05', 2, 4, 72);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-07', 3, 4, 68);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-08', 4, 1, 65);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-07', 5, 2, 37);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-08', 6, 3, 32);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-06', 7, 1, 71);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-07', 8, 1, 90);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-05', 9, 2, 66);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-05', 10, 4, 33);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-08', 11, 1, 79);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-08', 12, 1, 49);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-07', 13, 1, 96);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-03', 14, 3, 17);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-04', 15, 3, 27);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-05', 16, 1, 31);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-06', 17, 3, 99);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-05', 18, 1, 60);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-04', 19, 1, 4);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-05', 20, 2, 88);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-06', 21, 2, 27);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-05', 22, 4, 8);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-07', 23, 4, 81);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-05', 24, 2, 82);
INSERT INTO product_sales (date, product_id, category_id, quantity) VALUES ('2022-03-05', 25, 2, 91);
SQL First_value over query result-set example
Firstly, we will be doing a simple example without partitioning our query result-set. We wanted to find the least ordered category.
SELECT
product_id,
category_id,
quantity,
FIRST_VALUE(category_id) OVER(ORDER BY quantity ASC) least_ordered_category
FROM product_sales;
In the above query, we just wanted to return the least ordered category_id with all other columns of the table. Hence we didn't add any sub-clause for the partitioning query result set, we mentioned order by sub-clause with the sort criteria as ascending on column quantity so will get the lowest quantity row first, hence it returned category_id 2 as the least_ordered because it was least with quantity only 2.
SQL first_value over result-set with partition
Unlike the previous example above, we want to return all columns of the tables along with product_id which was the least sold item, in each category. In simpler terms, we have to return the product_id of the products which are having the lowest sales in each category along with the table column.
SELECT
date,
product_id,
category_id,
quantity,
FIRST_VALUE(product_id) OVER(PARTITION BY category_id ORDER BY quantity ASC) as least_sold_product_in_category
FROM product_sales;
The colored shades denote the partitioning set.
As in the image above the query result, the set will be distributed into four partition sets based on each category. For category_id 1 the least_sold_product_id returned is 19. And you can see that the quantity count ordered for product_id 19 is 4, and the rest of all the products of category_id 1 has the count ordered quantity more than the count quantity of product_id 19'. Similarly, for other categories, the least_sold_product is returned.
Now to make it a little realistic, we want to return the date, product_id, category_id, quantity, and product_id of the product which had the lowest sale quantity in their categories and for that date. Simply earlier we just partitioned the query result-set obtained from the FROM clause based on categories as we wanted to make groupings based on categories. Now we also want to know the lowest sales of the product for each category along with the date.
SELECT
date,
product_id,
category_id,
quantity,
FIRST_VALUE(product_id) OVER(PARTITION BY category_id, date ORDER BY quantity ASC) as least_sold_product_in_category_and_date
FROM product_sales ORDER BY category_id, date;
The colored shades denote the partitioning set.
As you can see in the table snapshot above, the first row returned 19 as product_id which had low quantity count sales for category 1 and the date 2022-03-04. In the immediate next row, you can see that it has returned 16 as product_id which had a low quantity count for category 1 (same as above) and the date 2022-03-05. A date is different from its previous row, it was out of scope. hence it returned 16 because product_id 16 has a low quantity order count for category 1 and product_id 18 had a 60 order count for the same date. The query result set is further partitioned based on the category_id and date in the partition by sub-clause. This helps in calculating date-wise which product had low order counts for each category.
Summary:
The FIRST_VALUE function returns the first value in the partitioned ordered query result set.
PARTITION BY sub-clause sets the local boundary of the data set as per the mentioned specified conditions. If not mentioned, it considers the whole result sets for the boundary.
ORDER BY sub-clause of the OVER clause is required.