SQL Join on NULL column value

It happens many times that, while designing the table structure, we allow NULL for some of the columns and then later on we get the need to perform the join on Null column values to retrieve the required result-set. One example to think of is the sales table and branch table. In the branch table, we have records of the branches of the company with their name, branch number, and branch location. And in the sales table, each of the sales records is maintained along with the branch number which made the sales. But some orders in the orders table have NULL values in the branch_number column, and similarly, some rows of the branch table have branch_number values as NULL. The below script creates the table and inserts the data for further illustrations.

drop table if exists branch
create table branch(
    id int,
    branch_number int ,
    name varchar(50) not null,
    location varchar(100) not null,
);
create  nonclustered index IX_branch_branch_number  
  on dbo.branch (branch_number, name, location); 

drop table if exists sales
create table sales(
    id int primary key,
    sales_date date not null,
    total_amount decimal(10,2) not null,
    customer_id int not null,
    branch_number int -- we permitted null values
);

insert into branch (id, branch_number, name, location) values (1, 101, 'Douglas-Block', 'Dengyue');
insert into branch (id, branch_number, name, location) values (2, 102, 'Veum-Jones', 'Coyah');
insert into branch (id, branch_number, name, location) values (3, NULL, 'Hansen and Sons', 'Halimpu');
insert into branch (id, branch_number, name, location) values (4, 104, 'Braun-Kemmer', 'Saint Louis');
insert into branch (id, branch_number, name, location) values (5, 105, 'Beatty, Bernhard and Schmitt', 'Dalmacio Vélez Sársfield');
insert into branch (id, branch_number, name, location) values (6, 106, 'Kub, Zboncak and Berge', 'Plyussa');

insert into sales (id, sales_date, total_amount, customer_id, branch_number) values (1, '2021-03-06', 9103.52, 7, 101);
insert into sales (id, sales_date, total_amount, customer_id, branch_number) values (2, '2021-03-03', 6513.04, 6, 101);
insert into sales (id, sales_date, total_amount, customer_id, branch_number) values (3, '2021-03-04', 5018.44, 8, NULL);
insert into sales (id, sales_date, total_amount, customer_id, branch_number) values (4, '2021-03-05', 12187.21, 1, NULL);
insert into sales (id, sales_date, total_amount, customer_id, branch_number) values (5, '2021-03-03', 8951.13, 8, 102);
insert into sales (id, sales_date, total_amount, customer_id, branch_number) values (6, '2021-03-04', 2109.0, 8, 105);
insert into sales (id, sales_date, total_amount, customer_id, branch_number) values (7, '2021-03-05', 4745.6, 7, 104);
insert into sales (id, sales_date, total_amount, customer_id, branch_number) values (8, '2021-03-05', 5326.41, 7, 106);
insert into sales (id, sales_date, total_amount, customer_id, branch_number) values (9, '2021-03-08', 3672.56, 5, NULL);
insert into sales (id, sales_date, total_amount, customer_id, branch_number) values (10, '2021-03-04', 4038.51, 7, 103);
insert into sales (id, sales_date, total_amount, customer_id, branch_number) values (11, '2021-03-04', 11874.09, 5, 104);
insert into sales (id, sales_date, total_amount, customer_id, branch_number) values (12, '2021-03-07', 8632.73, 6, NULL);
insert into sales (id, sales_date, total_amount, customer_id, branch_number) values (13, '2021-03-02', 14909.73, 5, 102);
insert into sales (id, sales_date, total_amount, customer_id, branch_number) values (14, '2021-03-02', 4014.49, 7, 105);
insert into sales (id, sales_date, total_amount, customer_id, branch_number) values (15, '2021-03-04', 6320.8, 2.3, 106);

Now if we wanted to see the branch details for each sale in the sales table, we need to perform an inner join on both the sales and branch tables.

select * from sales;
select * from branch;

select 
    s.id as sales_id,
    s.sales_date,
    s.total_amount,
    s.customer_id,
    b.name as branch_name,
    b.branch_number,
    b.location
from sales s
join branch b
    on s.branch_number = b.branch_number;

joining on null columns doesnt return rows having null values.PNG

The above image showing the result set of the query doesn't have any data regarding order IDs 3, 4, 9, and 12 as all of these ids have null values. There are ways to join on NULL values, one of which is making those columns NOT NULL and adding default values to them. But what if, we don't have access or permission to change the table structure in the database? We had to make it a workout with what we had.

Using ISNULL OR COALESCE function:

If we cant alter the table structure of the database, then we have to convert those NULL to NON-NULL values using a function like ISNULL or COALESCE. ISNULL function takes two parameters, first is the value of the expression we want to check if it's NULL or NON-NULL and if it's NULL, then it returns the second value we provided for the NULL. While COALESCE returns the first NON-NULL value in the list provided.

select ISNULL(NULL, NULL, 0);
select COALESCE(NULL, NULL, NULL, 1);

The first statement using the ISNULL function returns 0 and the second statement returns 1. Now let's again perform the join operation with the help of these functions for the null column values and see if we get the expected output.

select 
    s.id as sales_id,
    s.sales_date,
    s.total_amount,
    s.customer_id,
    b.name as branch_name,
    b.branch_number,
    b.location
from sales s
join branch b
    on ISNULL(s.branch_number, 0) = ISNULL(b.branch_number, 0);

-- using coalesce
select 
    s.id as sales_id,
    s.sales_date,
    s.total_amount,
    s.customer_id,
    b.name as branch_name,
    b.branch_number,
    b.location
from sales s
join branch b
    on COALESCE(s.branch_number, 0) = COALESCE(b.branch_number, 0);

joining on null column values using coalesce or isnull function.PNG

With the above image, we can see that both statements have returned the required result-set as we wanted. But We have used COALESCE or ISNULL function to get our results. This usage of the function on joining column disables the use of indexes in some databases it does disables index seek. And with a heavy increase in the data set of these tables, the computation cost will rise and will have a performance hit due to computing scaler operation. we can see the flow in the execution plan.

Using OR and IS NULL operator

The below query returns the same result set as the above queries and doesn't add extra computation cost and also enables the index to seek for us on the branch table. While creating the branch table before in the first SQL Snippet, we mentioned branch table columns in the non-clustered index we created.

select
    s.id as sales_id,
    s.sales_date,
    s.total_amount,
    s.customer_id,
    b.name as branch_name,
    b.branch_number,
    b.location
from sales s
join branch b
    on (s.branch_number = b.branch_number OR (s.branch_number is null and b.branch_number is null));

joining on null column values using OR and IS NULL Operator.PNG

The above query doesn't use any function like the previous two statements above and the left side of the operands mentioning column name is as it is, so it can take the advantage of the indexes. In the above query statement, our branch table performs an index seek retrieval operation. The above approach avoids extra computations and is also performance-friendly for large recordsets. The queries which can take the advantage of the indexes play an important role where the dataset is large and performance is very important.