Overview of Transaction and Concurrency
Any Relational Database System has a concept called Transaction. The transaction can be defined as a set of logical operations that has to be performed in a single session instance. In simpler terms, a Transaction is a set of SQL statements that needed to be executed as one single unit. We all know and read that Transactions should be atomic. It means that it should lead to two results, either it must commit to success or roll back on an error. It must not be half-commit or partial commit or rollback. Now, the problem of concurrency occurs when these multiple transactions are running on multiple session instances. Like many users are trying to access the same data set of the same object, some for reading the data, some for manipulating the data, etc at the same time. These concurrent reads write operations can lead to data inconsistency and unexpected results. Let's see a simple example of the lost updates problem.
DROP TABLE inventory_stocks;
CREATE TABLE inventory_stocks(
product_id INT IDENTITY(1,1),
stock INT
);
INSERT INTO inventory_stocks(stock) VALUES (400); -- lets insert one row only for simplicity
SELECT * FROM inventory_stocks;
Let's try to simulate an example of the Lost Update case with two transactions running concurrently. In transaction - 1 we are processing the purchase of the product. Some user has purchased 100 units of the stock of product id 1, we need to process this purchase by decrement the stock by the required amount and updating the stock. While transaction 1 is doing this process, let's say another user who had made a purchase previously, wanted to return the product. Transaction - 2 is executing the said process.
-- first transction one is buying the product
DECLARE @current_stock INT, @decrement INT
BEGIN TRANSACTION;
SELECT @current_stock = stock
FROM inventory_stocks
WHERE product_id = 1; -- getting current stock of the product id = 1
PRINT 'Current Stock Count -> '+CONVERT(VARCHAR(10), @current_stock);
-- 100 stocks are bought so reduce stock by 100
SET @current_stock = @current_stock - 100;
-- forced delay
waitfor delay '00:00:20';
-- updating the stock
UPDATE inventory_stocks SET stock = @current_stock WHERE product_id = 1;
-- result for display after purchase operation
SELECT * FROM inventory_stocks WHERE product_id = 1;
COMMIT TRANSACTION;
Second Transaction process:
-- second transction other user is returning the product he had bought earlier.
DECLARE @current_stock INT;
BEGIN TRANSACTION;
-- getting current stock of the product 1
SELECT @current_stock = stock
FROM inventory_stocks WHERE product_id = 1;
PRINT 'Read current_stock begore adding new stock due to returns -> '+CONVERT(VARCHAR(10), @current_stock);
-- increasing dtock due to return of prodcts
SET @current_stock = @current_stock + 20;
UPDATE inventory_stocks SET stock = @current_stock WHERE product_id = 1;
SELECT * FROM inventory_stock;
COMMIT TRANSACTION;
With both the transactions running concurrently having access to and updating the same dataset in the same table, running their operations, leading to lost updates. In this case, when you run both transactions in new tabs simultaneously we will see that we had lost the update of the purchase return stock. The stock which was returned by transaction - 2 is not reflected and hence lost. Some of the other problems that occur with concurrency in SQL Server are namely,
Dirty Reads.
Lost Updates.
Non-Repeatable Reads.
Phantom Reads.
And to overcome the above concurrency problem SQL server provides different types of Transaction Isolation Levels and they are namely,
Read Uncommitted.
Read Committed.
Repeatable Read.
Snapshot.
Serializable.
Depending upon the Transaction Isolation Levels we choose for the transaction, we get varying degrees of performance and concurrency issues. There are 2 types of approaches for tackling the concurrency in SQL Server,
Pessimistic Approach
Optimistic Approach
By default, the SQL Server follows the Pessimistic Approach to tackle concurrency, by acquiring the locks on the resources and isolation levels required as per ANSI. This leads to reading operation blocking write operations and writes operation blocking read operations. Another approach is the Optimistic Approach, wherein it remembers the value of the data at the start of the transaction and verifies that no other transactions have modified the same resource before committing the transaction, it is done by implementing the row versioning model. This approach doesn't require locking and it was first implemented in SQL Server 2005 version.
The row versioning model uses the row versioning technique, whereby it stores the copies of all previously committed versions of data rows as long as any transaction needs access to it in tempdb. The space used to store the previous versions of the updated data set is called a version store. In this model of row versioning implementation, the read operation doesn't block the write operation, and the write operation doesn't block the read operation. However, do note that the write operation will still acquire locks on resources and block other write operations. We have two variants of the snapshot-based isolations
- Snapshot Isolation
2. Read Committed Snapshot Isolation
and they both rely on row versioning, and not locking. The term 'snapshot' refers to a set of rows that are valid for the point in time when an operation is being performed. In cases, wherein the resource is locked, due to modification of data and it is still pending and not yet completed, the read operation retrieves the required resources (row sets) from the version store. The snapshot of the data retrieved is as it existed at the time when the current statement started. RCSI prevents dirty reads but does not prevent non-repeatable reads and phantom reads. However, SI prevents dirty reads, non-repeatable read and phantom read as the reader retrieves a snapshot of the data, as it existed at the time when **current transaction ** started. But as said earlier, the write operations will still block other write operations and all locking-related attributes are still relevant to the row versioning model approach. SQL Server keeps all the old versions of the row that a transaction is modifying. If multiple update requests are made to the same row, then those multiple older versions of the rows are maintained as long as any transactions that might need to access them. These multiple older versions of the rows are chained together as a linked list. Due to this, every data modification has an extra workload of managing and maintaining older versions along with increased usage of tempdb.
Now, let's discuss the snapshot-based isolation levels in detail. As stated above, we have two variants of snapshot-based isolations, both levels follow a row versioning model to maintain the snapshots,
Snapshot Isolation (SI) - it retrieves the committed data as of the beginning of the current transaction
Read Committed Snapshot Isolation (RCSI) - it retrieves committed data as of the beginning of the current statement
Snapshot Isolation (SI):
Enabling SI:
with the alter statement below, we can enable SI on the required database
ALTER DATABASE SET ALLOW_SNAPSHOT_ISOLATION ON;
When altering the database to allow SI, the presence of any active transaction which is doing any data modification, will block the alter statement and it will be a deferred operation.
Understanding the working of SI
In the SI variant, any data reads will be the most recent committed version, as of the beginning of the current transaction. This prevents dirty reads, non-repeatable reads, and phantom reads. we had two transactions, the first transaction was for the purchase of the product by a user session, and the second transaction was for the return of the purchased product by another user session. We will simulate the first transaction and in the second transaction, we will try to read the data set and retrieve the value, to understand better how the SI variant works.
-- first transaction, where in user purchases 100 units of stocks of the item having product_id as 1.
DECLARE @current_stock INT;
BEGIN TRANSACTION;
SELECT @current_stock = stock
FROM inventory_stocks
WHERE product_id = 1;
PRINT '@current_stock before the purchase in T1 is -> '+CONVERT(VARCHAR(10), @current_stock);
-- update the stock by decrementing the stock unit by 100 due to purchase
SET @current_stock = @current_stock - 100;
UPDATE inventory_stocks
SET stock = @current_stock
WHERE product_id = 1;
PRINT 'now updated stock for the product_id = 1';
waitfor delay '00:00:8'; -- for simulating time
SELECT @current_stock = stock
FROM inventory_stocks
WHERE product_id = 1;
PRINT '@current_stock after the purchase in T1 is -> '+CONVERT(VARCHAR(10), @current_stock);
COMMIT TRANSACTION;
-- second transaction, where we try to read the value of the stock of the product_id 1 and understand the modifications
SET TRANSACTION ISOLATION
LEVEL SNAPSHOT
DECLARE @current_stock INT;
BEGIN TRANSACTION;
SELECT @current_stock = stock
FROM inventory_stocks
WHERE product_id = 1;
PRINT '@current_stock before the purchase in T1 is -> '+CONVERT(VARCHAR(10), @current_stock);
waitfor delay '00:00:20' -- waiting for transction 1 to commit.
PRINT 'waited for transaction 1 to commit, now reading again';
SELECT @current_stock = stock
FROM inventory_stocks
WHERE product_id = 1;
PRINT '@current_stock after transction 1 is committed -> '+CONVERT(VARCHAR(10), @current_stock);
COMMIT TRANSACTION;
SELECT stock
FROM inventory_stocks
WHERE product_id = 1; -- after transaction 2 is committed
As we can see in the above image, In transaction 1 we did a stock update stating user purchases 100 units of the product having product_id = 1 and in transaction 2 we did a SELECT query at the different intervals to read the current stock of the product, This gives us a better understanding of the working of SI isolation variant. In transaction 1 we started with the initial stock and then we had an update stock operation to simulate reduce in stock. The output of the SI isolation variant can be understood with transaction 2 execution. In transaction 2, we first read the current stock of the product which returns 400. We simulated a delay for 20 seconds to let transaction 1 commit fully. Then after the commit of transaction 1, we again tried to read the stock of the same product and it returned 400. After committing transaction - 2 we perform a read operation again for the stock of the product and this time it returned 300. This shows that the SI isolation variant retrieves the most recent committed data set as of the beginning of the current transaction.
Now let's simulate two updates for the same row, setting transaction isolation level snapshot.
-- transaction 1
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
BEGIN TRANSACTION;
UPDATE inventory_stocks SET stock = 1000
WHERE product_id = 1;
waitfor delay '00:00:10';
COMMIT TRANSACTION;
-- transaction 2
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
BEGIN TRANSACTION;
UPDATE inventory_stocks SET stock = 2000
WHERE product_id = 1;
waitfor delay '00:00:10';
COMMIT TRANSACTION;
The first transaction will commit successfully, but the second transaction will return an error and the transaction will abort, stating that "Transaction aborted due to update conflict". Always be aware that conflicts can happen when working with SI and we must handle the conflicts appropriately.
Read Committed Snapshot Isolation:
Enabling RCSI:
With alter statement below, we enable RCSI
ALTER DATABASE SET READ_COMMITTED_SNAPSHOT ON
to avoid blocking, use the below command
ALTER DATABASE SET READ_COMMITTED_SNAPSHOT ON WITH NO_WAIT
Understanding the working of RCSI
RCSI is statement-level snapshot isolation. Any queries will get the most recent committed values as of the beginning of the current statement, unlike the SI isolation variant which is transaction-based. To demonstrate the example we did in the SI part, let's have two transactions. In transaction - 1, we will be performing an update on a product having product_id = 1 and in the transaction - 2 we will be reading the stock value before updating stock in the transaction - 1 and after performing the update from transaction 1.
-- transaction 1 for updating the stock( reducing stock by 20)
DECLARE @current_stock INT;
BEGIN TRANSACTION;
SELECT @current_stock = stock
FROM inventory_stocks
WHERE product_id = 1;
PRINT '@current_stock before update -> '+CONVERT(VARCHAR(10), @current_stock);
UPDATE inventory_stocks SET stock = @current_stock - 20 WHERE product_id = 1;
SELECT @current_stock = stock
FROM inventory_stocks
WHERE product_id = 1;
PRINT '@current_stock after update -> '+CONVERT(VARCHAR(10), @current_stock);
waitfor delay '00:00:10';
COMMIT TRANSACTION;
-- transaction 2 to track the read of the stocks
DECLARE @current_stock INT;
BEGIN TRANSACTION;
SELECT @current_stock = stock
FROM inventory_stocks
WHERE product_id = 1;
PRINT '@current_stock before update -> '+CONVERT(VARCHAR(10), @current_stock);
waitfor delay '00:00:20';
SELECT @current_stock = stock
FROM inventory_stocks
WHERE product_id = 1;
PRINT '@current_stock after update t1 -> '+CONVERT(VARCHAR(10), @current_stock);
COMMIT TRANSACTION;
Executing the two transactions above at different sessions in the management studio, we can see that after the update of stock in transaction 1 and committed, transaction 2 shows updated stock at the second SELECT query within the transaction. RCSI introduces great concurrency as stated earlier, read operations do not block write operations, and write operations do not block read operations. And no SET options are required to be mentioned to take advantage of RCSI, unlike the SI variant.
Summary of snapshot-based isolation levels
SI and RCSI, both variants follow the row versioning model with some key differences in how to enable the isolations and their effects.
SI retrieves' the most recent committed data as of the beginning of the transaction and RCSI retrieves the most recent committed data as of the beginning of the statement.
SI can result in an update conflict that can lead to a rollback or abort of the transaction. No possibility of update conflict in RCSI.
In the SI variant, each statement gets the data that was committed before the start of the transaction, while in RCSI each statement gets the data that was committed before the start of the statement. It picks the most recent committed changes.