Developing Modifications that Survive Concurrency

You can create a database under the assumption that SQL looks after all the problems of concurrency. It will probably work fine under test conditions: Then, in the production environment, it starts losing data in subtle ways that defy repetition. It is every Database Developer's nightmare. In an excerpt from his acclaimed book, Alex explains why it happens, and how you can avoid such problems.

Just like queries, modifications that work perfectly well in the isolated world of the test database, can suddenly start misbehaving, intermittently, when run in a production environment, under conditions of concurrent access. There are a number of different problems that might occur when “competing” connections try to simultaneously update the same data, some of the most common of which are:

  • Lost modifications, a.k.a. lost updates – such problems occur when modifications performed by one connection are overwritten by another. Typically such problems occur silently; no errors are raised.
  • Resource contention errors – such as deadlocks and lock timeouts
  • Primary key and unique constraint violations – such problems occur when different modifications attempt to insert one and the same row.

The sort of situation in which a lost update or another error can occur, in other words when the result of the operation is dependent on the sequence or timing of other events, is known as a race condition. It is the job of the defensive programmer to guard against potential race conditions in their software. The most common solutions for such problems include:

  • Serializing modifications against the same data, so that race conditions do not occur.
  • Detecting and handling errors caused by concurrency
  • Rewriting code so that it better withstands race conditions or avoids them altogether.

We shall discuss a few, all-too-common, examples that demonstrate the sort of problems that can arise, and then show different ways of solving them. Of course, it is possible that the problems that you encounter with concurrent modifications will be different from those described here. However, the basic approaches described for solving such problems are very likely to be useful in your situation.

Understanding Lost Modifications

Lost modifications can occur when multiple connections modify the same row of data. For example, one connection (A) reads a row of data, with the intent to update that row, based on certain criteria. Meanwhile after A has read the data, but before it updates it, a second connection (B) updates the same row. Connection A then performs its update, potentially causing the update made by B t be ‘lost’.

The classic example of a lost update involves reading data into a user form, for subsequent modification. For example, an employee loads into a user form, data pertaining to a particular customer, in order to update their address. Meanwhile, an automated process updates the same row, assigning to the customer a new status, such as “loyal customer”. If the employee submits the address update and the application updates the whole row, rather than the column that was changed, then the customer could be rest back to their old status, and the effect of the automated process will be lost.

Aside from having a more intelligent logic attached to the form, so only the column modified is updated in the database (we’ll discuss this in more detail shortly), there are essentially two concurrency control approaches that we can use in order to avoid such ‘lost updates’:

Optimistic approach: even though we have selected a row, other sessions can modify it, but we optimistically assume that this will not happen. When the selected row is updated, we have logic in place that will test to see if the row has been modified by someone else, since it was queried. If it has, then the employee would get a message saying that the row has been changed, and asking if we still want to make the requested change. This approach is preferable when selected rows are rarely modified, or when a typical modification takes a lot of time, such as modifying data via on-screen forms.

Pessimistic approach: Here, we pessimistically assume that rows will get modified by another process, between reading them and updating them, unless we do something to prevent it. When the employee selects a row, or list of rows, the system makes sure that nobody else can modify those rows. With this approach, he automated process would be blocked until the employee had made the address update (and we’d need logic in place to allow it to retry). This approach is most useful when selected rows are very likely to be modified and/or the modification does not take much time. Typically this approach is not used when users modify data via on-screen forms.

In the example we’ve used here, the two modifying transactions do not, from the database’s perspective, overlap. The automated update starts and finishes before the employee’s address update has started. In such circumstances, and given user forms are involved, we’re likely to take an optimistic approach to concurrency control.

In cases where only automated processes involved, where we have quick transactions attempting to (almost) simultaneously change the same rows of data, we are also in risk of lost updates, and are likely to adopt a pessimistic approach to concurrency control, in order to avoid them.

Let’s take a look at some fairly typical examples of when a ‘lost update’ can occur.

Non-overlapping Modifications

From the point of view of the database engine, the modifications in this example do not overlap; they occur at different times. Still, the second modification overwrites the changes made by the first one, and some information is lost.

Suppose that a ticket has been created in our bug tracking system to report that a very important business report for our Californian office has suddenly stopped working. Listing 1 shows the table that stores tickets. We have already used a table named Tickets in previous chapters of my book; make sure to create a new database to run the examples, or, at the very least, make sure to drop the table Tickets if it exists.

Listing 1: Creating the dbo.Tickets table

Of course, in real life this table would have more columns, such as Priority, and possibly some columns would have different types, and there would be some constraints. However, as usual in this book all the details that are not relevant to this simple example are omitted. Listing 2 shows the ticket that was created.

Listing 2: The ticket in dbo.Tickets table, reporting a problem with the TPS report

This is a very important ticket, so two developers – let’s call them Arne and Brian – immediately start troubleshooting. Brian starts the bug tracking GUI and opens the ticket. In the meantime, Arne starts his investigation and quickly realizes that one of the tables used in the report is empty; possibly it has been accidentally truncated. He opens the same ticket in his on-screen form in the bug-tracking GUI and immediately updates the ticket, describing the likely cause of the problem. He also reassigns the ticket to the DBA team. The resulting SQL is shown in Listing 3

Listing 3: The SQL that was issued by Arne’s bug tracking form

Meanwhile, Brian decided to start by ascertaining whether it was just the report that had failed, or whether it was also affecting their Ohio office. He runs the report for Ohio and gets the same problem so, from his onscreen view of the ticket, which he opened before Arne made his update, Brian updates the Problem field to reflect this. The resulting SQL is shown in Listing 4.

Listing 4: The SQL that was issued by Brian’s bug tracking form

The changes saved by Arne were completely lost. Clearly, our bug tracking system is susceptible to lost updates, and so has a big problem. There are two approaches to this issue that we must consider in order to prevent the lost update:

  • Writing logic into the client/data access layer so that only columns are updated in the database, not the entire row
  • Using concurrency control logic
  • Let’s consider each in turn.

Only Updating Changed Columns

In this simple example, the problem is pretty blatant: the SQL generated by the user form updates all the fields from the screen, not just the one Problem field that Brian modified. In this case, the problem could be solved by designing a better data access layer that only updates those columns that were modified in the form.

Nevertheless, this is only a partial solution and will not be adequate in every case. If Brian, in addition, to recording that the TPS report for Ohio was also not working, had suggested as interim solution such as, “temporarily expose yesterday’s TPS report”, then Arne’s much more sensible solution would have been overwritten, regardless.

Listing 5: Brian proposes a poor solution, overwrites a much better one suggested by Arne.

Furthermore, while updating only changed columns, while feasible, is far from an ideal solution. Let’s count how many different UPDATE statements would be required in order to modify only the columns that were actually updated on the screen. There are five columns that may be modified, which gives us a total 2^5 = 32 different update combinations. Should we generate UPDATE commands on the fly? Should we wrap 32 updates in stored procedures? Surely developing all this code manually is out of the question. Although generating such code would by quite easy, neither choice seems very attractive.

Using Concurrency Control Logic

Ultimately, any system that has the potential for ‘update conflicts’, which could result in lost updates, needs some concurrency control logic in place to either prevents such conflicts from occurring, or to determine what should happen when they do.

In previous chapters of my book, we discussed the use of isolation levels to mediate the potential interference of concurrent transactions. Unfortunately, in our bug tracking example, isolation levels alone will not help us. Although from a user’s point of view the problem is caused by “concurrent updates of the database”, from the database’s perspective the modifying transactions never overlap. The basic format of the example was:

  1. Session 1 queries data into form
  2. Session 2 queries same data into form
  3. Sessions 2 starts transaction to update data
  4. Sessions 2 completes transaction to update data
  5. Session 1 starts transaction to update data
  6. Sessions 1 completes transaction to update data
  7. Sessions 2’s update is lost

Although the example was run in the default READ COMMITTED, the result would have been the same using any of the other transaction isolation levels. In order for isolation levels to have any effect, the transactions must overlap, and in order for that to happen, we’d need to adopt a pessimistic approach, and start the transactions much earlier, as soon as the data was queried into the form, and essentially lock the data from that point. As discussed earlier, this pessimistic approach is often not feasible in situations where data is held in user forms for a long time; to do so would inevitably grind the whole system to a halt. So, when the bug tracking system opens ticket number 123 for both Arne and Brian, it should not keep the transactions open after their screen forms have been rendered.

If it is possible to start the transactions earlier, then there may be some cases where high levels such as SERIALIZABLE, or certainly SNAPSHOT (as we will discuss shortly) can help. Note though that cannot always prevent lost updates in this manner. In our previous example, we would simply be in danger of reversing the problem, and losing Brian’s, rather than Arne’s, update.

If you wish to implement a pessimistic approach, without locking resources as soon as the data is queried, then the situation is difficult. Unfortunately, there is no built in mechanism to implement pessimistic concurrency control for longer than a lifetime of a transaction. If we need to implement such an approach, we need to roll it out ourselves. For more information on how to accomplish this, refer to the book “Expert SQL Server 2005 Development” by Adam Machanic with Hugo Kornelis and Lara Rubbelke, where the author shows how to roll out your own locks, persist them in a table, and use triggers to verify if rows to be modified are locked.

A more straightforward approach, for examples such as this, is to implement optimistic concurrency control, where we “optimistically assume” that the rows won’t be modified in the time between querying them and updating them. Of course, with no control logic, the ‘conflicting update’ just proceeds and a lost update occurs, as we saw. However, with proper application of the optimistic approach, we’d have logic in place that raised a warning and prevented the conflicting update from proceeding. So, in our previous example, at the point Brian tried to update the system, the form data would be refreshed and Brian would get a warning that the data had changed since he queried it, and his update would not proceed.

Optimistic Concurrency Control to Detect and Prevent Lost Updates

Let’s take a look at three examples of how to implement optimistic concurrency control in our bug tracking example.

Saving the original values

To detect lost updates, our code needs to “remember” the values of the columns before they were modified, and submit those old values along with the modified ones. The following, rather large, stored procedure performs the update only if no columns were changed.

Listing 6: stored procedure only modifies if the ticket has not been changed.

As you can see by the size of this procedure, it takes a significant amount of code, both on the server and on the client, to implement this approach. Still, let’s see how it works. We’ll rerun our bug tracking example (Listings 10-3 and 10-4) using this stored procedure. First, we need to delete and reinsert test data.

Listing 7: Deleting modified test data.

To restore the test data, rerun Listing 2. Arne’s update, which was originally performed by Listing 3, is now submitted using the UpdateTicket stored procedure, as shown in Listing 8.

Listing 8: Using the UpdateTicket stored procedure to save Arne’s changes.

Brian’s update from Listing 5 is also submitted via the same stored procedure, which detects a lost update, as shown in Listing 9.

Listing 9: Stored procedure detects a lost update and does not save Brian’s changes.

Although this approach works in detecting and preventing lost updates, there is a more efficient one, using the ROWVERSION column.

Using ROWVERSION

A ROWVERSION column in a table is simply a column with a data type of ROWVERSION, which contains a number that auto-increments every time the row is modified. In other words, there is no way to modify a row without incrementing its ROWVERSION column. We can use this feature to detect and prevent lost updates.

In the simplest case, where we load a single row into a screen form, we can retrieve the ROWVERSION along with other columns. When we save the modified data in the database, we can match the saved ROWVERSION against the current ROWVERSION of the row that we are going to modify. If the ROWVERSION value has changed, then the row must have been modified since we read it.

In order to demonstrate this approach, we first need to add a ROWVERSION column to the Tickets table, as shown in Listing 10.

Listing 10: Adding a ROWVERSION column to the Tickets table

To populate the changed table, simply rerun scripts 10-7 and 10-2. Listing 11 shows how to modify our UpdateTicket stored procedure to use the new ROWVERSION column. It compares the ROWVERSION of the row to be modified against the original ROWVERSION value, passed as a parameter, and modifies the row only if these ROWVERSION values match.

Listing 11: The UpdateTicket stored procedure saves changes only if the saved ROWVERSION matches the current ROWVERSION of the row being modified

Listing 12 shows how our new UpdateTicket stored procedure works in our bug tracking example.

Listing 12: Detecting and preventing lost updates with ROWVERSION.

The stored procedure successfully saves Arne’s changes, because the row has not been changed between the time when he read the data into the bug tracker GUI, and the time when he updated the ticket.

However, when we invoke the stored procedure to save Brian’s changes, our UpdateTicket stored procedure detects that ticket 123 has been modified since Brian initially queried the data, as indicated by the fact that the value of the ROWVERSION column has changed, so the attempt to save Brian’s changes fails and a lost update is averted.

Up to now, all the cases we’ve discussed involved displaying information for the user and having the user perform some changes. Typically, in such cases, we do not keep the transaction open between the time we read a row and the time we modify it, so the only built in mechanism to detect lost updates was the ROWVERSION.

If, however, the data is modified programmatically, and quickly, then we can afford to keep the transaction open between the time we read a row and the time we modify it. In such cases, we can use Snapshot isolation to detect and prevent lost updates.

Using Snapshot Isolation level

In the first example, we’ll prevent a lost update using the SNAPSHOT isolation level. Before running the example, we need to establish some test data, as shown in Listing 13.

Listing 13: Adding test data.

Suppose that we have a process that reads tickets one-by-one, determines if they are eligible for removal from the system, and deletes those that are. Listing 14 mimics the case where this automated process has opened a transaction and read ticket number 123.

Listing 14: Opening transaction and reading ticket number 123.

At roughly the same time, another connection modifies the same ticket, as shown in Listing 15 (which should be run from a different tab).

Listing 15: Ticket number 123 is modified.

Clearly the situation has changed and the ticket should not be deleted. Highlight the commented DELETE statement in Listing 14 and execute it. Fortunately, under SNAPSHOT isolation, the potential lost update is detected and prevented, as shown in Listing 16.

Listing 16: A lost update is prevented.

The initial transaction, to retrieve and then delete the ticket, fails. Note that when we started this transaction, we did nothing to prevent other connections from modifying the ticket. Instead, we chose to detect the potential problem and handle it. This is yet another typical example of optimistic concurrency control.

Note that the error message explicitly suggests that we should “Retry the transaction or change the isolation level for the update/delete statement”. However, we need to be very careful when we consider such recommendations. We need to determine which action makes sense on case- by-case basis. Here, we do not want to change the isolation level because SNAPSHOT isolation did a very good job in detecting an error that we want to avoid. Should we retry the transaction? Maybe, but not automatically: we should consider retrying the transaction only after taking into account the new changes. In this particular case, the reopened ticket 123 should stay in the system.

As we have seen, SNAPSHOT isolation is very useful for detecting lost updates in this case. However, SNAPSHOT isolation detects lost updates only for the duration of the transaction and so using this approach will not help if the transactions do not overlap, as was the case in our first example in this article (Listings 1 to 5).

Before moving on, please make sure that Snapshot isolation is disabled for your test database, as subsequent examples will run in normal, READ COMMITTED mode.

Pessimistic Concurrency Control to Prevent Lost Updates

Let’s switch our attention now to ways in which we can implement pessimistic concurrency control, to prevent lost updates. This approach is appropriate when many short transactions are attempting to simultaneously modify the same rows. We’ll discuss two approaches:

  • Using the UPDLOCK hint

  • Using sp_getapplock

In these examples, the data is read and modified by a program, without any human interaction, and in a very short time. In such cases it is feasible to read and modify within the same transaction.

Again, these approaches only help us deal with concurrency for the duration of a transaction; they cannot prevent lost updates when transactions do not overlap. As such, they usually should not be used when users open screen forms to edit data and save their modifications at different times, because from the database’s point of view these modifications are not concurrent.

Serializing updates with UPDLOCK hint

We’ll rerun the ticket deletion/archive example using UPDLOCK hint instead of SNAPSHOT isolation level. First, rerun the script from Listing 13 to restore the modified data to its original state. Next, in one SSMS tab, retrieve the ticket 123 in a transaction under READ COMMITTED isolation level and using the UPDLOCK hint, as shown in Listing 17.

Listing 17. Reading ticket 123 with UPDLOCK hint.

As in the previous example, modify the ticket being archived (deleted) in another tab, as per Listing 15.

Unlike in the previous example, this time the modification does not complete; it stays in lock-waiting state, as it is blocked by our outstanding transaction in the first tab. At the beginning of the transaction in the first tab, we selected data for ticket 123, and the UPDLOCK hint guarantees that this data cannot be modified by other connections for the life of the transaction, though it can still be read.

Return to the first tab and uncomment and run the DELETE statement, in order to delete ticket 123 and commit the transaction. The second tab will now finish too, but the row that was targeted by the UPDATE no longer exists, so it could not be updated.

As we have seen, UPDLOCK hint has prevented the second update from modifying ticket 123. This is typical of pessimistic concurrency control solutions.

The UPDLOCK hint is best-suited to cases where our modifications are simple and short. In this example, we were dealing with a single row modification, and UPDLOCK hint works perfectly well. However, if we need to touch multiple rows, maybe in more than one table, and we hold locks for the duration of a long transaction, then our modifications are very prone to deadlocks (as demonstrated in Chapter 9 of my book).

The need to modify multiple rows in multiple tables in one transaction is very common. For example, saving a screen form with a customer’s order may result in inserting or updating rows in Orders, OrderItems, and OrderComments tables. In such cases, we can still use the locks that are implicitly acquired as the transaction progresses, and we can use UPDLOCK hints to get a better control over locking. This approach can work but is complex, as we often have to consider many possible combinations of different modifications, all occurring at the same time.

There is a simpler alternative in such cases: at the very beginning of our transaction, we can explicitly acquire one application lock for the whole Order object, which spans several rows in the involved tables, Orders, OrderItems, and OrderComments. Let’s see how it works.

Using sp_getapplock to prevent collisions

In this example, our transactions will explicitly acquire an application lock, using sp_getapplock. This effectively serializes modifications, because only one connection can hold an exclusive application lock on the same resource. Other modifications to the same data will be forced to wait for that lock to be released, so there will be no collisions whatsoever. This is an example of pessimistic concurrency control, used to its fullest extent.

Note that application locks are different from other locks in that:

  • The resource they lock is not a row or a page or a table but a name, as will be demonstrated in the following example.
  • They are acquired explicitly, rather than implicitly

Note that when transactions commit or rollback, all application locks are released, so they must be acquired in the context of an outstanding transaction, after we have explicitly started the transaction.

To demonstrate this approach, we first need to restore the modified data to its original state (Listing 13). Next, from one SSMS tab, begin a transaction, acquire an application lock, and start archiving the ticket 123, as shown in Listing 18.

Listing 18: Begin a transaction and acquire an application lock.

After running the script, uncomment and highlight the DELETE and COMMIT commands at the bottom, but do not execute them just yet. In a second tab, we’ll attempt to acquire an exclusive application lock and modify the same ticket, as shown in Listing 19.

Listing 19: Begin a transaction, attempt to acquire an application lock and modify the ticket being archived, if the application lock has been acquired

Immediately return to the first tab and run the highlighted DELETE statement; this script will raise a ‘Ticket not found‘ error. If we wait longer than 10 seconds before trying to run this DELETE, then Listing 19 will raise a ‘Failed to acquire lock‘ error. Either way, lost updates have been prevented.

This proves that if all modifications that wish to modify a ticket are programmed to acquire the corresponding application lock before touching it, then lost updates cannot occur. However, this approach only works if all modifications are programmed to acquire application locks. A failure to acquire an application lock, whether by accident or deliberately, bypasses our protection, and as such may result in lost updates or other problems, such as deadlocks.

To demonstrate this, restore the original data, comment out the command that invokes sp_getapplock, in Listing 19, and then rerun the same example, as follows:

  • In Listing 18, make sure that DELETE and COMMIT command are commented out.
  • Run Listing 18
  • In a second tab run Listing 19
  • Return to Listing 19, uncomment the

DELETE and COMMIT commands at the bottom, highlight them, and execute them.

When the DELETE completes, you’ll find that ticket number 123 is gone, which means that we’ve suffered a lost update. In short, sp_getapplock is only useful when it is consistently used by all relevant modifications. If such consistency is not possible, we will need to use other methods.

T-SQL Patterns that Fail High Concurrency Stress Tests

In many cases, our T-SQL code works perfectly well when we execute it from one connection at a time, but intermittently fails when it runs in production systems, under high concurrency.

In this section, we’ll examine the following two common T-SQL patterns and prove that they are generally unreliable under concurrent loads:

  • IF EXISTS(…) THEN
  • UPDATE … IF (@@ROWCOUNT = 0) BEGIN

We’ll then examine a third technique, MERGE, which is robust under concurrency.

The most important lesson to be learned is that if our code is supposed to run under high concurrency, then we need to stress test under such loads, and against realistic data volumes. If our production table has about 10M rows, we should not run our tests against a tiny table of just 100 rows.

Important Note: If any of these scripts in this section run for too long on your server, and you cancel them, make sure to close the tabs or rollback the transactions. Otherwise, you could end up with an outstanding transaction holding locks, and subsequent examples may not work as expected.

Problems with IF EXISTS(…) THE

The IF EXISTS(…) THEN pattern, as follows, is quite common and yet it frequently fails under high concurrency.

IF EXISTS(–enter some condition here
) BEGIN ;
  — perform some action here
END ;

Before we prove that the technique will cause our optimistic concurrency solution (using the ROWVERSION column) to fail under heavy concurrent loads, let’s first examine a much simpler example, which demonstrates the general problem with this pattern.

May cause Data Integrity Issues under Concurrent Access

To keep the example as simple and short as possible, we’ll use a table with just four columns, as shown in Listing 20.

Listing 20: Create and populate the WebPageStats table

We’ll INSERT or UPDATE rows in a loop using the following simple logic, as expressed in Listing 21: if a row with given ID exists, update it; otherwise insert a new one. Cut and paste this code into two tabs, switch each tab into text mode, and run the code simultaneously in each tab.

Listing 21: Inserting or updating rows in a loop

You should see PRIMARY KEY violations. You may be wondering if our pessimistic technique, using UPDLOCK, would help us out here: unfortunately it won’t. To try this out, uncomment the hint, comment out all BEGIN/COMMIT/ROLLBACK TRANSACTION commands, and re-run the test. You will still see PK violations. The UPDLOCK does not help as there is no row to be locked if the NOT EXISTS is true. So if both connections simultaneously check for existence of the same row, both will find it does not exist (so they won’t acquire a U lock), and both will try to INSERT the row, leading to the violation.

The most important point to remember is that code that performs perfectly in single-user test cases, may behave very differently under when multiple processes are attempting to access and modify the same data. The defensive programmer must test on a case-by-case basis, and test as many different scenarios as possible. With that in mind, I encourage you to play with this simple example a little bit, exploring how small changes affect the behavior of our code under high concurrency. For example:

  • Increase the isolation level in one or both tabs, and see how that affects the behavior.
  • Run different scripts in the tabs, such as with commented hint in one tab and uncommented in another, and see what happens.

May Break Optimistic Concurrency Solutions

Having demonstrated how unreliable the IF EXISTS pattern may be when it executes under high concurrency, let’s now prove that it will cause our optimistic concurrency solution to fail, under similarly high concurrency.

We’ll develop a stored procedure to update the WebPageStats table and then execute it in rapid succession, from two connections. Of course, we could do the same thing with UpdateTickets procedure, but these examples involving loops are quite large, so I decided to use a narrower WebPageStats table just to keep the examples shorter.

Listing 22 shows the UpdateWebPageStats stored procedure, which will detect any version mismatches when it saves changes.

Listing 22: Create the dbo.UpdateWebPageStats stored procedure

Of course, before testing how the stored procedure works under concurrency, we should make sure that it works without it. Testing the stored procedure without concurrency is left as an exercise for the reader.

The following two scripts will invoke the WebPageStats stored procedure multiple times in loops. Running these two scripts simultaneously from two connections will expose WebPageStats to high concurrency, and we shall see how it holds up.

The first script, in Listing 23, increments the column NumVisits for a single row, and does so 100,000 times, in a loop. Cut-and-paste this code into a tab, but do not run it yet.

Listing 23: A loop that invokes UpdateWebPageStats to increment NumVisits for one and the same row 10,000 times in a loop

Our second script, in Listing 24, increments another column, NumAdClicks, also 10,000 times in a loop. Cut-and-paste it into a second tab and run both scripts simultaneously.

Listing 24: A loop that invokes UpdateWebPageStats to increment NumAdClicks for the same row 10,000 times in a loop

These scripts may take some time to complete. When both scripts finish, we would expect both NumVisits and NumAdClicks to have the same value of 10,000. However, this is not the case, as Listing 25 demonstrates. Each time we run these two scripts, we will get different numbers but, every time, neither column will have the expected value of 10000.

Listing 25: NumVisits and NumAdClicks should both be 10000, but they do not have the expected values

As we can see, NumVisits and NumAdClicks do not have the expected value of 10,000. This means that many updates were lost. How could that happen? Suppose that both connections retrieve the version at approximately the same time, and then invoke the same stored procedure at approximately the same time. Clearly in both executions the condition in the IF statement evaluates as TRUE. As a result, both executions will enter the branch with the UPDATE command.

UPDATE commands will execute one after another, and the second one will overwrite the changes of the first one, because the ROWVERSION value is not tested again in the actual UPDATE statement. Adding this test to the UPDATE will not help, though. If we do that, then the first one will increment the ROWVERSION value, and the second one will not update the row at all because the condition (version = @version) in the WHERE clause will return FALSE, but the procedure will still return 0 to indicate success to the caller even though the requested update was not made, and the caller will not try the update again.

UPDATE … IF (@@ROWCOUNT = 0) BEGIN

Another common approach is to attempt, first, to UPDATE an existing row that matches the search criteria, and if there is no matching row, then INSERT a new row. It is also unreliable.

In order to demonstrate this, we need to modify our loop from Listing 21 so that it uses the UPDATE …IF (@@ROWCOUNT = 0) BEGIN pattern, as shown in Listing 26.

Listing 26: A loop that uses the UPDATE … IF (@@ROWCOUNT = 0) pattern

When we run script 10-26 simultaneously from two tabs, we get PRIMARY KEY violations, just as when we ran script 10-21 in our previous example.

In short, the UPDATE…IF (@@ROWCOUNT = 0) pattern is also unreliable under high concurrency. As before, we can (and should!) try out different isolation levels and hints. For example, I encourage you to add WITH(SERIALIZABLE) hint to the UPDATE command and see what happens. This is left as an advanced exercise for the readers.

Stress Testing the MERGE Command

If we are running SQL Server 2008, we can use the MERGE command to implement the same logic i.e. UPDATE rows if they exist, otherwise INSERT. In the context of our loop, MERGE may also intermittently fail but, with the help of a hint, it always completes without a single error. Let’s modify the script 10-26 to use MERGE command, as shown in Listing 27.

Listing 27: Implement our loop using the MERGE command

When we run this script in two tabs at the same time, we should get PRIMARY KEY violations. As usual, if we cancel a query, we must make sure to commit or rollback the outstanding transaction in that tab.

Next, uncomment the hint in both tabs and rerun the scripts; in this particular case, with the help of the HOLDLOCK hint, MERGE holds up under high concurrency perfectly well. Of course, this does not mean that we can always use this new command without stress testing. However, it means that we should at least consider using it whenever we INSERT or UPDATE under high concurrency.

For example, we can consider rewriting our UpdateWebPageStats stored procedure using the MERGE command, as well as exposing this new version of the procedure to the same thorough testing. This is left as an advanced exercise.

One final comment: in the examples in this article we only stress test how one stored procedure runs from multiple connections. In real life, this might not be good enough. If we have two different stored procedure modifying the same table, and if it is possible than these different modules will try to modify the same data concurrently, then we need to include such cases in our stress testing.

Creating New Objects may hurt Concurrency

In some cases, when we create an index, an indexed view, or a trigger, we may introduce serious issues, such as blocking or deadlocks. Let me provide an example of how creating an indexed view increases the probability of blocking and deadlocks. Consider the table, ChildTable, shown in Listing 28.

Listing 28: Creating the ChildTable table.

Let’s subject our table to concurrent modification. In one tab, run the script in Listing 29.

Listing 29: The modification to run in the first tab

In the second tab, run the script in Listing 30.

Listing 30: The modification to run in the second tab

The second modification completes right away. Return to the first tab and rollback the transaction. As we have seen, these two modifications do not block each other. However, what happens if we create an indexed view, based on our table, as shown in Listing 31.

Listing 31: Create the indexed view

Rerun script 10-29 followed by 10-30. This time the script 10-30 will not complete; it will be blocked by the script 10-29, because both modifications also need to modify the same row in the indexed view, and so script 10-30 is waiting for an exclusive lock on the view. Return to the first tab and rollback or commit the transaction to release the locks, and the script 10-35 will complete right away.

Similarly, creating new indexes or triggers may affect concurrent modifications. This means that if we stress test modules to determine how they handle concurrency, we may need to repeat stress testing when we add new indexes, indexed views, or triggers.

Of course, not all indexed views, indexes and so on will cause such blocking, and there are no general rules, which is why I stress the need to test on a case-by-case basis.

Conclusion

We have seen that when modifications run concurrently from multiple connections, we may end up with inconsistent results or errors.  We also investigated two T-SQL patterns that are in common use, and yet can fail under high concurrency, resulting either in lost updates or in blocking or deadlocks.

We have investigated several approaches, both pessimistic and optimistic, for avoiding lost updates and, for SQL Server 2008 users, demonstrated how MERGE can improve the robustness of our code.

The most important point of this article is this: our modules need to be concurrency-proof. We need to expose our modules to concurrency during stress testing, expose vulnerabilities in our code and proactively fix them.

Hopefully, this article, like the entire book, has served not only to provide several techniques that will make your code more robust, but also as an eye-opener as to just what situations your database code has to contend with, when deployed on a live, production system. I haven’t covered every possible case of what can go wrong; that would be impossible. Hopefully, however,  the common cases that have been covered will prove useful in making your code more robust; when a defensive programmer becomes aware of a frailty in one case, he or she knows that very careful testing will be needed in other, similar, cases.

More generally, however, I hope I’ve convinced you that we, as SQL Server programmers, need to be proactive and creative in our testing. After all, “a hard drill makes an easy battle“.

Alex’s book ‘Defensive Database Programming with SQL Server’ is now available to buy from Amazon.

For more articles like this, sign up to the fortnightly Simple-Talk newsletter.

Tags: , , , , , ,

  • 23620 views

  • Rate
    [Total: 42    Average: 4.7/5]
  • Bob Barrows

    Another solution …
    … to the “lost modifications” problem that was not mentioned is to not allow updates at all. Instead, use a related table into which modifications are added as new rows. This seems to be the ideal solution for your ticket example.

  • Bob Barrows

    Another solution …
    … to the “lost modifications” problem that was not mentioned is to not allow updates at all. Instead, use a related table into which modifications are added as new rows. This seems to be the ideal solution for your ticket example.

  • Jacob Wagner

    pre 2008
    Is there a good option for dealing wiht the concurrency issue in SQL 2005 (i.e. before MERGE)?

  • Alex K

    answers
    @Bob,

    I agree that storing all the versions of every row is highly useful in many cases. However, in many cases changes should not be saved if the row being modified has been changed by someone else. For instance, if I were Brian and I were prompted about Arne’s changes, I might decide not to save my pathetic “solution” at all. What do you think?

    @Jacob,

    I would recommend to use sp_getapplock. However, MERGE is not always completely safe to use under high concurrency either – in some cases it does not hold up too.

    Another option is to eliminate concurrency, accumulate modifications in some buffer, and save them in batches.

  • Anonymous

    execellent effort to explain the core issue
    Thanks Alex for taking time to write it up.

  • Alexander Oss

    Transactions Must Include All Pertinent Statements, & More
    I was aghast when I saw the example of the IF EXISTS() problem, as I was sure that when inside a transaction with SERIALIZABLE this approach could be safely taken. I tested it, and indeed I saw the predicted primary key violations.

    However, I then realized that a critical statement wasn’t taking place inside the transaction: the selection of the new primary key value. If you move the calculation of @WebPageID immediately after the BEGIN TRANSACTION, you no longer get any primary key violations.

    But… you do get deadlock errors in the second execution context. How can that happen, if all this activity is on a single table? My guess was insufficient locking granularity, and indeed, after adding WITH (ROWLOCK) to all statements in the transaction, no more errors were generated.

    Summary: SERIALIZABLE + ROWLOCK avoids errors. But at what cost…?

  • Alexander Oss

    Ooops @ Self
    Uh, actually, I did eventually start getting deadlock errors again, even with ROWLOCK. Why???

  • Alexander Oss

    Transactions Must Include All Pertinent Statements, & More, Take Two
    Okay, last babble. Two changes eliminate all errors: moving the assignment of @WebPageID inside the transaction, and adding UPDLOCK hints to both select statements (no SERIALIZABLE, no ROWLOCK). Here’s my version:

    SET NOCOUNT ON ;
    SET TRANSACTION ISOLATION LEVEL READ COMMITTED
    DECLARE @WebPageID INT,
    @MaxWebPageID INT ;

    SET @WebPageID = 0 ;
    SET @MaxWebPageID = ( SELECT MAX(WebPageID)
    FROM dbo.WebPageStats
    ) + 1000000 ;

    WHILE @WebPageID < @MaxWebPageID
    BEGIN ;
    BEGIN TRY ;
    BEGIN TRANSACTION ;

    SET @WebPageID = ( SELECT MAX(WebPageID)
    FROM dbo.WebPageStats WITH ( UPDLOCK )
    ) + 1 ;

    IF EXISTS ( SELECT *
    FROM dbo.WebPageStats WITH ( UPDLOCK )
    WHERE WebPageID = @WebPageID )
    BEGIN ;
    UPDATE dbo.WebPageStats
    SET NumVisits = 1
    WHERE WebPageID = @WebPageID ;
    END ;
    ELSE
    BEGIN ;
    INSERT INTO dbo.WebPageStats ( WebPageID, numvisits )
    VALUES ( @WebPageID, 2 ) ;
    END ;

    COMMIT TRANSACTION ;

    END TRY
    BEGIN CATCH ;
    SELECT ERROR_MESSAGE() ;
    ROLLBACK TRANSACTION ;
    END CATCH ;
    END ;

  • AlexK

    you are getting conversion deadlocks
    Alexander,

    Looks like you are getting conversion deadlocks, when two connections both have shared locks on the same resource and are trying to promote them to exclusive. That can happen on one table all right.