Robyn Page’s SQL Server Cursor Workbench

The topic of cursors is the ultimate "hot potato" in the world of SQL Server. Everyone has a view on when they should and mainly should not be used. By example and testing Robyn Page proves that, when handled with care, cursors are not necessarily a "bad thing". This article coined a phrase, 'Quirky Update', that has since established itself as the industry-term. We now feature a new revised version of the old classic, with help from Phil Factor.

The purpose of this series of workshops is to try to encourage you to take a practical approach to SQL skills. I always find I learn things much quicker by trying things out and experimenting. Please don’t just run the samples, but make changes, alter the data, look for my mistakes, try to see if there are different ways of doing things. Please feel free to criticize or disagree with what I say, if you can back this up. This workbench on cursors is not intended to tell you the entire story, as a tutorial might, but the details on BOL should make much more sense after you’ve tried things out for yourself!

Contents

What are cursors for?

Cursors were created to bridge the ‘impedence mismatch’ between the ‘record- based’ culture of conventional programming and the set-based world of the relational database.

They had a useful purpose in allowing existing applications to change from ISAM or KSAM databases, such as DBaseII, to SQL Server with the minimum of upheaval. DBLIB and ODBC make extensive use of them to ‘spoof’ simple file-based data sources.

Relational database programmers won’t need them but, if you have an application that understands only the process of iterating through resultsets, like flicking through a card index, then you’ll probably need a cursor.

Where would you use a Cursor?

An simple example of an application for which cursors can provide a good solution is one that requires running totals. A cumulative graph of monthly sales to date is a good example, as is a cashbook with a running balance.

We’ll try four different approaches to getting a running total…

Why not try these different approaches, with tables of different sizes and see how long the routines take? (I demonstrate a suitable test-rig shortly).

Is there a quicker or more elegant solution?

Global Cursors

If you are doing something really complicated with a listbox, or scrolling through a rapidly-changing table whilst making updates, a GLOBAL cursor could be a good solution, but is is very much geared for traditional client-server applications, because cursors have a lifetime only of the connection. Each ‘client’ therefore needs their own connection. The GLOBAL cursors defined in a connection will be implicitly deallocated at disconnect.

Global Cursors can be passed too and from stored procedure and referenced in triggers. They can be assigned to local variables. A global cursor can therefore be passed as a parameter to a number of stored procedures Here is an example, though one is struggling to think of anything useful in a short example.

Transact-SQL cursors are efficient when contained in stored procedures and triggers. This is because everything is compiled into one execution plan on the server and there is no overhead of network traffic whilst fetching rows.

Are Cursors Slow?

So what really are the performance differences? Let’s set up a test-rig. We’ll give each routine an increasingly big cashbook to work on up to 2 million rows, and give it a task that doesn’t disturb SSMS/Query analyser too much with a large result, so we can measure just the performance of each algorithm. We’ll put the timings into a table that we can put into excel and run a pivot on to do the analysis.

We’ll calculate the average balance, and the highest and lowest balance so as to check that the results of each method agree.

Now, which solution is going to be the best?

339-QuirkyData.jpg

The raw data is shown here. What I have not shown is the check of the results, which shows that every solution gave consistent results.

The iterative and cursor solution both give similar results since, under the covers, they are doing similar things. They are dramatically faster than the ‘correlated subquery’ and ‘group by’ methods as one would expect.

You will see from the graph that we couldn’t even attempt the correlated subquery methods under a ‘production’ table size. It would have taken too long.

Conclusion? If you don’t feel confident about using ‘Quirky Update’ (and it is easy to mess-up, so you have to test it rigorously), then Running totals are best done iteratively, either by the cursor or the WHILE loop. The WHILE loop is more intuitive, but there is no clear reason in favour of one or the other. For almost all work in SQL Server, set-based algorithms work far faster than iterative solutions, but there are a group of problems where this isn’t so. This is one of them. For a good example of another one, see Phil Factor Speed Phreak Challenge #6 – The Stock Exchange Order Book State problem.

339-QuirkyGraph.jpg

Cursor Variables

If you change the cursor type definition routine above you’ll notice that @@CURSOR_ROWS returns different values.

  • A negative value >1 is the number of rows currently in the keyset. If it is -1 The cursor is dynamic.
  • A 0 means that no cursors are open or no rows qualified for the last opened cursor or the last-opened cursor is closed or deallocated.
  • A positive integer represents the number of rows in the cursor.

The most important type of cursors are:

FORWARD_ONLY
Tou can only go forward in sequence from data source, and changes made to the underlying data source appear instantly.
DYNAMIC
Similar to FORWARD_ONLY, but You can access data using any order.
STATIC
Rows are returned as ‘read only’ without showing changes to the underlying data source. The data may be accessed in any order.
KEYSET
A dynamic data set with changes made to the underlying data appearing instantly, but insertions do not appear.

Cursor Optimization

  • Use them only as a last resort. Set-based operations are usually fastest (but not always-see above), then a simple iteration, followed by a cursor
  • Make sure that the cursor’s SELECT statement contains only the rows and columns you need
  • To avoid the overhead of locks, Use READ ONLY cursors rather than updatable cursors, whenever possible
  • Static and keyset cursors cause a temporary table to be created in TEMPDB, which can prove to be slow
  • Use FAST_FORWARD cursors, whenever possible, and choose FORWARD_ONLY cursors if you need updatable cursor and you only need to FETCH NEXT.

Questions

  1. What is the fastest way of calculating a running total in SQL Server? Does that depend on the size of the table?
  2. what does it suggest if the @@CURSOR_ROWS variable returns a -1?
  3. What is the scope of a cursor?
  4. When might you want locking in a cursor? Which would you choose?
  5. Why wouldn’t the use of a cursor be a good idea for scrolling through a table in a web-based application?

Acknowledgements

Thanks to Nigel Rivett, Phil Factor and Adam Machanic for their ideas. Thanks to Phil Factor for revising this to give a more comprehensive test harness.

Downloads

Tags: , , , , , , , ,

  • 76803 views

  • Rate
    [Total: 0    Average: 0/5]
  • Anonymous

    excellent
    We have to read any article posted by Robin Page as it falls in the category of very informative and example ridden piece. Any way this article was also very informative and examples are real. Keep up the excellent work.

  • Anonymous

    thanks for the info..
    nicely packaged i.e., good background and history… simple examples and nice approach to proof.. i feel like i learned something.. thanks!

  • Anonymous

    What is Recursive cursors ?
    Realy Good artical, please Can u Give more information about Recursive cursors?

  • Anonymous

    Very helpful
    Excellent workbench, especially the stats on the different approaches,

    Thanks

  • Anonymous

    Very helpful
    Excellent workbench, especially the stats on the different approaches,

    Thanks

  • Anonymous

    UPDATE method
    The UPDATE method is not reliable because it assumes the rows will be updated in a particular order, but there is no such guarantee.

  • Robyn Page

    re: Update Method
    Well spotted! I agree. I’ve been worrying a bit about that but I felt that pinning down the update order for certain would have complicated clarity of the script (it can be done, I’m told!). I experimented and couldn’t get a table variable to update in any other order than the order in which the records were inserted into it, so I felt it was safe enough to include. Can anyone get it to update in a different order?

  • Anonymous

    good work
    Robin this what i would like to say about u.

    “Give a man a fish, you feed him for a day. Teach a man to fish, you feed him for a life.”

  • Anonymous

    Excellent
    this really excellent

  • Anonymous

    SQL 2005, CTE’s, Ranking/Windowing Functions
    Where do you see CTE’s and Ranking/Windowing Functions (SQL 2005) fitting into the mix?

  • jacob.sebastian

    Memory Tables v/s cursors
    I had been wondering if a memory table (using a table variable) is a better choice than a cursor.

    do you have an idae whether a memory table would give better performance?

  • SHATCHARD

    What a crock!
    Although Ms Page may be quite an accomplished developer, she must have been ON THE SET the day I learned about SETS

    What about GROUP BY for aggregations?

    Also, GIVE CURSORS a wide berth. I have been optimising SQL stored procedures for 5 years and by far the biggest performance and locking issues come from cursors.

    There is nearly always a SET based alternative to cursors. Its the clash of the procedural programming mind against the database/SET mind.

    Nearly all database activities are quicker when done in bulk compared to iterating a loop.

    If you must iterate across a resultset either farm that out to a Business Rules Layer in a procedural language or create an @ temporary table variable to hold your iteration data. The locking for the iteration will be only on a temporary table not the actual data tables.

  • Paker

    Tempoary Tables, Table Variables, and Cursors
    I recently had cause to try processing the same data using each of the subject approaches – by far the best result was achieved using a cursor.

    I’ve also had experience in using running totals in a report – while this can be done at least two ways using set theory, using a cursor improved performance by more than an order of magnitude.

    Shatchard, I don’t feel that you are being particularly helpful – there are typically multiple ways to achieve a desired result, and the variety of configurations that people use mean that experiment and experience on a particular target platform are worth at least as much as the dogma you present so doggedly.

    Ms. Page has provided huge service to the user community, and is definitely on my ‘must read’ list.

    I am not aware of any similar contribution on your part – and certainly did not find it in your posting. A well worked out example on the problems of misusing cursors would have served us much better than what you provided.

  • Anonymous

    Cursors vs Set Based operations
    I also feel it’s misleading to promote cursors for use in some of these operations (arguably, running totals are a valid use, although this can also be achieved by aggregation) or by using incrementing variables (I also shy away from loops where possible).

    Cursors have aggressive locking models, spread out over a (typically) longer period of time than an aggregate query/stream would take to execute, which can lead to serious problems in a high transaction volume (or high user load) environment – it is however quite likely that they can perform faster (execution time only, not necessarily resource cost) in test environments.

    Adding up the sum of column x in table y using a cursor will always scale to a far lesser degree than using SUM() and GROUP BY, to give a simple example – it might be quicker to use the cursor on a single run, but try 500-1000 concurrent executions.

    To be honest, I’d go further than SHatchard and say that these days, there is no excuse for embedding this sort of logic in SQL Server at all – if you need to perform complex operations of this type, you need a middle tier. Then you get the option of scaling out, rather than up…

  • SHATCHARD

    Dillusional
    In answer to Paker, of course, theres always many ways to do anything. Indeed you can drive up a motorway backwards on the wrong side of the road. You could also jump out of an aeroplane without a parachute. However, theres one thing having the option to do it, it’s a whole different ball game to RECOMMEND it.

    Theres usually one right way to do something for that particular system and environment and in my experience no system has ever been improved with the use of cursors.

    My comments derive from ACTUALLY solving performance problems in the REAL world; comparing different methods and approaches for tackling the same problem and my conclusion is the always the same – avoid cursors.

    Actually I’m doing myself out of a job here, carry on using cursors, use as many as you can then I can earn huge consultancy fees going round clearing them up when your systems break down.

  • Tony Davis

    re: what a crock, dillusional etc.
    Yes, there are always a lot of options, but the article does not just say “here are all the options and one of them is using a cursor”. It does try to provide a measure of the performance of each solution on a reasonably-large database. I agree with “anonymous” that this is just a standalone performance test, and the next step is to test the *scalability* of each of these solutions and find out where each one breaks down. How can we do that?

    Nor does the article say “go ahead and use cursors wherever you like, you’ll be fine”. It actually says “Use them only as a last resort. Set-based operations are usually fastest”.

    It’s really is valuable to have Shatchard and others share their real experiences with using cursors — but how can we test this out for ourselves? These workbenches are all about test cases and hard data, so that readers can find out for themselves what solution will work for their system.

  • Anonymous

    double-entry accounting?
    Does anyone else think that the sample table is fundamentally flawed, as it uses the same sign for the income and outgo values? I’d think you’d either indicate flow direction through sign, or have seperate income and outgo columns.

    Also, how much is the string comparison in the CASE costing relative to the total times?

    Lastly, would performance be enhanced (especially if using Group By set logic) with an index of (et, amount)?

  • Robyn Page

    Re: Double-entry Accounting
    Yes, originally I used the same convention for the sample data that you suggested, but I looked up a cashbook example in an accounting textbook and this was the way the original cashbook entries were made. When I originally wrote the workbench, I did a complete worked example of a double-entry cashbook but took it out again as it wasn’t directly relevant to the main issue of whether there were any classic accounting calculations for which a cursor is necessary. I just left the stub.
    On the last two points, I’d very much hope that you’d tell me! The workbench approach would, I hope, allow you to put this to the test. I’d expect that a covering non-clustered index would be an improvement, as one would expect SQL Server to be able to get the data from the index leaf and not to have to access the data at all. My guess is that it would favour the set-based approaches.

  • Robyn Page

    Re: SHatchard and the Crock
    Many thanks to SHatchard, Paker, (and the anonymous author of ‘Cursors vs Set Based operations’) for their contributions. I agree with quite a lot of what they say. I’m a little surprised, though, that SHatchard interpreted an article that starts by stating ‘Relational database programmers won’t need them’, as a defense of the use of Cursors.
    I always use a set-based solution wherever I can, and have never had any need to put a cursor-based solution into a production system. (I will happily use an iterative solution occasionally). However, I’m just keen to keep an open mind and adopt an experimental approach to the issue. Do cursors ALWAYS have an aggressive locking strategy even if you specify a read-only cursor? Are there any common problems that are better solved by an iterative or cursor-based strategy?

    Surely we should put this sort of issue to the test?

    I would be very interested in SHatchard’s better GROUP BY solution, If he’d be prepared to send it to the editor, and would like to do some comparative timings with the solutions I used, and add this to the article.

  • Anonymous

    Where to use cursors
    I wholeheartedly agree that cursors should generally be avoided, but they can be a tool in your belt, even if little-used. If scalability is a concern, then cursors are probably not the best solution. However, there are times when they are, in fact, the best tool for the job.

    The main place I use cursors is in administrative scripts – compiling fragmentation data for all tables for all databases, for instance. The DMVs in SQL 2005 finally offer more ways to use set-based solutions to get the desired results, but cursors are still perfectly fine for this use, since scalability is not a problem.

    If you are working with very large amounts of data, you can use cursors to split up very large operations so as to avoid disk thrashing and excessive use of tempdb.

    To simply throw out cursors completely is like tossing out your hammer because you have access to a pile driver.

  • SHATCHARD

    The crux of my argument
    If i may re-iterate the last point from my first reply

    “If you must iterate across a resultset either farm that out to a Business Rules Layer in a procedural language or create an @ temporary table variable to hold your iteration data. The locking for the iteration will be only on a temporary table not the actual data tables”

    I was not saying that you never need to loop or iterate with T-SQL. I was saying 2 things

    a) For loops required for production then a Business Layer would normally be the most efficient. Pass a resultset back to the layer and iterate there.

    b) If you must do a loop in T-SQL then here is an example of looping using a temporary table variable (for SQL 2K/2005)

    SET NOCOUNT ON
    DECLARE @tab TABLE (id INT)
    DECLARE @iCurrentRow INT
    DECLARE @vName SYSNAME

    INSERT INTO @tab(id)
    SELECT id FROM master.dbo.sysobjects

    WHILE EXISTS (SELECT 1 FROM @tab)
    BEGIN
    SELECT @iCurrentRow=id FROM @tab

    SELECT @vName=name from master.dbo.sysobjects WHERE id=@iCurrentRow
    PRINT @vName

    DELETE FROM @tab WHERE id=@iCurrentRow
    END

    You dump the data you want use in your loop into the temporary table, thus moving the locking away from the actual tables to the temporary table.

    My argument is that an article designed to explain cursors would actually lead to encouraging people to try them when actually there are much better (and efficient) alternatives.

    As for a GROUP BY example, hmmmm, I think it was during my first day of learning about SQL that it dawned on me “ah SQL is designed for aggregations!”.

  • Anonymous

    Temp Tables
    The whole point of having the locking on the base tables is for data integrity. Using a temp table circumvents the locking mechanism and can lead to integrity issues when more than 1 person is using a system.

    Still waiting for a “running total” implementation using GROUP BY …
    (preferably one that can be used in SQL2000, so no custom aggregate functions)

  • Anonymous

    Set vs. Procedural
    I, for one, don’t understand the distinction people make all the time between “set” and “procedural” in regards to DBMS’s.

    The simple fact is that the execution of the query is ultimately procedural. Either we program the procedure ourselves, or we describe our query and let the optimizer generate a procedure for us. The reason “set”-based solutions are usually faster is because the optimizer usually comes up with a very good procedure — especially considering that it can leverage parallelism and (just about) as much memory as it wants.

    If you look at the execution plans generated, you can see very clearly that SQL Server is simply converting “set” operations into “procedural” operations.

    Take, for example, the correlated subquery and self-join execution plans. You will probably see a “nested loop” operation and later a “stream aggregate” operation. Not only are nested loops inefficient, but it generates a huge dataset that it then iterates through to calculate the aggregates.

    The most efficient way is to pass through the data as few times as possible, which is what happens with the “quirky update”, iterative, and cursor solutions. I am guessing that the iterative and cursor solutions are slower because of the implicit transactions that surround the DML being executed within the loops — the update method makes 2 passes through the data, but only incurs the overhead of 2 transactions. You might want to try enclosing the iterative and cursor methods in a transaction to see if the elapsed times come closer to the time of the update method.

    I believe the “quirky update” method does indeed rely on the details of the execution plan, so in more complex systems an index or statistics change could alter the execution plan and subsequently break the calculation. Also, because of its quirky nature, this method would probably be confusing to anyone trying to maintain the code in the future.

    When using cursors, two key things to keep in mind are locking and transactions. Be sure you understand them and use them carefully! The availability of your system and the integrity of your data depend on them.

    As far as the example from b) above, the proposed method accesses @tab many more times than it needs to. It scans @tab from top to bottom in the first SELECT statement of the loop and then scans @tab from top to bottom (due to no index) in order to execute the DELETE at the end of the loop. It performs these scans for every record in the table — not the best use of system resources.

    Please understand that I am not trying to provoke anybody here — just trying to offer an honest analysis of the methods presented.

    J. Steele

    (Both my previous posts were marked “Anonymous” even though I had filled in the Name textbox – must I sign in for “nonanonymous” posting?)

  • Anonymous

    Other uses for Cursors
    I’ve got to weigh in on this to get a reaction to the following. Here is a real world example of using cursors that I’ve constructed. I have a table that contains a 2K varchar field (among other variables). The content of this field is a stylized stored procedure with one or more ‘replaceable’ variables. The purpose of this table of stored procedures is to perform data validation in a myriad of ways and process order. So, the processing sequence goes something like this. With a cursor, identify the validation records to run based on several parameters passed into the stored procedure. The results order from the cursor is based on an ordinal value associated with the record. For each record in the cursor set, do a search / replace on the replaceable fields. Construct a dynamic query where I have a standardized Insert into <table> [the select query from the cursor record with the ‘where’ variables updated from passed in parameter(s)]. I iterate through until no more validation routines need to be processed. This routine is called as part of a bigger data import process. Different data file formats trigger different validation requirements hence the passed in variables.

    Why not do this outside the sql system? Or for that matter using sql stored procedures. Because the scenarios for validation change with enough frequency that test/release cycles to release binary code make it impractical. The current method allows the user to insert a new validation to be used in a file load within minutes of identifying the need. There is an interface that guides the end user department in constructing these validations thus allowing them to be self sufficient without requiring support / assistance from IT. It’s not always about faster execution.

    The discussion has been centered on iterating through tables for some result but a SQL system can be a complete application in itself. As eluded to in various previous threads, what are you trying to achieve and are you smart about it. This article allows those without years of seat experience to see something tangible and build upon what was presented. Many thanks to Ms. Page for taking the time to outline ideas for others to digest and expand there breadth of knowledge. Thanks to those with their counter points as well.

  • Anonymous

    flawed comparisons
    Several things stand out on this performance comparison.
    The example using the “SET” technique included an unused (but populated) field Et in the table variable @cb, this table variable was missing a primary key, and was missing a SET Nocount ON command.
    The example using the cursor did not have this useless field created and populated, had a SET Nocount ON command. A primary key on the table variable used in the cursor is not possible.
    By making some minor tweaks to the “SET” technique (adding a primary key to @cb, NOCOUNT, getting rid of the useless field), I next had to change to measure to milliseconds – performance was 15 to 20 times better than the cursor method.
    So, if this article is an attempt to compare approaches, it is misleading. It does demonstrate that cursors can be almost as slow as poorly thought out set based solutions. However, that wasn’t the message the author intended, and I’m still in the group of developers that view cursors as the technique of last resort.

  • Robyn Page

    re: Flawed comparisons

    Thanks very much for the contribution. It has certainly given food for thought.

    Several points:

    ET (EntryType) is indeed used in all the methods. One wants the comparison to be fair.

    NOCOUNT was ON for all the timings.

    The introduction of the primary key on the table variable on the ‘Quirky Update’ approach slows the performance very slightly on my test system.

    Changing the index on the test table to a primary key clustered or a compound (covering) index made little difference to the timings.

    Timings were all in seconds as the worst performers were running into several minutes.

    The ‘message that the author intended’ about cursors was quite clearly stated at the beginning of the article ‘Relational database programmers won’t need them’, but I would want to keep an open mind, and to put matters to the test. I’d particularly like to hear of any application for which a cursor provides the best approach.

  • Anonymous

    Possible cursor application
    How about a scenario where you want to retrieve a threshold based on a percentage of a total?

    Say you are a non-profit with a large pool of benefactors, and you want to view a “Top X%” list – where the percentage represents a portion of the total sum of contributions for a time period. So if you wanted a “Top 75%” list, you would start with your largest contributor and in descending order keep a running total until you have a sum that meets or exceeds 75% of your total. The amount that pushed the running total past 75% would then be used as the threshold for the list.

    As a small example, say you received $100 total this month, with contributions of the following amounts: $55, $22, $5, $4, $4, $3, $2, $2, $2, $1

    If you asked for “Top 50%”, you would get a threshold of $55, since the first contribution pushed the running total over $50, and the list would contain 1 record.

    If you asked for “Top 75%”, you would get a threshold of $22, and the list would contain 2 records.

    Seems like an odd type of request, but when dealing with large numbers of entities with widely varying values, it can be an effective way to view significant records.

    Sure, this could be implemented outside of the database – but that would require pumping a lot of data outside of the SQL Server process.

    Any thoughts on (MSSQL2000) implementations that might be more effective than a cursor or iteration?

  • Robyn Page

    Re: Possible Cursor Application

  • Anonymous
  • Granted

    Cursors
    Yes, as a rule, cursors are bad. However, there are places where they can perform better than set based operations. If you check out Itzik Ben-Gan’s book, Inside TSQL Querying, on page 234 & 235, he shows a method for generating Row Numbers for data (if you’re not using Row numbers all over the place… read the book, please) that results in a single scan of the data as opposed to n-squared number of scans for a set based operation.
    Just remember, in a 500 page book on TSQL, he shows a single example where cursors are better.
    Great article Robyn.

  • Anonymous

    Possible Cursor Applications
    I too use cursors rarely. However, I have 3 scenarios where they make sense to me.

    First, I have used cursors when I doing inserts where I need to grab the @@identity value for each row inserted and use it somewhere else. I will generate a temp table for the cursor to operate on. This table includes an id column where I will store the @@identity value. For each record in the cursor, I do the insert and store the @@identity value. After the cursor finishes, I can then use the id values for updating other tables (with set operations).

    Second, I will use a cursor if I need to execute a stored procedure using values from each record. This may be a different procedure for each record (as someone else also discussed here), or it may be a procedure that can’t be reduced to a set operation.

    Third, I could see a cursor being used in a case where simplicity is more important than performance, probably in code that does not run very often. Cursor code may be a little verbose, but it can be very simple to understand and maintain. Robyn’s running total example demonstrates this – the set based version is (to me anyway) more complex than the cursor based version. If I was going to run this once a week, I would opt for the simplest code, not the most efficient.

  • scott2718281828

    Unnecessar Cursor Applications
    If you want to capture the identity values from multiple insertions, try the new OUTPUT clause in SQL 2005. It can list all the new identity values created. (If you still insist on doing it with a cursor, consider using SCOPE_IDENTITY() rather than @@IDENTITY.)

    The second and third use for cursors mentioned above don’t show any reason why cursors should be chosen over using a table variable. Table variables are no more complicated than cursors, and don’t have the locking issues.

    My biggest gripe about cursors is that the default cursor is the most expensive scrollable updatable type, and too many people don’t bother to find out how to use them intelligently. If they defaulted to LOCAL FAST_FORWARD, and you had to figure out the proper options to do anything else, there would probably be fewer complaints about their performance.

  • Anonymous

    Possibly Unnecessary Cursor Applications
    Thanx for the info scott2718281828 – I didn’t know about the new output clause – it looks like it might help me in my first application. (I guess I missed that part of the “What’s New” section of BOL!)

    My stuff doesn’t typically have scope issues, so I have just used @@identity. However, I think scope_identity() is probably safer, so that future changes don’t break my current code.

    To your second point – I don’t see that table variables are a substitute for cursors. When I (rarely) use a cursor, it always operates on a temp table or a table variable to avoid locking issues. But I don’t see how I could use a table variable *instead* of a cursor.

    I certainly don’t find table variables complex – I was talking about complex set-based queries. I think there are cases where, after the developer considers the balance of performance needs, development time, developer skill set, and maintainability, that a cursor will be the right choice.

    I don’t think that cursors are ‘evil’ and should be avoided at all costs. If you have a need that can more easily be met with a cursor (within required performance parameters), then use a cursor. The ‘iteration’ technique described in Robyn’s article seems like a way to avoid using a cursor just for the sake of avoiding using a cursor. I would only use it only if there was a demonstrated performance boost (though, I must say, it does look like a lot less typing).

  • Anonymous

    Cursor vs Table Variable
    I have found table variables adequate to my needs where I need to iterate through data row by row.

    For example, in SQL 2005, I have a database that exists solely for the purpose of sending automated e-mails to staff and customers based on events in other databases. (E.g.: “Your order has shipped”.)

    sp_send_dbmail is the easiest way to send these e-mails (in the set up I have), and iterating through it with different parameters for each exec command is how I have it operate. This allows for completely automatic operation on a scheduled basis.

    I could do this with a cursor, but I have found that the ability to perform set-based operations on the table variable before the iteration starts is much more efficient (faster, less server resources).

    For example, I can do things like:

    update @Emails
    set attachments = replace(attachments, ‘c:’, ‘… UNC resource …’)

    (I’m not including the actual UNC path, since I’m not interested in providing server names on an open forum. I’m sure you can figure out how it could be made to work.)

    Doing that once, as well as a dozen other clean-up tasks, based on user-input, etc., as part of a single update statement that works on the whole table variable, is much faster than stepping through a cursor (or table variable) and updating each row one at a time.

    I’m not familiar with any way to update a cursor in a similar manner. So far as I know, you have to operate on them one row at a time. (Correct me if I’m wrong on that.)

    Then, using an ID column in the table variable, I can step through it with:

    declare @Row int, @Rows int

    select @row = 1, @rows = count(*)
    from @emails

    while @row <= @rows
    begin
    (code to run on each row, including exec command on sp_send_dbmail)
    select @row = @row + 1
    end

    This solution seems to me to be the best of iterative and set solutions. Allows an exec command that can’t take a set-solution, allows set commands that are more efficient than row-by-row updates.

    It’s possible I’m missing some options, using this, that I’m not aware of. In that case, I would definitely appreciate comments on them. But I have found that this approach to anything I have to iterate through is very efficient and fast.

    Of course, avoiding iteration in favor of set-based solutions is generally much better, but there are a few things where I find stepping through to be necessary.

  • Anonymous

    Cursor vs Table Variable
    Your procedure to send mail is written exactly the way I would have done it, except that I would use a cursor instead of the home-made iteration code. I certainly agree that as much as possible should be done with set operations, and that the cursor/iterator should operate on a table var (or temp table if necessary).

    The home-made iterator must ‘select’ from the table var once on each iteration, and it must have a field that can be used to select records in sequence. A cursor does not require the extra field. And, since cursors are a built-in part of Transact-SQL, I would presume (hope!) that the smart folks who built the database engine would have put in some optimizations to reduce the number of reads. In other words, since the home-made version selects one record at a time, the db must scan the table (or an index) to find the row, and repeat this for each row in the table. My hope is that with a cursor, the query engine will be smart enough to grab the whole thing at once (or at least more than one row at a time).

    Does this actually happen? I don’t know. My previous uses of cursor were on applications that were run infrequently, so performance tuning wasn’t needed. I am now working on one where performance will be an issue, so I think I will do some testing to see which technique is faster.

  • Anonymous

    Shatchard’s iterative example with a @local table … and a grouse :o)
    If Shatchard’s table variable is holding a large number of rows, performance will be absolutely, unabashedly miserable. The table variable would benefit from having a clustered primary key so that the whole table does not have to be scanned every time a row is fetched or deleted.

    As long as I’m grousing, the people who say that cursors always have aggressive locking are either just being inflammatory, or they haven’t done their schoolwork. The type of locking is specific to the type of cursor as well as any locking hints in the cursor definition.

    If people really want to take exception to what the article is saying, they should at least provide measureable counterexamples. Usually the people who say “always” and “never” seem to be the first to dismiss good, solid evidence, especially when it flies in the face of their dogma.

  • Robyn Page

    Re: Shatchard’s iterative example with a @local table
    Absolutely right. A well designed cursor-based method will perform as well, of not slightly better, than an iterative solution, and the optimization is done for you.
    Dogma has no place in IT, wherever it can be replaced by understanding. We should try things out, experiment, and challenge preconceptions. Please keep grousing!

  • Anonymous

    Update order for table @cb
    Hi Robyn:

    Excellent article and just what I was looking for. I run a transaction log and needed running balance for reports. When I tried out your example, the only thing I changed was this statement in the quirky set technique:

    UPDATE @cb

    SET @total = total = @total + amount FROM @cb

    I added an ORDER BY for my specific case because obviously, I wanted the balance by date like a typical register:

    UPDATE @cb

    SET @total = total = @total + amount FROM @cb order by [DateColumn]

    That worked great. I’ve just bookmarked you and will be coming back in the future. Thanks again.

  • Anonymous

    “Cursors are bad” is dogma
    Cursors certainly have their issues, locking being most likely the worst as mentioned by several people.

    But when I hear that they are “always bad”, I suspect a newbie is talking.

    In this article:
    http://www.sql-server-performance.com/dp_no_cursors.asp

    …I ran the code supplied by the author, and the way it was intended by the author to be ran, I found his “better” way than a cursor is 30% slower!!

  • Anonymous

    They’re a compensation for bad design
    I’m lucky enbough to control the whole application that my firm uses. In the example given, I would simply add a running total column and whack a trigger on the table. This may sound obtuse, but the point is that most developers look at the data retrieval without questioning how the data gets into the tables and whether their structure is correct. If you have to use cursors for accessing multiple lines of data then something is usually wrong with the data and how it’s stored.

  • Anonymous

    wow great article
    you’re hot!

  • Anonymous

    A use for update cursors

    With the premis that I am not a DBA but a programmer who has to work on a database, today I worked on the following situation:

    I wrote a function RandomPercentage that is used in a script for masking client data for demo purposes.

    update tblProducts
    set prodPrice = prodPrice * dbo.RandomPercentage(25,10,rand(),rand())

    when I ran this it (apparently) evalutated dbo.RandomPercentage once, and then applied it to every row. The only solution that I found to apply a different percentage to every row that worked was to use an update cursor.

    As a by-the-by, I found googling around that while there are many people who ask how cursors work, for 9 out of 10 responses, the reply is that a data set could do that for you faster blah blah.. in-fact, it took a very long time to find an example of an “update cursor”.

  • DaveWasEre

    Good well thought out article
    I’ve been using cursors for many years without any of the problems that Shatchard has.

    Robyn is absolutely correct they are a good tool to have and there are times when you have to use them.
    I would also say to always use temporary tables in memory for 2 reasons.
    1. Cut down the rows being processed.
    2. Hold only the data columns required to do your updates in an unlocked table.

    As in the example I have to re-calculate totals on a hugh database periodically. Users make errors on accounts and the corrections sometimes have to be applied several months later. The accounts gain interest every month so all the transactions, interest and totals have to be re-calculated. The only way to do this is using cursors as you have to know where you are in the transaction history.

    I can see why Shatchard has a problem with locking but any update script can have a locking issue and if used with a bit of sense cursors get the job done.

  • Jeff Moden

    Really???
    One poster above wrote…
    _________________________________________________
    I added an ORDER BY for my specific case because obviously, I wanted the balance by date like a typical register:

    UPDATE @cb

    SET @total = total = @total + amount FROM @cb order by [DateColumn]

    That worked great.
    __________________________________________________

    I’d like to know how you avoided the error by the word “Order”. Are you actually using SQL Server?

  • Shane Stone

    Ordering Of Records with Update Method
    I realise it has been a couple of years since the original question wwas posed, but I thought I’d add some comments about the UPDATE method and the ordering thereof.

    It does indeed work non-deterministically IN GENERAL. i.e. You don’t know which record order it is going to use as it accumulates. You really only see this behaviour on extremely large data sets on multi-processor machines, but it is important to know how the solution scales.

    With current versions of SQL Server at least, there are things you can do to ensure that it works deterministically.

    1) The only index defined on the table being updated should be a clustered primary key.
    2) The primary key order should be the order in which you want to update the records.
    3) You should use OPTION(MAXDOP 1) to prevent SQL Server trying to process different parts of the table in parallel.

    That said, the UPDATE method is really only applicable to simple tasks. For more complex behaviours where some knowledge of past records is required, CURSORS are definitely a better option. Robyn’s tips on how to optimise them are an ideal starting point.

  • Shane Stone

    Ordering Of Records with Update Method
    I realise it has been a couple of years since the original question wwas posed, but I thought I’d add some comments about the UPDATE method and the ordering thereof.

    It does indeed work non-deterministically IN GENERAL. i.e. You don’t know which record order it is going to use as it accumulates. You really only see this behaviour on extremely large data sets on multi-processor machines, but it is important to know how the solution scales.

    With current versions of SQL Server at least, there are things you can do to ensure that it works deterministically.

    1) The only index defined on the table being updated should be a clustered primary key.
    2) The primary key order should be the order in which you want to update the records.
    3) You should use OPTION(MAXDOP 1) to prevent SQL Server trying to process different parts of the table in parallel.

    That said, the UPDATE method is really only applicable to simple tasks. For more complex behaviours where some knowledge of past records is required, CURSORS are definitely a better option. Robyn’s tips on how to optimise them are an ideal starting point.

  • Jeff Moden

    Complex tasks are not a problem…
    Shane Stone wrote:
    “That said, the UPDATE method is really only applicable to simple tasks. For more complex behaviours where some knowledge of past records is required, CURSORS are definitely a better option.”

    Absolutely NOT true! 😉 What type of million row “more complex task” would you like to see the Quirky Update work on?

    • Phil Factor

      Re: Complex tasks are not a problem…
      Quite so. I have a parser working in SQL using ‘Quirky Update’. It is much faster than the cursor version and even maintains several ‘stacks’. I haven’t come across a task that is too complex for this method. I do agree with Robyn and Shane that cursor methods are sometimes misrepresented by people who don’t set them up properly, but even when one does cursors properly, ‘quirky Updates’ usually have the edge.

  • Paul_Ch

    Cursors are bad right????????????????????
    Cursors are supposed to bad choices for databases
    right??????
    You go for cursors only when your set theory is not strong……

  • Lee

    Is It Kosher?
    One of my first SQL Server assignments when I arrived nine years ago at my present job was to find a method for auditing transactions in our job-tracking software, a COTS product named “HEAT”. Because we couldn’t exactly open up the delivered executables and perform custom-code bypass surgery as we saw fit, I didn’t see much of an alternative to using triggers. It’s like one of those old-fashioned coffee percolators: you don’t have to know what’s going on in the pot to see the data as it bubbles through.

    The real problem came once I realized that it was necessary for the triggers on different tables to communicate with each other. As an Oracle refugee, I remember being somewhat aghast at the limited options afforded by SQL Server trigger code — SQL Server may be competitive with Oracle in myriad ways, but trigger features are not among them. Oracle features not just AFTER triggers, but also true BEFORE triggers (rather than the cheap, imitation INSTEAD OF triggers). They also give you a row-by-row execution option, i.e., it doesn’t have to be set-based. Best of all, you can declare an external package and use it to stash information that can be accessed by other triggers — sort of like using a global bulletin board.

    But none of this was available. I considered keeping work tables dedicated to trigger communication, and that may have been the best solution. But it’s not the one I chose.

    I discovered what I thought at the time to be an easier solution: global cursors. I used global cursors as I would have used a package declaration in Oracle: as my global bulletin board. E.g., a positive test for the existence of a global cursor told the trigger that some sort of related operation was already in progress in another trigger; and, if there was a need to know more, the cursor could be opened and information pulled from it. E.g., from one trigger, you could pass a new IDENTITY value resulting from an INSERT into another trigger like this:

    DECLARE c_histseq CURSOR GLOBAL
    FOR
    SELECT [local variable] histseq
    FOR READ ONLY

    And as a given operation finished up on its tasks, it would close and deallocate the cursor.

    We moved on to another job-tracking product about a year ago, so now our HEAT triggers are adorned with cobwebs and sit sullenly dormant in one of our OBE folder. We never ran into any problems with the triggers and cursors, but perhaps that was just luck.

    A couple of quick takes on other comments:

    To “What a Crock!” — if you have to run each row in a row set through a stored procedure and are not free to recode the procedure, it seems to me a cursor is necessary.

    Simulating a cursor by interatively and incrementally walking a set of rows in a table is not generally an improvement over cursors, as others have noted. I discovered that empirically several years ago, and it was corroborated by one of the big guns at PASS (Ben-Gan perhaps?) last November. He also said that Microsoft could improve cursor performance very easily if they wanted to do so, but refuse because they want to discourage database programmers from thinking only in terms of row-by-row logic. Their instincts may be correct. Having lived in the Oracle world, I observed that PL/SQL is so powerful, many excellent PL/SQL programmers actually never employ set-based logic. Microsoft may be trying to avoid such a situation.

  • BuggyFunBunny

    Why SQLServer is different from Oracle
    — Microsoft may be trying to avoid such a situation.

    I’d argue this difference lies as much with each DB’s legacy as anything else. Oracle (and DB2, all versions) came up through the COBOL world, whereas SQLServer didn’t. I’ve seen lots of SQLServer used just as iteratively in java, but those folks don’t actually realize what they’re doing!