Click here to monitor SSC
Av rating:
Total votes: 104
Total comments: 52


Robyn Page
Robyn Page's SQL Server Cursor Workbench
24 January 2007

/*The topic of cursors is the ultimate "hot potato" in the world of SQL Server. Everyone has a view on when they should and mainly should not be used. By example and testing Robyn Page proves that, when handled with care, cursors are not necessarily a "bad thing". This article coined a phrase, 'Quirky Update', that has since established itself as the industry-term. We now feature a new revised version of the old classic, with help from Phil Factor.

/*

The purpose of this series of workshops is to try to encourage you to take a practical approach to SQL skills. I always find I learn things much quicker by trying things out and experimenting. Please don't just run the samples, but make changes, alter the data, look for my mistakes, try to see if there are different ways of doing things. Please feel free to criticize or disagree with what I say, if you can back this up. This workbench on cursors is not intended to tell you the entire story, as a tutorial might, but the details on BOL should make much more sense after you've tried things out for yourself!

Contents

  What are cursors for?

  Where would you use a cursor?

  Global cursors

  Are Cursors Slow?

  Cursor Variables

  Cursor Optimisation

  Questions

  Acknowledgements

What are cursors for?

Cursors were created to bridge the 'impedence mismatch' between the 'record- based' culture of conventional programming and the set-based world of the relational database.


They had a useful purpose in allowing existing applications to change from ISAM or KSAM databases, such as DBaseII, to SQL Server with the minimum of upheaval. DBLIB and ODBC make extensive use of them to 'spoof' simple file-based data sources.


Relational database programmers won't need them but, if you have an application that understands only the process of iterating through resultsets, like flicking through a card index, then you'll probably need a cursor.

Where would you use a Cursor?

An simple example of an application for which cursors can provide a good solution is one that requires running totals. A cumulative graph of monthly sales to date is a good example, as is a cashbook with a running balance.
We'll try four different approaches to getting a running total...

*/

/*so lets build a very simple cashbook */

CREATE TABLE #cb (        cb_ID INT IDENTITY(1,1),--sequence of entries 1..n

   Et VARCHAR(10), --entryType

   amount money)--quantity

INSERT INTO #cb(et,amount) SELECT 'balance',465.00

INSERT INTO #cb(et,amount) SELECT 'sale',56.00

INSERT INTO #cb(et,amount) SELECT 'sale',434.30

INSERT INTO #cb(et,amount) SELECT 'purchase',20.04

INSERT INTO #cb(et,amount) SELECT 'purchase',65.00

INSERT INTO #cb(et,amount) SELECT 'sale',23.22

INSERT INTO #cb(et,amount) SELECT 'sale',45.80

INSERT INTO #cb(et,amount) SELECT 'purchase',34.08

INSERT INTO #cb(et,amount) SELECT 'purchase',78.30

INSERT INTO #cb(et,amount) SELECT 'purchase',56.00

INSERT INTO #cb(et,amount) SELECT 'sale',75.22

INSERT INTO #cb(et,amount) SELECT 'sale',5.80

INSERT INTO #cb(et,amount) SELECT 'purchase',3.08

INSERT INTO #cb(et,amount) SELECT 'sale',3.29

INSERT INTO #cb(et,amount) SELECT 'sale',100.80

INSERT INTO #cb(et,amount) SELECT 'sale',100.22

INSERT INTO #cb(et,amount) SELECT 'sale',23.80

/* You don't actually need a cursor. You can get a running total using a correlated subquery */

SELECT [Entry Type]=Et, amount,

[balance after transaction]=(

       SELECT SUM(--the correlated subquery

                      CASE WHEN total.Et='purchase'

                       THEN -total.amount

                       ELSE total.amount

                       END)

FROM #cb total WHERE total.cb_id <= #cb.cb_id )

FROM #cb ORDER BY #cb.cb_id

--or you can do this simple inner join and group-by clause if you don't

--like correlated subqueries

SELECT [Entry Type]=MIN(#cb.Et), [amount]=MIN (#cb.amount),

[balance after transaction]=

SUM(CASE WHEN total.Et='purchase'

                       THEN -total.amount

                       ELSE total.amount

                       END)

FROM #cb total INNER JOIN #cb ON  total.cb_id <= #cb.cb_id

GROUP BY #cb.cb_id ORDER BY #cb.cb_id

--and here is a very different technique that takes advantege

--of the quirky behavionr of SET in an UPDATE command in SQL Server

DECLARE @cb TABLE(cb_ID INT,--sequence of entries 1..n

        Et VARCHAR(10), --entryType

        amount money,--quantity

        total money)

DECLARE @total money

SET @total = 0

INSERT INTO @cb(cb_id,Et,amount,total)

     SELECT cb_id,Et,CASE WHEN Et='purchase'

                       THEN -amount

                       ELSE amount

                       END,0 FROM #cb order by cb_id

UPDATE @cb

          SET @total = total = @total + amount FROM @cb

SELECT [Entry Type]=Et, [amount]=amount,

                [balance after transaction]=total FROM @cb ORDER BY cb_id

-- or you can give up trying to do it a set-based way and

-- iterate through the table

DECLARE @ii INT, @iiMax INT, @CurrentBalance money

DECLARE @Runningtotals TABLE (cb_id INT, Total money)

SELECT @ii=MIN(cb_id), @iiMax=MAX(cb_id),@CurrentBalance=0 FROM #cb

WHILE @ii<=@iiMax

       BEGIN

       SELECT  @currentBalance=@currentBalance

                       +CASE WHEN Et='purchase'

                       THEN -amount

                       ELSE amount

                       END FROM #cb WHERE cb_ID=@ii

       INSERT INTO @runningTotals(cb_id, Total) SELECT @ii,@currentBalance

       SELECT @ii=@ii+1

      END

SELECT[Entry Type]=Et,amount,total

FROM #cb INNER JOIN @Runningtotals r ON #cb.cb_id=r.cb_id

/*

or alternatively you can use......

----------....A CURSOR!!!

the use of a cursor will normally involve a DECLARE, OPEN, several

FETCHs, a CLOSE and a DEALLOCATE

*/

SET Nocount ON

DECLARE @Runningtotals TABLE (cb_id INT, Et VARCHAR(10), --entryType

                                       amount money, Total money)

DECLARE @CurrentBalance money, @Et VARCHAR(10), @amount money

--Declare the cursor

--declare current_line  cursor -- SQL-92 syntax--only scroll forward

DECLARE current_line CURSOR fast_forward--SQL Server only--only scroll forward

FOR

       SELECT Et,amount

       FROM #cb ORDER BY cb_id

FOR READ ONLY

--now we open the cursor to populate any temporary tables (in the case of

-- cursors) etc..

--Cursors are unusual because they can be made GLOBAL to the connection.

OPEN current_line

--fetch the first row

FETCH NEXT FROM current_line

INTO @Et,@amount

WHILE @@FETCH_STATUS = 0--whilst all is well

       BEGIN

       SELECT @CurrentBalance = COALESCE(@CurrentBalance,0)

       +CASE WHEN @Etyle="COLOR: blue">='purchase'

                       THEN -@amount

                       ELSE @amount

                       END

       INSERT INTO @Runningtotals (Et, amount,Total)

               SELECT @Et,@Amount,@CurrentBalance

    -- This is executed as long as the previous fetch succeeds.

       FETCH NEXT FROM current_line

               INTO @Et,@amount

       END

SELECT  [Entry Type]=Et,amount,Total FROM @Runningtotals ORDER BY cb_id

CLOSE current_line--Do not forget to close  when its result set is not needed.

--especially a global updateable cursor!

DEALLOCATE current_line

-- although the Cursor code looks bulky and complex, on small tables it will

-- execute just as quickly as a simple iteration, and will be faster with tables

-- of any size if you forget to put an index on the table through which you're

-- iterating!

-- The first two solutions are faster with small tables but slow down

-- exponentially as the table size grows.

/* here is the result of all the routines above

Entry Type amount                balance after transaction

---------- --------------------- -------------------------

balance    465.00                465.00

sale       56.00                 521.00

sale       434.30                955.30

purchase   20.04                 935.26

purchase   65.00                 870.26

sale       23.22                 893.48

sale       45.80                 939.28

purchase   34.08                 905.20

purchase   78.30                 826.90

purchase   56.00                 770.90

sale       75.22                 846.12

sale       5.80                  851.92

purchase   3.08                  848.84

sale       3.29                  852.13

sale       100.80                952.93

sale       100.22                1053.15

sale       23.80                 1076.95

*/

--Why not try these different approaches, with tables of different

--sizes and see how long the routines take? (I demonstrate a suitable test-rig

-- shortly).

-- Is there a quicker or more elegant solution?

Global Cursors


If you are doing something really complicated with a listbox, or scrolling through a rapidly-changing table whilst making updates, a GLOBAL cursor could be a good solution, but is is very much geared for traditional client-server applications, because cursors have a lifetime only of the connection. Each 'client' therefore needs their own connection. The GLOBAL cursors defined in a connection will be implicitly deallocated at disconnect.


Global Cursors can be passed too and from stored procedure and referenced in triggers. They can be assigned to local variables. A global cursor can therefore be passed as a parameter to a number of stored procedures Here is an example, though one is struggling to think of anything useful in a short example*/

CREATE PROCEDURE spReturnEmployee (

      @EmployeeLastName VARCHAR(20),

      @MyGlobalcursor CURSOR VARYING OUTPUT

)

AS

BEGIN

   SET NOCOUNT ON

   SET @MyGlobalcursor =  CURSOR STATIC FOR

   SELECT lname, fname FROM pubs.dbo.employee

               WHERE lname = @EmployeeLastName

   OPEN @MyGlobalcursor

END

.

DECLARE @FoundEmployee CURSOR,

               @LastName VARCHAR(20),

               @FirstName VARCHAR(20)

EXECUTE spReturnEmployee  'Lebihan', @FoundEmployee OUTPUT

--see if anything was found

--note we are careful to check the right cursor!

IF CURSOR_STATUS('variable', '@FoundEmployee') = 0

     SELECT 'no such employee'

ELSE

     BEGIN

     FETCH NEXT FROM @FoundEmployee INTO @LastName, @FirstName

     SELECT @FirstName+' '+@LastName

     END

CLOSE @FoundEmployee

DEALLOCATE @FoundEmployee


/*Transact-SQL cursors are  efficient when contained in stored procedures and triggers. This is because everything is compiled into one execution plan on the server and there is no overhead of network traffic whilst fetching rows.

Are Cursors Slow?

So what really are the performance differences? Let's set up a test-rig. We'll give each routine an increasingly big cashbook to work on up to 2 million rows, and give it a task that doesn't disturb SSMS/Query analyser too much with a large result, so we can measure just the performance of each algorithm. We'll put the timings into a table that we can put into excel and run a pivot on to do the analysis.
We'll calculate the average balance, and the highest and lowest balance so as to check that the results of each method agree.
Now, which solution is going to be the best?*/

------------------------------------------------------------------------

-- Test harness

------------------------------------------------------------------------

--declare the local variables

DECLARE @ii INT, @iiMax INT, @CurrentBalance MONEY

DECLARE @Et VARCHAR(10), @amount MONEY

 

--and clean up the harness from the last run. 

IF EXISTS (SELECT  * FROM tempdb.INFORMATION_SCHEMA.TABLES

   WHERE TABLE_NAME LIKE '#Events[_]%')     DROP TABLE #Events

IF EXISTS (SELECT  * FROM tempdb.INFORMATION_SCHEMA.TABLES

   WHERE TABLE_NAME LIKE '#TableSizes[_]%') DROP TABLE #TableSizes

IF EXISTS (SELECT  * FROM tempdb.INFORMATION_SCHEMA.TABLES

   WHERE TABLE_NAME LIKE '#TempCB[_]%')     DROP TABLE #TempCB

IF EXISTS (SELECT  * FROM tempdb.INFORMATION_SCHEMA.TABLES

   WHERE TABLE_NAME LIKE '#RunningTotals[_]%') DROP TABLE #RunningTotals

 

--Firstly, we'll make a table to record our results for each table size

CREATE TABLE #Events (Event_ID INT IDENTITY(1,1),--sequence of Events 1..n

     [Event] VARCHAR(50) NOT NULL, --The Event we're recording

     [method] VARCHAR(50) NOT NULL, --the algorithm we are using

     TableSize INT NOT NULL,

     Time DATETIME DEFAULT GETDATE())--The moment that it happened

 

CREATE TABLE #tempcb (cb_ID INT,--sequence of entries 1..n

         Et VARCHAR(10), --entryType

         amount MONEY,--quantity

         total MONEY)

DECLARE @total MONEY

 

CREATE TABLE  #Runningtotals (cb_id INT, Total MONEY)

 

 

--now, we'll have a table of the table sizes that we want

CREATE TABLE #TableSizes  (TableSize_ID INT IDENTITY(1,1),--TableSizes to try out with

     TableSize INT NOT NULL

)

--and fill them with the table sizes we'll be using. (Change to taste)

INSERT INTO #tablesizes (TableSize)

     SELECT 20 UNION SELECT 200 UNION  SELECT 2000 UNION

         SELECT 20000 union select 200000 union select 2000000

 

DECLARE @tablesizeRow INT,@maxTableSizeRow INT,@TableSize INT

SELECT  @tablesizeRow = MIN(TableSize_ID),

        @maxTableSizeRow = MAX(TableSize_ID)

FROM    #Tablesizes

 

WHILE @tablesizeRow <= @maxTableSizeRow

  BEGIN

   --firstly, get the number of rows

    SELECT  @TableSize = tablesize

    FROM    #tablesizes

    WHERE   TableSize_ID = @TablesizeRow

    SELECT  @TablesizeRow = @TablesizeRow + 1    

   --Delete the cashbook!

    IF EXISTS

     ( SELECT  * FROM    tempdb.INFORMATION_SCHEMA.TABLES

         WHERE   TABLE_NAME LIKE '#cb[_]%' ) DROP TABLE #cb

      --create a new randomly-generated cashbook table.

    CREATE TABLE #cb

      (

       cb_ID INT IDENTITY(1, 1),

       Et VARCHAR(10), --entryType

       amount MONEY

      )--quantity

    INSERT  INTO #cb

        (et, amount)  SELECT  'balance', 465.00

    SELECT  @ii = 0

    WHILE @ii <= @TableSize

      BEGIN

        INSERT  INTO #cb (et, amount)

                SELECT  CASE WHEN RAND() < 0.5 THEN 'sale'

                             ELSE 'purchase'

                        END, CAST(RAND() * 180.00 AS MONEY)

        SELECT  @ii = @ii + 1

      END

  --and put an index on it

    CREATE CLUSTERED INDEX idxcbid ON #cb (cb_id)

    CREATE INDEX covering ON #cb (cb_id, et, amount)

 

--first try the correlated subquery approach...

    IF @TableSize < 200000 --they run out of steam pretty soon.

      BEGIN

        INSERT  INTO #Events (method, event, tablesize)

           SELECT  'Correlated Subquery', 'start', @TableSize

 

        SELECT  'correlated subquery', MIN(g.balance),

                 AVG(g.balance), MAX(g.balance)

        FROM    (SELECT [balance] =

                   (SELECT SUM(--the correlated subquery

                      CASE WHEN total.Et = 'purchase' THEN -total.amount

                           ELSE total.amount

                      END)

                    FROM   #cb total WHERE  total.cb_id <= #cb.cb_id

                    )

                 FROM   #cb) g

 

--then we'll do the 'group by and inner join'

        INSERT  INTO #Events

          (method, event, tablesize) SELECT 'group by', 'start', @TableSize

 

        SELECT  'Group by...', MIN(f.balance), AVG(f.balance), MAX(f.balance)

        FROM (SELECT 

                  [balance] = SUM(CASE WHEN total.Et = 'purchase'

                                  THEN -total.amount ELSE total.amount

                                  END)

              FROM   #cb total INNER JOIN #cb ON total.cb_id <= #cb.cb_id

              GROUP BY #cb.cb_id

              ) f

      END ---end of the slow section

 

-- Now let's try the "quirky" technique using SET

    INSERT  INTO #Events (method, event, tablesize)

            SELECT  'Quirky Update', 'start', @TableSize

 

    SET @total = 0

    TRUNCATE TABLE #TEMPCB

    INSERT  INTO #TempCb

            (cb_id, Et, amount, total)

            SELECT  cb_id, Et,

                    CASE WHEN Et = 'purchase' THEN -amount

                         ELSE amount END, 0

            FROM    #cb  order by cb_id

    UPDATE  #TempCb

    SET     @total = total = @total + amount

    SELECT  'quirky Update', MIN(Total), AVG(Total), MAX(Total)

    FROM    #TempCb

-- now the simple iterative solution

 

    INSERT  INTO #Events (method, event, tablesize)

            SELECT  'Iterative Solution', 'start',@TableSize

 

    SELECT  @ii = MIN(cb_id), @iiMax = MAX(cb_id),

            @CurrentBalance = 0

    FROM    #cb

    TRUNCATE TABLE #RunningTotals

    WHILE @ii <= @iiMax

      BEGIN

        SELECT  @currentBalance = @currentBalance

           + CASE WHEN Et = 'purchase' THEN -amount ELSE amount END

        FROM    #cb  WHERE   cb_ID = @ii

        INSERT  INTO #runningTotals (cb_id, Total)

                SELECT  @ii, @currentBalance

        SELECT  @ii = @ii + 1

      END

    SELECT  'iterative method', MIN(Total), AVG(Total), MAX(Total)

    FROM    #Runningtotals      

-- now the simple iterative solution

    INSERT  INTO #Events  (method, event, tablesize)

            SELECT  'Cursor Solution', 'start', @TableSize

 

--Declare the cursor

--declare current_line  cursor -- SQL-92 syntax

                                        ---scroll forward only

 

    DECLARE current_line CURSOR fast_forward--SQL Server only

                                       ---scroll forward

      FOR SELECT  cb_id, Et, amount

          FROM    #cb

          ORDER BY cb_id

      FOR READ ONLY

    TRUNCATE TABLE #Runningtotals

--now we open the cursor to populate any temporary tables 

    OPEN current_line

--fetch the first row

    FETCH NEXT FROM current_line INTO @ii, @Et, @amount

    SELECT  @currentBalance = 0

    WHILE @@FETCH_STATUS = 0--whilst all is well

      BEGIN

        SELECT  @CurrentBalance = COALESCE(@CurrentBalance,0)

                 + CASE WHEN @Et = 'purchase'

                        THEN -@amount ELSE @amount END

        INSERT  INTO #runningTotals (cb_id, Total)

                SELECT  @ii, @currentBalance

    -- This is executed as long as the previous fetch succeeds.

        FETCH NEXT FROM current_line INTO @ii, @Et, @amount

      END

 

    CLOSE current_line--Do not forget to close

    DEALLOCATE current_line

    SELECT  'cursor method', MIN(Total), AVG(Total), MAX(Total)

    FROM    #Runningtotals

      INSERT  INTO #Events (method, event, tablesize)

            SELECT  'Test Run', 'End', 0

  END

--Now it is all done, get a report of the findings

SELECT  method, Tablesize,

        DATEDIFF(ms, Time,

                 (SELECT time FROM #events [next] WHERE next

                          . event_ID = this . Event_ID + 1

                 )  ) [Duration(ms)]

FROM    #Events AS [this] WHERE   event = 'start'

 

------------------------------------------------------------------------

-- End of Test harness

------------------------------------------------------------------------

/*

The raw data is shown here. What I have not shown is the check of the results, which shows that every solution gave consistent results.

 

The iterative and cursor solution both give similar results since, under the covers, they are doing similar things. They are dramatically faster than the 'correlated subquery' and 'group by' methods as one would expect.

 

You will see from the graph that we couldn't even attempt the correlated subquery methods under a 'production' table size. It would have taken too long.

 

Conclusion?  If you don't feel confident about using 'Quirky Update' (and it is easy to mess-up, so you have to test it rigorously), then Running totals are best done iteratively, either by the cursor or the WHILE loop. The WHILE loop is more intuitive, but there is no clear reason in favour of one or the other. For almost all work in SQL Server, set-based algorithms work far faster than iterative solutions, but there are a group of problems where this isn't so. This is one of them. For a good example of another one, see Phil Factor Speed Phreak Challenge #6 - The Stock Exchange Order Book State problem

 

*/

 

Cursor Variables

--@@CURSOR_ROWS         The number of rows in the cursor

--@@FETCH_STATUS Boolean value, success or failure of most recent fetch

---2 if a keyset FETCH returns a deleted row
So here is a test harness just to see what the two variables will
give at various points. Try changing the cursor type to see what
@@Cursor_Rows and @@Fetch_Status returns. It works on our temporary

Table

*/

--Declare the cursor

DECLARE @Bucket INT

--declare current_line  cursor--we only want to scroll forward

DECLARE current_line CURSOR keyset --we scroll about (no absolute fetch)

/* TSQL extended cursors can be specified

[LOCAL or GLOBAL] [FORWARD_ONLY or SCROLL] [STATIC, KEYSET, DYNAMIC

or FAST_FORWARD]

[READ_ONLY, SCROLL_LOCKS or OPTIMISTIC]

[TYPE_WARNING]*/

FOR    SELECT 1 FROM #cb

SELECT @@FETCH_STATUS, @@CURSOR_ROWS

OPEN current_line

--fetch the first row

FETCH NEXT --NEXT , PRIOR, FIRST, LAST, ABSOLUTE n or RELATIVE n

               FROM current_line INTO @bucket

WHILE @@FETCH_STATUS = 0--whilst all is well

       BEGIN

       SELECT @@FETCH_STATUS, @@CURSOR_ROWS

       FETCH NEXT FROM current_line INTO @Bucket

       END

CLOSE current_line

DEALLOCATE current_line /*

if you change the cursor type definition routine above you'll notice that @@CURSOR_ROWS returns different values

a negative value >1 is the number of rows currently in the keyset.

If it is -1 The cursor is dynamic.

A 0 means that no cursors are open or no rows qualified for the last opened cursor or the last-opened cursor is closed or deallocated.

a positive integer represents the number of rows in the cursor the most important type of cursors are...

 

FORWARD_ONLY

       you can only go forward in sequence from data source, and changes made

       to the underlying data source appear instantly.

DYNAMIC

       Similar to FORWARD_ONLY, but You can access data using any order.

STATIC

       Rows are returned as 'read only' without showing changes to the underlying

      data source. The data may be accessed in any order.

KEYSET

       A dynamic data set with changes made to the underlying data appearing

       instantly, but insertions do not appear.

Cursor Optimization

  . Use them only as a last resort. Set-based operations are usually fastest

    (but not always-see above), then a simple iteration, followed by a cursor

  . Make sure that the cursor's SELECT statement contains only the

    rows and columns you need

  . To avoid the overhead of locks, Use READ ONLY cursors rather than

    updatable cursors, whenever possible.

  . , static and keyset cursors cause a temporary table to be

    created in TEMPDB, which can prove to be slow

  . Use FAST_FORWARD cursors, whenever possible, and choose  FORWARD_ONLY

    cursors if you need updatable cursor and you only need to FETCH NEXT.

Questions

    1/ What is the fastest way of calculating a running total in

        SQL Server? Does that depend on the size of the table?

    2/ what does it suggest if the @@CURSOR_ROWS variable returns a -1?

    3/ What is the scope of a cursor?

    4/ When might you want locking in a cursor? Which would you choose?

    5/ Why wouldn't the use of a cursor be a good idea for scrolling

        through a table in a web-based application?

Acknowledgements

 Thanks to Nigel Rivett, Phil Factor and Adam Machanic for their ideas. Thanks to Phil Factor for revising this to give a more comprehensive test harness

Note: the source to all this can be downloaded from the speechbubble at the top of the article

*/



This article has been viewed 52707 times.
Robyn Page

Author profile: Robyn Page


Robyn Page has worked as a consultant with Enformatica and USP Networks with a special interest in the provision of broadcast services over IP intranets. She was also a well known actress, being most famous for her role as Katie Williams, barmaid and man-eater in the Television Series Family Affairs, when she was nominated as 'Most sexy newcomer' at the British Soap awards. She is currently having a career break to raise a young family.

Search for other articles by Robyn Page

Rate this article:   Avg rating: from a total of 104 votes.


Poor

OK

Good

Great

Must read
 
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: excellent
Posted by: Anonymous (not signed in)
Posted on: Thursday, January 25, 2007 at 3:35 AM
Message: We have to read any article posted by Robin Page as it falls in the category of very informative and example ridden piece. Any way this article was also very informative and examples are real. Keep up the excellent work.

Subject: thanks for the info..
Posted by: Anonymous (not signed in)
Posted on: Thursday, January 25, 2007 at 4:48 PM
Message: nicely packaged i.e., good background and history... simple examples and nice approach to proof.. i feel like i learned something.. thanks!

Subject: What is Recursive cursors ?
Posted by: Anonymous (not signed in)
Posted on: Friday, January 26, 2007 at 1:40 AM
Message: Realy Good artical, please Can u Give more information about Recursive cursors?

Subject: Very helpful
Posted by: Anonymous (not signed in)
Posted on: Friday, January 26, 2007 at 4:33 AM
Message: Excellent workbench, especially the stats on the different approaches,

Thanks

Subject: Very helpful
Posted by: Anonymous (not signed in)
Posted on: Friday, January 26, 2007 at 5:18 AM
Message: Excellent workbench, especially the stats on the different approaches,

Thanks

Subject: UPDATE method
Posted by: Anonymous (not signed in)
Posted on: Monday, January 29, 2007 at 1:16 PM
Message: The UPDATE method is not reliable because it assumes the rows will be updated in a particular order, but there is no such guarantee.

Subject: re: Update Method
Posted by: Robyn Page (view profile)
Posted on: Tuesday, January 30, 2007 at 1:00 PM
Message: Well spotted! I agree. I've been worrying a bit about that but I felt that pinning down the update order for certain would have complicated clarity of the script (it can be done, I'm told!). I experimented and couldn't get a table variable to update in any other order than the order in which the records were inserted into it, so I felt it was safe enough to include. Can anyone get it to update in a different order?

Subject: good work
Posted by: Anonymous (not signed in)
Posted on: Friday, February 02, 2007 at 12:44 AM
Message: Robin this what i would like to say about u.

"Give a man a fish, you feed him for a day. Teach a man to fish, you feed him for a life."

Subject: Excellent
Posted by: Anonymous (not signed in)
Posted on: Friday, February 02, 2007 at 1:18 AM
Message: this really excellent

Subject: SQL 2005, CTE's, Ranking/Windowing Functions
Posted by: Anonymous (not signed in)
Posted on: Monday, February 05, 2007 at 10:59 PM
Message: Where do you see CTE's and Ranking/Windowing Functions (SQL 2005) fitting into the mix?

Subject: Memory Tables v/s cursors
Posted by: jacob.sebastian (view profile)
Posted on: Tuesday, February 06, 2007 at 12:30 AM
Message: I had been wondering if a memory table (using a table variable) is a better choice than a cursor.

do you have an idae whether a memory table would give better performance?

Subject: What a crock!
Posted by: SHATCHARD (view profile)
Posted on: Tuesday, February 06, 2007 at 3:46 AM
Message: Although Ms Page may be quite an accomplished developer, she must have been ON THE SET the day I learned about SETS

What about GROUP BY for aggregations?

Also, GIVE CURSORS a wide berth. I have been optimising SQL stored procedures for 5 years and by far the biggest performance and locking issues come from cursors.

There is nearly always a SET based alternative to cursors. Its the clash of the procedural programming mind against the database/SET mind.

Nearly all database activities are quicker when done in bulk compared to iterating a loop.

If you must iterate across a resultset either farm that out to a Business Rules Layer in a procedural language or create an @ temporary table variable to hold your iteration data. The locking for the iteration will be only on a temporary table not the actual data tables.




Subject: Tempoary Tables, Table Variables, and Cursors
Posted by: Paker (view profile)
Posted on: Tuesday, February 06, 2007 at 6:38 AM
Message: I recently had cause to try processing the same data using each of the subject approaches - by far the best result was achieved using a cursor.

I've also had experience in using running totals in a report - while this can be done at least two ways using set theory, using a cursor improved performance by more than an order of magnitude.

Shatchard, I don't feel that you are being particularly helpful - there are typically multiple ways to achieve a desired result, and the variety of configurations that people use mean that experiment and experience on a particular target platform are worth at least as much as the dogma you present so doggedly.

Ms. Page has provided huge service to the user community, and is definitely on my 'must read' list.

I am not aware of any similar contribution on your part - and certainly did not find it in your posting. A well worked out example on the problems of misusing cursors would have served us much better than what you provided.

Subject: Cursors vs Set Based operations
Posted by: Anonymous (not signed in)
Posted on: Tuesday, February 06, 2007 at 7:04 AM
Message: I also feel it's misleading to promote cursors for use in some of these operations (arguably, running totals are a valid use, although this can also be achieved by aggregation) or by using incrementing variables (I also shy away from loops where possible).

Cursors have aggressive locking models, spread out over a (typically) longer period of time than an aggregate query/stream would take to execute, which can lead to serious problems in a high transaction volume (or high user load) environment - it is however quite likely that they can perform faster (execution time only, not necessarily resource cost) in test environments.

Adding up the sum of column x in table y using a cursor will always scale to a far lesser degree than using SUM() and GROUP BY, to give a simple example - it might be quicker to use the cursor on a single run, but try 500-1000 concurrent executions.

To be honest, I'd go further than SHatchard and say that these days, there is no excuse for embedding this sort of logic in SQL Server at all - if you need to perform complex operations of this type, you need a middle tier. Then you get the option of scaling out, rather than up...

Subject: Dillusional
Posted by: SHATCHARD (view profile)
Posted on: Tuesday, February 06, 2007 at 7:30 AM
Message: In answer to Paker, of course, theres always many ways to do anything. Indeed you can drive up a motorway backwards on the wrong side of the road. You could also jump out of an aeroplane without a parachute. However, theres one thing having the option to do it, it's a whole different ball game to RECOMMEND it.

Theres usually one right way to do something for that particular system and environment and in my experience no system has ever been improved with the use of cursors.

My comments derive from ACTUALLY solving performance problems in the REAL world; comparing different methods and approaches for tackling the same problem and my conclusion is the always the same - avoid cursors.

Actually I'm doing myself out of a job here, carry on using cursors, use as many as you can then I can earn huge consultancy fees going round clearing them up when your systems break down.



Subject: re: what a crock, dillusional etc.
Posted by: Tony Davis (view profile)
Posted on: Tuesday, February 06, 2007 at 8:50 AM
Message: Yes, there are always a lot of options, but the article does not just say "here are all the options and one of them is using a cursor". It does try to provide a measure of the performance of each solution on a reasonably-large database. I agree with "anonymous" that this is just a standalone performance test, and the next step is to test the *scalability* of each of these solutions and find out where each one breaks down. How can we do that?

Nor does the article say "go ahead and use cursors wherever you like, you'll be fine". It actually says "Use them only as a last resort. Set-based operations are usually fastest".

It's really is valuable to have Shatchard and others share their real experiences with using cursors -- but how can we test this out for ourselves? These workbenches are all about test cases and hard data, so that readers can find out for themselves what solution will work for their system.

Subject: double-entry accounting?
Posted by: Anonymous (not signed in)
Posted on: Tuesday, February 06, 2007 at 10:06 AM
Message: Does anyone else think that the sample table is fundamentally flawed, as it uses the same sign for the income and outgo values? I'd think you'd either indicate flow direction through sign, or have seperate income and outgo columns.

Also, how much is the string comparison in the CASE costing relative to the total times?

Lastly, would performance be enhanced (especially if using Group By set logic) with an index of (et, amount)?

Subject: Re: Double-entry Accounting
Posted by: Robyn Page (view profile)
Posted on: Tuesday, February 06, 2007 at 3:28 PM
Message: Yes, originally I used the same convention for the sample data that you suggested, but I looked up a cashbook example in an accounting textbook and this was the way the original cashbook entries were made. When I originally wrote the workbench, I did a complete worked example of a double-entry cashbook but took it out again as it wasn't directly relevant to the main issue of whether there were any classic accounting calculations for which a cursor is necessary. I just left the stub.
On the last two points, I'd very much hope that you'd tell me! The workbench approach would, I hope, allow you to put this to the test. I'd expect that a covering non-clustered index would be an improvement, as one would expect SQL Server to be able to get the data from the index leaf and not to have to access the data at all. My guess is that it would favour the set-based approaches.

Subject: Re: SHatchard and the Crock
Posted by: Robyn Page (view profile)
Posted on: Tuesday, February 06, 2007 at 4:06 PM
Message: Many thanks to SHatchard, Paker, (and the anonymous author of 'Cursors vs Set Based operations') for their contributions. I agree with quite a lot of what they say. I'm a little surprised, though, that SHatchard interpreted an article that starts by stating 'Relational database programmers won't need them', as a defense of the use of Cursors.
I always use a set-based solution wherever I can, and have never had any need to put a cursor-based solution into a production system. (I will happily use an iterative solution occasionally). However, I'm just keen to keep an open mind and adopt an experimental approach to the issue. Do cursors ALWAYS have an aggressive locking strategy even if you specify a read-only cursor? Are there any common problems that are better solved by an iterative or cursor-based strategy?

Surely we should put this sort of issue to the test?

I would be very interested in SHatchard's better GROUP BY solution, If he'd be prepared to send it to the editor, and would like to do some comparative timings with the solutions I used, and add this to the article.


Subject: Where to use cursors
Posted by: Anonymous (not signed in)
Posted on: Wednesday, February 07, 2007 at 2:23 PM
Message: I wholeheartedly agree that cursors should generally be avoided, but they can be a tool in your belt, even if little-used. If scalability is a concern, then cursors are probably not the best solution. However, there are times when they are, in fact, the best tool for the job.

The main place I use cursors is in administrative scripts - compiling fragmentation data for all tables for all databases, for instance. The DMVs in SQL 2005 finally offer more ways to use set-based solutions to get the desired results, but cursors are still perfectly fine for this use, since scalability is not a problem.

If you are working with very large amounts of data, you can use cursors to split up very large operations so as to avoid disk thrashing and excessive use of tempdb.

To simply throw out cursors completely is like tossing out your hammer because you have access to a pile driver.

Subject: The crux of my argument
Posted by: SHATCHARD (view profile)
Posted on: Wednesday, February 07, 2007 at 5:03 PM
Message: If i may re-iterate the last point from my first reply

"If you must iterate across a resultset either farm that out to a Business Rules Layer in a procedural language or create an @ temporary table variable to hold your iteration data. The locking for the iteration will be only on a temporary table not the actual data tables"

I was not saying that you never need to loop or iterate with T-SQL. I was saying 2 things

a) For loops required for production then a Business Layer would normally be the most efficient. Pass a resultset back to the layer and iterate there.

b) If you must do a loop in T-SQL then here is an example of looping using a temporary table variable (for SQL 2K/2005)

SET NOCOUNT ON
DECLARE @tab TABLE (id INT)
DECLARE @iCurrentRow INT
DECLARE @vName SYSNAME

INSERT INTO @tab(id)
SELECT id FROM master.dbo.sysobjects

WHILE EXISTS (SELECT 1 FROM @tab)
BEGIN
SELECT @iCurrentRow=id FROM @tab

SELECT @vName=name from master.dbo.sysobjects WHERE id=@iCurrentRow
PRINT @vName

DELETE FROM @tab WHERE id=@iCurrentRow
END

You dump the data you want use in your loop into the temporary table, thus moving the locking away from the actual tables to the temporary table.

My argument is that an article designed to explain cursors would actually lead to encouraging people to try them when actually there are much better (and efficient) alternatives.

As for a GROUP BY example, hmmmm, I think it was during my first day of learning about SQL that it dawned on me "ah SQL is designed for aggregations!".

Subject: Temp Tables
Posted by: Anonymous (not signed in)
Posted on: Wednesday, February 07, 2007 at 6:47 PM
Message: The whole point of having the locking on the base tables is for data integrity. Using a temp table circumvents the locking mechanism and can lead to integrity issues when more than 1 person is using a system.

Still waiting for a "running total" implementation using GROUP BY ...
(preferably one that can be used in SQL2000, so no custom aggregate functions)

Subject: Set vs. Procedural
Posted by: Anonymous (not signed in)
Posted on: Wednesday, February 07, 2007 at 10:00 PM
Message: I, for one, don't understand the distinction people make all the time between "set" and "procedural" in regards to DBMS's.

The simple fact is that the execution of the query is ultimately procedural. Either we program the procedure ourselves, or we describe our query and let the optimizer generate a procedure for us. The reason "set"-based solutions are usually faster is because the optimizer usually comes up with a very good procedure -- especially considering that it can leverage parallelism and (just about) as much memory as it wants.

If you look at the execution plans generated, you can see very clearly that SQL Server is simply converting "set" operations into "procedural" operations.

Take, for example, the correlated subquery and self-join execution plans. You will probably see a "nested loop" operation and later a "stream aggregate" operation. Not only are nested loops inefficient, but it generates a huge dataset that it then iterates through to calculate the aggregates.

The most efficient way is to pass through the data as few times as possible, which is what happens with the "quirky update", iterative, and cursor solutions. I am guessing that the iterative and cursor solutions are slower because of the implicit transactions that surround the DML being executed within the loops -- the update method makes 2 passes through the data, but only incurs the overhead of 2 transactions. You might want to try enclosing the iterative and cursor methods in a transaction to see if the elapsed times come closer to the time of the update method.

I believe the "quirky update" method does indeed rely on the details of the execution plan, so in more complex systems an index or statistics change could alter the execution plan and subsequently break the calculation. Also, because of its quirky nature, this method would probably be confusing to anyone trying to maintain the code in the future.

When using cursors, two key things to keep in mind are locking and transactions. Be sure you understand them and use them carefully! The availability of your system and the integrity of your data depend on them.

As far as the example from b) above, the proposed method accesses @tab many more times than it needs to. It scans @tab from top to bottom in the first SELECT statement of the loop and then scans @tab from top to bottom (due to no index) in order to execute the DELETE at the end of the loop. It performs these scans for every record in the table -- not the best use of system resources.

Please understand that I am not trying to provoke anybody here -- just trying to offer an honest analysis of the methods presented.

J. Steele

(Both my previous posts were marked "Anonymous" even though I had filled in the Name textbox - must I sign in for "nonanonymous" posting?)

Subject: Other uses for Cursors
Posted by: Anonymous (not signed in)
Posted on: Friday, February 09, 2007 at 1:25 PM
Message: I’ve got to weigh in on this to get a reaction to the following. Here is a real world example of using cursors that I’ve constructed. I have a table that contains a 2K varchar field (among other variables). The content of this field is a stylized stored procedure with one or more ‘replaceable’ variables. The purpose of this table of stored procedures is to perform data validation in a myriad of ways and process order. So, the processing sequence goes something like this. With a cursor, identify the validation records to run based on several parameters passed into the stored procedure. The results order from the cursor is based on an ordinal value associated with the record. For each record in the cursor set, do a search / replace on the replaceable fields. Construct a dynamic query where I have a standardized Insert into <table> [the select query from the cursor record with the ‘where’ variables updated from passed in parameter(s)]. I iterate through until no more validation routines need to be processed. This routine is called as part of a bigger data import process. Different data file formats trigger different validation requirements hence the passed in variables.

Why not do this outside the sql system? Or for that matter using sql stored procedures. Because the scenarios for validation change with enough frequency that test/release cycles to release binary code make it impractical. The current method allows the user to insert a new validation to be used in a file load within minutes of identifying the need. There is an interface that guides the end user department in constructing these validations thus allowing them to be self sufficient without requiring support / assistance from IT. It’s not always about faster execution.

The discussion has been centered on iterating through tables for some result but a SQL system can be a complete application in itself. As eluded to in various previous threads, what are you trying to achieve and are you smart about it. This article allows those without years of seat experience to see something tangible and build upon what was presented. Many thanks to Ms. Page for taking the time to outline ideas for others to digest and expand there breadth of knowledge. Thanks to those with their counter points as well.

Subject: flawed comparisons
Posted by: Anonymous (not signed in)
Posted on: Monday, February 12, 2007 at 7:45 PM
Message: Several things stand out on this performance comparison.
The example using the "SET" technique included an unused (but populated) field Et in the table variable @cb, this table variable was missing a primary key, and was missing a SET Nocount ON command.
The example using the cursor did not have this useless field created and populated, had a SET Nocount ON command. A primary key on the table variable used in the cursor is not possible.
By making some minor tweaks to the "SET" technique (adding a primary key to @cb, NOCOUNT, getting rid of the useless field), I next had to change to measure to milliseconds - performance was 15 to 20 times better than the cursor method.
So, if this article is an attempt to compare approaches, it is misleading. It does demonstrate that cursors can be almost as slow as poorly thought out set based solutions. However, that wasn't the message the author intended, and I'm still in the group of developers that view cursors as the technique of last resort.

Subject: re: Flawed comparisons
Posted by: Robyn Page (view profile)
Posted on: Tuesday, February 13, 2007 at 2:59 PM
Message:

Thanks very much for the contribution. It has certainly given food for thought.

Several points:

ET (EntryType) is indeed used in all the methods. One wants the comparison to be fair.

NOCOUNT was ON for all the timings.

The introduction of the primary key on the table variable on the 'Quirky Update' approach slows the performance very slightly on my test system.

Changing the index on the test table to a primary key clustered or a compound (covering) index made little difference to the timings.

Timings were all in seconds as the worst performers were running into several minutes.

The 'message that the author intended' about cursors was quite clearly stated at the beginning of the article 'Relational database programmers won't need them', but I would want to keep an open mind, and to put matters to the test. I'd particularly like to hear of any application for which a cursor provides the best approach.


Subject: Possible cursor application
Posted by: Anonymous (not signed in)
Posted on: Thursday, February 15, 2007 at 8:10 PM
Message: How about a scenario where you want to retrieve a threshold based on a percentage of a total?

Say you are a non-profit with a large pool of benefactors, and you want to view a "Top X%" list - where the percentage represents a portion of the total sum of contributions for a time period. So if you wanted a "Top 75%" list, you would start with your largest contributor and in descending order keep a running total until you have a sum that meets or exceeds 75% of your total. The amount that pushed the running total past 75% would then be used as the threshold for the list.

As a small example, say you received $100 total this month, with contributions of the following amounts: $55, $22, $5, $4, $4, $3, $2, $2, $2, $1

If you asked for "Top 50%", you would get a threshold of $55, since the first contribution pushed the running total over $50, and the list would contain 1 record.

If you asked for "Top 75%", you would get a threshold of $22, and the list would contain 2 records.

Seems like an odd type of request, but when dealing with large numbers of entities with widely varying values, it can be an effective way to view significant records.

Sure, this could be implemented outside of the database - but that would require pumping a lot of data outside of the SQL Server process.

Any thoughts on (MSSQL2000) implementations that might be more effective than a cursor or iteration?

Subject: Re: Possible Cursor Application
Posted by: Robyn Page (view profile)
Posted on: Saturday, February 17, 2007 at 11:39 AM
Message:
SET nocount ON

DECLARE
@Donors TABLE (donor VARCHAR(10),Amount money)
INSERT INTO @Donors (donor,amount) VALUES('Bill',55)
INSERT INTO @Donors (donor,amount) VALUES('Fred',22)
INSERT INTO @Donors (donor,amount) VALUES('Alice',5)
INSERT INTO @Donors (donor,amount) VALUES('Tom',4)
INSERT INTO @Donors (donor,amount) VALUES('May',4)
INSERT INTO @Donors (donor,amount) VALUES('Joan',3)
INSERT INTO @Donors (donor,amount) VALUES('Britney',2)
INSERT INTO @Donors (donor,amount) VALUES('Henry',2)
INSERT INTO @Donors (donor,amount) VALUES('George',2)
INSERT INTO @Donors (donor,amount) VALUES('Humph',1)

DECLARE @RankedDonors TABLE (donor VARCHAR(10),Amount money,
percentage numeric (8,2), cumulativePercent numeric (8,2))

--OK, we inserted them in rank order but let's be certain
--by inserting them into a new table and calculate the
--percentages at the same time
INSERT INTO @RankedDonors (Donor, Amount,Percentage)
SELECT Donor, Amount,[percentage]=(amount/total)*100 FROM
@donors CROSS JOIN
(SELECT [total]=SUM(amount) FROM @Donors) f
ORDER BY amount DESC

--now update this to give the cumulative percentage
DECLARE @RunningPercentage numeric (8,2)
UPDATE @RankedDonors
SET @RunningPercentage=cumulativePercent
=COALESCE(@RunningPercentage,0)+percentage

--and get the two top donors who breached the 75% limit
SELECT * FROM @RankedDonors
WHERE cumulativePercent <=
(
SELECT MIN(cumulativePercent)
FROM @RankedDonors
WHERE CumulativePercent > 75)

Subject: Some thoughts on cursors
Posted by: Anonymous (not signed in)
Posted on: Tuesday, February 20, 2007 at 2:00 PM
Message: http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/running-sums-redux.aspx

Subject: Cursors
Posted by: Granted (view profile)
Posted on: Wednesday, February 21, 2007 at 7:09 AM
Message: Yes, as a rule, cursors are bad. However, there are places where they can perform better than set based operations. If you check out Itzik Ben-Gan's book, Inside TSQL Querying, on page 234 & 235, he shows a method for generating Row Numbers for data (if you're not using Row numbers all over the place... read the book, please) that results in a single scan of the data as opposed to n-squared number of scans for a set based operation.
Just remember, in a 500 page book on TSQL, he shows a single example where cursors are better.
Great article Robyn.

Subject: Possible Cursor Applications
Posted by: Anonymous (not signed in)
Posted on: Wednesday, February 21, 2007 at 7:19 AM
Message: I too use cursors rarely. However, I have 3 scenarios where they make sense to me.

First, I have used cursors when I doing inserts where I need to grab the @@identity value for each row inserted and use it somewhere else. I will generate a temp table for the cursor to operate on. This table includes an id column where I will store the @@identity value. For each record in the cursor, I do the insert and store the @@identity value. After the cursor finishes, I can then use the id values for updating other tables (with set operations).

Second, I will use a cursor if I need to execute a stored procedure using values from each record. This may be a different procedure for each record (as someone else also discussed here), or it may be a procedure that can't be reduced to a set operation.

Third, I could see a cursor being used in a case where simplicity is more important than performance, probably in code that does not run very often. Cursor code may be a little verbose, but it can be very simple to understand and maintain. Robyn's running total example demonstrates this - the set based version is (to me anyway) more complex than the cursor based version. If I was going to run this once a week, I would opt for the simplest code, not the most efficient.


Subject: Unnecessar Cursor Applications
Posted by: scott2718281828 (view profile)
Posted on: Wednesday, February 21, 2007 at 1:13 PM
Message: If you want to capture the identity values from multiple insertions, try the new OUTPUT clause in SQL 2005. It can list all the new identity values created. (If you still insist on doing it with a cursor, consider using SCOPE_IDENTITY() rather than @@IDENTITY.)

The second and third use for cursors mentioned above don't show any reason why cursors should be chosen over using a table variable. Table variables are no more complicated than cursors, and don't have the locking issues.

My biggest gripe about cursors is that the default cursor is the most expensive scrollable updatable type, and too many people don't bother to find out how to use them intelligently. If they defaulted to LOCAL FAST_FORWARD, and you had to figure out the proper options to do anything else, there would probably be fewer complaints about their performance.

Subject: Possibly Unnecessary Cursor Applications
Posted by: Anonymous (not signed in)
Posted on: Wednesday, February 21, 2007 at 2:34 PM
Message: Thanx for the info scott2718281828 - I didn't know about the new output clause - it looks like it might help me in my first application. (I guess I missed that part of the "What's New" section of BOL!)

My stuff doesn't typically have scope issues, so I have just used @@identity. However, I think scope_identity() is probably safer, so that future changes don't break my current code.

To your second point - I don't see that table variables are a substitute for cursors. When I (rarely) use a cursor, it always operates on a temp table or a table variable to avoid locking issues. But I don't see how I could use a table variable *instead* of a cursor.

I certainly don't find table variables complex - I was talking about complex set-based queries. I think there are cases where, after the developer considers the balance of performance needs, development time, developer skill set, and maintainability, that a cursor will be the right choice.

I don't think that cursors are 'evil' and should be avoided at all costs. If you have a need that can more easily be met with a cursor (within required performance parameters), then use a cursor. The 'iteration' technique described in Robyn's article seems like a way to avoid using a cursor just for the sake of avoiding using a cursor. I would only use it only if there was a demonstrated performance boost (though, I must say, it does look like a lot less typing).


Subject: Cursor vs Table Variable
Posted by: Anonymous (not signed in)
Posted on: Thursday, February 22, 2007 at 2:49 PM
Message: I have found table variables adequate to my needs where I need to iterate through data row by row.

For example, in SQL 2005, I have a database that exists solely for the purpose of sending automated e-mails to staff and customers based on events in other databases. (E.g.: "Your order has shipped".)

sp_send_dbmail is the easiest way to send these e-mails (in the set up I have), and iterating through it with different parameters for each exec command is how I have it operate. This allows for completely automatic operation on a scheduled basis.

I could do this with a cursor, but I have found that the ability to perform set-based operations on the table variable before the iteration starts is much more efficient (faster, less server resources).

For example, I can do things like:

update @Emails
set attachments = replace(attachments, 'c:\', '... UNC resource ...')

(I'm not including the actual UNC path, since I'm not interested in providing server names on an open forum. I'm sure you can figure out how it could be made to work.)

Doing that once, as well as a dozen other clean-up tasks, based on user-input, etc., as part of a single update statement that works on the whole table variable, is much faster than stepping through a cursor (or table variable) and updating each row one at a time.

I'm not familiar with any way to update a cursor in a similar manner. So far as I know, you have to operate on them one row at a time. (Correct me if I'm wrong on that.)

Then, using an ID column in the table variable, I can step through it with:

declare @Row int, @Rows int

select @row = 1, @rows = count(*)
from @emails

while @row <= @rows
begin
(code to run on each row, including exec command on sp_send_dbmail)
select @row = @row + 1
end

This solution seems to me to be the best of iterative and set solutions. Allows an exec command that can't take a set-solution, allows set commands that are more efficient than row-by-row updates.

It's possible I'm missing some options, using this, that I'm not aware of. In that case, I would definitely appreciate comments on them. But I have found that this approach to anything I have to iterate through is very efficient and fast.

Of course, avoiding iteration in favor of set-based solutions is generally much better, but there are a few things where I find stepping through to be necessary.


Subject: Cursor vs Table Variable
Posted by: Anonymous (not signed in)
Posted on: Thursday, February 22, 2007 at 4:10 PM
Message: Your procedure to send mail is written exactly the way I would have done it, except that I would use a cursor instead of the home-made iteration code. I certainly agree that as much as possible should be done with set operations, and that the cursor/iterator should operate on a table var (or temp table if necessary).

The home-made iterator must 'select' from the table var once on each iteration, and it must have a field that can be used to select records in sequence. A cursor does not require the extra field. And, since cursors are a built-in part of Transact-SQL, I would presume (hope!) that the smart folks who built the database engine would have put in some optimizations to reduce the number of reads. In other words, since the home-made version selects one record at a time, the db must scan the table (or an index) to find the row, and repeat this for each row in the table. My hope is that with a cursor, the query engine will be smart enough to grab the whole thing at once (or at least more than one row at a time).

Does this actually happen? I don't know. My previous uses of cursor were on applications that were run infrequently, so performance tuning wasn't needed. I am now working on one where performance will be an issue, so I think I will do some testing to see which technique is faster.

Subject: Shatchard's iterative example with a @local table ... and a grouse :o)
Posted by: Anonymous (not signed in)
Posted on: Thursday, March 08, 2007 at 12:24 AM
Message: If Shatchard's table variable is holding a large number of rows, performance will be absolutely, unabashedly miserable. The table variable would benefit from having a clustered primary key so that the whole table does not have to be scanned every time a row is fetched or deleted.

As long as I'm grousing, the people who say that cursors always have aggressive locking are either just being inflammatory, or they haven't done their schoolwork. The type of locking is specific to the type of cursor as well as any locking hints in the cursor definition.

If people really want to take exception to what the article is saying, they should at least provide measureable counterexamples. Usually the people who say "always" and "never" seem to be the first to dismiss good, solid evidence, especially when it flies in the face of their dogma.

Subject: Re: Shatchard's iterative example with a @local table
Posted by: Robyn Page (view profile)
Posted on: Thursday, March 08, 2007 at 3:16 AM
Message: Absolutely right. A well designed cursor-based method will perform as well, of not slightly better, than an iterative solution, and the optimization is done for you.
Dogma has no place in IT, wherever it can be replaced by understanding. We should try things out, experiment, and challenge preconceptions. Please keep grousing!

Subject: Update order for table @cb
Posted by: Anonymous (not signed in)
Posted on: Friday, March 16, 2007 at 11:48 AM
Message: Hi Robyn:

Excellent article and just what I was looking for. I run a transaction log and needed running balance for reports. When I tried out your example, the only thing I changed was this statement in the quirky set technique:

UPDATE @cb

SET @total = total = @total + amount FROM @cb


I added an ORDER BY for my specific case because obviously, I wanted the balance by date like a typical register:

UPDATE @cb

SET @total = total = @total + amount FROM @cb order by [DateColumn]


That worked great. I've just bookmarked you and will be coming back in the future. Thanks again.

Subject: "Cursors are bad" is dogma
Posted by: Anonymous (not signed in)
Posted on: Wednesday, March 21, 2007 at 10:29 AM
Message: Cursors certainly have their issues, locking being most likely the worst as mentioned by several people.

But when I hear that they are "always bad", I suspect a newbie is talking.

In this article:
http://www.sql-server-performance.com/dp_no_cursors.asp

...I ran the code supplied by the author, and the way it was intended by the author to be ran, I found his "better" way than a cursor is 30% slower!!

Subject: They're a compensation for bad design
Posted by: Anonymous (not signed in)
Posted on: Thursday, April 12, 2007 at 7:48 AM
Message: I'm lucky enbough to control the whole application that my firm uses. In the example given, I would simply add a running total column and whack a trigger on the table. This may sound obtuse, but the point is that most developers look at the data retrieval without questioning how the data gets into the tables and whether their structure is correct. If you have to use cursors for accessing multiple lines of data then something is usually wrong with the data and how it's stored.

Subject: wow great article
Posted by: Anonymous (not signed in)
Posted on: Tuesday, February 26, 2008 at 11:45 AM
Message: you're hot!

Subject: A use for update cursors
Posted by: Anonymous (not signed in)
Posted on: Monday, April 21, 2008 at 11:14 AM
Message:
With the premis that I am not a DBA but a programmer who has to work on a database, today I worked on the following situation:

I wrote a function RandomPercentage that is used in a script for masking client data for demo purposes.

update tblProducts
set prodPrice = prodPrice * dbo.RandomPercentage(25,10,rand(),rand())

when I ran this it (apparently) evalutated dbo.RandomPercentage once, and then applied it to every row. The only solution that I found to apply a different percentage to every row that worked was to use an update cursor.

As a by-the-by, I found googling around that while there are many people who ask how cursors work, for 9 out of 10 responses, the reply is that a data set could do that for you faster blah blah.. in-fact, it took a very long time to find an example of an "update cursor".


Subject: Good well thought out article
Posted by: DaveWasEre (view profile)
Posted on: Friday, June 05, 2009 at 4:39 AM
Message: I've been using cursors for many years without any of the problems that Shatchard has.

Robyn is absolutely correct they are a good tool to have and there are times when you have to use them.
I would also say to always use temporary tables in memory for 2 reasons.
1. Cut down the rows being processed.
2. Hold only the data columns required to do your updates in an unlocked table.

As in the example I have to re-calculate totals on a hugh database periodically. Users make errors on accounts and the corrections sometimes have to be applied several months later. The accounts gain interest every month so all the transactions, interest and totals have to be re-calculated. The only way to do this is using cursors as you have to know where you are in the transaction history.

I can see why Shatchard has a problem with locking but any update script can have a locking issue and if used with a bit of sense cursors get the job done.

Subject: Really???
Posted by: Jeff Moden (view profile)
Posted on: Tuesday, April 06, 2010 at 11:32 PM
Message: One poster above wrote...
_________________________________________________
I added an ORDER BY for my specific case because obviously, I wanted the balance by date like a typical register:

UPDATE @cb

SET @total = total = @total + amount FROM @cb order by [DateColumn]

That worked great.
__________________________________________________

I'd like to know how you avoided the error by the word "Order". Are you actually using SQL Server?

Subject: Ordering Of Records with Update Method
Posted by: Shane Stone (view profile)
Posted on: Friday, April 23, 2010 at 2:34 AM
Message: I realise it has been a couple of years since the original question wwas posed, but I thought I'd add some comments about the UPDATE method and the ordering thereof.

It does indeed work non-deterministically IN GENERAL. i.e. You don't know which record order it is going to use as it accumulates. You really only see this behaviour on extremely large data sets on multi-processor machines, but it is important to know how the solution scales.

With current versions of SQL Server at least, there are things you can do to ensure that it works deterministically.

1) The only index defined on the table being updated should be a clustered primary key.
2) The primary key order should be the order in which you want to update the records.
3) You should use OPTION(MAXDOP 1) to prevent SQL Server trying to process different parts of the table in parallel.

That said, the UPDATE method is really only applicable to simple tasks. For more complex behaviours where some knowledge of past records is required, CURSORS are definitely a better option. Robyn's tips on how to optimise them are an ideal starting point.

Subject: Ordering Of Records with Update Method
Posted by: Shane Stone (view profile)
Posted on: Friday, April 23, 2010 at 4:53 AM
Message: I realise it has been a couple of years since the original question wwas posed, but I thought I'd add some comments about the UPDATE method and the ordering thereof.

It does indeed work non-deterministically IN GENERAL. i.e. You don't know which record order it is going to use as it accumulates. You really only see this behaviour on extremely large data sets on multi-processor machines, but it is important to know how the solution scales.

With current versions of SQL Server at least, there are things you can do to ensure that it works deterministically.

1) The only index defined on the table being updated should be a clustered primary key.
2) The primary key order should be the order in which you want to update the records.
3) You should use OPTION(MAXDOP 1) to prevent SQL Server trying to process different parts of the table in parallel.

That said, the UPDATE method is really only applicable to simple tasks. For more complex behaviours where some knowledge of past records is required, CURSORS are definitely a better option. Robyn's tips on how to optimise them are an ideal starting point.

Subject: Complex tasks are not a problem...
Posted by: Jeff Moden (view profile)
Posted on: Sunday, June 13, 2010 at 6:27 PM
Message: Shane Stone wrote:
"That said, the UPDATE method is really only applicable to simple tasks. For more complex behaviours where some knowledge of past records is required, CURSORS are definitely a better option."

Absolutely NOT true! ;-) What type of million row "more complex task" would you like to see the Quirky Update work on?

Subject: Re: Complex tasks are not a problem...
Posted by: Phil Factor (view profile)
Posted on: Monday, June 14, 2010 at 8:51 AM
Message: Quite so. I have a parser working in SQL using 'Quirky Update'. It is much faster than the cursor version and even maintains several 'stacks'. I haven't come across a task that is too complex for this method. I do agree with Robyn and Shane that cursor methods are sometimes misrepresented by people who don't set them up properly, but even when one does cursors properly, 'quirky Updates' usually have the edge.

Subject: Cursors are bad right????????????????????
Posted by: Paul_Ch (view profile)
Posted on: Friday, September 10, 2010 at 11:50 PM
Message: Cursors are supposed to bad choices for databases
right??????
You go for cursors only when your set theory is not strong......

Subject: Re: Cursors are bad right?
Posted by: Phil Factor (view profile)
Posted on: Saturday, September 11, 2010 at 10:46 AM
Message: This is not entirely true.
In the recent SQL Speed Phreak contest, ...
http://ask.sqlservercentral.com/questions/17499/phil-factor-speed-phreak-challenge-6-the-stock-exchange-order-book-state-problem
...the problem was such that no set-based solution was possible. A cursor-based solution was the only TSQL entrant. A WHILE loop would have sufficed but the cursor solution was faster, surprisingly. I've often come across problems like this where no set-based solution is possible.


Subject: Is It Kosher?
Posted by: Lee (view profile)
Posted on: Monday, September 20, 2010 at 7:24 AM
Message: One of my first SQL Server assignments when I arrived nine years ago at my present job was to find a method for auditing transactions in our job-tracking software, a COTS product named "HEAT". Because we couldn't exactly open up the delivered executables and perform custom-code bypass surgery as we saw fit, I didn't see much of an alternative to using triggers. It's like one of those old-fashioned coffee percolators: you don't have to know what's going on in the pot to see the data as it bubbles through.

The real problem came once I realized that it was necessary for the triggers on different tables to communicate with each other. As an Oracle refugee, I remember being somewhat aghast at the limited options afforded by SQL Server trigger code -- SQL Server may be competitive with Oracle in myriad ways, but trigger features are not among them. Oracle features not just AFTER triggers, but also true BEFORE triggers (rather than the cheap, imitation INSTEAD OF triggers). They also give you a row-by-row execution option, i.e., it doesn't have to be set-based. Best of all, you can declare an external package and use it to stash information that can be accessed by other triggers -- sort of like using a global bulletin board.

But none of this was available. I considered keeping work tables dedicated to trigger communication, and that may have been the best solution. But it's not the one I chose.

I discovered what I thought at the time to be an easier solution: global cursors. I used global cursors as I would have used a package declaration in Oracle: as my global bulletin board. E.g., a positive test for the existence of a global cursor told the trigger that some sort of related operation was already in progress in another trigger; and, if there was a need to know more, the cursor could be opened and information pulled from it. E.g., from one trigger, you could pass a new IDENTITY value resulting from an INSERT into another trigger like this:

DECLARE c_histseq CURSOR GLOBAL
FOR
SELECT [local variable] histseq
FOR READ ONLY

And as a given operation finished up on its tasks, it would close and deallocate the cursor.

We moved on to another job-tracking product about a year ago, so now our HEAT triggers are adorned with cobwebs and sit sullenly dormant in one of our OBE folder. We never ran into any problems with the triggers and cursors, but perhaps that was just luck.

A couple of quick takes on other comments:

To "What a Crock!" -- if you have to run each row in a row set through a stored procedure and are not free to recode the procedure, it seems to me a cursor is necessary.

Simulating a cursor by interatively and incrementally walking a set of rows in a table is not generally an improvement over cursors, as others have noted. I discovered that empirically several years ago, and it was corroborated by one of the big guns at PASS (Ben-Gan perhaps?) last November. He also said that Microsoft could improve cursor performance very easily if they wanted to do so, but refuse because they want to discourage database programmers from thinking only in terms of row-by-row logic. Their instincts may be correct. Having lived in the Oracle world, I observed that PL/SQL is so powerful, many excellent PL/SQL programmers actually never employ set-based logic. Microsoft may be trying to avoid such a situation.

Subject: Why SQLServer is different from Oracle
Posted by: BuggyFunBunny (view profile)
Posted on: Monday, September 20, 2010 at 1:38 PM
Message: -- Microsoft may be trying to avoid such a situation.

I'd argue this difference lies as much with each DB's legacy as anything else. Oracle (and DB2, all versions) came up through the COBOL world, whereas SQLServer didn't. I've seen lots of SQLServer used just as iteratively in java, but those folks don't actually realize what they're doing!

 










Phil Factor
Automated Script-generation with Powershell and SMO
 In the first of a series of articles on automating the process of building, modifying and copying SQL Server... Read more...



 View the blog
What's the Point of Using VARCHAR(n) Anymore?
 The arrival of the (MAX) data types in SQL Server 2005 were one of the most popular feature for the... Read more...

SQL Source Control: The Development Story
 Often, there is a huge difference between software being easy to use, and easy to develop. When your... Read more...

How to Import Data from HTML pages
 It turns out that there are plenty of ways to get data into SQL Server from websites, whether the data... Read more...

SQL Scripts Manager: An Appreciation
 SQL Scripts Manager is Simple-Talk's present to its readers. William Brewer was an enthusiastic... Read more...

Hosted Team Foundation Server 2010 Review
 Team Foundation Server (TFS) has expanded its remit to support the whole software development process,... Read more...

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 Database design and implementation is the cornerstone of any data centric project (read 99.9% of... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Beginning SQL Server 2005 Reporting Services Part 2
 Continuing his in-depth tour of SQL Server 2005 Reporting Services, Steve Joubert demonstrates the most... Read more...

Creating CSV Files Using BCP and Stored Procedures
 Nigel Rivett demonstrates some core techniques for extracting SQL Server data into CSV files, focussing... Read more...

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.

Join Simple Talk