26 July 2007

RBAR: ‘Row By Agonizing Row’

Remi Gregoire describes the vice of RBAR Database Programming, 'Row By Agonising Row', and illustrates how the effect of RBAR can sometimes be felt only years after an application is released, when the database supporting the application grows.

RBAR, is an acronym for ‘Row By Agonising Row’. It was first coined by Jeff Moden, a very regular poster on SqlserverCentral.com.

If, like me, you have not had the valuable experience of working in a Very Large Database Environment (VLDB), you may have done some ‘Row by agonising row’ (RBAR) programming without seeing too many bad results. But as time goes by, and as the Databases grow larger, the consequences of this style of coding will start to become increasingly felt: You will start getting help calls about the applications because they are getting slower by the minute, or even, apparently, not working at all.RBAR is a consequence of coding in a strictly procedural way, rather than in a set-based way. It is different from poor coding; it is the result of adopting a mindset that one always has to tell the computer, step by step, how to do something rather than, in a set-based approach, merely specifying the result one is aiming for. With relational databases, one tells the database what one wants, not how to do it, row by agonising row.

RBAR in Front-end code

When I first joined a company, my very first job was to do programming work on projects that used Access as front-end, and SQL Server as the back-end. I found several examples of RBAR processing on the front-end of the code There was almost no back end code on the server except for views and stored procedures that only did

The team of programmers were all beginners, and they believed that they had programmed a system that was working correctly. It did the job that they wanted it to do correctly in a timely manner. It gave the output they needed. This, however, was a few years before my arrival 500 000 inserts later. The basis logic in every function was something like this:

Open all the tables where we may or may not need to do anything to or from, download all columns and all rows, then loop through all that to do the job.

The code was certainly working when there were 1000 rows in the database tables. Now, with nearly 1million rows, the loops inside the loops inside the loops… nested 19 times, started to take more and more time, both on the server and application, and on the nerves of the people who used it.The application was being forced to process the data in a particular way that was fine for a small data set but ridiculous for a large one. My solution back then (4-5 years ago), was to reduce the number of rows that were downloaded from tables on the server to a minimum. Since I did not want to mess too much with this critical part of the application, I stopped my optimisation there. This minor alteration made the application run almost 10 times faster.I thought that this was not bad for a junior and, since everyone else was happy, I didn’t push the envelope further.

Now fast forward three to four more years and 400 000 more inserts in the various tables. The performance problem is as bad, if not worse, than it was when I first joined.

A simple example may give you an idea. We have a ‘Bill of Work’ system where four buttons are present on a form:: ‘Bill hours’, ‘Bill the rest’, ‘Print’, and ‘Cancel’. The ‘Cancel’ button just changes a few columns in the base table and deletes a few rows in a handful of others tables. Simple. However, because of the way the system was designed, it took eleven seconds to run this process on one of the larger Bills of Work, and around six seconds for the rest. This was a part of the application I had never touched at all, but I clearly remember that, when I first joined, this process took no more than a second. I would now consider even this to be extremely slow, but nobody at the time saw it as a problem

I got back into this code last week to see what I could do with this button. The first thing that I noticed was that six recordsets were being opened at the very start of the function. Those recordsets then downloaded four of the biggest tables in the database. This part, by itself, took around five seconds.

There was a two-way logic in this function depending on which flags are set in the Bill. So depending on the flags, 3 and only 3 tables can be used when running this function. You would never require all 6 tables. I therefore moved the table downloads to their respective IF statement. This shaved a few seconds off the time. I then started to analyse the code. Four of the six tables were opened merely to delete rows . The code looped through all the rows, one row at the time, in order to check to see if the ID matched the ID of the current Bill of Work. When it did, it deleted the row. Seeing this, I deleted about 100 lines of code and inserted 4 lines of my own code which reads something like this:

 (the logic was more complex than that but you get the idea).

After I deleted the code that downloaded those 4 tables, I made another test run and was pleased to see that the code now ran in less than 2 seconds.
“That’s great, it’s a 5 fold increase in speed and time. Let’s go 10 fold now”.
I changed the rest of the code so that the updates were now done in a single batch rather than row by row. This shaved about 1.5 seconds of the remaining time.

Having done that, I moved all that logic into a stored procedure on the server, and changed the front-end code to call that stored procedure. After a quick check of the execution plans and, as a result, adding two more indexes, this procedure now runs in less than 0.10 second (this also includes auditing and a re-query of the bill of work to display the changes). My final version of the code takes about 4-5 times less written lines of code to do the exact same thing. And also now that the code is server side, a new change of this procedure will not required a redeployment of the application

The conclusion is that SQL Server was designed to work with sets of data. This means that it is more efficient in deleting all rows where FK = 10, than deleting them one at the time (firstly delete the row that has a PK = 1, then the one where PK = 2). The set-based approach will be faster than the RBAR over 99.99% of the time. I know of only a couple of exceptions where RBAR is at worst as fast as set-based, and at best, outruns the hell out of it.

RBAR in a trigger

So what is RBAR in a trigger? Let’s say that you want to set a flag to true with all new inserted rows. The RBAR code could look something like this :


–So, happy in the knowledge that seems to have worked, you go home and have a good night. The next morning you come in with the boss waiting in your car-parking space. He’s fuming because the code you put into production is not working. So after arguing that you tested the code and it was working yesterday, you ask him to show you what query he ran. He shows you this query:

 –To your dismay, you see that only 1 row, the last, is updated in the trigger. No matter how many rows will be put in that insert statement, only 1, the last one, will be updated by the trigger. You go back and check your work and realize that, since you only set the variable once and do one update, you can’t possibly update more than one row. So you now come up with this solution :


This works perfectly, you think, and so you push this into production. Now, six months later, you again find your boss waiting in your parking spot. Your company has recently merged with a huge corporation with over 100 000 employees. The new company decides to merge their data into your system and use your systems from now on. However, when running a statement to insert all of their employees’ information, they find that the server hangs for several minutes without any response. They kill the query and try again and again with the same results.You retest your trigger with a few rows and it seems to work fine. However you realize that, in order to update a single row, the server first has to do a SELECT FROM INSERTED (which is actually read from the logs, and is a costly operation), then an update. This works fine for a few rows, but with an insert of 100 000 rows, the server actually has to run over 200 000 queries to produce the correct results and finish the insert. I will not talk about all other pitfalls of administration in this article (like file growth or tempdb), but there are quite a few to avoid.

This insertion operation is not small by any standards, but it should not take several minutes, or even seconds, to run. You do some research and find a way to do a single UPDATE statement that should speed this thing up a lot :

Again, you see that the results are correct with this version of the trigger. But what about the speed? I will enable the second trigger and run a 10 rows inserts and I’ll let you judge the difference in the execution plans of each trigger :

Unfortunately, even with a minor insert of 10 000 rows, the final run time is just unacceptable. In a VLDB environment, this trigger would have been a bottleneck for the application right from the very start because, even at one row, the trigger still has to do 2 operations instead of one (if you don’t count declaring and destroying the variable as an operation). Now, if you have the courage or a very powerful machine, I will let you try the RBAR trigger at 1 M rows. I won’t attempt this on my machine because it would take a few days to run (assuming I don’t run out of RAM first).

RBAR Bad, Set-based good!

To conclude, I gave you a little taste here of what RBAR processing can do to a server. I’m using an archaic P III 1.3 GHz with 512 MB of RAM to do those tests. Even though it’s a 10 years old machine, it’s still powerful enough to run fairly complex statements at a very good speed. I’m sure most of you will get much better results that I did even on your personal notebooks. However I feel I clearly made my point in this article. I often see questions on the forums about this subject on the forums of sqlservercentral.com, and I just can’t seem to get my point across that any type of RBAR is often a very bad idea if you want any kind of performance out of your application. The RBAR concept applies to any looping technique and cursors, to code wherever it stands, whether it’s sitting on the server or on a client machine like it was the case for me at my first job. Here I’m only referring to client code that interacts with the server from within the loop. I’m not saying, by any means, that looping should be banned from programming, but rather that programming a database generally requires you to specify the result you want accurately, rather than telling it how to go about getting it, row by agonising row.

For more articles like this, sign up to the fortnightly Simple-Talk newsletter.

Tags: , ,


  • Rate
    [Total: 95    Average: 4.3/5]

Remi operates from Sherbrooke, Quebec, Canada. My formal training as a web programmer specialised in Database systems. I've been working with sql server 2000 for 3 years now and I've also been helping on SqlServerCentral.com for about the same amount of time. I'm currently rebuilding the system of a company going from Access ADPs + SQL server 2000 to C# + SQL server 2005.

View all articles by Remi Gregoire

  • Eric Russell

    When I’m interviewing…
    When I’m interviewing someone for a database developer position, I will ask:
    “How much experience have you had developing T-SQL cursors?”
    How they answer this question can be very revealing.

  • Mal Daughtree

    When I was interviewed…
    Yup, There is a better way. My mission in life. Educate people to the set based way.
    Remi, Excellent article, should be compulsory reading for those who believe otherwise.

  • Jesse

    “Jeff Modem” should be “Jeff Moden” (oops -now fixed Ed:)

    I don’t care what you say about me, as long as you say something about me, and as long as you spell my name right. –George Cohan

  • GSquared

    The only case I know of where RBAR is faster than set-based, is running totals. I’ve seen several tests on this, and several articles on several pages, and I’ve tried it out myself. But other than that kind of case, where data depends on the data in the rows preceding it in some specific order, set-based is definitely the way to go.

  • Kilo Bravo

    RBAR: ‘Row By Agonizing Row’
    How many times have you seen Management throw hardware at the problem. They do so, of course, because it is usually cheaper to do that than to rewrite RBAR code into set-based code.

  • Ken3

    Re: RBAR
    Not even Running totals is better in RBAR. This can be done quicker and Set-based with the ‘Quirky Update trick’ described by Robyn Page in her Cursor Workbench http://www.simple-talk.com/sql/learn-sql-server/robyn-pages-sql-server-cursor-workbench/

  • RgR’us

    Running totals
    There are 2 things about running totals :

    1 – The quick update method is very fast and will at worst, be only a tad faster than the cursor (from memory).

    2 – Do you really need to do running total on the server? I know that answer will be yes quite often, especially in the case where there is no application to present the data. But I’ve seen more than my share of users who think they need to do this stuff on the server when the application could do it more easily than the server and save the server some cpu cycles.

  • Bubba01

    Excellent Article – Should be a required read for all Procedural Devlopers
    This should be a required read for all developers/programmers who used/worked with any non-SQL based language before having worked with SQL. I recently worked for a company that in its wisdom decided DBA’s and other SQL savvy developers were a waste of money because the VB guys with the company knew SQL and ADO. This arrogance/ignorance has lead to performance problems and very bad design which clients are now paying the price for. The company officials seem to have finally realized the error in their thinking however sadly enough its now too late. The DB design is so convoluted in many areas and the program so extensive that a major over haul is the only way to get the DB design as it should be and doing that would assuredly break the program and require a re-write of the front end as well as the back end.

    Procedural developers, those who code in OOP languages like C, VB and the ‘.Net’ languages should never be allowed to make SQL/DB related decisions until they have received proper training from a certified SQL/DBA. So many of the mistakes in the current system could have been easily avoided with the most basic of training/education in SQL/Set based development. So sad, so very sad.

  • Joe Celko

    Running Totals
    You might want to look at the SUM() OVER() function in the SQL-99 standards. A running total is easy to write, but you should test on your product.

  • Jeff Moden

    So THAT’s what you look like!

    Very nice article on RBAR, Remi. And for those that don’t know… RBAR is pronounced “ree-bar”… kinda like the metal rods they stick in cement… an appropriate analogy (stuck in cement), too, I think 😉

  • Randy In Marin

    It still “depends”. If the set based query will hold too many locks and adversly affect existing processes (e.g., blocking or deadlocks), breaking up the query into smaller transactions with a cursor might be a valid option. The new set-based query might be perfection, but it still has to content with existing code – which can be a bit less than perfect.

  • Chris

    OOP Languages
    Good point but VB is not an OOP Language. VB.Net is but not any version prior to that.

    “Procedural developers, those who code in OOP languages like C, VB and the ‘.Net’ languages should never be allowed to make SQL/DB related decisions until they have received proper training from a certified SQL/DBA.”

  • Jeff Moden

    No, it doesn’t “depend”
    Breaking up large transactions to smaller transactions doesn’t mean you have to resort to either a cursor or any form of RBAR. Yes, you can use a loop of sorts to limit the number of rows for a transaction, but the number of rows never needs to be just one and you certainly don’t need a cursor to limit the number of rows.

  • Rich

    I went back to IT to help out on SqlSvr databases.
    They had two guys who knew everything. They were refreshing some JDE Sales orders and general ledger tables to a Business Objects data base (100,000 rows by 216 columns). During the four hour refreshes they experienced many locking issues, customers complained on deaf ears. When I suggested four hours was bit much for a refresh job; the answer was always, we don’t have time for that and they sqaushed my suggestions.
    After going to the IT Director with my ideas, he gave me carte blanch to do whatever it takes. By adding one index, and copying ONLY changed rowws, the job now takes less than one minute. The former Data base manger is not longer here and works as a consultant!