Click here to monitor SSC

Tony Davis is an Editor with Red Gate Software, based in Cambridge (UK), specializing in databases, and especially SQL Server. He edits articles and writes editorials for both the Simple-talk.com and SQLServerCentral.com websites and newsletters, with a combined audience of over 1.5 million subscribers. You can sample his short-form writing at either his Simple-Talk.com blog or his SQLServerCentral.com author page. As the editor behind most of the SQL Server books published by Red Gate, he spends much of his time helping others express what they know about SQL Server. He is also the lead author of the book, SQL Server Transaction Log Management. In his spare time, he enjoys running, football, contemporary fiction and real ale.

SQL Code Reuse: teaching a dog new tricks

Published 1 September 2011 5:42 pm

Developers, by every natural instinct and training, strive to make their code reusable and generic. Dissuading them from doing so, in certain circumstances, is akin to trying to persuade a dog not to fetch a stick. However, when Gail Shaw commented on Twitter last week that "For the record, code reuse in SQL.is not always a good thing", it was more than a causal remark; it was borne out of bitter experience.

The problem is that, in the absence of their usual armory O-O techniques such as encapsulation and inheritance, the price of making database code easier to maintain, by such obvious methods, can be high. The "generic" views, stored procedures and functions that result, may seem elegant and reusable, but can destroy performance, because it is tough for the query optimizer to produce an efficient execution plan. It hurts to make SQL code generic.

At some point, nearly every SQL Programmer gets infected with the feverish idea of passing table names to stored procedures. "Hey, why write scores of procedures to do this process on each table when I can write a generic, reusable procedure that does it on any table!" Bad idea; behind every stored procedure is an execution plan and a stored procedure designed to work with "any table" will result in a generic execution plan that will perform very poorly for a majority of tables. It is far better if they are tailored for specific tables and specific needs.

Another typical example is where the logic for some seemingly-complex calculation has been "abstracted" into a monstrous, but reusable, view, which performs tortuous aggregations and multiple joins, executes appalling slowly, acquires numerous long-held locks and causes severe blocking in the database. Often, such twisted logic can be replaced by simple, easily optimized SQL statements. Granted, it isn’t "reusable" and flaunts the ‘DRY’ (Don’t repeat yourself) principle, but it is relatively easy to write and will often perform orders of magnitude faster.

User-defined Functions (UDFs) are another favorite mechanism for promoting code reuse, and are often even more problematic. In-line logic is always much faster, even if to the sensitive developer it has the look of hippos doing line-dancing. Memories of the overuse of UDFs can make any seasoned DBA flinch. If you ever bump into Grant Fritchey at a community event, buy him a beer and ask him about the case of the application with multi-statement UDFs that called other multi-statement UDFs in an attempt at enforcing inheritance in a database. Also ask him how well it scaled beyond a single-user and a single row.

Should SQL Server simply get better at adopting and supporting such basic and universally-accepted programming practices as putting logic in a function? Probably, yes, but in the meantime, we must measure any code reuse in the database against the likely performance penalty.

Perhaps the most effective form of code reuse is via constraints, though it requires lateral thinking to extend this beyond simple data rules. Functions can be used, but extra care and effort is required to write them as inline functions; in-line code or calculated columns will always outperform UDFs. Stored procedure use is to be actively encourage; just don’t try to make them generic.

On Simple-Talk we’ve published a lot about execution plans, query optimization and performance. We believe that, once a developer is aware of the process, they are better able to judge that fine balancing point in the compromise between performance and maintainability. Even better, we hope we’ve also given a glimpse of an alternative path to those goals, by means of intelligent database design. A neat trick, if you can do it.

Cheers,

Tony.

8 Responses to “SQL Code Reuse: teaching a dog new tricks”

  1. dlambert says:

    Great thoughts. I think the critical pivot point here is whether you’re a DB-centric thinker or a code-centric thinker. The bulk of the code reuse and sophisticated OO techniques have a place in one or the other of these worlds, but it’s not clear to me that you really want them in both places at the same time.

    If you’re a DB-centric guy, then the DB, including its stored procs, is the single source of truth and knowledge in your application. You should be able to support any number of applications connected to this DB, and each of them can be designed and coded as they see fit because your DB-centric model ensures data integrity and consistent execution of business rules. In this case, you’d really rather have your app let the DB handle business logic in order to ensure a consistent experience for the users, so the app ends up being pretty simple with respect to business rules.

    In a more code-centric model, you still have a solid DB design with FK relationships and so on, but a lot of the reuse scenarios you’re talking about move into the code. If you choose to use stored procs for CRUD operations, there’s even a decent chance that you can just generate most of them (which is a form of reuse). At this point, there may still be higher-order business functions in the DB, but they’re going to be the exception rather than the norm.

    I really believe that if you try to emphasize “full” business logic in both places, you end up with the potential for conflicting interpretations of business rules and a great many maintenance headaches. It’s better to pick one to lead and the other to follow, in my opinion.

  2. paschott says:

    Definitely remember the “one view to bind them” view we had in one of our systems. Everything was based on that view. Views on top of views which led to some horrible maintenance and performance. It probably sounded good at the time, but it also led to a new policy about DB Code and Design reviews – while still in development of the product. We have a similar problem with UDFs, but I can see why in that case. Trying to ship code quickly, we have a need to repeat the process. With SQL 2000 we didn’t have a great way to get the “first” value for different datasets. We can re-write to use some OUTER APPLY commands, but that is a bit painful even if it does perform better. We have a LOT of code that uses those functions and would need to be replaced.

    As I’m unlikely to run into Grant anytime soon (except online), I can only hope that he can find a way to blog about his experience. Sounds like a great story.

  3. BuggyFunBunny says:

    – Should SQL Server simply get better at adopting and supporting such basic and universally-accepted programming practices as putting logic in a function?

    No. The point of SQL, despite Codd’s objection that it never really did it, is to be a data/logic language. As such, it’s really a 4 or 5 GL. Using SED or Awk (on *nix or through windows ports) one can produce reams of perfectly accurate SQL adapted to each table/join/etc. Hell, I’ve done it in SQL alone.

    – If you ever bump into Grant Fritchey at a community event, buy him a beer and ask him about the case of the application with multi-statement UDFs that called other multi-statement UDFs in an attempt at enforcing inheritance in a database. Also ask him how well it scaled beyond a single-user and a single row.

    If ever there were an archetype of coders who shouldn’t be let near a RDBMS, this is they. Read Celko’s “Thinking in Sets”, again. As Codd/Date/Pascal/I have been saying for years: a row is a business rule, and code which implements logic is merely bloating the data compares with excess verbiage; cut out the middleman and put the logic in the data.

    – Perhaps the most effective form of code reuse is via constraints, though it requires lateral thinking to extend this beyond simple data rules.

    Ah. My previous comment was made as I read. I think we agree.

    And, once again: the future is multi-core/processor/SSD machines. Which is to say (nearly) cost free joins. Read Celko’s current piece, and clean up your schemas. You’ll be doing humanity a favour by making redundant a cast of thousands of muddle headed coders; let them write pretty, pretty input GUIs and that’s all.

    And now that I’ve read the other comments (wouldn’t want knowledge to get in my way), a Tip of The Hatlo Hat to dlambert.

  4. Phil Factor says:

    Developers have recenty discovered the virtues of DRY. (don’t repeat yourself) Great! Hopefully now they will ‘get’ the whole point of the normalisation process. In C# you do this with arcane manipulation of objects, and we do it through the normalisation process . I use pretty well bracket-infested inline expressions but damme sir, I only do it in one place.

  5. sciszewski says:

    We switched from using User Defined Functions to encapsulate code to inline code in our stored procedures and were able to speed them up by orders of magnitude! Similarly, using a UDF within a SQL statement should be avoided, unless there’s no other way to do what you need.

    That’s been my experience.

  6. acuenca says:

    Thanks for the article, It’s very interesting.

  7. Ross Presser says:

    Nitpick: One who ignores a principle is not “flaunting” it. He is “flouting” it.

    http://public.wsu.edu/~brians/errors/flaunt.html

  8. greentian says:

    The first step to code ‘reuse’ is a clear, consistent, and normalized database design. With that established you can get a lot of code ‘reuse’ out of generating the standard code for the CRUD procedures and other standard procedures that you might need. Why write code you don’t need to write. The metadata of the data model, plus any rules that you decided on, provide the structure for all of the generated procedures.

    We certainly do not need the deep, heavy, programming layers. The database code should ideally be the ‘assembler language’ of a database application. As clean and fast as possible. But tools can help manage the code.

    One tool I believe in is a preprocessor. This allows you to create macros, include predefined constants, and so forth that can be inserted into the inline code. Just think of this as saving youself effort retyping the same thing over and over. Your source control objects are, of course, the preprocessed versions of the code.

    Alex Kuznetsov (http://freecomputerbooks.com/Defensive-Database-Programming-with-SQL-Server.html) is a strong proponent of preprocessing code.

Leave a Reply