Click here to monitor SSC

Tony Davis is an Editor with Red Gate Software, based in Cambridge (UK), specializing in databases, and especially SQL Server. He edits articles and writes editorials for both the Simple-talk.com and SQLServerCentral.com websites and newsletters, with a combined audience of over 1.5 million subscribers. You can sample his short-form writing at either his Simple-Talk.com blog or his SQLServerCentral.com author page. As the editor behind most of the SQL Server books published by Red Gate, he spends much of his time helping others express what they know about SQL Server. He is also the lead author of the book, SQL Server Transaction Log Management. In his spare time, he enjoys running, football, contemporary fiction and real ale.

Do Scalar UDFs give SQL Server a Bad Name?

Published 13 November 2009 4:24 am

Many developers seem to regard SQL Server as if it were a science-fiction alien planet where unsuspecting crew-members in blue jumpers occasionally die horribly; everything is suddenly unsafe, and potentially malicious: nothing really works properly and so any serious code should be kept well away from it. Is this developer ignorance, or is their fear justified?

The answer is, of course, somewhere in between; but SQL Server certainly does itself no favors by tolerating, and avoiding, features with ‘known problems’, rather than fixing them. It continues to ignore issues such as the performance of scalar UDFs. The problem is essentially that the optimizer deals with them very inefficiently. It generally can’t decompose and rearrange the function as if it were inline, so the function gets executed once for every single row returned. This often has a huge performance impact.

Very recently, I had a conversation with a developer who had encountered exactly this problem during a commercial development project. It was a very familiar tale. As he said it to me, he was just a .NET guy, not a “SQL Server expert”, but it was clear that he was simply astonished that something as fundamental as encapsulating reusable logic in a function could cause such drastic performance problems in SQL Server.

The problem has been known ever since UDFs were introduced but it has never been resolved, despite an apparently reasonable and supported Connect request. The reason for Microsoft dragging its feet on this issue appears to be that the workaround is relatively straightforward. It sometimes takes a bit of thinking and ingenuity but, in most cases, a developer can turn a scalar UDF into an inline UDF, as documented by Itzik Ben-Gan, among others.

Seasoned Database Developers just know about the problem and avoid it. One can blame developers that fall foul of the dangers of UDFs for not making the effort to understand how SQL Server works before using it, but on the other hand, if SQL Server cannot make even these basic concessions to universally-accepted ideas of sound programming practices, then maybe it can’t complain about the feeling of apprehension amongst developers who use it only occasionally. UDFs are not the only flawed feature that adds to the perception that SQL Server is quirky. For years, error handling was another, lack of good documentation tools, and so on.

It isn’t even a good idea for seasoned Database Developers to just know about the problem and avoid it. Problems such as UDFs and SELECT..INTO get mythologized, and DBAs start to flinch from even the most harmless usage of flawed parts of the language and avoid features long after they are fixed. Given how persistent such myths can be, even if Microsoft fixed scalar UDFs tomorrow it could still be another 10 years before the “functions don’t work in SQL Server” myth worked its way out of the system, and developers and DBAs were convinced that they were safe to use again.

Cheers,

Tony

9 Responses to “Do Scalar UDFs give SQL Server a Bad Name?”

  1. timothyawiseman@gmail.com says:

    A beautiful post, and I could not agree more. One of the main functions of a good compiler (or query-optimizer as the case may be) is to optimize the code provided as much as possible. This becomes even more significant in a largely declarative language like T-SQL.

  2. paschott says:

    I can definitely see both sides of this one. When used with caution, UDF’s are great. They allow us to add calculations that can be re-used very easily. I know that I really appreciated getting them. The downsides are pretty much everything you mention here – they don’t scale up. They work reasonably well for a small data set, but when you need to run the “Age” function on every row in your multi-million row data set, it is painful. Add in the fact that the estimated query plan often ignores these calculations and you’ve got even more pain. (Thanks to Grant Fritchey for those reminders @ PASS.)

    I’d still argue that there’s a lack of good documentation tools. The DB Diagram tool isn’t horrible, but I don’t like the fact that it actually has to save those changes to the database in order to work. I’d much prefer to make the changes and commit when I’m ready.

    Nice tie-in to your prior post, by the way. Hopefully MS will address that Connect ticket sooner rather than later.

  3. PocketJoshua says:

    Tony your description of how these misconceptions and lack of understanding gets created and perpetuated is spot on. Additionally, thanks for the link to Itzik’s article. It may be a few years old but it is still excellent content.

    My user name link goes to more comments on this post and a personal tribute to Itzik’s impact on my career.

    Joshua
    {my apologies for the double post but editing does not seem possible}

  4. Granted says:

    Excellent thoughts. Unfortunately,the momentum is towards more and more marginalization of the database platform through ORM tools and stuff like Microsoft Dynamics. That’s even as these tools are found to do horrible things to the database, making performance, maintenance, data integrity and recoverability either difficult or impossible. Relational data stores with structured data still have a very well defined purpose within the majority of organizations. That means that these database systems, warts & all, are going to continue to be created and grow. Rather than trying harder and harder to hide them away or ignore them, we need to get developers to embrace the fact that they’re there, not going away, and that they need to be dealt with on the terms of the database system. As long as the trend is towards attempting to turn the relational database into an object persistance layer, we’re going to see more and more database problems. Unfortunately, like with the UDF’s, this is not going to be seen as an implimentation issue, which it is, but as a problem with the database system.

    The question I have is, how do we get off that path? Except for attempting to educate developers, which just doesn’t seem to work very well (I’m not knocking developers here, but rather our ability to reach them), I don’t have a good suggestion.

  5. BuggyFunBunny says:

    @Granted:
    As long as the trend is towards attempting to turn the relational database into an object persistance layer, we’re going to see more and more database problems.

    The question I have is, how do we get off that path? Except for attempting to educate developers, which just doesn’t seem to work very well (I’m not knocking developers here, but rather our ability to reach them), I don’t have a good suggestion.

    The use of BCNF datastores reduces the need for such UDF or as object persistence. How to get there?

    I have been seeing a way out for a few years, but the path is bumpy. The multi-core/processor machine with SSD as prime/sole storage is as near as we’ll ever likely get to a “perfect” platform for BCNF datastores. Recently (last few months, literally) storage vendors have been hedging by offering, or being forced to offer, hybrid arrays with a few “cache” SSD fronting plain vanilla HDD. Lots cheaper than a pure SSD array, and not as functional in a BCNF environment.

    That bad news said, the cost/performance curve continues to move the SSD toward HDD; what the ignorant don’t accept is that the bloat of object/xml datastores need these behemoth HDD, while normalized relational datastores do not. The SSD vendors seem to have bifurcated between “Enterprise” and “consumer”, with the Enterprise focusing on COBOL mainframe shops; I strongly suspect this is where the pressure to build hybrid arrays comes from. These folks are never going to refactor a VSAM file dump into DB2/Oracle/etc., so a pure SSD array will be largely wasted; the same is true of the object/xml zealots.

    On the other hand, smaller value added resellers, particularly those that ship a machine configured for their vertical, have a motivation: wring the most performance out of the fewest resources with the least support headache. They have a better mouse trap staring them in the face, and they’ll take it. In this sense, SQLServer is more likely the database of choice than DB2 or Oracle. Windows, too, alas.

    So, platforms like Oslo, all of which I’ll characterize as having the goal of generating the UI from the database (taking Date’s command that a row is a business rule), as they gain mindshare will inevitably shift focus back to the datastore as control. It won’t happen soon, I don’t think. But the burden of bytes and code imposed by the object/xml crowd will eventually cause collapse. It isn’t happenstance that none of the Object Databases has thrived. Morphing a Relational Database into an act-alike of an OODMBS will only make matters worse for the implementations.

    There is a better way, clearly. Getting authorization from The Powers That Be will be the issue.
    Once the pendulum starts to swing toward the pure SSD array, we can use it to build (or refactor to) BCNF datastores that blow the doors off anything else. Rational managers (there must be some, right?) will sign on the dotted line.

  6. Matt51F1 says:

    What I would like to know is where do these third-party software vendors find their developers who know so little about programming efficiency into a query.

    Surely even the most inexperienced of programmer knows that a UDF is meant to be something small, simple and intended to replace repetitious code that may need to be run many, many times – and yet, most seem to not understand this concept.

    There is one vendor we deal with here in Australia that ships UDF’s to us that, at best, use mismatched variables (sending a numeric, receiving it as a varchar, comparing the received variable implicitly with a numeric) and, at worst, have functions that average 300 lines and even run to 800 lines in length with extensive use of WITH(NOLOCK) in the multiple-join queries that are often part of cursors-within-cursors inside the function.

    When a problem arises with the application, they then have the hide to suggest that it is our installation that is the problem.

    I have rewritten some of the UDF’s but I’m not about to do all of the work they should be doing and allowing them to charge us for the privilege.

    I’ve even described the UDF’s to a couple of MVP’s and they visibly cringe before even seeing the code…. :(

    What is the difference between a developer and a computer?
    You only need to punch the information into a computer once.
    (addendum: … if you have a half-decent DBA :)

  7. Anonymous says:

    Tools such as Visual Basic (VB) and Visual FoxPro succeeded and became wildly popular because they provided…

Leave a Reply