Click here to monitor SSC

Tony Davis is an Editor with Red Gate Software, based in Cambridge (UK), specializing in databases, and especially SQL Server. He edits articles and writes editorials for both the Simple-talk.com and SQLServerCentral.com websites and newsletters, with a combined audience of over 1.5 million subscribers. You can sample his short-form writing at either his Simple-Talk.com blog or his SQLServerCentral.com author page. As the editor behind most of the SQL Server books published by Red Gate, he spends much of his time helping others express what they know about SQL Server. He is also the lead author of the book, SQL Server Transaction Log Management. In his spare time, he enjoys running, football, contemporary fiction and real ale.

Time for a rethink on SQL CLR?

Published 29 September 2011 12:10 pm

It is almost seven years since Microsoft announced the sensational news that, with their new SQL Server 2005, you could do data manipulation in Visual Basic or C# as well as SQL. The marketing people got overexcited, stabbing out clichés from their keyboards such as "new era", and "revolutionary". However, they had run off barking in a different direction to the technologists, missing the whole value of SQLCLR for providing specialized extensions to SQL Server

Unsurprisingly, the technology was misunderstood and many DBAs have never recovered from their initial feelings of dread, still shuddering at the merest mention of the term "CLR function". In the opening passage of his recent review of SQL#, a suite of very useful SQL CLR functions for string manipulation, regular expressions, and more, Grant uses terms such as "throwing your arms up" "screaming", "crying out loud", "nuts" and "stupid", which seems a pretty fair assessment of the attitude of many DBAs toward CLR functions in SQL Server.

Several factors have added to its bad reputation. Some of this is misuse, using SQL CLR for functionality that belongs in the application tier, rather than the database. Security issues have compounded the anathema toward CLRs. Much harm was done by poor implementation, mainly on the part of the coder, but also on the part of Microsoft and SQL Server.

The worst problem is the complexity of writing effective CLR routines for high performance and scalability. The key to success is the constant streaming of data, using IEnumerable methods and the SQLDataReader, so that SQL Server never has to process too much data at once. The use, instead, of the ADO.NET dataset, and various custom user collections, can make place huge memory demands on SQL Server, involving heavy multi-page memory allocations in the non-buffer pool region. This issue, especially in 32-bit versions of SQL Server, with access to limited amounts of Virtual Address Space, was the cause of paging problems, and specific errors such as "App Domain marked for unload due to memory pressure", or "701" errors, where SQL Server fails to allocate a contiguous region of VAS for an operation.

Unfortunately, SQL Server does not necessarily make it easy for the SQL CLR programmer. SQL CLR functions cannot use a SQLDataReader over the context connection, so the efficient IEnumerable-based streaming simply isn’t available and the programmer is forced to use less efficient techniques, or to find ways around the problem, which ultimately involves classifying your assembly as EXTERNAL ACCESS, or possibly even UNSAFE.

If you’re someone like Adam Machanic, you persevere. His blog, and various presentations, detail his journey through writing custom IEnumerable classes to enable streaming, creating multiple threads in the SQL CLR to help deal with issues such as "rude aborts" and evolving all this into a powerful "query parallelizer" component.

If all of this is enough to discourage the rest of us from writing our own SQL CLR components, it still doesn’t mean we should deny ourselves access to the useful functionality and performance that they can provide. Professionally-written and properly-tested, SQLCLR libraries, such as SQL#, may cause us to refine or qualify the general anti-CLR position. Thorough testing by the likes of the author of SQL#, Solomon Rutzky, has shown that for functions that require heavy computational power, SQL CLR routines have obvious performance advantages over T-SQL. The SQL CLR gives us a "turbo-charged cursor". Any type of calculation that lends itself naturally to cursor-based logic, such as the running total calculation, will run faster as a SQL CLR.

For accessing important CLR libraries such as the Regex functions SQL CLR assemblies are essential. There are a host of specialized requirements where CLR can replace the clunky and antiquated OLE Automation techniques, and there is the added value of allowing new complex data types such as vectors and coordinates. Yes, the technology has its quirks, but CLR libraries are worth having. In fact, unless you have kicked out the Hierarchy functions and geospatial functions, you already probably have CLR libraries installed!

I’d be interested to hear if, and how, you’re currently using SQL CLR, or at least if your attitude towards it is softening.

Cheers,

Tony.

14 Responses to “Time for a rethink on SQL CLR?”

  1. Rowland says:

    To me the ability to extend SQL Server in this way is natural — akin to Oracle’s Pro*C. But it was always over hyped..a bit silly really. Unless there’s a compelling case (RegEx or other string mashing) I don’t allow its use on production servers due to scary things like security, but also mundane things like it makes deployment a pain.

  2. Keith Rowley says:

    Seems to me that SQL CLR is another area that is starting to benefit from well written frameworks of code provided to make a complicated problem much simpler, much as jquery is doing for javascript coding.

  3. DGPerson says:

    Once again tunnel vision is crippling the mind(s) of the folks in the SQL Server universe.
    That vision causes individuals to believe that one tool is the only tool; C# coders think C# is the center of the universe. T-SQL coders think T-SQL is. BUTT, neither is true.
    C# and the CLR can serve you if you use them to edit strings where T-SQL generally sucks. T-SQL serves when you need to read/write data. The trouble comes when the tunnel vision causes one to think that C# and the CLR is useful to manipulate data in tables and when T-SQL is use to chop strings.

    So don’t drive nails with screw drivers or drive screws into wood using a hammer. The right tool for the job is the way to think of it.

  4. cjlotz says:

    We’ve been using SQL CLR functions successfully for close to a year now. We were able to optimize a function that was using T-SQL to parse text to extract phrases for dynamic translation purposes considerably using a SQL CLR function and RegEx expression. Once parsed and read from the relevant tables, we also keep a cache of the different phrases per language as a Thread-safe Dictionary in C#. This improves sub-sequent lookup times to be very fast. By setting up SQL CLR triggers on the tables that contain the phrases we are able to clear the relevant cached entries when changes are made to the specific translations.

    I do however feel that the installation and administration of these functions can sometimes turn out to be a bit complex and a hassle. We sometimes run into scenarios where the functions are deployed incorrectly and we are still unable to determine precisely why as the behaviour is quite random.

    So I agree, for the right job, there’s nothing that beats the power and speed of C# running within SQL CLR functions. To us this is just another tool in the hands of a development team which, if used for the right purposes, can considerably improve the maintenance and performance of getting data out of your database.

  5. srutzky says:

    Hi Tony and thanks for the positive mentions :) .

    Definitely some good points made here but I would like to clarify something regarding use of IEnumerable.

    The restriction of not being able to stream data out while at the same time streaming the data in from a DataReader is only an issue for User-Defined Functions; this is not an issue for CLR Stored Procedures. While I greatly prefer the interface of TVFs over Procs regarding the issue of usability, I have found that the vast majority of TVFs (that I have done at least) have not needed to read a result set from the DB. So the restriction does not really come into play all that often. And when it does, if the results to be returned are too large to fit into memory, there is always the option of switching to a Stored Procedure as it can release each ResultsRow as it is created so that you only really have one row’s worth of data in memory at time.

  6. Alex_Kuznetsov says:

    Those of us who aren’t tied to a particular language or framework can consider quite a few other alternatives:

    1. Can we do it on the client? Someone straight out of college can, for example, quickly implement running totals, and unit test them with mocks, without having to set up the database for unit tests. If we make this choice, heavy calculations will slow down some client, while other users experience fast response times, and our team enjoys our weekend. Also anyone can maintain that code.

    2. Can we upgrade the server? Changes in agile environments are very common, and they may render super smart CLR optimizations useless, but better hardware stays with us for the next project even if we kill the whole current system.

    3. Can we use another RDBMS? If the business logic needs fast cursors, why don’t we use Oracle – its cursors are usually as fast as set based commands. If the business logic needs rich string manipulations, why can’t we use Postgres – it natively supports Perl and Python as languages for database objects.

    Least, but not last: do we have to use an unpopular technology? Unpopular technologies usually have more problems, because less people use them. As such, we are more likely to encounter a show-stopper bug and switch to some other approach anyway, after having wasted some time with this one.

    How do we know CLR will not be abandoned or completely reworked in the next release? The time spent on learning CLR is the time not spent learning some more common technology, more likely to be used in the next project. Should we need to hire another team member, we are less likely to find someone already familiar with it. If we ask someone to learn CLR, they are less likely to reuse that skill in their next project, as compared to Java or Python or C# skills. Developers always want to try out and learn new things, but in my (very limited, of course) experience I do not know of anyone who voluntarily asked to try out CLR.

  7. Kevin Pullin says:

    SQL CLR is a lifesaver, both in terms of performance and usability, in areas where tsql falls flat on its face. We use it for extremely basic things that you’d think would be natively exposed: integer/long parsing, time zone conversions, simple string manipulation, regexs, etc, etc.

    Additionally, avoiding xpath for XML parsing is a huge benefit – one offending tsql XML statement takes 35 minutes to complete, whereas a re-implementation in SQL CLR takes 3 seconds (plus it’s simpler for us .NET devs to understand, debug, and write tests against :] ).

    As always, use the right tool for the job. Sometimes that’s set based SQL statements and other times it’s plain old imperative code.

  8. BuggyFunBunny says:

    Alex:
    1. Can we do it on the client?

    Why? The whole point of RDBMS is to really do what OO proponents merely brag about: data and method in One Place. The excuse that servers can’t do this is folly. With the tsunami of high-core/SSD machines overwhelming the industry (and, you’ll note the diminishing power of “client” machines, i.e. iStuff and such), there’s ever more reason to build systems as Dr. Codd suggested.

    And don’t get me started on the disaster that is all things xml.

  9. paschott says:

    I tend to agree with Rowland – generally over-hyped. I’ve played with SQL# before and appreciate that recent article because it reminded me of the project and let me take another look at the project. I can definitely see a real use for a good portion of that functionality in our apps, but it’s a hard sell because of the possible security risks. Even while the functionality is useful, I still haven’t found really compelling arguments to use CLR so far. However, I may be revisiting that in the future as we try to increase performance. I definitely don’t want to rule out a potential tool on the basis of unknown security risks. I’m actually more concerned that we’d slow things down than that we’d introduce a security risk. :-)

    I’ll give this another mention to our teams and see what we may be able to do with the idea. It may lead to nothing at the moment, but it’s always worth considering if the benefits outweigh the negatives.

  10. Alex_Kuznetsov says:

    @BuggyFunBunny: I am not so sure about the following: “The whole point of RDBMS is to really do what OO proponents merely brag about: data and method in One Place.”

    As a practitioner, I use RDBMS as a component of useful solutions for my clients, when it is cheaper to use it as compared to use other tools. Regardless of any ideology, many things are much cheaper to accomplish elsewhere, outside of RDBMS, for lots of reasons. For example, heavy calculations are usually cheaper to deliver on the client because (I might be missing some reasons):

    1. Less skilled developers can develop solutions on the client, and the required skills are easier to find. Have you ever tried to hire or train an expert in CLR?

    2. Unit testing not involving databases is way easier, TDD is _MUCH_ easier. This seriously ups the quality of software and the overall robustness of the solution.

    3. To scale out, I can just another app server, like Google does. Scaling up a database server may be much more expensive/complex. With all due respect to Dr Codd and his work, I think Google founders made just a little bit more money off their approach than Dr Codd made off his theory.

    4. Migration to another RDBMS is much easier and cheaper if the database only does what it does best: store and retrieve data, enforce constraints, ensure isolation, provide backups/recovery.

    5. I am spreading my risks: any problems with one particular algorithm do not affect all database users.

    6. I am keeping developers happy – they usually prefer not to be involved with CLR. You know what? They are usually right, they are usually more productive with other approaches. Also I am not so sure CLR will be around, and not dramatically changed, in the next release. So the investment of time and effort in learning it might not realize enough profit to justify itself.

  11. Henrik Staun Poulsen says:

    We use it for calculations that should have been in SQL Server:

    Median
    LinearRegression
    Skewness
    FirstQuartile
    WeibullLeastSquares

    It works quite well for us.

  12. BuggyFunBunny says:

    Alex:
    ” heavy calculations are usually cheaper to deliver on the client because”
    Old style clients, perhaps. Near future, and many current, not even close. And there’s the issue of data integrity: left to the client (and its coders) one is left with the chance (and a good one) that not all client code will do the same thing. And cheaper, by what measure? Do you include the direct I/O cost of the data transfer for a specific calc? And the cost to maintain the infrastructure to carry out the transfer? And so on. Don’t assume that divide and conquer is always true for all times and all places. There was a time when the cpu was nicely divided into discrete (best of breed) components. Not so much any more. Consolidation is the inarguable trend. Cloud is just another example.

    And here’s a wonderfully cranky exegesis: http://ora-00001.blogspot.com/2011/07/mythbusters-stored-procedures-edition.html

    “4. Migration to another RDBMS is much easier and cheaper if the database only does what it does best: store and retrieve data, enforce constraints”
    If one defines catalogs to high normal form (BCNF or even 5NF), there’s not much left. Now, display calcs (pretty stuff) is another issue. Sure that belongs on the client. To quote Pascal: “a row is a business rule”.

    It hasn’t happened yet, alas, but the high core-count/SSD machine will spell the end of most client side coding. And the reason is obvious: high NF databases (heavy joined) will no longer be strapped by HDD latencies. With high NF, one gets a significant reduction in data footprint, in addition to the integrity constraints being the “only code” in the datastore. It really is the future. Oh, and SwisSQL does a good job of migrating, or so they say; not used it in anger. In all my years, not once has a production application abandoned one engine for another, so I’ve never bought that argument.

  13. DeafProgrammer says:

    I have been working with databases since 1988 as a statistical clerk and I then promoted to a programmer in 1995. I admitted, unfortunately, the three-tier architecture is sometimes implemented because the database design is so bad. I couldn’t write a simple set-based query for that solution. To design a well-normalised database encourages the database people to develop business rules within the RDBMS database structure such as constraints, stored procedures, writing set-based queries, index and the likes. I’ve seen so many bad design databases that are working so well within the three-tier architecture environment. The downside of this environment is the high cost of maintaining the system.

  14. srutzky says:

    @Alex_Kuznetsov: “If we ask someone to learn CLR, they are less likely to reuse that skill in their next project, as compared to Java or Python or C# skills.” and “If the business logic needs rich string manipulations, why can’t we use Postgres – it natively supports Perl and Python as languages for database objects.”

    Hi Alex. There might be a small misunderstanding here as to what exactly CLR is. CLR is not a language, it is the framework that runs .Net languages such as C# and VB.Net; it is basically the same thing as the JVM for JAVA. So in terms of the skills that people are reusing, both C# and VB.Net can be used to create functions, stored procedures, aggregates, etc in SQL Server (much like how you would be using Perl or Python in PostgreSQL). When people talk about SQLCLR they are simply referring to the ability to run C# and VB.Net code from within the database. Hence people are not learning CLR separately and Microsoft is not about to discontinue the CLR (which would mean throwing away C#, VB.Net, and some others like IronPython, etc.)

    —————-

    @paschott: “I can definitely see a real use for a good portion of that functionality in our apps, but it’s a hard sell because of the possible security risks.”

    Hi paschott. May I ask what security risks you are referring to? CLR procedures and functions execute with the same security context as regular T-SQL procedures and functions, and if the assembly is marked as SAFE then there isn’t much damage that it can do. If the assembly is marked as EXTERNAL_ACCESS (so that it can access disk, network, etc.) then commands that reach outside of the database use the context of the login that is set as the “Logon As” account for the SQL Server process, which ideally should be a restricted account that can’t do much damage anyway. But there is a HUGE security advantage to using CLR code for disk and network access since the code can only access what it is programmed to access. The alternative that many people go with is to enable xp_cmdshell which also runs in the security context of the “Logon As” account BUT there is no restriction on what can be called. It is much more likely that someone can mess with a bat/cmd script being called from xp_cmdshell then they would be able to mess with the code inside of an assembly. Also, by using CLR instead of xp_cmdshell you can better separate security such that only Login(or Role)A can execute a proc to delete some files and only Login(or Role)B can execute a proc to FTP. By using xp_cmdshell to do both operations, both Logins/Roles would need access to xp_cmdshell so that they can call their respective bat/cmd scripts. So as far as I can see, the security risks with CLR code are the same as with T-SQL (for SAFE code) and less than with xp_cmdshell (for EXTERNAL_ACCESS code).

    Take care,
    Solomon…

Leave a Reply