24 April 2007

Choosing between CLR and T-SQL stored procedures: a simple benchmark

In this short article, Greg Larsen performance benchmarks alternative CLR-based and T-SQL stored procedure solutions for retrieving operating system file information. He also assesses other factors to be considered when choosing the correct implementation for your system, such as ease of development and maintenance.

In my previous article (Building my First SQL Server 2005 CLR) I wrote about my experiences coding and implementing my first SQL Server 2005 CLR procedure. The idea was to:

  • Provide a “learn through my pain” tutorial for people considering adoption of CLR functionality
  • Demonstrate what I considered a realistic practical usage of CLR procedures, in replacing an unsupported extended stored procedure (SP) called xp_getfiledetails. I’d used this XP in SQL 2000 to allow me to obtain various bits of operating system information for a physical disk file.

A lot of interesting points came out of the subsequent discussion on this article. At the time, I felt that a potential drawback of my new CLR-based solution was that for the first time I would have objects in my database for which the source code was stored outside of the database. This misconception was corrected by Adam Machanic. Adam pointed out a couple of ways to get your source code stored in SQL Server, one of those being to use the ALTER ASSEMBLY T-SQL command, like so:

  ALTER ASSEMBLY xp_getfiledetails
  ADD FILE FROM 'c:\temp\xp_getfiledetails.cs'

After issuing the above command the source code for my CLR can be viewed by using the sys.assemble_files catalog view. With the CLR and source code stored in SQL Server I can move my database and the CLR and source code will move along with the database.

Additionally the waters were “muddied further” by comments from Phil Factor where he offered a T-SQL solution that accomplished exactly the same thing, thus removing the need for introducing CLR functionality in the first place – although, this alternative solution was supported using OLE Automation.

So, when faced with two apparently viable solutions for a problem – one CLR-based and one T-SQL-based – how does one choose which one to use? As a starting point, I decided to perform a simple performance benchmark.

Retrieving file information using a CLR procedure

I will not present again here my original xp_getfiledetails CLR procedure solution. However, the C# code is included in the code download for this article, and is described in full in my original article.

Retrieving file information using T-SQL and OLE Automation

The basis for my T-SQL code solution was provided by Phil Factor. I modified Phil’s code slightly so that the T-SQL code performs and produces exactly the same results as my CLR-based solution. Both the CLR and the T-SQL procedures produce the same results as the deprecated xp_getfiledetails extended stored procedure provided with SQL Server 2000. If you want to follow along, grab the code for the tsql_getfiledetails_OLE stored procedure, and create it in your test database.

If you plan to execute the above stored procedure you will need to enable OLE Automation. To do that you can run the following code:

sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
sp_configure 'Ole Automation Procedures', 1;
GO
RECONFIGURE;
GO

Benchmarking the solutions

In my benchmark testing I ran these two processes on a standalone SQL Server 2005 machine. I recorded and compared the elapsed time for each process to return file information.

Here is the simple test harness I used to drive this performance testing:

SET nocount ON

DECLARE @maxsize INT
DECLARE @x INT
DECLARE @starttime1 datetime
DECLARE @stoptime1 datetime
DECLARE @starttime2 datetime
DECLARE @stoptime2 datetime

CREATE TABLE #temp (
    [Alternate Name] VARCHAR(100),
    [Size] INT,
    [Creation Date] CHAR(8),
    [Creation Time] CHAR(6),
    [Last Written Date] CHAR( 8),
    [Last Written Time] CHAR(6),
    [Last Accessed Date] CHAR(8),
    [Last Accessed Time] CHAR(6),
    [Attributes] INT
    )

SET @x = 0 
SET @starttime1 = GETDATE()
WHILE @x < 20
    BEGIN 
    SET @x = @x + 1
    INSERT INTO #temp 
       EXEC master.dbo.tsql_getfiledetails_OLE 'C:\temp\longfilename.txt'
    END

SET @stoptime1 = GETDATE()
SET @x = 0 
SET @starttime2 = GETDATE()

WHILE @x < 20
    BEGIN 
    SET @x = @x + 1
    INSERT INTO #temp EXEC master.dbo.xp_getfiledetails 'C:\temp\longfilename.txt'
END

SET @stoptime2 = GETDATE()
SELECT DATEDIFF(ms,@starttime1, @stoptime1) ole_duration, 
    DATEDIFF(ms,@starttime2,@stoptime2) clr_duration 

DROP TABLE #temp

As you can see, this code has two WHILE loops. The first loop executes the tsql_getfiledetailes_OLE T-SQL stored procedure twenty times, and the second loop does the same for the CLR store procedure. In each case, we return the details of the same file (c:\temp\longfilename.txt).

I capture the start and end time for each WHILE loop and then execute a SELECT statement that calculates and displays the duration, in milliseconds, for each process.

When I ran this code on my standalone SQL Server, I got the following output:

ole_duration clr_duration
------------ ------------
533 16

Clearly, the CLR-based procedure is much faster at returning file information from the operating system than the T-SQL OLE automation procedure. From this test we can conclude that OLE Automation has more overhead than the different methods used in C# for returning file information.

T-SQL or CLR?

At first glance it may seem obvious that you’d choose to implement the CLR-based procedure in your system, over the T-SQL procedure. However, is raw speed the only consideration?

On reflection, it probably isn’t too surprising that the OLE T-SQL solution is slower. OLE automation is generally used for activities such as printing, sending email, saving stuff to files and so on – slow processes by definition. In our shop, the process that uses xp_getfiledetails only calls the routine once, and runs as a batch application. Therefore the extra overhead of using the T-SQL solution is really not an issue.

More important considerations in our case are ease of development and ease of maintenance and from this standpoint we’d probably end up choosing the T-SQL solution regardless of the slower performance.

On the other hand, if you where building a real-time solution that was traversing a directory tree and summarizing the size of each file so it could present a total size of the directory, then probably the CLR solution would be more appropriate.

Conclusion

From my benchmark testing, it is clear that the CLR method of is much more efficient at returning operating system file information. If your goal is to optimize the performance of returning file information then the CLR approach is the way to go.

However, while benchmark testing is an important element in deciding your approach, it is also clear that speed of code execution is not the only consideration when deciding whether to implement programming logic using a CLR or T-SQL. Most shops will probably find it easier to build, debug and maintain a T-SQL procedure over a CLR-based procedure. One needs to weigh all the benefits and drawbacks of any solution prior to implementation.

For additional information on how to weigh the strengths and weakness of CLR over T-SQL you might consider reading the following white paper from Microsoft:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql90/html/sqlclrguidance.asp

Keep up to date with Simple-Talk

For more articles like this delivered fortnightly, sign up to the Simple-Talk newsletter

Downloads

This post has been viewed 56717 times – thanks for reading.

Tags: , , , , , ,

  • Rate
    [Total: 32    Average: 3.1/5]
  • Share

Greg Larsen

View all articles by Greg Larsen

  • Anonymous

    CLR vs OLE
    Did you restart the server (clear caches) and reverse the execution order?

    If you’re running on the same tables, naturally the 2nd set would be faster, working off of the results of the first.

    But, since CLR is supposed tobe closer to metal than tsql, I’m not surprised.

  • Anonymous

    After your getfilesdetails SP is run, sp_send_dbmail is failing
    hiya,

    dont know if you have seen this happen before but we were using your code for getfiledetails and straight after running a sp_send_dbmail for DatabaseMail.

    If the sendmail has a query embedded, it fails with the below error but if we run the sendmail on its on it works, so it isnt the sendmail code. Its almost like the getfiledetails code is breaking it.

    If I run the sendmail with no query it works fine.

    I have tried various queries and they all fail.

    Msg 22050, Level 16, State 1, Line 0
    Error formatting query, probably invalid parameters
    Msg 14661, Level 16, State 1, Procedure sp_send_dbmail, Line 478
    Query execution failed: Error Intiailizing COM . CoInitialize failed with Hresult: 0x80010106

    Any help with this would be much appreciated.

    thanks
    wendy