Click here to monitor SSC

Phil Factor's Phrenetic Phoughts

Simple-Talk columnist
The wilder shores of Transact SQL    Phil on Twitter   Phil on SQL Server Central  Phil on BOS

Using XML to pass lists as parameters in SQL Server

Published Thursday, January 05, 2012 2:32 AM

(Updated 14th Jan 2012, and again 26th Jan 2012)

Every so often, the question comes up on forums of how to pass a list as a parameter to a SQL procedure or function. There was a time that I used to love this question because one could spread so much happiness and gratitude by showing how to parse a comma-delimited list into a table. Jeff Moden has probably won the laurels for the ultimate list-parsing TSQL function, the amazing ‘DelimitedSplit8K’. Erland Sommarskog has permanently nailed the topic with a very complete coverage of the various methods for using lists in SQL Server on his website.. With Table-Valued parameters, of course, the necessity for having any lists in SQL Server is enormously reduced, though it still crops up.

Element-based XML seems, on the surface, to provide a built-in way of handling lists as parameters. No need for all those ancillary functions for splitting lists into tables, one might think. Yes, indeed, but be careful of the XQuery syntax that you use, as we'll see.

 Let’s just take the very simplest example of taking a list of integers and turning it into a table of integers. One can use a simple element-based XML list based on a fragment like this…

      DECLARE @XMLlist XML
     
SELECT  @XMLList = '<list><i>2</i><i>4</i><i>6</i><i>8</i><i>10</i><i>15</i><i>17</i><i>21</i></list>'

…to give a table like this ..

IDs
-----------
2
4
6
8
10
15
17
21
 
(8 row(s) affected)

..by using this TSQL expression

  SELECT x.y.value('.','int') AS IDs
        
FROM @XMLList.nodes('/list/i') AS x ( y )

It isn’t as intuitive as a simple comma-delimited list to be sure, but it is bearable. In small doses, this shredding of element-based XML lists works fine, but,<added 26th January> using this particular syntax for getting the value of the element </added 26th January> one soon notices the sigh from the server as the length of the list starts to increases into four figures. Try it on a list of a hundred thousand integers and you’ll have time to eat a sandwich.

OK. Let’s do a few measurements and do a graph.

Jeff Moden’s Split function, scales beautifully as shown by the red line, and it is difficult to measure it, it is so quick. The XML eShred technique by contrast, using the element-based list, and this XQuery syntax, exhibits horrible scaling. This sort of curve is enough to strike terror into the developer. Just as a comparison, I did the process of taking a list and turning it into a table by creating a VALUES expression from the list. Even this took longer than Jeff’s function, presumably because of the overhead of compiling the expression. Of course, the comparison is rather unfair because this third approach , the ‘Using Values Script’ approach, does no validation, but still one wonders what on earth the XML expression is DOING all that time. It must be gazing out the window, reading the paper and scratching itself, one assumes. No. Actually, the CPU is warming up on the task, so it involves frantic activity.

Operations involving XML can be startlingly fast in SQL Server, but this particular critter seems startlingly slow once one gets to a three-figure number in the list that you’re turning into a relation that can then be used in SQL Expressions. Imagine this getting loose in a production system when someone starts passing more values in the list and the database suddenly slows down. How would you track the problem down?

<added 26th January>The big problem here is one of the expression. As pointed out by a reader of the blog, if you modify the expression slightly to

  SELECT x.y.value('.','int') AS IDs
        
FROM @XMLList.nodes('/list/i/text()') AS x ( y )

The shredding suddenly goes like a rocket</added 26th January>.

There is a second, and more wordy version of the simple XML list, The attribute-based list. Here

   DECLARE @XMLlist XML
   SELECT  
@XMLList = '<list><y i="2" /><y i="4" /><y i="6" /><y i="8" /><y i="10" /><y i="15" /><y i="17" /><y i="21" /></list>'

…to give exactly the same table ..

IDs
-----------
2
4
6
8
10
15
17
21
 
(8 row(s) affected)

by using a very similar syntax like this...

   SELECT x.y.value('@i','int')
      
FROM @XMLList.nodes('list/y') AS x( y )

'Eh? What difference could this make?', you're wondering. Well, rather a lot, it turns out. Thanks to a reader comment on the first version of this blog by wBob, we can reveal that this version scales as well as Jeff's amazing 'split' function. I must admit to being rather surprised at the difference in performance of the two. By using the Attribute-based list, or the modified XQuery expression,  you can cheerfully pass lists of substantial length to procedures without worrying, it seems. It reminded me of an argument I had some years ago with a distinguished Database Developer who was advising us all not to use XML-based lists because of their performance problems. I'd been using attribute-based  lists simply because I copied Bob Beauchamin's example code in 'The Book', and hadn't hit any problem. We couldn't convince each other. Perhaps we'd hit this performance difference.

<added 26th January>We can even improve on this. as Sviridov Konstantin has pointed out, you can tweak it to the point that it goes around twice the speed of Jeff's TSQL split function by using syntax like this...

   SELECT x.y.value('.','int')
      
FROM @XMLList.nodes('list/y/@i') AS x( y )

Which will give a performance comparison with the the TSQL split function. These modified XML shreds must be the fastest non-CLR method of  transferring list-based data as a variable in SQL.

</added 26th January>XML in SQL Server seems to remind me of the old poem by Henry Wadsworth Longfellow. (1807-1882)

There was a little girl, who had a little curl
Right in the middle of her forehead,
And when she was good, she was very, very good,
But when she was bad she was horrid.'.

Just as a comparison, here is the horrid XML performing against the first attribute-based XML Shred, and the TSQL split function

As always, one should measure everything you can, and take nothing for granted if you have to deliver a scalable application.

Here is the simple code I used to do the metrics. The results were simply pasted into excel and graphed.

--in order to record the timings, we prepare a log.
DECLARE @log TABLE (
            
Log_Id INT IDENTITY(1, 1),TheeVent VARCHAR( 2000 ),
            
[Values] INT, CPU FLOAT DEFAULT  @@CPU_BUSY,
            
DateAndTime DATETIME DEFAULT GETDATE()) ;
--define working variables
DECLARE @List VARCHAR(MAX),@XMLList XML,@AttributeBasedXMLList XML,
        
@buildScript VARCHAR(MAX), @XMLbuildScript NVARCHAR(MAX),
        
@ii INT, @Values INT;
--and table variables to receive the results/relations
DECLARE @ByteBucket TABLE (TheNumeral INT)
DECLARE @ByteBucket2 TABLE (TheNumeral INT)
DECLARE @ByteBucket3 TABLE (TheNumeral INT)
DECLARE @ByteBucket4 TABLE (TheNumeral INT)
DECLARE @ByteBucket5 TABLE (TheNumeral INT)
DECLARE @ByteBucket6 TABLE (TheNumeral INT)
SET NOCOUNT ON
SELECT
@Values=1 --start with one item in the list
WHILE @Values<=3000
  
BEGIN
  
--build up the list with random integers
  
SELECT @List='1',@ii=@Values
  
WHILE @ii>1
    
BEGIN
    SELECT
@List=@List+','+CONVERT(VARCHAR(3),CONVERT(INT,RAND()*100))
    
SELECT @ii=@ii-1 --  .. to the required length
    
END
  
--and pre-prepare the XML List and VALUES script
  
SELECT @XMLList='<list><i>'+REPLACE(@List,',','</i><i>')+'</i></list>'
  
SELECT @BuildScript='SELECT x FROM (values ('+REPLACE(@List,',','),(')+'))d(x)'
  
--and pre-prepare the XML List and VALUES script  
SELECT @XMLBuildScript='SELECT @AttributeBasedXMLList = ( SELECT x AS "@x" FROM (values ('
      
+REPLACE(@List,',','),(')+'))d(x) FOR XML PATH(''y''), ROOT(''root''), TYPE )'
EXEC sp_executesql @XMLBuildScript, N'@AttributeBasedXMLList XML OUT',
                    
@AttributeBasedXMLList OUT
  
--try doing the delimited list function
  
INSERT INTO @log (TheEvent, [Values]) SELECT 'Using Split function',@Values
  
INSERT INTO @ByteBucket (TheNumeral)
  
SELECT item FROM dbo.DelimitedSplit8K (@list,',')
  
--use the XML Shred trick
  
INSERT INTO @log (TheEvent, [Values]) SELECT 'Element-based XML',@Values
  
INSERT INTO @ByteBucket2 (TheNumeral)
  
SELECT x.y.value('.','int') AS IDs
        
FROM @XMLList.nodes('/list/i') AS x ( y )
  
--use the XML Shred trick
  
INSERT INTO @log (TheEvent, [Values]) SELECT 'Element-based XML (text())',@Values
  
INSERT INTO @ByteBucket6 (TheNumeral)
  
SELECT x.y.value('.','int') AS IDs
        
FROM @XMLList.nodes('/list/i/text()') AS x ( y )
  
--try the VALUES method
  
INSERT INTO @log (TheEvent, [Values]) SELECT 'Using Values script',@Values
  
INSERT INTO @ByteBucket3 (TheNumeral)
    
EXECUTE (@BuildScript)
  
--use the XML Shred trick
  
INSERT INTO @log (TheEvent, [Values]) SELECT 'Attribute-based XML',@Values
  
INSERT INTO @ByteBucket4 (TheNumeral)
    
SELECT x.y.value('@x','int')
      
FROM @AttributeBasedXMLList.nodes('root/y') AS x(y)
  
INSERT INTO @log (TheEvent, [Values]) SELECT 'finished',@Values
  
--use the XML Shred trick
  
INSERT INTO @log (TheEvent, [Values]) SELECT 'Attribute-based XML (@)',@Values
  
INSERT INTO @ByteBucket5 (TheNumeral)
    
SELECT x.y.value('.','int')
      
FROM @AttributeBasedXMLList.nodes('root/y/@x') AS x(y)
  
INSERT INTO @log (TheEvent, [Values]) SELECT 'finished',@Values
  
SELECT @Values=@Values+100
END      

--Yes, we need to check that they all agree!
SELECT COUNT(*), SUM(TheNumeral) FROM @ByteBucket
UNION ALL
SELECT COUNT(*), SUM(TheNumeral) FROM @ByteBucket2
UNION ALL
SELECT COUNT(*), SUM(TheNumeral) FROM @ByteBucket3
UNION ALL
SELECT COUNT(*), SUM(TheNumeral) FROM @ByteBucket4
UNION ALL
SELECT COUNT(*), SUM(TheNumeral) FROM @ByteBucket5
UNION ALL
SELECT COUNT(*), SUM(TheNumeral) FROM @ByteBucket6

SELECT
  
TheStart.[Values],
  
MAX(CASE WHEN TheStart.TheeVent ='Using Split function'
          
THEN DATEDIFF( ms, TheStart.DateAndTime, Theend.DateAndTime )
          
ELSE 0
      
END) AS [Using Split function],
  
MAX(CASE WHEN TheStart.TheeVent ='Element-based XML'
          
THEN DATEDIFF( ms, TheStart.DateAnddTime, Theend.DateAndTime )
          
ELSE 0
      
END) AS [Element-based XML],
  
MAX(CASE WHEN TheStart.TheeVent ='Using Values script'
          
THEN DATEDIFF( ms, TheStart.DateAndTime, Theend.DateAndTime )
          
ELSE 0
      
END) AS [Using Values script],
  
MAX(CASE WHEN TheStart.TheeVent ='Attribute-based XML'
          
THEN DATEDIFF( ms, TheStart.DateAndTime, Theend.DateAndTime )
          
ELSE 0
      
END) AS [XML Attributes],*/
  
MAX(CASE WHEN TheStart.TheeVent ='Element-based XML (text())'
          
THEN DATEDIFF( ms, TheStart.DateAndTime, Theend.DateAndTime )
          
ELSE 0
      
END) AS [Element-based XML (text())],
  
MAX(CASE WHEN TheStart.TheeVent ='Attribute-based XML (@)'
          
THEN DATEDIFF( ms, TheStart.DateAndTime, Theend.DateAndTime )
          
ELSE 0
      
END) AS [Attribute-based XML (@)]
FROM   @log TheStart
  
INNER JOIN @log Theend
    
ON Theend.Log_Id = TheStart.Log_Id + 1
WHERE TheStart.TheEvent<>'finished'
GROUP BY TheStart.[Values];

SELECT
  
TheStart.[Values],
  
MAX(CASE WHEN TheStart.TheeVent ='Using Split function'
          
THEN theEnd.cpu-TheStart.cpu
          
ELSE 0
      
END) AS [Using Split function],
  
MAX(CASE WHEN TheStart.TheeVent ='Element-based XML'
          
THEN theEnd.cpu-TheStart.cpu
          
ELSE 0
      
END) AS [Element-based XML],
  
MAX(CASE WHEN TheStart.TheeVent ='Using Values script'
          
THEN theEnd.cpu-TheStart.cpu
          
ELSE 0
      
END) AS [Using Values script],
  
MAX(CASE WHEN TheStart.TheeVent ='Attribute-based XML'
          
THEN theEnd.cpu-TheStart.cpu
          
ELSE 0
      
END) AS [XML Attributes],
  
MAX(CASE WHEN TheStart.TheeVent ='Element-based XML (text())'
          
THEN theEnd.cpu-TheStart.cpu
          
ELSE 0
      
END) AS [Element-based XML (text())],
  
MAX(CASE WHEN TheStart.TheeVent ='Attribute-based XML (@)'
          
THEN theEnd.cpu-TheStart.cpu
          
ELSE 0
      
END) AS [Attribute-based XML (@)]
FROM   @log TheStart
  
INNER JOIN @log Theend
    
ON Theend.Log_Id = TheStart.Log_Id + 1
WHERE TheStart.TheEvent<> 'finished'
GROUP BY TheStart.[Values];

Comments

 

PhilQ said:

I would be very interested to see the variation of this test if a simple Schema was applied to the XML variable being processed. How would then relate to the other mechanisms demonstrated?
January 11, 2012 1:06 PM
 

wBob said:

Hi Phil,

using attribute-based XML provides comparable (although not quite as fast) results.  Can you verify?

thanks!

--in order to record the timings, we prepare a log.
DECLARE @log TABLE (
            Log_Id INT IDENTITY(1, 1),TheeVent VARCHAR( 2000 ),
            [Values] int, CPU float DEFAULT  @@CPU_BUSY,
            DateAndTime DATETIME DEFAULT GetDate()) ;
--define working variables
DECLARE @List varchar(max),@XMLList XML,@BuildScript1 NVARCHAR(max),@BuildScript2 NVARCHAR(max), @ii INT, @Values INT;
--and table variables to receive the results/relations
DECLARE @ByteBucket TABLE (TheNumeral int)
DECLARE @ByteBucket2 TABLE (TheNumeral int)
DECLARE @ByteBucket3 TABLE (TheNumeral int)
SET NOCOUNT ON
SELECT @Values=1 --start with one item in the list
WHILE @Values < 1000
  BEGIN
  --build up the list with random integers
  SELECT @List='1',@ii=@Values
 While @ii>1
    begin
    Select @List=@List+','+Convert(Varchar(3),convert(int,RAND()*100))
   SELECT @ii=@ii-1 --  .. to the required length
   END
   
 --and pre-prepare the XML List and VALUES script  
 SELECT @BuildScript1='SELECT @XMLList = ( SELECT x AS "@x" FROM (values ('+REPLACE(@List,',','),(')+'))d(x) FOR XML PATH(''y''), ROOT(''root''), TYPE )'
 EXEC sp_executesql @BuildScript1, N'@XMLList XML OUT', @XMLList OUT
 SELECT @BuildScript2='SELECT x FROM (values ('+REPLACE(@List,',','),(')+'))d(x)'

--select @XMLList

 --try doing the delimited list function
 INSERT INTO @log (TheEvent, [Values]) SELECT 'Using Split function',@Values
 INSERT INTO @ByteBucket (TheNumeral)
 SELECT item from dbo.DelimitedSplit8K (@list,',')

 --use the XML Shred trick
 INSERT INTO @log (TheEvent, [Values]) SELECT 'Using XML Shred',@Values
 INSERT INTO @ByteBucket2 (TheNumeral)
 SELECT x.y.value('@x','int')
 FROM @XMLList.nodes('root/y') AS x(y)

 --try the VALUES method
 INSERT INTO @log (TheEvent, [Values]) SELECT 'Using Values script',@Values
 INSERT INTO @ByteBucket3 (TheNumeral)
   EXECUTE (@BuildScript2)
 INSERT INTO @log (TheEvent, [Values]) SELECT 'finished',@Values

 SET @Values += 25

END        

SELECT
 TheStart.[Values],
 MAX(CASE WHEN TheStart.TheeVent ='Using Split function'
          THEN DatedIff( ms, TheStart.DateAndTime, Theend.DateAndTime )
          ELSE 0
     END) AS [Using Split function],
 MAX(CASE WHEN TheStart.TheeVent ='Using XML Shred'
          THEN DatedIff( ms, TheStart.DateAndTime, Theend.DateAndTime )
          ELSE 0
     END) AS [Using XML Shred],
 MAX(CASE WHEN TheStart.TheeVent ='Using Values script'
          THEN DatedIff( ms, TheStart.DateAndTime, Theend.DateAndTime )
          ELSE 0
     END) AS [Using Values script]
FROM   @log TheStart
 INNER JOIN @log Theend
   ON Theend.Log_Id = TheStart.Log_Id + 1
WHERE TheStart.TheEvent<>'finished'
GROUP BY TheStart.[Values];

SELECT
 TheStart.[Values],
 MAX(CASE WHEN TheStart.TheeVent ='Using Split function'
          THEN theEnd.cpu-TheStart.cpu
          ELSE 0
     END) AS [Using Split function],
 MAX(CASE WHEN TheStart.TheeVent ='Using XML Shred'
          THEN theEnd.cpu-TheStart.cpu
          ELSE 0
     END) AS [Using XML Shred],
 MAX(CASE WHEN TheStart.TheeVent ='Using Values script'
          THEN theEnd.cpu-TheStart.cpu
          ELSE 0
     END) AS [Using Values script]
FROM   @log TheStart
 INNER JOIN @log Theend
   ON Theend.Log_Id = TheStart.Log_Id + 1
WHERE TheStart.TheEvent<>'finished'
GROUP BY TheStart.[Values];
January 14, 2012 1:21 PM
 

Paul White NZ said:

Hi Phil,

As Jeff's article (http://www.sqlservercentral.com/articles/Tally+Table/72993/) shows, the fastest (2-3x better than DelimitedSplit8K) and most flexible (not limited to 8000 bytes, handles unicode & ansi in one function) way to split strings is to use a SQL CLR function.

Very interesting about the XML attribute thing though.

Paul
January 26, 2012 9:50 AM
 

Phil Factor said:

Sure, Paul. I agree entirely, but this blog was intended to focus purely on the relative performance of XML Lists, because the comments about their poor performance when used as parameters had bothered me for a while. It got more interesting after wBob's contribution that showed that XML attribute lists were, by contrast, pretty handy. I used other methods purely for comparison of performance. I'd hate to trespass on the stamping ground of experts like Jeff and Erland.
January 26, 2012 10:02 AM
 

Sviridov Konstantin said:

Hi Phil,

You will be amazed how fast elements-based XML approach could be if we change code just a little

 SELECT x.y.value('.','int') AS IDs
        FROM @XMLList.nodes('/list/i/text()') AS x ( y )
January 26, 2012 10:16 AM
 

Sviridov Konstantin said:

We can use the same method with attributes

   SELECT x.y.value('.','int')
      FROM @AttributeBasedXMLList.nodes('root/y/@x') AS x(y)
January 26, 2012 10:29 AM
 

Phil Factor said:

Yes, With Sviridov Konstantin's modification, the performance of both XML lists drop to be faster even than the amazing ‘DelimitedSplit8K'. Results to follow shortly. Maybe we'll have to bring in the CLR solution too now.
January 26, 2012 11:02 AM
 

gsacavdm said:

Inspired by this article I asked around and came across the following whitepaper which explains why such a performance difference exists:
http://msdn.microsoft.com/en-US/library/ms345118(v=SQL.90).aspx

In short, for element based, using .nodes('list/i') and then .value('.','int') equates to: retrieve contents of i and ALL ITS CHILD ELEMENTS.

Whereas .nodes('root/y') and then .value('@i','int') equates to: retrieve contents of y@i.

Given how Xml is internally structured in SQL, the first is much more demanding than the latter.

However, as Sviridov suggested, if you use .text with element-based XML, you''ll obtain the same performance as attribute based since you eliminate the "ALL OF ITS CHILD ELEMENTS" portion.
January 26, 2012 5:24 PM
 

Phil Factor said:

Ah! of course. That is brilliant detective-work. Wow, a lot of us DBAs  have been doing it wrongish.  So poor XML has been getting an underserved reputation for slowness, but what can you expect with its klingon-style complexity (and I don't exactly have any child elements!). Conclusion seems to be that, with Sviridov's tweaks, XML would seem to provide the fastest non-CLR way of passing lists as variables. - if you stick to the right syntax.
January 26, 2012 6:57 PM
 

efb060448 said:

Excellent article.  I have been using the "slow" method and these simple changes will help performance enormously. Much appreciated. Thanks
January 28, 2012 8:26 PM
You need to sign in to comment on this blog
<January 2012>
SuMoTuWeThFrSa
25262728293031
1234567
891011121314
15161718192021
22232425262728
2930311234
Exploring SSIS Architecture and Execution History Through Scripting
 When you are using SSIS, there soon comes a time when you are confronted with having to do a tricky... Read more...

A Testing Perspective of Controllers and Orchestrators
 The neat separation between processing and rendering in ASP.NET MVC guarantees you an application... Read more...

TortoiseSVN and Subversion Cookbook Part 4: Sharing Common Code
 Michael Sorens continues his series on Source Control with Subversion and TortoiseSVN by describing... Read more...

How to Kill a Company in One Step or Save it in Three
 The majority of companies that suffer a major data loss subsequently go out of business. Wesley David... Read more...

Migrating from OCS 2007 R2 to Lync: Part 4
 Having migrated the rest of our users and legacy resources across and started getting ready to... Read more...