Av rating:
Total votes: 71
Total comments: 35


Anith Sen
Concatenating Row Values in Transact-SQL
31 July 2008

It is an interesting problem in Transact SQL, for which there are a number of solutions and considerable debate. How do you go about producing a summary result in which a distinguishing column from each row in each particular category is listed in a 'aggregate' column? A simple, and intuitive way of displaying data is surprisingly difficult to achieve. Anith Sen gives a summary of different ways, and offers words of caution over the one you choose.

Contents

Introduction

Many a time, SQL programmers are faced with a requirement to generate report-like resultsets directly from a Transact SQL query. In most cases, the requirement arises from the fact that there neither sufficient tools nor in-house expertise to develop tools that can extract the data as a resultset, and then massage the data in the desired display format. Quite often folks are confused about the potential of breaking relational fundamentals such as the First Normal Form or the scalar nature of typed values. (Talking about 1NF violations in a language like SQL which lacks sufficient domain support, allows NULLs and supports duplicates is somewhat ironic to begin with, but that is a topic which requires detailed explanations.)

By 'Concatenating row values' we mean this:
You have a table, view or result that looks like this...
...and you wish to have a resultset like the one below:

In this example we are accessing the sample NorthWind database and using the following SQL


SELECT CategoryId, ProductName

      FROM Northwind..Products

The objective is to return a resultset with two columns, one with the Category Identifier, and the other with a concatenated list of all the Product Names separated by a delimiting character: such as a comma.

Concatenating column values or expressions from multiple rows are usually best done in a client side application language, since the string manipulation capabilities of Transact SQL and SQL based DBMSs are somewhat limited. However, you can do these using different approaches in Transact SQL, but it is best to avoid such methods in long-term solutions

A core issue

Even though SQL, in general, deviates considerably from the relational model, its reliance on certain core aspects of relational foundations makes SQL functional and powerful. One such core aspect is the set based nature of SQL expressions (well, multi-sets to be exact, but for the given context let us ignore the issue of duplication). The primary idea is that tables are unordered and therefore the resultsets of any query that does not have an explicit ORDER BY clause is unordered as well. In other words, the rows in a resultset of a query do not have a prescribed position, unless it is explicitly specified in the query expression.

On the other hand, a concatenated list is an ordered structure. Each element in the list has a specific position. In fact, concatenation itself is an order-utilizing operation in the sense that values can be prefixed or post fixed to an existing list. So approaches that are loosely called “concatenating row values”, “aggregate concatenation” etc. would have to make sure that some kind of an order, either explicit or implicit, should be specified prior to concatenating the row values. If such an ordering criteria is not provided, the concatenated string would be arbitrary in nature.

Considerations

Generally, requests for row value concatenations often comes in two basic flavors, when the number of rows is known and small (typically less than 10) and when the number of rows is unknown and potentially large. It may be better to look at each of them separately.

In some cases, all the programmer wants is just the list of values from a set of rows. There is no grouping or logical partitioning of values such as  the list of email addresses separated by a semicolon or some such. In such situations, the approaches can be the same except that the join conditions may vary. Minor variations of the examples list on this page illustrate such solutions as well.

For the purpose of this article the Products table from Northwind database is used to illustrate column value concatenations with a grouping column. Northwind is a sample database in SQL Server 2000 default installations. You can download a copy from from the Microsoft Downloads

Concatenating values when the number of items is small and known beforehand


When the number of rows is small and almost known beforehand, it is easier to generate the code. One common approach where there is a small set of finite rows is the pivoting method. Here is an example where only the first four alphabetically-sorted product names per categoryid is retrieved:

   SELECT CategoryId,

           MAX( CASE seq WHEN 1 THEN ProductName ELSE '' END ) + ', ' +

           MAX( CASE seq WHEN 2 THEN ProductName ELSE '' END ) + ', ' +

           MAX( CASE seq WHEN 3 THEN ProductName ELSE '' END ) + ', ' +

           MAX( CASE seq WHEN 4 THEN ProductName ELSE '' END )

      FROM ( SELECT p1.CategoryId, p1.ProductName,

                    ( SELECT COUNT(*)

                        FROM Northwind.dbo.Products p2

                        WHERE p2.CategoryId = p1.CategoryId

                        AND p2.ProductName <= p1.ProductName )

             FROM Northwind.dbo.Products p1 ) D ( CategoryId, ProductName, seq )

     GROUP BY CategoryId ;

 The idea here is to create a expression inside the correlated subquery that produces a rank (seq) based on the product names and then use it in the outer query. Using common table expressions and the ROW_NUMBER() function, you can re-write this as:

; WITH CTE ( CategoryId, ProductName, seq )

     AS ( SELECT p1.CategoryId, p1.ProductName,

            ROW_NUMBER() OVER ( PARTITION BY CategoryId ORDER BY ProductName )

           FROM Northwind.dbo.Products p1 )

SELECT CategoryId,

           MAX( CASE seq WHEN 1 THEN ProductName ELSE '' END ) + ', ' +

           MAX( CASE seq WHEN 2 THEN ProductName ELSE '' END ) + ', ' +

           MAX( CASE seq WHEN 3 THEN ProductName ELSE '' END ) + ', ' +

           MAX( CASE seq WHEN 4 THEN ProductName ELSE '' END )

     FROM CTE

     GROUP BY CategoryId ;

 Note that ROW_NUMBER() is a newly-introduced feature in SQL 2005. If you are using any previous version, you will have to use the subquery approach (You can also use a self-join, to write it a bit differently). Using the recently introduced PIVOT operator, you can write this as follows:

SELECT CategoryId,

           "1" + ', ' + "2" + ', ' + "3" + ', ' + "4" AS Product_List

      FROM ( SELECT CategoryId, ProductName,

                    ROW_NUMBER() OVER (PARTITION BY CategoryId
             ORDER
BY ProductName)

               FROM Northwind.dbo.Products ) P ( CategoryId, ProductName, seq )

     PIVOT ( MAX( ProductName ) FOR seq IN ( "1", "2", "3", "4" ) ) AS P_ ;

Not only does the syntax appear a bit confusing, but also it does not seem to offer any more functionality than the previous CASE approach. However, in rare situations, it could come in handy.

Concatenating values when the number of items is not known

When you do not know the number of items that are to be concatenated beforehand, the code can become rather  more demanding. The new features in SQL 2005 make some of the approaches easier. For instance, the recursive common table expressions (CTEs) and the FOR XML PATH('') syntax makes the server do the hard work behind the concatenation, leaving the programmer to deal with the presentation issues. The examples below make this point obvious.

Recursive CTE methods

The idea behind this method is from a newsgroup posting by Vadim Tropashko. It is similar to the ideas behind generating a materialized path for hierarchies.

WITH CTE ( CategoryId, product_list, product_name, length )

          AS ( SELECT CategoryId, CAST( '' AS VARCHAR(8000) ), CAST( '' AS VARCHAR(8000) ), 0

                 FROM Northwind..Products

                GROUP BY CategoryId

                UNION ALL

               SELECT p.CategoryId, CAST( product_list +

                      CASE WHEN length = 0 THEN '' ELSE ', ' END + ProductName AS VARCHAR(8000) ),

                      CAST( ProductName AS VARCHAR(8000)), length + 1

                 FROM CTE c

                INNER JOIN Northwind..Products p

                   ON c.CategoryId = p.CategoryId

                WHERE p.ProductName > c.product_name )

SELECT CategoryId, product_list

      FROM ( SELECT CategoryId, product_list,

                    RANK() OVER ( PARTITION BY CategoryId ORDER BY length DESC )

               FROM CTE ) D ( CategoryId, product_list, rank )

     WHERE rank = 1 ;

 The CASE in the recursive part of the CTE is used to eliminate the initial comma, but you can use RIGHT or the SUBSTRING functions instead. This may not be the best performing option, but certain additional tuning could be done to make them suitable for medium sized datasets.

Another approach using recursive common table expressions was sent in by Anub Philip, an Engineer from Sathyam Computers that uses separate common table expressions for the anchor and recursive parts.

WITH Ranked ( CategoryId, rnk, ProductName ) 

             AS ( SELECT CategoryId,

                         ROW_NUMBER() OVER( PARTITION BY CategoryId ORDER BY CategoryId ),

                         CAST( ProductName AS VARCHAR(8000) )

                    FROM Northwind..Products),

   AnchorRanked ( CategoryId, rnk, ProductName ) 

             AS ( SELECT CategoryId, rnk, ProductName

                    FROM Ranked

                   WHERE rnk = 1 ),

RecurRanked ( CategoryId, rnk, ProductName )

             AS ( SELECT CategoryId, rnk, ProductName

                    FROM AnchorRanked

                   UNION ALL

                  SELECT Ranked.CategoryId, Ranked.rnk,

                         RecurRanked.ProductName + ', ' + Ranked.ProductName

                    FROM Ranked

                   INNER JOIN RecurRanked

                      ON Ranked.CategoryId = RecurRanked.CategoryId

                     AND Ranked.rnk = RecurRanked.rnk + 1 )

SELECT CategoryId, MAX( ProductName )

      FROM RecurRanked

  GROUP BY CategoryId;

On first glance, this query may seem a bit expensive in comparison, but the reader is encouraged to check the execution plans and make any additional tweaks as needed.

The blackbox XML methods

Here is a technique for string concatenation that uses the FOR XML clause with PATH mode. It was initially posted by Eugene Kogan, and later became common in public newsgroups.

SELECT p1.CategoryId,

       ( SELECT ProductName + ','

           FROM Northwind.dbo.Products p2

          WHERE p2.CategoryId = p1.CategoryId

          ORDER BY ProductName

            FOR XML PATH('') ) AS Products

      FROM Northwind.dbo.Products p1

      GROUP BY CategoryId ;

 

There is a similar approach that was originally found in the beta newsgroups, using the CROSS APPLY operator.

SELECT DISTINCT CategoryId, ProductNames

    FROM Northwind.dbo.Products p1

   CROSS APPLY ( SELECT ProductName + ','

                     FROM Northwind.dbo.Products p2

                     WHERE p2.CategoryId = p1.CategoryId

                     ORDER BY ProductName

                     FOR XML PATH('') )  D ( ProductNames )

You may notice a comma at the end of the concatenated string, which you can remove using a STUFF, SUBSTRING or LEFT function. While the above methods are deemed reliable by many at the time of writing, there is no guarantee that it will stay that way, given that the internal workings and evaluation rules of FOR XML PATH() expression in correlated subqueries are not well documented.

Using Common Language Runtime

Though this article is about approaches using Transact SQL, this section is included due to the popularity of CLR aggregates in SQL 2005. It not only empowers the CLR programmer with new options for database development, but also, in some cases, they work at least as well as native Transact SQL approaches.

If you are familiar with .NET languages, SQL 2005 offers a convenient way to create user defined aggregate functions using C#, VB.NET or similar languages that are supported by the Common Language Runtime (CLR). Here is an example of a string concatenate aggregate function written using C#.

using System;

using System.Collections.Generic;

using System.Data.SqlTypes;

using System.IO;

using Microsoft.SqlServer.Server;

 

[Serializable]

[SqlUserDefinedAggregate(Format.UserDefined,  MaxByteSize=8000)]

public struct strconcat : IBinarySerialize{

        private List values;

 

        public void Init()    {

            this.values = new List();

        }

 

        public void Accumulate(SqlString value)    {

            this.values.Add(value.Value);

        }

 

        public void Merge(strconcat value)    {

            this.values.AddRange(value.values.ToArray());

        }

 

        public SqlString Terminate()    {

            return new SqlString(string.Join(", ", this.values.ToArray()));

        }

 

        public void Read(BinaryReader r)    {

            int itemCount = r.ReadInt32();

            this.values = new List(itemCount);

            for (int i = 0; i <= itemCount - 1; i++)    {

                this.values.Add(r.ReadString());

            }

        }

 

        public void Write(BinaryWriter w)    {

            w.Write(this.values.Count);

            foreach (string s in this.values)      {

                w.Write(s);

            }

        }

}

Once you build and deploy this assembly on the server, you should be able to execute your concatenation query as:

SELECT CategoryId,

           dbo.strconcat(ProductName)

      FROM Products

     GROUP BY CategoryId ;

 

 If you are a total newbie on CLR languages, and would like to learn more about developing database solutions using CLR languages, consider starting at Introduction to Common Language Runtime (CLR) Integration

Scalar UDF with recursion

Recursive functions in t-SQL have a drawback that the maximum nesting level is 32. So this approach is applicable only for smaller datasets, especially when the number of items within a group, that needs to be concatenated, is less than 32.

 

CREATE FUNCTION udf_recursive ( @cid INT, @i INT )

RETURNS VARCHAR(8000) AS BEGIN

        DECLARE @r VARCHAR(8000), @l VARCHAR(8000)

        SELECT @i = @i - 1,  @r = ProductName + ', '

          FROM Northwind..Products p1

         WHERE CategoryId = @cid

           AND @i = ( SELECT COUNT(*) FROM Northwind..Products p2

                       WHERE p2.CategoryId = p1.CategoryId

                         AND p2.ProductName <= p1.ProductName ) ;

        IF @i > 0 BEGIN

              EXEC @l = dbo.udf_recursive @cid, @i ;

              SET @r =  @l + @r ;

END

RETURN @r ;

END

This function can be invoked as follows:

SELECT CategoryId,

           dbo.udf_recursive( CategoryId, COUNT(ProductName) )

      FROM Northwind..Products

     GROUP BY CategoryId ;

Table valued UDF with a WHILE loop

This approach is based on the idea by Linda Wierzbecki where a table variable with three columns is used within a table-valued UDF. The first column represents the group, second represents the currently processing value within a group and the third represents the concatenated list of values.

 

CREATE FUNCTION udf_tbl_Concat() RETURNS @t TABLE(

            CategoryId INT,

            Product VARCHAR(40),

            list VARCHAR(8000) )

BEGIN

     INSERT @t (CategoryId, Product, list)

     SELECT CategoryId, MIN(ProductName),  MIN(ProductName)

       FROM Products

      GROUP BY CategoryId

WHILE ( SELECT COUNT(Product) FROM @t ) > 0 BEGIN

        UPDATE t

           SET list = list + COALESCE(

                         ( SELECT ', ' + MIN( ProductName )

                             FROM Northwind..Products

                            WHERE Products.CategoryId = t.CategoryId

                              AND Products.ProductName > t.Product), ''),

               Product = ( SELECT MIN(ProductName)

                             FROM Northwind..Products

                            WHERE Products.CategoryId = t.CategoryId

                              AND Products.ProductName > t.Product )

          FROM @t t END

RETURN

END

 

 The usage of the above function can be like:

SELECT CategoryId, list AS Products

  FROM udf_tbl_Concat() ;

 Dynamic SQL

This approach is a variation of the kludge often known using the nickname of  ‘dynamic cross tabulation’. There is enough literature out there which demonstrates the drawbacks and implications of using Dynamic SQL. A popular one, at least from Transact SQL programmer’s perspective, is Erland's Curse and Blessings of Dynamic SQL. The Dynamic SQL approaches can be developed based on creating a Transact SQL query string based on the number of groups and then use a series of CASE expressions or ROW_NUMBER() function to pivot the data for concatenation.

DECLARE @r VARCHAR(MAX), @n INT, @i INT

SELECT @i = 1,

       @r = 'SELECT CategoryId, ' + CHAR(13),

       @n = (SELECT TOP 1 COUNT( ProductName )

                   FROM Northwind..Products

                  GROUP BY CategoryId

                  ORDER BY COUNT( ProductName ) DESC ) ;     

WHILE @i <= @n BEGIN

           SET @r = @r +

           CASE WHEN @i =

                THEN 'MAX( CASE Seq WHEN ' + CAST( @i AS VARCHAR ) + '

                                 THEN ProductName

                                            ELSE SPACE(0) END ) + ' + CHAR(13)

           WHEN @i = @n

             THEN 'MAX( CASE Seq WHEN ' + CAST( @i AS VARCHAR ) + '

                                 THEN '', '' + ProductName

                                 ELSE SPACE(0) END ) ' + CHAR(13)

             ELSE 'MAX( CASE Seq WHEN ' + CAST( @i AS VARCHAR ) + '

                                 THEN '', '' + ProductName

                                 ELSE SPACE(0) END ) + ' + CHAR(13) 

           END ;

           SET @i = @i + 1 ;

END

SET @r = @r + '

    FROM ( SELECT CategoryId, ProductName,

                  ROW_NUMBER() OVER ( PARTITION BY CategoryId ORDER BY ProductName )

             FROM Northwind..Products p ) D ( CategoryId, ProductName, Seq )

           GROUP BY CategoryId;'

EXEC( @r ) ;

 The Cursor approach

The drawbacks of rampant usage of cursors are well-known among the Transact SQL community. Because they are generally resource intensive, procedural and inefficient, one should strive to avoid cursors or loop based solutions in general Transact SQL programming.

DECLARE @tbl TABLE (id INT PRIMARY KEY, list VARCHAR(8000))

SET NOCOUNT ON

DECLARE @c INT, @p VARCHAR(8000), @cNext INT, @pNext VARCHAR(40)

DECLARE c CURSOR FOR

        SELECT CategoryId, ProductName

          FROM Northwind..Products

         ORDER BY CategoryId, ProductName ;

        OPEN c ;

        FETCH NEXT FROM c INTO @cNext, @pNext ;

        SET @c = @cNext ;

        WHILE @@FETCH_STATUS = 0 BEGIN

             IF @cNext > @c BEGIN

                  INSERT @tbl SELECT @c, @p ;

                  SELECT @p = @PNext, @c = @cNext ;

             END ELSE

                  SET @p = COALESCE(@p + ',', SPACE(0)) + @pNext ;

             FETCH NEXT FROM c INTO @cNext, @pNext

        END

        INSERT @tbl SELECT @c, @p ;

        CLOSE c ;

DEALLOCATE c ;

SELECT * FROM @tbl ;

 

 Unreliable approaches

This section details a couple of notorious methods often publicized by some in public forums. The problem with these methods is that they rely on the physical implementation model; changes in indexes, statistics etc or even a change of a simple expression in the SELECT list or ORDER BY clause can change the output. Also these are undocumented, unsupported and unreliable to the point where one can consistently demonstrate failures. Therefore these methods are not recommended at all for production mode systems.

Scalar UDF with t-SQL update extension

It is rare for the usage of an expression that involves a column, a variable and an expression in the SET clause in an UPDATE statement to appear intuitive. However, in general, the optimizer often seems to process these values in the order of materialization, either in the internal work tables or any other storage structures.

CREATE FUNCTION udf_update_concat (@CategoryId INT)

        RETURNS VARCHAR(MAX) AS

BEGIN

DECLARE @t TABLE(p VARCHAR(40));

DECLARE @r VARCHAR(MAX) ;

        SET @r = SPACE(0) ;

        INSERT @t ( p ) SELECT ProductName FROM Northwind..Products

                         WHERE CategoryId = @CategoryId ;

        IF @@ROWCOUNT > 0

            UPDATE @t

               SET @r = @r + p + ',' ;

        RETURN(@r)

END

Here is how to use this function:

SELECT CategoryId, dbo.udf_update_concat(CategoryId)

      FROM Northwind..Products

     GROUP BY CategoryId ;

 

Again, it is important to consider that lack of physical independence that is being exploited here before using or recommending this as a usable and meaningful solution.

Scalar UDF with variable concatenation in SELECT

This is an approach purely dependent on the physical implementation and internal access paths. Before using this approach, make sure to refer to the relevant knowledgebase article.

CREATE FUNCTION dbo.udf_select_concat ( @c INT )

RETURNS VARCHAR(MAX) AS BEGIN

DECLARE @p VARCHAR(MAX) ;

           SET @p = '' ;

        SELECT @p = @p + ProductName + ','

          FROM Northwind..Products

         WHERE CategoryId = @c ;

RETURN @p

END

And, as for its usage:

SELECT CategoryId, dbo.udf_select_concat( CategoryId )

      FROM Northwind..Products

     GROUP BY CategoryId ;

Conclusion

Regardless of how it is used, "aggregate concatenation" of row values in Transact SQL, especially when there is a grouping, is not a simple routine. You need to consider carefully the circumstances  before you choose one method over another.  The most logical choice would to have a built-in operator with optional configurable parameters that can do the concatenation of the values depending on the type. Till then, reporting requirements and external data export routines will have to rely on such Transact SQL programming hacks.

References

Acknowledgements

Umachandar Jayachandran, Linda Wierzbecki, Bruce Margolin, Roy Harvey, Eugene Kogan, Vadim Tropashko, Anub Philip.



This article has been viewed 19033 times.
Anith Sen

Author profile: Anith Sen

Anith S Larson is an independent consultant specializing in data management primarily using SQL Server. From the mid 90s, he has been working on a variety of database design and application developement projects for clients primarily in wealth management and financial services industry. Besides having a few IT certifications and a Masters degree in Computer Applications, Anith is currently working towards his MBA. He resides in Lakeland, TN with his family.

Search for other articles by Anith Sen

Rate this article:   Avg rating: from a total of 71 votes.


Poor

OK

Good

Great

Must read
 
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: My Solution
Posted by: DaveTheWave (view profile)
Posted on: Thursday, July 31, 2008 at 3:49 PM
Message:

I use the following. This is not completely my code. I found most if not nearly all of this on the net.

CREATE TABLE ParameterKeys
  
(
    
ParameterKey INT IDENTITY
                    
PRIMARY KEY,
    
DateCreated DATETIME NOT NULL
   )

CREATE TABLE Param_Concatenation
  
(
    
ParameterKey INT REFERENCES ParameterKeys (ParameterKey)
      
ON DELETE CASCADE,
    
ValueKey VARCHAR(50) NOT NULL,
    
ValueSubKey1 VARCHAR(50) NULL,
    
ValueSubKey2 VARCHAR(50) NULL,
    
Value VARCHAR(50) NOT NULL
   )
GO
CREATE PROCEDURE Param_GetKey
  
@ParameterKey INT OUTPUT
AS
   DELETE   FROM
ParameterKeys
  
WHERE    DateCreated < DATEADD(DAY, -7, GETDATE())
  
INSERT   INTO ParameterKeys
            
(DateCreated)
  
VALUES   (GETDATE())
  
SET @ParameterKey = SCOPE_IDENTITY()
GO

CREATE PROCEDURE Param_DeleteKey @ParameterKey INT
AS
   DELETE   FROM
ParameterKeys
  
WHERE    ParameterKey = @ParameterKey
GO

CREATE FUNCTION fn_Concatenate
  
(
    
@ParameterKey INT,
    
@ValueKey VARCHAR(50),
    
@ValueSubKey1 VARCHAR(50),
    
@ValueSubKey2 VARCHAR(50),
    
@Delimiter VARCHAR(50)
   )
RETURNS VARCHAR(8000)
AS BEGIN
      DECLARE
@tempTable TABLE (value VARCHAR(50))
      
DECLARE @result VARCHAR(8000)
      
SET @result = 'PlaceHolderDummyText'

      
INSERT   @tempTable
              
SELECT   Value
              
FROM     Param_Concatenation
              
WHERE    ParameterKey = @ParameterKey
                
AND ValueKey = @ValueKey
                
AND ValueSubKey1 = @ValueSubKey1
                
AND ValueSubKey2 = @ValueSubKey2
              
ORDER BY Value ASC

      IF
@@ROWCOUNT > 0
        
UPDATE   @tempTable
        
SET      @result = @result + @Delimiter + ' ' + value

      
RETURN (REPLACE(
              
REPLACE(@result, 'PlaceHolderDummyText, ', ''),
              
'PlaceHolderDummyText',
                
''  )
            )
  
END
/**
USAGE
If the ORDERSDETAIL table contains the following data:

ORDERID FROM_TYPE FROM_COL ...
...
65748  99        5186
65748  99        5803
65748  99        5804
65748  99        5806
65748  99        5808
65748  99        5811
65748  99        5812
65748  99        9916
65748  99        9919
65748  99        9920
44055  99        5186
44055  99        5803
44055  99        5804
44055  99        5805
44055  99        5806
44055  99        5808
44055  99        5809
44055  99        5810
44055  99        5811
44055  99        5812
...
*/


DECLARE @ParameterKey INT
EXEC
dbo.Param_GetKey @ParameterKey OUTPUT


INSERT INTO Param_Concatenation
SELECT DISTINCT
      
ParameterKey = @ParameterKey,
      
ValueKey = od.ORDERID,
      
ValueSubKey1 = 'FROM_TYPE',
      
ValueSubKey2 = 'x',
      
Value = od.FROM_COL
  
FROM ORDERSDETAIL od
  
WHERE od.ORDERID IN ('65748','44055')

SELECT dbo.fn_Concatenate(  @ParameterKey,
                          
'65748',
                          
'FROM_TYPE',
                          
'x',
                          
',')
SELECT dbo.fn_Concatenate(  @ParameterKey,
                          
'44055',
                          
'FROM_TYPE',
                          
'x',
                          
',')

--Would product the following output
--Order 65748 would results will be:
---5186, 5803, 5804, 5806, 5808, 5811, 5812, 9916, 9919, 9920
--Order 44055 would results will be:
---5186, 5803, 5804, 5805, 5806, 5808, 5809, 5810, 5811, 5812

Subject: Ooh! This looks fun
Posted by: Phil Factor (view profile)
Posted on: Thursday, July 31, 2008 at 4:16 PM
Message: Here is my method. It works in both SQL Server 2000 and 2005. Goodness me, Anith, you're going to hate it.

DECLARE @accumulation VARCHAR(7000),
--variable used for accumulating lists
@CategoryID INT
--variable used for keeping tabs on the GROUPING id

DECLARE @grouping TABLE--temporary table
(
  
MyID INT IDENTITY(1, 1) PRIMARY KEY,
  
CategoryID INT,
  
ProductName VARCHAR(255),
  
accumulation VARCHAR(7000)--used to collect the list
)
INSERT INTO @Grouping --insert raw result you want a GROUPING of
(CategoryID, ProductName)
  
SELECT CategoryID, ProductName
      
FROM Northwind..Products
      
ORDER BY CategoryID, ProductName

UPDATE @grouping --and update the table, doing the accumulation.
  
SET @Accumulation = [accumulation]
      
= COALESCE(
              
CASE WHEN CategoryID <> COALESCE(@CategoryID, 0)
              
THEN '' + productName
              
ELSE LEFT(@Accumulation + ',' + productName, 7000)
          
END,
          
''),
      
@CategoryID = CategoryID

SELECT CategoryID, MAX(accumulation)
  
FROM @grouping
  
GROUP BY CategoryID
  
ORDER BY CategoryID

Subject: CLR methods
Posted by: Adam Machanic (view profile)
Posted on: Friday, August 01, 2008 at 9:09 AM
Message: Nice to see you here, Anith!

Regarding the CLR method, if you're not afraid to work with some in-memory structures you can both greatly improve performance and exceed the 8000-byte barrier. I suspect that FOR XML PATH will still deliver better performance in many cases, though.

One caveat of FOR XML PATH method is the issue of "entitization." Try the following:---
SELECT
  
n + ',' AS [text()]
FROM
(
  
SELECT 'a<b' AS n
  
UNION ALL
  
SELECT 'b>a'
) r
FOR XML PATH('')
---

...

TO fix this problem we need to play some rather annoying games:

---
SELECT
(
  
SELECT
  
(
    
SELECT
      
n + ',' AS [text()]
    
FROM
    
(
      
SELECT 'a<b' AS n
      
UNION ALL
      
SELECT 'b>a'
    
) r
    
FOR XML PATH(''), TYPE
  
) AS concat
  
FOR XML RAW, TYPE
).value('/row[1]/concat[1]', 'varchar(max)')
---

<plug>
I wrote about the CLR method in detail in "Expert SQL Server 2005 Development" (Apress, 2007)
</plug>


Subject: Re: Ooh! This looks fun
Posted by: Anith Sen (not signed in)
Posted on: Friday, August 01, 2008 at 9:31 AM
Message: Good job, Phil.

Since you are using a table variable you have control over the underlying indexes and so you don't have to worry about somebody else changing them and affecting the output.

I never completely understood the underlying evaluation scheme in t-SQL UPDATE EXTENSION. Is it right to left, all at once, random, I have no idea. Regardless, it seems to work fine here.

Another concern is the INSERT .. SELECT statement with an ORDER BY clause. Again, in this case, since you are using a table variable it seems like there shouldn't be any problems. There has been several online discussions (for instance, http://tinyurl.com/67yovl) and I believe the consensus now is that it should work as expected.

Subject: Re: CLR methods
Posted by: Anith Sen (not signed in)
Posted on: Friday, August 01, 2008 at 10:18 AM
Message: Hello Adam,

That is an excellent point. Rather than using REPLACE function, it is better to have the value method of the xml type. One issue seems to be, as you said, the "annoying games" with multiple levels of derivation. I am not completed sure how to proceed when there is a grouping column; Do you include the grouped column as an element in your xml value like:

SELECT
categoryid, Products
  
FROM ( SELECT p1.CategoryId,
       (
SELECT ProductName + ','
          
FROM Northwind.dbo.Products p2
          
WHERE p2.CategoryId = p1.CategoryId
          
ORDER BY ProductName
            
FOR XML PATH(''), TYPE, ELEMENTS,
      
ROOT ('productname') )  AS Products
      
FROM Northwind.dbo.Products p1
      
GROUP BY CategoryId ) AS list FOR XML RAW, TYPE

Thanks

Subject: grouping
Posted by: Adam Machanic (view profile)
Posted on: Friday, August 01, 2008 at 1:41 PM
Message: Agreed, grouping is a problem, but not just with XML -- virtually all of these solutions (except the cursor and CLR UDA) have quite a bit of unnecessary overhead because the engine doesn't support this stuff.

Regarding cursors, yet another option is a CLR stored procedure. You can read the data with SqlDataReader, do the concatenation with StringBuilder, and send all of the results back to the caller in a single rowset rather than stuffing them into a temp table. Much, much more efficient than a T-SQL cursor.

Subject: Good article
Posted by: PedroJ (not signed in)
Posted on: Saturday, August 02, 2008 at 12:19 AM
Message: This is a useful article. Some methods are very new to me.

Subject: Great article
Posted by: Anonymous (not signed in)
Posted on: Thursday, August 07, 2008 at 2:28 AM
Message: Recursive CTE methods is bit hard to understand

Subject: Here is another way of doing it
Posted by: Phil Factor (view profile)
Posted on: Thursday, August 07, 2008 at 8:56 AM