31 July 2008

Concatenating Row Values in Transact-SQL

It is an interesting problem in Transact SQL, for which there are a number of solutions and considerable debate. How do you go about producing a summary result in which a distinguishing column from each row in each particular category is listed in a 'aggregate' column? A simple, and intuitive way of displaying data is surprisingly difficult to achieve. Anith Sen gives a summary of different ways, and offers words of caution over the one you choose.

Contents

Introduction

Many a time, SQL programmers are faced with a requirement to generate report-like resultsets directly from a Transact SQL query. In most cases, the requirement arises from the fact that there neither sufficient tools nor in-house expertise to develop tools that can extract the data as a resultset, and then massage the data in the desired display format. Quite often folks are confused about the potential of breaking relational fundamentals such as the First Normal Form or the scalar nature of typed values. (Talking about 1NF violations in a language like SQL which lacks sufficient domain support, allows NULLs and supports duplicates is somewhat ironic to begin with, but that is a topic which requires detailed explanations.)

539-ConcatenateClip2.jpg

By ‘Concatenating row values’ we mean this:
You have a table, view or result that looks like this…
…and you wish to have a resultset like the one below:

In this example we are accessing the sample NorthWind database and using the following SQL

The objective is to return a resultset with two columns, one with the Category Identifier, and the other with a concatenated list of all the Product Names separated by a delimiting character: such as a comma.

539-ConcatenateClip.jpg

Concatenating column values or expressions from multiple rows are usually best done in a client side application language, since the string manipulation capabilities of Transact SQL and SQL based DBMSs are somewhat limited. However, you can do these using different approaches in Transact SQL, but it is best to avoid such methods in long-term solutions

A core issue

Even though SQL, in general, deviates considerably from the relational model, its reliance on certain core aspects of relational foundations makes SQL functional and powerful. One such core aspect is the set based nature of SQL expressions (well, multi-sets to be exact, but for the given context let us ignore the issue of duplication). The primary idea is that tables are unordered and therefore the resultsets of any query that does not have an explicit ORDER BY clause is unordered as well. In other words, the rows in a resultset of a query do not have a prescribed position, unless it is explicitly specified in the query expression.

On the other hand, a concatenated list is an ordered structure. Each element in the list has a specific position. In fact, concatenation itself is an order-utilizing operation in the sense that values can be prefixed or post fixed to an existing list. So approaches that are loosely called “concatenating row values”, “aggregate concatenation” etc. would have to make sure that some kind of an order, either explicit or implicit, should be specified prior to concatenating the row values. If such an ordering criteria is not provided, the concatenated string would be arbitrary in nature.

Considerations

Generally, requests for row value concatenations often comes in two basic flavors, when the number of rows is known and small (typically less than 10) and when the number of rows is unknown and potentially large. It may be better to look at each of them separately.

In some cases, all the programmer wants is just the list of values from a set of rows. There is no grouping or logical partitioning of values such as  the list of email addresses separated by a semicolon or some such. In such situations, the approaches can be the same except that the join conditions may vary. Minor variations of the examples list on this page illustrate such solutions as well.

For the purpose of this article the Products table from Northwind database is used to illustrate column value concatenations with a grouping column. Northwind is a sample database in SQL Server 2000 default installations. You can download a copy from from the Microsoft Downloads

Concatenating values when the number of items is small and known beforehand

When the number of rows is small and almost known beforehand, it is easier to generate the code. One common approach where there is a small set of finite rows is the pivoting method. Here is an example where only the first four alphabetically-sorted product names per categoryid is retrieved:

 The idea here is to create a expression inside the correlated subquery that produces a rank (seq) based on the product names and then use it in the outer query. Using common table expressions and the ROW_NUMBER() function, you can re-write this as:

 Note that ROW_NUMBER() is a newly-introduced feature in SQL 2005. If you are using any previous version, you will have to use the subquery approach (You can also use a self-join, to write it a bit differently). Using the recently introduced PIVOT operator, you can write this as follows:

Not only does the syntax appear a bit confusing, but also it does not seem to offer any more functionality than the previous CASE approach. However, in rare situations, it could come in handy.

Concatenating values when the number of items is not known

When you do not know the number of items that are to be concatenated beforehand, the code can become rather  more demanding. The new features in SQL 2005 make some of the approaches easier. For instance, the recursive common table expressions (CTEs) and the FOR XML PATH(”) syntax makes the server do the hard work behind the concatenation, leaving the programmer to deal with the presentation issues. The examples below make this point obvious.

Recursive CTE methods

The idea behind this method is from a newsgroup posting by Vadim Tropashko. It is similar to the ideas behind generating a materialized path for hierarchies.

 The CASE in the recursive part of the CTE is used to eliminate the initial comma, but you can use RIGHT or the SUBSTRING functions instead. This may not be the best performing option, but certain additional tuning could be done to make them suitable for medium sized datasets.

Another approach using recursive common table expressions was sent in by Anub Philip, an Engineer from Sathyam Computers that uses separate common table expressions for the anchor and recursive parts.

On first glance, this query may seem a bit expensive in comparison, but the reader is encouraged to check the execution plans and make any additional tweaks as needed.

The blackbox XML methods

Here is a technique for string concatenation that uses the FOR XML clause with PATH mode. It was initially posted by Eugene Kogan, and later became common in public newsgroups.

There is a similar approach that was originally found in the beta newsgroups, using the CROSS APPLY operator.

You may notice a comma at the end of the concatenated string, which you can remove using a STUFF, SUBSTRING or LEFT function. While the above methods are deemed reliable by many at the time of writing, there is no guarantee that it will stay that way, given that the internal workings and evaluation rules of FOR XML PATH() expression in correlated subqueries are not well documented.

The problem with this approach is that the contents of the ProductName column is interpreted a XML rather than text, which will lead to  certain characters being ‘entitized’, or in some cases, leading to the SQL causing an error. (see note below, and solution by Adam Machanic in comments below) and to avoid this, it is better to use a slightly revised syntax like this..

…and this…

Using Common Language Runtime

Though this article is about approaches using Transact SQL, this section is included due to the popularity of CLR aggregates in SQL 2005. It not only empowers the CLR programmer with new options for database development, but also, in some cases, they work at least as well as native Transact SQL approaches.

If you are familiar with .NET languages, SQL 2005 offers a convenient way to create user defined aggregate functions using C#, VB.NET or similar languages that are supported by the Common Language Runtime (CLR). Here is an example of a string concatenate aggregate function written using C#.

Once you build and deploy this assembly on the server, you should be able to execute your concatenation query as:

 

 If you are a total newbie on CLR languages, and would like to learn more about developing database solutions using CLR languages, consider starting at Introduction to Common Language Runtime (CLR) Integration

Scalar UDF with recursion

Recursive functions in t-SQL have a drawback that the maximum nesting level is 32. So this approach is applicable only for smaller datasets, especially when the number of items within a group, that needs to be concatenated, is less than 32.

 

This function can be invoked as follows:

Table valued UDF with a WHILE loop

This approach is based on the idea by Linda Wierzbecki where a table variable with three columns is used within a table-valued UDF. The first column represents the group, second represents the currently processing value within a group and the third represents the concatenated list of values.

 

 The usage of the above function can be like:

 Dynamic SQL

This approach is a variation of the kludge often known using the nickname of  ’dynamic cross tabulation’. There is enough literature out there which demonstrates the drawbacks and implications of using Dynamic SQL. A popular one, at least from Transact SQL programmer’s perspective, is Erland’s Curse and Blessings of Dynamic SQL. The Dynamic SQL approaches can be developed based on creating a Transact SQL query string based on the number of groups and then use a series of CASE expressions or ROW_NUMBER() function to pivot the data for concatenation.

 The Cursor approach

The drawbacks of rampant usage of cursors are well-known among the Transact SQL community. Because they are generally resource intensive, procedural and inefficient, one should strive to avoid cursors or loop based solutions in general Transact SQL programming.

 Unreliable approaches

This section details a couple of notorious methods often publicized by some in public forums. The problem with these methods is that they rely on the physical implementation model; changes in indexes, statistics etc or even a change of a simple expression in the SELECT list or ORDER BY clause can change the output. Also these are undocumented, unsupported and unreliable to the point where one can consistently demonstrate failures. Therefore these methods are not recommended at all for production mode systems.

Scalar UDF with t-SQL update extension

It is rare for the usage of an expression that involves a column, a variable and an expression in the SET clause in an UPDATE statement to appear intuitive. However, in general, the optimizer often seems to process these values in the order of materialization, either in the internal work tables or any other storage structures.

Here is how to use this function:

 

Again, it is important to consider that lack of physical independence that is being exploited here before using or recommending this as a usable and meaningful solution.

Scalar UDF with variable concatenation in SELECT

This is an approach purely dependent on the physical implementation and internal access paths. Before using this approach, make sure to refer to the relevant knowledgebase article.

And, as for its usage:

Conclusion

Regardless of how it is used, “aggregate concatenation” of row values in Transact SQL, especially when there is a grouping, is not a simple routine. You need to consider carefully the circumstances  before you choose one method over another.  The most logical choice would to have a built-in operator with optional configurable parameters that can do the concatenation of the values depending on the type. Till then, reporting requirements and external data export routines will have to rely on such Transact SQL programming hacks.

References

Acknowledgements

Umachandar Jayachandran, Linda Wierzbecki, Bruce Margolin, Roy Harvey, Eugene Kogan, Vadim Tropashko, Anub Philip.

Additional information about the XML Blackbox approach inserted 22/03/2012.

Keep up to date with Simple-Talk

For more articles like this delivered fortnightly, sign up to the Simple-Talk newsletter

This post has been viewed 711641 times – thanks for reading.

Tags: , ,

  • Rate
    [Total: 524    Average: 4.2/5]
  • Share

Anith Sen

View all articles by Anith Sen

  • DaveTheWave

    My Solution

    I use the following. This is not completely my code. I found most if not nearly all of this on the net.

    CREATE TABLE ParameterKeys
      
    (
        
    ParameterKey INT IDENTITY
                        
    PRIMARY KEY,
        
    DateCreated DATETIME NOT NULL
       )

    CREATE TABLE Param_Concatenation
      
    (
        
    ParameterKey INT REFERENCES ParameterKeys (ParameterKey)
          
    ON DELETE CASCADE,
        
    ValueKey VARCHAR(50) NOT NULL,
        
    ValueSubKey1 VARCHAR(50) NULL,
        
    ValueSubKey2 VARCHAR(50) NULL,
        
    Value VARCHAR(50) NOT NULL
       )
    GO
    CREATE PROCEDURE Param_GetKey
      
    @ParameterKey INT OUTPUT
    AS
       DELETE   FROM
    ParameterKeys
      
    WHERE    DateCreated < DATEADD(DAY, –7, GETDATE())
      
    INSERT   INTO ParameterKeys
                
    (DateCreated)
      
    VALUES   (GETDATE())
      
    SET @ParameterKey = SCOPE_IDENTITY()
    GO

    CREATE PROCEDURE Param_DeleteKey @ParameterKey INT
    AS
       DELETE   FROM
    ParameterKeys
      
    WHERE    ParameterKey = @ParameterKey
    GO

    CREATE FUNCTION fn_Concatenate
      
    (
        
    @ParameterKey INT,
        
    @ValueKey VARCHAR(50),
        
    @ValueSubKey1 VARCHAR(50),
        
    @ValueSubKey2 VARCHAR(50),
        
    @Delimiter VARCHAR(50)
       )
    RETURNS VARCHAR(8000)
    AS BEGIN
          DECLARE
    @tempTable TABLE (value VARCHAR(50))
          
    DECLARE @result VARCHAR(8000)
          
    SET @result = ‘PlaceHolderDummyText’

          INSERT   @tempTable
                  
    SELECT   Value
                  
    FROM     Param_Concatenation
                  
    WHERE    ParameterKey = @ParameterKey
                    
    AND ValueKey = @ValueKey
                    
    AND ValueSubKey1 = @ValueSubKey1
                    
    AND ValueSubKey2 = @ValueSubKey2
                  
    ORDER BY Value ASC

          IF @@ROWCOUNT > 0
            
    UPDATE   @tempTable
            
    SET      @result = @result + @Delimiter + ‘ ‘ + value

          RETURN (REPLACE(
                  
    REPLACE(@result, ‘PlaceHolderDummyText, ‘, ),
                  
    ‘PlaceHolderDummyText’,
                    
    ”  )
                )
      
    END
    /**
    USAGE
    If the ORDERSDETAIL table contains the following data:

    ORDERID FROM_TYPE FROM_COL …

    65748  99        5186
    65748  99        5803
    65748  99        5804
    65748  99        5806
    65748  99        5808
    65748  99        5811
    65748  99        5812
    65748  99        9916
    65748  99        9919
    65748  99        9920
    44055  99        5186
    44055  99        5803
    44055  99        5804
    44055  99        5805
    44055  99        5806
    44055  99        5808
    44055  99        5809
    44055  99        5810
    44055  99        5811
    44055  99        5812

    */

    DECLARE @ParameterKey INT
    EXEC
    dbo.Param_GetKey @ParameterKey OUTPUT

    INSERT INTO Param_Concatenation
    SELECT DISTINCT
          
    ParameterKey = @ParameterKey,
          
    ValueKey = od.ORDERID,
          
    ValueSubKey1 = ‘FROM_TYPE’,
          
    ValueSubKey2 = ‘x’,
          
    Value = od.FROM_COL
      
    FROM ORDERSDETAIL od
      
    WHERE od.ORDERID IN (‘65748’,‘44055’)

    SELECT dbo.fn_Concatenate(  @ParameterKey,
                              
    ‘65748’,
                              
    ‘FROM_TYPE’,
                              
    ‘x’,
                              
    ‘,’)
    SELECT dbo.fn_Concatenate(  @ParameterKey,
                              
    ‘44055’,
                              
    ‘FROM_TYPE’,
                              
    ‘x’,
                              
    ‘,’)

    –Would product the following output
    –Order 65748 would results will be:
    —5186, 5803, 5804, 5806, 5808, 5811, 5812, 9916, 9919, 9920
    –Order 44055 would results will be:
    —5186, 5803, 5804, 5805, 5806, 5808, 5809, 5810, 5811, 5812

  • Phil Factor

    Ooh! This looks fun
    Here is my method. It works in both SQL Server 2000 and 2005. Goodness me, Anith, you’re going to hate it.

    DECLARE @accumulation VARCHAR(7000),
    –variable used for accumulating lists
    @CategoryID INT
    –variable used for keeping tabs on the GROUPING id

    DECLARE @grouping TABLE–temporary table
    (
      
    MyID INT IDENTITY(1, 1) PRIMARY KEY,
      
    CategoryID INT,
      
    ProductName VARCHAR(255),
      
    accumulation VARCHAR(7000)–used to collect the list
    )
    INSERT INTO @Grouping –insert raw result you want a GROUPING of
    (CategoryID, ProductName)
      
    SELECT CategoryID, ProductName
          
    FROM Northwind..Products
          
    ORDER BY CategoryID, ProductName

    UPDATE @grouping –and update the table, doing the accumulation.
      
    SET @Accumulation = [accumulation]
          
    = COALESCE(
                  
    CASE WHEN CategoryID <> COALESCE(@CategoryID, 0)
                  
    THEN + productName
                  
    ELSE LEFT(@Accumulation + ‘,’ + productName, 7000)
              
    END,
              
    ),
          
    @CategoryID = CategoryID

    SELECT CategoryID, MAX(accumulation)
      
    FROM @grouping
      
    GROUP BY CategoryID
      
    ORDER BY CategoryID

  • Adam Machanic

    CLR methods
    Nice to see you here, Anith!

    Regarding the CLR method, if you’re not afraid to work with some in-memory structures you can both greatly improve performance and exceed the 8000-byte barrier. I suspect that FOR XML PATH will still deliver better performance in many cases, though.

    One caveat of FOR XML PATH method is the issue of “entitization.” Try the following:
    SELECT
      
    n + ‘,’ AS [text()]
    FROM
    (
      
    SELECT ‘a<b’ AS n
      
    UNION ALL
      
    SELECT ‘b>a’
    ) r
    FOR XML PATH()

    TO fix this problem we need to play some rather annoying games:


    SELECT
    (
      
    SELECT
      
    (
        
    SELECT
          
    n + ‘,’ AS [text()]
        
    FROM
        
    (
          
    SELECT ‘a<b’ AS n
          
    UNION ALL
          
    SELECT ‘b>a’
        
    ) r
        
    FOR XML PATH(), TYPE
      
    ) AS concat
      
    FOR XML RAW, TYPE
    ).value(‘/row[1]/concat[1]’, ‘varchar(max)’)
    <plug>
    I wrote about the CLR method in detail in “Expert SQL Server 2005 Development” (Apress, 2007)
    </plug>

  • Anith Sen

    Re: Ooh! This looks fun
    Good job, Phil.

    Since you are using a table variable you have control over the underlying indexes and so you don’t have to worry about somebody else changing them and affecting the output.

    I never completely understood the underlying evaluation scheme in t-SQL UPDATE EXTENSION. Is it right to left, all at once, random, I have no idea. Regardless, it seems to work fine here.

    Another concern is the INSERT .. SELECT statement with an ORDER BY clause. Again, in this case, since you are using a table variable it seems like there shouldn’t be any problems. There has been several online discussions (for instance, http://tinyurl.com/67yovl) and I believe the consensus now is that it should work as expected.

  • Anith Sen

    Re: CLR methods
    Hello Adam,

    That is an excellent point. Rather than using REPLACE function, it is better to have the value method of the xml type. One issue seems to be, as you said, the “annoying games” with multiple levels of derivation. I am not completed sure how to proceed when there is a grouping column; Do you include the grouped column as an element in your xml value like:

    SELECT categoryid, Products
      
    FROM ( SELECT p1.CategoryId,
           (
    SELECT ProductName + ‘,’
              
    FROM Northwind.dbo.Products p2
              
    WHERE p2.CategoryId = p1.CategoryId
              
    ORDER BY ProductName
                
    FOR XML PATH(), TYPE, ELEMENTS,
          
    ROOT (‘productname’) )  AS Products
          
    FROM Northwind.dbo.Products p1
          
    GROUP BY CategoryId ) AS list FOR XML RAW, TYPE


    Thanks

  • Adam Machanic

    grouping
    Agreed, grouping is a problem, but not just with XML — virtually all of these solutions (except the cursor and CLR UDA) have quite a bit of unnecessary overhead because the engine doesn’t support this stuff.

    Regarding cursors, yet another option is a CLR stored procedure. You can read the data with SqlDataReader, do the concatenation with StringBuilder, and send all of the results back to the caller in a single rowset rather than stuffing them into a temp table. Much, much more efficient than a T-SQL cursor.

  • PedroJ

    Good article
    This is a useful article. Some methods are very new to me.

  • Anonymous

    Great article
    Recursive CTE methods is bit hard to understand

  • Phil Factor

    Here is another way of doing it

    This method works in SQL Server 2000 upwards. As you’ll see, it goes about it quite a different, slightly cheating, way. As far as I know, Robyn and I invented it, though it took three cups of coffee.


    DECLARE @list VARCHAR(MAX)
    SELECT @List=COALESCE(@list+‘,’,)
       +
    ‘|’+CONVERT(VARCHAR(5),CategoryID)
       +
    ‘|’+ productName+‘|’+CONVERT(VARCHAR(5),CategoryID) +‘|’
      
    FROM northwind..products ORDER BY categoryID,productName

    SELECT CATEGORYid,
      
    [Product List]=LEFT(members,CHARINDEX(‘,|’,members+‘,|’)-1) FROM
    (
    SELECT categoryID, ‘members’=
            
    REPLACE(
              
    SUBSTRING(list,
                
    CHARINDEX(‘|’+CONVERT(VARCHAR(5),CategoryID)+‘|’,list),8000),
              
    ‘|’+CONVERT(VARCHAR(5),CategoryID)+‘|’, )
    FROM (SELECT ‘list’=@List)f
    CROSS JOIN (
      
    SELECT categoryID FROM northwind..products GROUP BY categoryID
      
    )Categories
    )g

  • rudy

    another approach…
    [code]SELECT CategoryId
    , GROUP_CONCAT(ProductName) AS ProductList
    FROM Northwind..Products
    GROUP
    BY CategoryId[/code]

    oh, wait, this how you do it in MySQL

    stupid open sores databases, making things easy!!!

    ;o)

  • Adam Machanic

    Re: another approach…
    Open sores? Sounds like a product I’d rather avoid.

  • TimothyAWiseman

    Excellent Article
    Excellent article and it had some techniques I had never considered before.

    Jeff Moden approached the same topic from a purely performance standpoint in: http://www.sqlservercentral.com/articles/Test+Data/61572/

    And some of the CTE portion is similar to http://www.sqlservercentral.com/articles/CTE/62404/ though that focuses on readability.

    Anith, your article was extremely thorough and enlightening.

  • Anith Sen

    Re: Here is another way of doing it
    That is a new one, Phil. And a good one:-) I see you build the “big” list and then split it up.

    Question, what happens if I change the ORDER BY clause as the following and why?

    ORDER BY LEFT( categoryID, 5 ) , productName

  • Anith Sen

    Re: Excellent Article
    Thanks for the links, Timothy. And for those comments as well 🙂

  • Phil Factor

    Re: : Here is another way of doing it
    Simple concatenation isn’t always reliable. There are circumstances where it simply doesn’t happen. This can vary from one release of SQL Server to another. I sometimes wish they’d tie it down! In the case you mention, there is no supported order for the concatenation to occur. The use of an expression in the ORDER BY forces a different execution plan which produces an indeterminate order for the concatenation, presumably because the order-by is postponed until after the concatenation is performed.

    I was brought up to believe that the ‘order by’ parameter should always be one of those in the result set. I’m not sure if that would help if the order by parameter was the result of a function.

    My own experience is that a simple SQL-92 query will result in a correct concatenation. The ‘Sybase’ update will also usually work reliably, but occasionally needs a ‘tickle’. If you pile in with the fancy stuff, you run a risk that the concatenation ends up with merely the last string added to it.

    Thanks for a fascinating article. It has certainly taught me several things.

  • Prashanth Kumar

    Excellent article
    This is an excellent article.

  • mjswart

    Join and Split
    A very thorough article on Joining/Concatenating strings.

    http://dbwhisperer.blogspot.com/2008/06/splitting-and-joining-strings-in-sql.html mentions joining (using xml path syntax) and it also talks about the inverse task of splitting.

    Splitting a comma-separated-values is a common task I think. Hopefully it will become less common with table valued parameters.

  • Phil Factor

    re: Join and split
    http://www.simple-talk.com/sql/t-sql-programming/the-helper-table-workbench/ might give you a few more ideas on the topic of splitting and joining strings. This article really is more about the problem of producing a list of distinguishing columns of all rows that are grouped together by each unique value in a ‘Group by’ in SQL Server, using concatenation. I’ve heard it called ‘The indexers’ grouping.

  • Anonymous

    join
    ALTER function [dbo].[GetBillStandardFeatures] (
    @UnitID int,
    @IsStandard bit
    )
    returns varchar(6000)
    as
    begin

    declare @str nvarchar(4000)
    declare @nm varchar(50)
    declare @temp varchar(100)
    set @str =”

    DECLARE RuleList CURSOR FOR
    select FeatureID from tblUnitFeature where UnitID = @UnitID
    OPEN RuleList

    FETCH NEXT FROM RuleList into @nm
    WHILE (@@FETCH_STATUS = 0)
    begin
    set @temp =’ ‘
    select @temp = Name from tblFeature where FeatureID in (@nm) and IsStandard = @IsStandard
    if @temp <> ‘ ‘
    begin
    set @str = @str + @temp + ‘ <br/> ‘
    end
    FETCH NEXT FROM RuleList into @nm
    end
    CLOSE RuleList
    DEALLOCATE RuleList

    return @str
    end

    i am using this way in this i pass the id from the out side and they join with “br” and return the string

    vipul
    Cybercom creation (Ahemdabad)
    dongavipul@gmail.com
    http://dongavipul.blogspot.com

  • Cary Hower

    Recursion and row pairing
    declare @t table
    ( CatID tinyint identity(1,1) primary key
    , CategoryID tinyint not null
    , ProductName varchar(4000) not null
    )

    declare @Incr tinyint
    , @Rows tinyint

    insert into @t (CategoryID, ProductName)
    select CategoryID
    , ProductName
    from Products
    order by CategoryID
    , ProductName

    select @Rows = @@RowCount
    select @Incr = 1

    while @Rows > 0
    begin

    update l
    set ProductName = l.ProductName + ‘,’ + r.ProductName
    from @t l
    join @t r
    on l.CategoryID = r.CategoryID
    and l.CatID + @Incr = r.CatID
    and (l.CatID – (select min(CatID) from @t where CategoryID = l.CategoryID)) % (@Incr * 2) = 0

    select @Rows = @@RowCount

    delete l
    from @t l
    where (CatID – (select min(CatID) from @t where CategoryID = l.CategoryID)) % (@Incr * 2) = case @Incr when 1 then 1 else @Incr end

    select @Incr = @Incr * 2
    end

    –A function can be created that returns @t without the identity column

  • Cary Hower

    Recursion and row pairing (correction)
    I apologize for an oversight. The delete statment case was unnecessary. “case @Incr when 1 then 1 else @Incr end” should have been simply “@Incr”

  • Cary Hower

    Recursion and row pairing (correction)
    I apologize for an oversight. The delete statment case was unnecessary. “case @Incr when 1 then 1 else @Incr end” should have been simply “@Incr”

  • Marzio Alunni

    Newbie’s consideration
    Hi,I have little experience with SQL server and I know almost nothing about CLR but reading this article I noticed that only the CLR method let encapsulate the concatenation logic in a separate function.
    It seems to me a big pro.
    The resulting query have far more grater readability, and avoid the overhead imposed by the other methods.
    I’m only concerned about performance… You think this is unmotivated, and due to my ignorance of the CLR subject?

    Thanks (I Apologize for my poor English… ;P)

  • Shamas saeed

    Shamas saeed Considration
    This is good article. You can show row values in one column using , seperated or any other character. This is good for extra knowledge but one thing must mention the we can concate these values in another way and i am doing this from year’s before.
    We can use coleasce function to do this as.

    Select @vProductName = COALESCE(@vProductName +’,’,”) + ProductName from Product

    Declare @vProductName a variable and return select result in , seperated values.

  • Anith Sen

    Re:Newbie’s consideration
    Marzio,
    As far as separating the concatenation logic, t-SQL UDFs should do it just fine. I am not familiar with the performance factors related to CLR. Perhaps, someone else may chime in,

    Anith

  • Anith Sen

    Re: Shamas saeed Considration
    Shamas,
    The method you suggested has some issues — it can be unpredictable even in some cases, doesn’t return any rows. I mentioned it in the article under the section “Unreliable approaches”. You can find several examples of failure discussed in the archives of google groups.

    Anith

  • Marzio Alunni

    Re: Re: Newbie’s consideration
    Sorry, I was not clear.
    When I wrote “only the CLR method let encapsulate the concatenation logic in a separate function”, I meant in a way could by easily reused for every query that need concatenation.
    As we can see from your examples, with UDF I can encapsulate concatenation logic for a specific column of a specific table, so if I need to concatenate another column from another table I have to write another UDF.
    Do you think its a good argument to prefer CLR aggregate on other options?

    Marzio

  • Marzio Alunni

    Re: Re: Newbie’s consideration
    Sorry, I was not clear.
    When I wrote “only the CLR method let encapsulate the concatenation logic in a separate function”, I meant in a way could by easily reused for every query that need concatenation.
    As we can see from your examples, with UDF I can encapsulate concatenation logic for a specific column of a specific table, so if I need to concatenate another column from another table I have to write another UDF.
    Do you think its a good argument to prefer CLR aggregate on other options?

    Marzio

  • niko_san

    new solution
    SELECT CategoryId,CAST(ProductName as nvarchar(max)) as ProductName into #Products from Northwind..Products
    order by CategoryId,ProductName

    declare @StringID nvarchar(max)
    declare @StringData nvarchar(max)
    SET @StringData = ”
    SET @StringID = ”

    UPDATE #Products
    SET
    @StringData = CASE WHEN CategoryId = @StringID
    THEN @StringData + ProductName+’,’
    ELSE ProductName+’,’ END,
    @StringID = CategoryId,
    ProductName = CASE WHEN CategoryId = @StringID
    THEN @StringData
    ELSE ProductName END
    select CategoryId,substring(max(ProductName),1,len(max(ProductName))-1) from #Products
    group by CategoryId
    drop table #Products

  • niko_san

    new solution
    SELECT CategoryId,CAST(ProductName as nvarchar(max)) as ProductName into #Products from Northwind..Products
    order by CategoryId,ProductName

    declare @StringID nvarchar(max)
    declare @StringData nvarchar(max)
    SET @StringData = ”
    SET @StringID = ”

    UPDATE #Products
    SET
    @StringData = CASE WHEN CategoryId = @StringID
    THEN @StringData + ProductName+’,’
    ELSE ProductName+’,’ END,
    @StringID = CategoryId,
    ProductName = CASE WHEN CategoryId = @StringID
    THEN @StringData
    ELSE ProductName END
    select CategoryId,substring(max(ProductName),1,len(max(ProductName))-1) from #Products
    group by CategoryId
    drop table #Products

  • jengo

    sybase list
    table:
    id name
    ———–
    1 test
    2 again
    1 work
    2 better

    sybase sql string:
    select id, list(name) from table_name group by id

    result:
    id list(name)
    —————–
    1 test ,work
    2 again ,better

  • Murali

    Good one
    I have learnt lot of things from this artical.

  • Chi

    length limit
    It works well and is an elegant solution, but when the result is more 255, the rest is cut off. Any idea how to work around this?

    Thanks!

  • Chi

    Re: length limit
    (I was referring to the For XML Path approach)

  • Chris Massey

    Anonymous Commenting Disabled
    Anonymous commenting has been disabled in this article due to spamming. If you want to add your comments, I’m afraid you’ll either need to to sign in or sign up. Sorry for the inconvenience.

  • Lee

    Kick Me
    Fine, I’ll paint the target on my T-shirt on this one: I have been using the “unreliable” string concatenation technique — “Scalar UDF with variable concatenation in SELECT” — since I started out doing Sybase in 1991. Years before, I might add, people began having heart murmurs over this technique. Over and over, I have been lectured on its “unreliability”, but in seventeen years, I have never seen it break.

    But… point taken. It is a good policy not to release code that relies on an unsupported behavior. I do not think it would be easy for Microsoft to break that particular behavior — they would have to change or take away altogether the ability to set a VARCHAR local variable with a SELECT, and physically do something to keep the implied loop in the SELECT from running. But that’s really beside the point. Thanks for a good article.

  • DBowlerHB

    CLR Error
    Great artical Anith!! I really liked the idea of using CLR to add this function so it could be reused easily. I tried to create the new strConcat function using your example word for word unfortunatly I’m getting an error when I build the solution.

    On the line:
    “private System.Collections.Generic.List values;”
    The word “List” is underlined with the following error:
    “Using the gerneric type ‘System.Collections.Gerneric.List<T>’ requires ‘1’ type arguments.”

    Because I am very new to C# I do not know how to fix this issue, and thought I would notify you that your sample may have a problem.

    Any ideas?

    ~DB

  • Adam Machanic

    String Concatenation Challenge/Contest
    Enjoyed this article?

    Good, now go try to solve a T-SQL string concatenation challenge and win a prize:

    http://sqlblog.com/blogs/adam_machanic/archive/2009/02/27/t-sql-challenge-grouped-string-concatenation.aspx

    🙂

  • Dennis

    Re: Shamas saeed Considration
    Like Lee, I use this method a lot. I accept that the order of items in the concatenated list is not reliable. It doesn’t even make sense to Order By on a column that couldn’t logically be present in the result set.

    Aside from that, I don’t understand any purported dependency on physical implementation, indexing or the like.

    Can anybody explain that or give a test case where the concatenation method doesn’t work?

  • RBarryYoung

    Actually, variable concatenation in SELECT is Reliable
    For SQL Server 2005 at least, it is not correct to say that variable concatenation in SELECT is either unreliable or dependent on the physical implementation. The example given is unreliable because it has an error in that it lacks an ORDER BY clause to insure the proper ordering. Here is a correct implementation:

    CREATE FUNCTION dbo.udf_select_concat ( @c INT )
    RETURNS VARCHAR(MAX) AS BEGIN
    DECLARE @p VARCHAR(MAX) ;
    SET @p = ” ;
    SELECT @p = @p + ProductName + ‘,’
    FROM Northwind..Products
    WHERE CategoryId = @c
    ORDER BY ProductName ;
    RETURN @p
    END

    This is all documented and supported straight out of BOL. Finally, the referenced knowledgebase article is not relevant because it is only for SQL Server 2000 and 7.0.

  • Dennis

    Re: Shamas saeed Considration
    Like Lee, I use this method a lot. I accept that the order of items in the concatenated list is not reliable. It doesn’t even make sense to Order By on a column that couldn’t logically be present in the result set.

    Aside from that, I don’t understand any purported dependency on physical implementation, indexing or the like.

    Can anybody explain that or give a test case where the concatenation method doesn’t work?

  • Anith

    Variable concatenation in SELECT
    RBarryYoung and Dennis,

    There are several cases where it fails. Several forum members in Microsoft newsgroups have posted several examples, at least from version 7.0 onwards. For some examples, see:

    http://groups.google.com/group/microsoft.public.sqlserver.programming/msg/70917b5ca789ea77?dmode=source

    http://groups.google.com/group/microsoft.public.sqlserver.programming/msg/371167d94545e1d7?hl=en&dmode=source

    The problem is several people ignore these warnings and keep using them only to find one day it suddenly stopped working!

    Anith

  • Stephan Koch

    Much Simpler Way
    Hi Anith

    This is how I do it. Not sure why nobody else sems to do it this way:

    — Initialise variables
    DECLARE @values varchar(MAX)
    set @values = ”
    — Get the string
    Select @values = @values + ‘[‘ + ProductName + ‘],’ from Northwind..products
    — Strip out the last comma
    select @values = left(@values, len(@values)-1)
    Print @values

    This code is not entirely complete, since it will fall over when there are no rows in the product table, but you get the idea.

    I use this approach to build values for variable PIVOT clauses.

  • Stephan Koch

    Much simpler way part 2
    Oops – just realised that I was solving a different problem – mine only works for a single group at a time… sorry for confusion.

  • Stephan Koch

    Much simpler way part 2
    Oops – just realised that I was solving a different problem – mine only works for a single group at a time… sorry for confusion.

  • GeoffS

    Anub Philip’s method
    Hi, All, first time on this site for me and very, very useful.

    I just want to add my positive comment on the method of Anub Philip – for my application, this runs in less than a second compared to ~2:35:00 (that’s 2 minutes 35 sec) for exactly the same result using the previous CTE method.

    The pre-ranking of the records for the CTE was a huge benefit.

    The slower method did run quite fast on a simpler data set, but sank on the more complex table.

    Geoff

  • suresh123

    i have a problem .. pls solve it
    Actually i have to run this method on huge table of around 65000 rows, and the column to be concatenated is ntext field , i cannot change to varchar because of data lose…
    when i use this “Scalar UDF with t-SQL update extension”
    My notes get truncated …
    Any other method to do my job

  • PeggyScott84

    How do I add new lines instead of commas?
    I am trying to concatenate row values with newlines in stead.
    I need to display the query results on a ColdFusion 8 website.

    It sounds basic but I have tried using char(10) char(13)

    n, nr, r

    tried using <br>, </br>

    nothing works.

    Any help/suggestion is welcome!
    Thanks!

  • BugMeNot

    Common Language Runtime – Error In Solution
    The code in the “Using Common Language Runtime” section of this article results in the following error:

    Using the generic type ‘System.Collections.Generic.List<T>’ requires ‘1’ type arguments

    To correct this error, replace all references to “List” with “List<string>”.

  • KLynch0803

    Very interesting article..
    I love this article problem is I dont understand it completely.. I’m far from a programmer but I’m trying to manipulate some data I have to insert into a form im updating values in. Can someone please help me edit this process to wrok in MSAccess?

    I have the following data with multiple City and state Values that are equal but unique zipcodes. This list is 43,000 rows long:

    tbl_City_State
    City StateName ZipCode

    I need an output like such for each City state without duplicate city state names…

    Raleigh NC 29706 29707 29708

    Meaning if there are 3 or 50 Raleigh NC in the table it will make one and add all the zip codes to the one cell with the city state name.

    I would appreciatte any help in advance I have been trying to figure out how to do this in excel or access for about 2 weeks now.

  • KLynch0803

    Very interesting article..
    I love this article problem is I dont understand it completely.. I’m far from a programmer but I’m trying to manipulate some data I have to insert into a form im updating values in. Can someone please help me edit this process to wrok in MSAccess?

    I have the following data with multiple City and state Values that are equal but unique zipcodes. This list is 43,000 rows long:

    tbl_City_State
    City StateName ZipCode

    I need an output like such for each City state without duplicate city state names…

    Raleigh NC 29706 29707 29708

    Meaning if there are 3 or 50 Raleigh NC in the table it will make one and add all the zip codes to the one cell with the city state name.

    I would appreciatte any help in advance I have been trying to figure out how to do this in excel or access for about 2 weeks now.

  • melhorum

    Very Good Men.
    it’s clean code,thanks.

  • dmc

    Simpler….
    Why not this solution??

    Problem: To show all the cities as comma separated list for each region

    use northwind

    –raw data…
    select region,city from customers where len(region)=2 order by 1,2

    region city
    ————— —————
    AK Anchorage
    BC Tsawassen
    BC Vancouver
    CA San Francisco
    DF Caracas
    ID Boise
    MT Butte
    NM Albuquerque
    OR Elgin
    OR Eugene
    OR Portland
    OR Portland
    RJ Rio de Janeiro
    RJ Rio de Janeiro
    RJ Rio de Janeiro
    SP Campinas
    SP Resende
    SP Sao Paulo
    SP Sao Paulo
    SP Sao Paulo
    SP Sao Paulo
    WA Kirkland
    WA Seattle
    WA Walla Walla
    WY Lander

    (25 row(s) affected)

    — cities within each region
    select
    region,
    (select stuff((select distinct ‘,’ + city as [text()]
    from customers
    where region=c.region
    order by ‘,’ + city
    for xml path(”)),1,1,”)) as ‘cities’

    from customers c
    where len(region)=2 –limiting view
    group by region

    region cities
    ————— ——————————-
    AK Anchorage
    BC Tsawassen,Vancouver
    CA San Francisco
    DF Caracas
    ID Boise
    MT Butte
    NM Albuquerque
    OR Elgin,Eugene,Portland
    RJ Rio de Janeiro
    SP Campinas,Resende,Sao Paulo
    WA Kirkland,Seattle,Walla Walla
    WY Lander

    (12 row(s) affected)

    no functions, cursors, declares, etc… am I missing something??

  • Anith

    Re: Why not this solution??
    dmc,

    No, you are not missing anything. It is already addressed in the article under the section “The blackbox XML methods”.

    Anith Larson

  • dmigliore

    Great examples
    Thanks for the examples of the various methods. This gives me more choices in how to solve the concatenation problem.

  • originator

    Concat using XQuery
    Hi guys!

    I’d like to mention another one fast, XML-based and “entity”-safe method:

    declare @iXml xml;

    SELECT @iXml = (
    SELECT ProductName + ‘,’
    FROM Northwind.dbo.Products
    FOR XML PATH);

    select @iXml.value(‘string(/)’,’nvarchar(max)’);

    Good luck!

  • kalpesh

    Migrating from Oracle to SQL Server – Aggregate function
    Hi, Need help in converting the StrAgg (ODCIAggregate from Oracle) function (http://download.oracle.com/docs/cd/B14117_01/appdev.101/b10800/dciaggfns.htm#BEJBEBHE) to SQL Server compatible.

    There are many Views which use this function in generic way. Don’t want to do it in C# so as to keep in SQL specific.

    Select
    Docid,
    Protocol,
    StrAgg(Jan) as ‘Jan’,
    StrAgg(Feb) as ‘Feb’,
    StrAgg(Mar) as ‘Mar’,
    StrAgg(Apr) as ‘Apr’,
    StrAgg(May) as ‘May’,
    StrAgg(Jun) as ‘Jun’,
    StrAgg(Jul) as ‘Jul’,
    StrAgg(Aug) as ‘Aug’,
    StrAgg(Sep) as ‘Sep’,
    StrAgg(Oct) as ‘Oct’,
    StrAgg(Nov) as ‘Nov’,
    StrAgg(Dec) as ‘Dec’
    From
    (
    Select
    Docid,
    P.Protocol,
    Case When To_Char(PL.PamDt,’Mon’) = ‘Jan’ Then To_Char(PL.ActDt,’DD-Mon’) ELSE NULL END As Jan,
    Case When To_Char(PL.PamDt,’Mon’) = ‘Feb’ Then To_Char(PL.ActDt,’DD-Mon’) ELSE NULL END As Feb,
    Case When To_Char(PL.PamDt,’Mon’) = ‘Mar’ Then To_Char(PL.ActDt,’DD-Mon’) ELSE NULL END As Mar,
    …… Dec
    From
    (
    Select
    T.DocId,
    T.Protocol
    From
    PAMSTR T,
    CITY Cy
    Where Cy.CNum <= T.CycleNum
    ) P,
    PAMSTRDTL Pl
    Where
    P.DocId = Pl.DocId (+)
    And P.CNum = Pl.ActCNum (+)
    Order By CNum,PamDt
    )
    Group by DocId, Protocol

    If I execute without Aggregate I get:
    Docid Protocol Jan Feb Mar
    1 P1 01-Jan
    1 P1 02-Jan 02-Mar
    2 P2 02-Mar
    2 P2 01-Jan 05-Mar

    If I execute with Aggregate I get:
    Docid Protocol Jan Feb Mar
    1 P1 01-Jan, 02-Jan 02-Mar
    2 P2 01-Jan 02-Mar, 05-Mar

    Any guidance appreciated to get generic solution.

  • ISAKOS

    Peeling back
    Great article,
    it solved my issue,
    I would like to say thank to all of you,
    regarding to Anith Sen

  • multispective

    create an index view using these queries
    good explanations

    since I want to join the csv results against other tables, i would like to create indexed views using the queries described in this article.
    i tried to create indexed views using the two query techniques in the article: CTE and XML.
    But sql server prevents me from creating an index on views that use CTE and subquery.
    this is for a large table so are there any other fast ways to be able to have the csv results?
    thanks

  • Halalelue

    Can you create new record
    Hi, i like to know how to create new record after the fourth records? Using the CLR routine?

  • youcantryreachingme

    Insane
    Fair dinkum, MSSQL drives me mental.

    Sybase Adaptiver Server Anywhere (ASA) has an aggregate list function.

    Your query becomes:

    SELECT CategoryId, list(ProductName)
    FROM Northwind..Products
    GROUP BY CategoryId

    How hard is that?

    It’s like how MSSQL won’t let you use an alias in a WHERE clause, arguing the values of the columns haven’t been calculated at the time the WHERE is being evaluated. Poor excuse. ASA has no trouble referring to aliases in WHERE clauses.

    http://dcx.sybase.com/1101/en/dbreference_en11/list.html