Click here to monitor SSC
  • Av rating:
  • Total votes: 86
  • Total comments: 28
Anith Sen

Faking Arrays in Transact SQL

16 September 2008

It is a simple routine that we all need to use occasionally; parsing a delimited list of strings in TSQL. In a perfect relational world, it isn't necessary, but real-world data often comes in a form that requires one of the surprising variety of  routines that Anith Sen describes, along with sage advice about their use.

Introduction

Sometimes SQL programmers come up with a requirement to use multi-valued columns or variables in an SQL query, especially when the data comes from somewhere else. Since there is no built-in support for any such datatype as an array or list  in t-SQL, quite often folks use delimited VARCHAR strings to lump multiple values together.

We all know that having multiple values in a single column is against the fundamental tenet of relational model since an attribute in a relation can have only one value drawn from an appropriate type.

SQL tables, by their very nature, do not allow multiple values in its columns. This is a sensible stricture which, even before the XML datatype appeared, programmers have occasionally been defeating with varying degrees of success. Users can interpret a single string as a combination of smaller strings usually delimited by a character such as  a comma or a space. Therefore, by leveraging the existing string functions, programmers can extract such smaller parts from the concatenated string.

It is very important to note that some of the methods that we'll describe are kludgy, while others appear to violate the fundamentals of data independence altogether. Such methods are only suggested here only for the sake of completeness, and are not recommended for use in production systems. If you are using any methods that are undocumented in the product manual, use them with due caution and all relevant caveats apply. Basic recommendations for using proper datatype conversion techniques, avoiding positional significance for columns etc must be considered for stable production code.

Considerations

Most of the methods of parsing an array and using it for data manipulation are used to insert data as multiple rows into tables. There are other occasions where the programmer has the task of cleaning and querying  badly-designed data represented in a table.

The following sections illustrate a variety of methods one can employ to identify and enlist subsections of a string represented in a variable, parameter or even as a column value in a table. The examples use a comma separated list, commonly known as a CSV string where the value is represented as:
           'item-1,item-2,item-3....item-n'.
In practice, you can use any character including a space to delimit and improvise the methods accordingly.

For the examples below, a few customer identifiers are randomly chosen from the Customers table in the Northwind database. For each example below, we'll use this list.

DECLARE @p VARCHAR(50)
SET @p = 'ALFKI,LILAS,PERIC,HUNGC,SAVEA,SPLIR,LONEP,GROSR'

Northwind is a sample database in SQL Server 2000 default installations. You can download a copy from the Microsoft Downloads

Direct Data comparison

For simple comparisons, there is no need for complicated routines. The inherent pattern matching features in Transact SQL can be used directly in most cases. One prime example is to pass in a list of values and use it in the IN list of a WHERE clause. Here are some common methods:

Using the CHARINDEX function:

 

SELECT CustomerID, ContactName, CompanyName
      
FROM Northwind.dbo.Customers
    
WHERE CHARINDEX( ',' + CustomerID + ',', ',' + @p + ',' ) > 0 ;
    
/*
    CustomerID ContactName                    CompanyName                              
    ---------- ------------------------------ ------------------------------
    ALFKI      Maria Anders                   Alfreds Futterkiste
    GROSR      Manuel Pereira                 GROSELLA-Restaurante
    HUNGC      Yoshi Latimer                  Hungry Coyote Import Store
    LILAS      Carlos González                LILA-Supermercado
    LONEP      Fran Wilson                    Lonesome Pine Restaurant
    PERIC      Guillermo Fernández            Pericles Comidas clásicas
    SAVEA      Jose Pavarotti                 Save-a-lot Markets
    SPLIR      Art Braunschweiger             Split Rail Beer & Ale

    (8 row(s) affected)*/

Using pattern matching with PATINDEX:

    SELECT CustomerID, ContactName, CompanyName
      
FROM Northwind.dbo.Customers
    
WHERE PATINDEX( '%,' + CustomerID + ',%', ',' + @p + ',' ) > 0   ;

Using LIKE operator for pattern matching

    SELECT CustomerID, ContactName, CompanyName
      
FROM Northwind.dbo.Customers
    
WHERE ',' + @p + ',' LIKE '%,' + CustomerID + ',%' ;

String Parsing

In many cases, you may want to use the parsed list of values as a resultset that can be used in subsequent operations. For instance, for larger lists, it may be more effective to use a JOIN rather using an IN() clause. Another common scenario is the case of multi-row inserts where the list is parsed and the individual elements are inserted using a single INSERT statement.

Using the PARSENAME() function

The PARSENAME function returns a SYSNAME value (equivalent to NVARCHAR(128) and is generally used for identifying part of a database object like object name, owner name, database name, and server name. However it can also be used for parsing a string if the string is made up of less than five delimited values. Typical scenarios for applying this approach would be splitting up a full name, identifying parts of an IP address etc.

For example:

DECLARE @FullName NVARCHAR(500)

SET @FullName = N'Flintstone, Mr.Fred, Jr'

SELECT PARSENAME(FName, 3) AS "Title",

       PARSENAME(FName, 2) AS "FirstName",

       PARSENAME(FName, 4) AS "LastName",

       PARSENAME(FName, 1) AS "Prefix"

  FROM ( SELECT REPLACE(@FullName, ', ', '.')) D(FName)

Solutions using a table of numbers

In most cases with larger strings, the faster solutions are often the ones using a table of sequentially incrementing numbers. However, the performance assertions in general should be taken with a grain of salt since, without testing, it is almost impossible to conclude which method performs better than another.  Before we go through the various solutions that use a table of numbers we'll run through a few of the approaches to creating a number table.

Creating a Number (Sequentially incrementing values) table in T-SQL

A table of monotonically increasing numbers can be created in a variety of ways. Either a base table or a view or any expression that can create a sequence of numbers can be used in these scenarios. Some of the common methods to create a table of numbers  are detailed in the next section.

Though sequentially incrementing values can be generated as a part of the query, generally, it is recommended that you create a permanent base table and insert as many numbers as you need for various solutions. It is also advised to make sure the number column is set as a Primary Key to avoid any potential duplication of rows.

Identity based sequences

Using the default values for an Identity Column in a table, one can simply insert default rows to generate the sequence. One consideration in this approach is that it can be used only with permanent base tables. Note that without any arguments IDENTITY property uses the value 1 for both seed and increment.

    CREATE TABLE dbo.Nbrs(n INT NOT NULL IDENTITY) ;
    
GO
    
SET NOCOUNT ON ;
    
INSERT dbo.Nbrs DEFAULT VALUES ;
    
WHILE SCOPE_IDENTITY() < 500
        
INSERT dbo.Nbrs DEFAULT VALUES ;

With the IDENTITY() function we can use SELECT INTO a new table from any existing table. This example demonstrates using a CROSS JOIN between two existing tables.

    SELECT TOP 500 IDENTITY(INT) AS n
      
INTO dbo.Nbrs
      
FROM Northwind.dbo.Orders o1
    
CROSS JOIN Northwind.dbo.Orders o2 ;
Regular loops using WHILE

Here is a simple procedural loop with inserting one row at a time.

CREATE TABLE dbo.Nbrs(n INT NOT NULL PRIMARY KEY) ;
    
GO
    
SET NOCOUNT ON
    DECLARE
@Index INT ;
    
SET @Index = 1 ;
    
WHILE @Index <= 500 BEGIN
        INSERT
dbo.Nbrs (n) VALUES (@Index) ;
        
SET @Index = @Index + 1 ;
    
END

The efficiency of the above loop ca be improved using the following suggestion from a newsgroup posting by Itzik Ben-Gen.

    SET NOCOUNT ON
    DECLARE
@Index AS INT;
    
SET @Index = 1

    INSERT dbo.Nbrs SELECT 1;
    
WHILE @Index * 2 < 500 BEGIN
        INSERT
dbo.Nbrs
        
SELECT @Index + n
    
      FROM dbo.Nbrs
        
SET @Index = @Index + @@ROWCOUNT
    
END
    INSERT
dbo.Nbrs
    
SELECT @Index + n
      
FROM dbo.Nbrs
    
WHERE @Index + n <= 500
Sequences based on existing tables

Using an existing base table to view can be an easy way to generate a sequence of numbers, esp. when you don’t want to create a permanent table to support a single querty. One drawback to such mechanisms is that as the dataset gets larger, the comparative efficiency of the inserts can become an issue.

Here is an example using a correlated subquery based on the unique column.

      SELECT n
      
FROM ( SELECT ( SELECT COUNT(*)
                    
FROM Northwind.dbo.Orders o2
            
        WHERE o2.OrderId <= o1.OrderId ) AS "n"

          FROM Northwind.dbo.Orders o1 ) Nbrs (n)
   
WHERE n <= 500 ;

This is the same logic as above, but using a self join

    SELECT COUNT(*) AS "n"       FROM Northwind.dbo.Orders o1
    
INNER JOIN Northwind.dbo.Orders o2
        
ON o2.OrderId <= o1.OrderId
    
GROUP BY o1.OrderId
    
HAVING COUNT(*) <= 500 ;

Another quick way to generate the sequence is to use the ROW_NUMBER() function. This example uses a CROSS JOIN just to illustrate the ease of formulation

    WITH Nbrs ( n ) AS (
      
SELECT ROW_NUMBER() OVER (ORDER BY n) AS n
        
FROM ( SELECT s1.id
                
FROM Northwind.dbo.Orders o1
                
CROSS JOIN Northwind.dbo.Orders o2 ) D ( n )
        )
SELECT n
            
FROM Nbrs
          
WHERE n BETWEEN 1 AND 500 ;
With ROW_NUMBER()

For each row returned in a resultset, ROW_NUMBER function returns a sequential number, starting at 1. One can use any existing table, view or any resultset to generate a ROW_NUMBER() expression. Here is an example:  

    SELECT n
      
FROM ( SELECT ROW_NUMBER() OVER ( ORDER BY OrderId )
              
FROM Northwind.dbo.Orders ) D ( n )
    
WHERE n <= 500 ;

As a matter of fact, you can use any arbitrary expression or built-in functions like CURRENT_TIMESTAMP or NEW_ID() in the ORDER BY clause to generate the sequential number.

Approaches using Common table expressions

Common table expressions are aliased resultsets derived from a simple query and defined within the execution scope of a single DML statement. A CTE, with an alias, optional column list and a definition query, can help simplify complex queries and improve readability. In general, any query can be written as a CTE query, but there are certain instances where a query CTE construct can offer the best performing query.

Using a CTE, some of the above methods for generating a sequence of numbers can be simplified.

Here is a simple transformation of derived table of unit digits. The logic is pretty much self explanatory.

     WITH Units ( nbr ) AS (
        
SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION      
        SELECT
3 UNION SELECT 4 UNION SELECT 5 UNION            
        SELECT
6 UNION SELECT 7 UNION SELECT 8 UNION
        SELECT
9 )
    
SELECT u3.nbr * 100 + u2.nbr * 10 + u1.nbr + 1
      
FROM Units u1, Units u2, Units u3
    
WHERE u3.nbr * 100 + u2.nbr * 10 + u1.nbr + 1 <= 500 ;

A Recursive CTE is common table expression can include references to itself. With an anchor member and a recursive member, a recursive CTE can generate a number listing pretty efficiently.  Also note the default recursion limit can be changed by using the OPTION (MAXRECURSION) to the desired limit (0 denotes unlimited).

     WITH Nbrs ( n ) AS (
        
SELECT 1 UNION ALL
        
SELECT 1 + n FROM Nbrs WHERE n < 500 )
    
SELECT n FROM Nbrs
    
OPTION ( MAXRECURSION 500 )

Here is a clever method again by Itzik Ben-Gen using a CROSS JOIN of nested CTE references. This is not only quick but can be used for generating larger datasets.

     WITH Nbrs_3( n ) AS ( SELECT 1 UNION SELECT 0 ),
          
Nbrs_2( n ) AS ( SELECT 1 FROM Nbrs_3 n1 CROSS JOIN Nbrs_3 n2 ),
          
Nbrs_1( n ) AS ( SELECT 1 FROM Nbrs_2 n1 CROSS JOIN Nbrs_2 n2 ),
          
Nbrs_0( n ) AS ( SELECT 1 FROM Nbrs_1 n1 CROSS JOIN Nbrs_1 n2 ),
          
Nbrs  ( n ) AS ( SELECT 1 FROM Nbrs_0 n1 CROSS JOIN Nbrs_0 n2 )
        
SELECT n
          
FROM ( SELECT ROW_NUMBER() OVER (ORDER BY n)
                      
FROM Nbrs ) D ( n )
        
WHERE n <= 500 ;

Parsing a delimited string using a Number Table

Now we have our number table, we can start using it a number of different ways to split fake arrays. The general idea here is to use the sequence value as the second argument for the SUBSTRING() function and then use the WHERE clause to limit the number of splits using a terminating criteria. Note than this general logic can be written in a variety of ways.

    SELECT SUBSTRING( ',' + @p + ',', n + 1,
                
CHARINDEX( ',', ',' + @p + ',', n + 1 ) - n - 1 ) AS "value"
      
FROM Nbrs
    
WHERE SUBSTRING( ',' + @p + ',', n, 1 ) = ','
      
AND n < LEN( ',' + @p + ',' ) ;

Similar logic from a query in a public newsgroup posting by Linda Wiezbecki.

    SELECT SUBSTRING( p, n + 1,
                
CHARINDEX( ',', p, n + 1 ) - ( n + 1 ) ) AS "value"
      
FROM Nbrs
      
JOIN ( SELECT ',' + @p + ',' ) D ( p )
        
ON SUBSTRING( p, n, LEN( p ) ) LIKE ',_%' ;

Here are a couple of methods that not only parses the delimited list, but returns the position of each item in the list as well. This one is using a self join and is popularized by Joe Celko in SQL newsgroups. postings.

    SELECT SUBSTRING( ',' + @p + ',', MAX( n1.n + 1 ),  
                        
n2.n - MAX( n1.n + 1 ) ) AS "value"
          
COUNT( n2.n ) AS "pos"
      
FROM Nbrs n1
      
JOIN Nbrs n2 ON n1.n < n2.n
      
AND n2.n <= LEN( ',' + @p + ',' ) + 1
    
WHERE SUBSTRING( ',' + @p + ',', n1.n, 1 ) = ','
      
AND SUBSTRING( ',' + @p + ',', n2.n, 1 ) = ','
    
GROUP BY n2.n ;

The following is from a posting by Umachandar Jayachandran that uses the REPLACE function to get the number of commas in the string and calculate the position of each item value.

    SELECT SUBSTRING( p, n + 1, CHARINDEX( ',', p, n + 1 ) - n - 1 ) AS "value"
          
LEN( SUBSTRING( p, 1, CHARINDEX( ',', p, n + 1 ) ) ) -
          
LEN( REPLACE( SUBSTRING( p, 1,
                    
CHARINDEX( ',', p, n + 1 ) ), ',', '' ))  - 1 AS "pos"
      
FROM Nbrs
      
JOIN ( SELECT ',' + @p + ',' ) D ( p )
        
ON n BETWEEN 1 AND LEN( p ) - 1
      
AND SUBSTRING( p, n , 1 ) = ',' ;

The following method also gives you the positional value and is simpler

    SELECT SUBSTRING( @p, n, CHARINDEX( ',', @p + ',', n ) - n ) AS "value"
           n
+ 1 - LEN( REPLACE( LEFT( @p, n ), ',', '' ) ) AS "pos"
      
FROM Nbrs  
    
WHERE SUBSTRING(',' + @p, n, 1) = ','
      
AND n < LEN(@p) + 1 ;

With the ROW_NUMBER() and RANK() functions introduced in SQL 2005, one could create the positional values much easily like:

    SELECT SUBSTRING( ',' + @p + ',', n + 1,
                
CHARINDEX( ',', ',' + @p + ',', n + 1 ) - n - 1 ) AS "value"
           ROW_NUMBER
() OVER ( ORDER BY n ) AS "pos"
      
FROM Nbrs
    
WHERE SUBSTRING( ',' + @p + ',', n, 1 ) = ','
      
AND n < LEN( ',' + @p + ',' ) ;

You can wrap any of these methods using a table of sequentially incrementing numbers into a table valued UDF or another stored procedure and make it more generic, reusable and handy.

With Common Table Expressions

Most of the methods described above can be re-written using common table expressions. Here is one way of using it to get the starting position and ending position of each element item and then using SUBSTRING to extract the value:

    WITH CTE ( pos_begin, pos_end ) AS (
        
SELECT n1.n, ( SELECT MIN( n2.n )FROM Nbrs n2
                        
WHERE SUBSTRING( ',' + @p + ',' , n2.n , 1 ) = ','
                          
AND n2.n > n1.n )
          
FROM Nbrs n1
        
WHERE n1.n <= LEN( ',' + @p + ',' ) - 1
          
AND SUBSTRING( ',' + @p + ',' , n1.n , 1 ) = ',' )
    
SELECT SUBSTRING( @p , pos_begin, pos_end - pos_begin - 1 ) AS "Value"
      
FROM CTE ;

Another concise approach with Common table expressions is to use a recursive CTE. It does not use a table of sequentially incrementing numbers, however it can be a bit inefficient for larger strings.

Here is an example.

    WITH CTE ( pos, pos_begin, pos_end ) AS (
      
SELECT 0, 1, CHARINDEX( ',', @p + ',' )
        
UNION ALL
      
SELECT pos + 1, pos_end + 1, CHARINDEX( ',', @p + ',', pos_end + 1 )
        
FROM CTE
        
WHERE CHARINDEX( ',', @p + ',', pos_end + 1 ) > 0 )
    
SELECT pos + 1, SUBSTRING( @p, pos_begin , pos_end - pos_begin ) AS "value"
      
FROM CTE
    
OPTION ( MAXRECURSION 0 ) ;

A similar approach is suggested by Hari Mohan, a systems analyst with BFL software:

;WITH cte (pos_begin, pos_end) AS

(   SELECT 0, 1

     UNION ALL

    SELECT pos_end, charindex(',', @p, pos_end) + 1

      FROM cte

     WHERE pos_end > pos_begin )

SELECT SUBSTRING(@p, pos_begin,

            CASE WHEN pos_end > 1

                THEN pos_end - pos_begin - 1

                ELSE LEN(@p) - pos_begin + 1

            END ) AS "value",

       RANK() OVER ( ORDER BY pos_begin )    

  FROM cte

 WHERE pos_begin > 0

OPTION ( MAXRECURSION 0 ) ;

As mentioned before, any of the above mentioned parsing routines can be written as a table valued user defined function or a view and can be used directly in the queries like:

    SELECT CustomerID, ContactName, CompanyName
      
FROM Northwind.dbo.Customers c
      
JOIN dbo.udf_parsed_list () p
        
ON c.CustomerID = p.value ;

Methods that replace the delimiters

Apart from the above mentioned approaches there are certain other tricks which can be used in Transact SQL. Basically the following methods uses Dynamic SQL, a bit different from traditional queries, but can be used as an approach for smaller string parsing requirements in certain cases.

Here is an example using Dynamic SQL with IN list in the WHERE clause

    DECLARE @SQLx NVARCHAR(4000)
    
SET @SQLx = N'
    SELECT CustomerID, ContactName, CompanyName
      FROM Northwind.dbo.Customers  
     WHERE CustomerID IN ( '''
+ REPLACE( @p, N',', N''',''' ) + N''' )' ;
    
EXEC sp_ExecuteSQL @SQLx ;

This is another illustration of replacing the element delimiters with ' UNION SELECT '. The results can be stored in a table variable or table variable and then can be used in subsequent operations.

    DECLARE @tbl TABLE ( val VARCHAR(10) NOT NULL PRIMARY KEY );
    
DECLARE @SQLx VARCHAR(8000)
    
SET @SQLx = 'SELECT ''' + REPLACE( @p, ',', ''' UNION SELECT ''') + ''''
    
INSERT @tbl EXEC( @SQLx ) ;
    
SELECT val FROM @tbl ;
 

A similar approach is proposed by Alejandro Mesa in a public newsgroup posting that uses separate INSERT statements as well. Here is an example where @t is a valid table created prior to execution:

    DECLARE @SQLx VARCHAR(8000)
    
SET @SQLx= 'INSERT ' + @t + ' VALUES (' +
                  
REPLACE( @p, ',', ' ) INSERT ' + @t + ' VALUES (') + ')'
    
EXEC ( @SQLx ) ;

XML solutions

OPENXML function

An approach that is getting much attention, is the OPENXML method. This method, thought may not be ideal for larger datasets, can be effectively used for relatively small number of items, especially if the XML string is generated from a client application. The basic idea is to pass the values as an XML document to the stored procedure instead of a CSV. Here is an example. Note that you can use the REPLACE function to change a CSV string to XML format, say within a stored procedure for instance.

    DECLARE @doc VARCHAR(500)
    
DECLARE @XMLDoc INT
    SET
@doc = '
    <ROOT>
    <Customer pos="1" id="ALFKI"></Customer>
    <Customer pos="2" id="LILAS"></Customer>
    <Customer pos="3" id="PERIC"></Customer>
    <Customer pos="4" id="HUNGC"></Customer>
    <Customer pos="5" id="SAVEA"></Customer>
    <Customer pos="6" id="SPLIR"></Customer>
    <Customer pos="7" id="LONEP"></Customer>
    <Customer pos="8" id="GROSR"></Customer>
    </ROOT>'          

 EXEC sp_xml_preparedocument @XMLDoc OUTPUT, @doc ;
    
SELECT pos, id
      
FROM OPENXML ( @XMLDoc , '/ROOT/Customer', 1 )
              
WITH ( pos INT, id VARCHAR(5) )  ;
    
EXEC sp_xml_removedocument @XMLDoc ;
 

One could generate the xml document from the csv variable with only the id values easily like:

    SELECT '
    <ROOT>
    <Customer id="'
+
            
REPLACE( @p, ',', '"></Customer>
    <Customer id="'
) + '"></Customer>
    </ROOT>'

The nodes() method

This is a feature that was introduced in SQL Server 2005 using the xml datatype. You can use the nodes() method to shred the data that can be mapped to a new row. The value() method can be applied as shown below to extract the individual elements:

    DECLARE @doc XML;
    
SET @doc = '
    <ROOT>
    <Customer pos="1" id="ALFKI"></Customer>
    <Customer pos="2" id="62;</Customer>
    <Customer pos="3" id="PERIC"></Customer>
    <Customer pos="4" id="HUNGC"></Customer>
    <Customer pos="5" id="SAVEA"></Customer>
    <Customer pos="6" id="SPLIR"></Customer>
    <Customer pos="7" id="LONEP"></Customer>
    <Customer pos="8" id="GROSR"></Customer>
    </ROOT>'          
    
SELECT D.element.value('@id', 'VARCHAR(5)'),
          
D.element.value('@pos', 'INT')
      
FROM @doc.nodes('/ROOT/Customer') AS D ( element )
 

Using a CLR function

With the popularity of CLR functions in SQL 2005 and beyond, string parsing has become much more versatie. It is much easier to create a simple CLR function, deploy the assembly and then use it as a function within t-SQL code. To implement this function you will have to enable the database option CLR enabled.Here is a C# function example that you can create in Visual Studio under the database project :  

using System;

using System.Data;

using System.Data.SqlClient;

using System.Data.SqlTypes;

using Microsoft.SqlServer.Server;

using System.Collections;

 

public class ParseList

{

    [SqlFunction(FillRowMethodName = "FillRow")]

 

    public static IEnumerable CLRParseString(SqlString csv)

    {

        SqlString Delim_ = ",";

        return csv.Value.Split(Delim_.Value.ToCharArray(0, 1));

    }

 

    public static void FillRow(object value, out string csv)

    {

        csv = (string)value;

    }

}

Once you build this function and created the .dll you can create an assembly and a table valued function that can be accessed from within t-SQL

CREATE ASSEMBLY ParseList FROM 'C:\ParseList.dll'

WITH PERMISSION_SET = SAFE

GO

CREATE FUNCTION CLRParseString ( @str NVARCHAR(4000))

RETURNS

TABLE ( val NVARCHAR(4000) )

AS EXTERNAL NAME ParseList.ParseList.CLRParseString;

GO

The usage is pretty straight forward like:

SELECT * FROM CLRParseString( @p )

You can find several comparable CLR functions in newsgroups and blogs with heavy discussions on the performance implications of using them. For some of the popular benchmarks I would recommend the article by Erland mentioned in the references section.

Simple WHILE loop

Another popular method is to use a procedural WHILE loop. Very popular among programmers by virtue of its simplicity, this method is not particularly efficient for larger datasets. Here is an example:

    WHILE LEN( @param ) > 0 BEGIN
       IF CHARINDEX
( ',', @param ) > 0
          
SELECT @val = LEFT( @param, CHARINDEX( ',', @param )  - 1 ) ,
                  
@param = RIGHT( @param, LEN( @param ) - CHARINDEX( ',', @param ) )
      
ELSE
           SELECT
@val = @param, @param = SPACE(0)
      
EXEC('INSERT tbl VALUES (' + @val + ')' )
    
END

Inserting parsed values into multiple columns

When the passed values are to be returned as column values in a single row rather than multiple rows, we can use the PIVOT function. Since we will always know the number of columns that will be in the resultset, formulating the query with a static PIVOT list will not be too difficult.

SELECT R.*

  FROM ( SELECT SUBSTRING( ',' + @p + ',', n + 1,

                CHARINDEX( ',', ',' + @p + ',', n + 1 ) - n - 1 ) AS "value",

                ROW_NUMBER() OVER ( ORDER BY n ) AS "pos"

           FROM Nbrs

          WHERE SUBSTRING( ',' + @p + ',', n, 1 ) = ','

            AND LEN( ',' + @p + ',' ) > n ) AS  parsed

PIVOT ( MAX("value") FOR pos IN ("1", "2", "3", "4", "5", "6", "7", "8") ) R

 

Conclusion

Be judicious and apply common sense while using any of the string parsing routines in t-SQL. Each of the methods described above has certain benefits in some scenarios and may not be appropriate for all situations. If you need additional detailed analysis of these and a few other methods including performance considerations, consider the articles in the references section.

References

Acknowledgements

Aaron Bertrand, Erland Sommarskog, Umachandar Jayachandran, Linda Wierzbecki, Joe Celko, Alejandro Mesa, Kristen Hodges and Itzik Ben-Gan.

Anith Sen

Author profile:

Anith S Larson specializes in data management primarily using SQL Server. From the mid 90s, he has been working on a variety of database design and systems development projects for clients primarily in wealth management and financial services industry. He resides in Lakeland, TN.

Search for other articles by Anith Sen

Rate this article:   Avg rating: from a total of 86 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Great Article - Thanks
Posted by: jerryol (view profile)
Posted on: Wednesday, September 17, 2008 at 12:26 AM
Message: Great article man, thanks.

Subject: Impressive!
Posted by: KeithFletcher (view profile)
Posted on: Wednesday, September 17, 2008 at 3:19 AM
Message: Thanks for this! I use one or two of the methods you've mentioned here, a lot more often than I'd like to admit, so it's nice to see a few extra alternatives.

Never thought to going the XMl route...

great article

Subject: Excellent
Posted by: Viswanath (not signed in)
Posted on: Wednesday, September 17, 2008 at 7:52 AM
Message: Excellent Study

Subject: Great article!
Posted by: Anonymous (not signed in)
Posted on: Wednesday, September 17, 2008 at 8:17 AM
Message: Thanks for the article and I just know that this information will be kept in my bag of tricks.

Subject: Sweet!
Posted by: Chad (view profile)
Posted on: Wednesday, September 17, 2008 at 9:44 AM
Message: There are several methods here I hadn't thought of!

Subject: Do you have a link to Itzik's newsgroup post?
Posted by: Trevor (not signed in)
Posted on: Wednesday, September 17, 2008 at 9:49 AM
Message: That you reference in the index + n section: "The efficiency of the above loop ca be improved using the following suggestion from a newsgroup posting by Itzik Ben-Gen."

Subject: Re: Do you have a link to Itzik's newsgroup post?
Posted by: Anith Sen (not signed in)
Posted on: Wednesday, September 17, 2008 at 10:16 AM
Message: Trevor,

He has posted it several times, here is one:
http://groups.google.com/group/microsoft.public.sqlserver.programming/msg/cc9de24464c167fc

Anith

Subject: Using a UDF to parse delimited text to a table
Posted by: Anonymous (not signed in)
Posted on: Wednesday, September 17, 2008 at 10:17 AM
Message: I've been using this function for quite some time now. Works great.

http://www.codeproject.com/KB/database/SQL_UDF_to_Parse_a_String.aspx

Subject: What about xml?
Posted by: AJ Tech (not signed in)
Posted on: Wednesday, September 17, 2008 at 10:39 AM
Message: I would use xml. Code is simpler and more robust in a sense that I can validate it against a schema.

Subject: My current version
Posted by: Jason Hannas (view profile)
Posted on: Wednesday, September 17, 2008 at 12:02 PM
Message: I've been seeing a lot of these types of articles/postings lately. A few months ago, I took several ideas from the best of these articles and put together a function that's working very well for me. I don't take credit--the basic concept is someone else's idea (but I just can't remember who/where I learned it). It requires a numbers table, although you could use another method of generating a temporary sequence if you wanted.

CREATE FUNCTION dbo.fnListToTable
(
@List varchar(max),
@delimiter char(1)
)
RETURNS TABLE
AS

RETURN
( SELECT
ROW_NUMBER() OVER(ORDER BY Number) AS [Element],
SUBSTRING(List,Number + 1,CHARINDEX(@delimiter,List,Number + 1) - Number - 1) AS [Value]
FROM
( SELECT
@delimiter + @List + @delimiter AS List ) AS l
CROSS JOIN admin.tbl_Numbers AS n
WHERE
Number < LEN(List)
AND SUBSTRING(List,Number,1) = @delimiter
) ;


Subject: Use With Caution
Posted by: wbw (not signed in)
Posted on: Wednesday, September 17, 2008 at 8:41 PM
Message: I've used both the XML route and the CHARINDEX, primarily for input parameters that can be multivalued. It simplifies the capability to handle flexible data input, but potentially complicates the front-end code (i.e. non database code), especially the XML route.

Additionally, these types of fancy, "slick" and very non-standard fixes to complicated problems tend to cause nothing but confusion down the road.

And a final caveat: in the wrong hands, even the best intentions can go awry. As the author states, use sparingly.

Subject: great
Posted by: Anonymous (not signed in)
Posted on: Thursday, September 18, 2008 at 5:17 AM
Message: There are several methods here I hadn't thought of
and if you can give us more ! thanks

Subject: Jason Hannas version is best, I think
Posted by: Anonymous (not signed in)
Posted on: Thursday, September 18, 2008 at 9:06 AM
Message: I think the Jason Hannas post is close to the version found in Ben-gan book (Inside Microsoft SQL Server 2005: T-SQL Programming). It makes really clever use of the clustered index on the numbers table, and the performance is a million times better than any of the looping / xml versions. I no longer have the book, but I believe most of a chapter is dedicated to just this topic. Pick it up, the book is well worth it.

Subject: The best article on the subject i've ever seen.
Posted by: Theo Ekelmans (not signed in)
Posted on: Wednesday, September 24, 2008 at 2:49 AM
Message: nuff sdaid.

Subject: I have used some in practice and tested a few others
Posted by: Anonymous (not signed in)
Posted on: Wednesday, September 24, 2008 at 3:36 AM
Message: And I found that everything based on number tables (especially dynamicly generated ones) perform really bad when the string to be parsed increases.

Best all round is contrary to what the article states, the procedural loop that returns a table (which usualy needs to be indexed for best performance)

Also, I like to point out that the version of this listed in this article is about as bad (performance wise) as you can program it. There is TOO much data being moved around, like the redefinition of the input variable/parameter.

It is better to use charindex in combination with sunstrings and insert these substrings into the result table.

If you need to access a specific position in your delimited list, make a seperate function for that, that too will perform tons better then generating a whole table only to access just 1 item in the list.

I think the article shows a lot of fancy SQL that in practice when found in production..people will have a hard time understanding of what it does, how it does it and most of all...what the intensions behind it are if not instructed beforehand.

Functions, in whatever form will always be the most natural self describing from this point of view and should be preferred above all else!


Subject: have used some in practice and tested a few others (2)
Posted by: Anonymous (not signed in)
Posted on: Wednesday, September 24, 2008 at 3:56 AM
Message: Also really uber bad in the procedural loop example is the use of dynamic SQL. Thats the real killer.

Here a specialised function for distinct numbers (IDs):


create function dbo.fnSplitIDList( @IDs varchar(max) )
returns @parsed table( id int primary key clustered ) with schemabinding
as
begin
-- Append comma
set @IDs = @IDs + ','

-- indexes to keep the position of searching
declare @p1 int
declare @p2 int
declare @v varchar(max)
declare @n int

-- Start from first character
set @p1=1
set @p2=1

while @p1 < len( @IDs ) begin
set @p1 = charindex( ',', @IDs, @p1 )

set @v = substring( @IDs, @p2, @p1 - @p2 )
if @v != '' begin

set @n = convert( int, @v );
if not exists( select 1 from @parsed where id = @n ) begin
insert into @parsed( id ) values( @n )
end

end

-- Go to next non comma character
set @p2 = @p1 + 1

-- Search from the next charcater
set @p1 = @p1 + 1
end

return
end
;



I also made a version for non-numeric substrings as well, more in line with the article, but not used in production yet and thus not heavily tested (TEST IT first).

-- Parses a list of substrings (max 896 chars per substring), delivering an indexed table to quickly search against
--
create function dbo.fnSplitList( @list varchar(max), @separator varchar(max) = ';' )
returns @parsed table
(
elementID int identity primary key clustered with fillfactor = 100
, elementValue varchar(896) -- max length possible due to use in index
, unique ( elementValue, elementID )
) with schemabinding
as
begin
-- Append separator
set @list = @list + @separator;

-- indexes to keep the position of searching
declare @p1 int;
declare @p2 int;
declare @v varchar(max);
declare @ll int;
declare @sl int;

-- detect list len
set @ll = len( @list )

-- detect separator len
set @sl = len( @separator );
if isnull( @sl, 0 ) = 0 begin
if isnull( @ll, 0 ) != 0 begin
insert into @parsed( elementValue ) values( @list )
end
return
end

-- Start from first character
set @p1=1;
set @p2=1;

while @p1 < @ll begin
set @p1 = charindex( @separator, @list, @p1 )

set @v = substring( @list, @p2, @p1 - @p2 )
insert into @parsed( elementValue ) values( @v )

-- Go to next non comma character
set @p2 = @p1 + @sl

-- Search from the next charcater
set @p1 = @p1 + @sl
end

return
end
;


The code here ain't for a beauty contest either, but thats due to the bad variable naming. I was in a real hurry when I wrote this, after wasting hours of time using the other more fancy stuff that was dog slow in practice!

Subject: Numbers/Tally Tables are the way
Posted by: Anonymous (not signed in)
Posted on: Wednesday, September 24, 2008 at 6:39 AM
Message: Hi there,

one of the best guides for this type of problem that I have ever seen came from Jeff Moden over at SQLServerCentral.com:

http://www.sqlservercentral.com/articles/T-SQL/63003/

Please check out any/all of his publications. I think he was raised speaking T-SQL from birth! :)

Subject: response to: Numbers/Tally Tables are the way
Posted by: Peter de Heer (not signed in)
Posted on: Wednesday, September 24, 2008 at 10:46 AM
Message: Based on the most suitable code I found at

http://www.sqlservercentral.com/articles/T-SQL/63003/


I created a new distinct ID list parser based on Tally Tables. This does indeed go faster in isolated simple my test. It is worth probing deeper and I do have come concerns with the idea behind tally tables.

First of all the concerns:

* Cross joining system tables to 'fake' a number of records is depending on implementation. This is tunable per system, fair enaugh if you wrap the logic into a inline function.

* There is I/O involved where none needs to be from a functional point of view. Caching will aleviate this for the best part, so not a hot topic either if you tune right.

* The mere fact DB tables are accessed makes any split operation non-deterministic. Hey!!! This is something to really consider. If you use Tally table techniques you risk being slower (relative to procedural functions with deterministic results), because the outcome of the function cannot be assumed to be the same even when the input is!

In my real code, this happens quite a lot and that would force anyone using it to store the outcome for reuse again...likely negating the speed benefit and adding custom code where you rather not see it.

Anyway...enaugh concerns...I made a few function for people to try that solve a common theme. Parsing IDs out of a comma separated list of numbers, removing duplicates on the fly.

FUNCTION 1: The Tally table function

alter function dbo.fnTally( @input varchar(max) ) returns table
as
return
(
select top ( isnull( len( @input ), 0 ) )
row_number() over ( order by t1.object_id ) as 'N'
from
master.sys.All_Columns as t1 with( nolock )
cross join master.sys.All_Columns as t2 with( nolock )
)
;


Function 2: A split/distinct function returning a indexed table

create function dbo.fnSplitIDList_Tally( @IDs varchar(max) )
returns @parsed table( id int primary key clustered )
as
begin
set @IDs = ',' + @IDs + ',';

insert into @parsed( id )
select distinct
convert( int, substring( @IDs, n + 1, charindex( ',', @IDs, n + 1 ) - n - 1 ) ) as 'data'
from
dbo.fnTally( @IDs )
where
n < len( @IDs ) and substring( @IDs, n, 1 ) = ',' and substring( @IDs, n + 1, 1 ) != ','
;

return;
end
;


Function 3: Same as function two, but now inline and exposing the inner workings to the consuming query

create function dbo.fnSplitIDList_Tally_Inline( @IDs varchar(max) )
returns table
as
return
(
select distinct
convert( int, substring( ',' + @IDs + ',', n + 1, charindex( ',', ',' + @IDs + ',', n + 1 ) - n - 1 ) ) as 'data'
from
dbo.fnTally( ',' + @IDs + ',' )
where
n < len( ',' + @IDs + ',' ) and substring( ',' + @IDs + ',', n, 1 ) = ',' and substring( ',' + @IDs + ',', n + 1, 1 ) != ','
)
;


Let me know what you think of these and how they work (or not) for you!


Subject: response to: Numbers/Tally Tables are the way (2)
Posted by: Peter de Heer (not signed in)
Posted on: Wednesday, September 24, 2008 at 10:55 AM
Message: I forgot to mention one more aspect of implementation dependance.

The current and past implementations of SQL are bad at procedural code. But this is not becasue of inherent flaws...as in the end all code is executed procedural (processors just work that way). Newer implementations (like Google's chrome does now in the browser arena) can totaly wipe the floor with any technique you implement today.

So don't write procedural impelemantations off beforehand in favor of set based logic. If procedural works well enaugh for you now and is eayer to implement, it is perfectly valid.

At some point DB developers are out of trick to optimise set based code further and procedural gode finally gets some deserved loving imh. It is just a matter of time!

Subject: Tough crowd...
Posted by: Jeff Moden (view profile)
Posted on: Saturday, September 27, 2008 at 12:42 PM
Message: Heh... tough crowd here. Lemme ask this of any of you who were kind enough to post code or make comments about performance... would you please post some decent test code to go along with your claims? Especially those of you who claim that procedural code actually has any chance at all of beating properly formed set based code?

Subject: Please, be real careful, folks...
Posted by: Jeff Moden (view profile)
Posted on: Saturday, September 27, 2008 at 5:33 PM
Message: Before any of you consider using any of the code presented in the article, do what the author has not... test for performance. If you want to shorten up the test list and save yourself some time, simply throw away anything and everything that has a WHILE loop in it and that includes the code allegedly written by some famous authors.

Be real careful folks... do some serious load testing before you take any of this code into production. And, if you do decide to build a Tally table, remember to put a clustered index on it. Lots of people forget to do that and then complain about how slow it is. ;-)

Once you're done checking for performance, check and make sure the code actually works the way you want it. Some of the functions in the posts above won't return what you think it will.

Subject: No chance with the WHILE Loop
Posted by: Phil Factor (view profile)
Posted on: Thursday, October 02, 2008 at 8:03 AM
Message: I have to agree with Jeff that the WHILE loop solutions stand little chance, and the fancy XML stuff is way out of contention. I've just parsed the book of Genesis, and Chapter 1 of Moby-Dick into individial words, using various methods. Even with a lot of text, the Number 'helper-table' technique is six times as fast as the next contender. However, with the whole of Moby-Dick in there, The WHILE loop seems to be faster. Never say 'never' to the WHILE loop.

To be fair to Anith, his article was aimed at saying what the various trechniques were, not to do a performance beauty-pagent. The article was getting pretty long without it. Maybe for his next article? Maybe Jeff will pitch in with an article.

Subject: Interesting test...
Posted by: Jeff Moden (view profile)
Posted on: Sunday, October 05, 2008 at 9:14 PM
Message: I just got done doing the same test on the full text of Moby-Dick and I copied Phil's code exactly except for renaming the numbers table to "Tally" and it's column to "N". The WHILE loop lost pretty badly (While loop = 109 seconds, Tally table = 73 seconds). I'm pretty sure that changing the names aren't what did it. When Phil ran the code on his machine, the WHILE loop won quite nicely (While loop = 26 seconds, Tally table = 48 seconds). Something else is going on there but I can't speak of Phil's machine. It would well be worth a good study ;-)

I agree... Anith's fine article wasn't meant to be a "performance beauty-pageant". That's why I warned folks that some of that code listed in the article is dangerously slow and that they should test it for performance before using it.

And, my appologies to Anith... Looking back at my previous post, I used a really poor choice of words and I didn't mean to make it sound like I was coming down on him.

So far as articles go, most anything I'd write about here, I've already written on the "sister" (RedGate owns both) to this fine forum, SQLServerCentral.com. Here's the link to the "faking arrays" article I wrote there...

http://www.sqlservercentral.com/articles/T-SQL/63003/

Sorry... I don't know how to make the link above a "live" link on this forum.


Subject: Interesting test...
Posted by: Jeff Moden (view profile)
Posted on: Monday, October 06, 2008 at 6:14 AM
Message: I just got done doing the same test on the full text of Moby-Dick and I copied Phil's code exactly except for renaming the numbers table to "Tally" and it's column to "N". The WHILE loop lost pretty badly (While loop = 109 seconds, Tally table = 73 seconds). I'm pretty sure that changing the names aren't what did it. When Phil ran the code on his machine, the WHILE loop won quite nicely (While loop = 26 seconds, Tally table = 48 seconds). Something else is going on there but I can't speak of Phil's machine. It would well be worth a good study ;-)

I agree... Anith's fine article wasn't meant to be a "performance beauty-pageant". That's why I warned folks that some of that code listed in the article is dangerously slow and that they should test it for performance before using it.

And, my appologies to Anith... Looking back at my previous post, I used a really poor choice of words and I didn't mean to make it sound like I was coming down on him.

So far as articles go, most anything I'd write about here, I've already written on the "sister" (RedGate owns both) to this fine forum, SQLServerCentral.com. Here's the link to the "faking arrays" article I wrote there...

http://www.sqlservercentral.com/articles/T-SQL/63003/

Sorry... I don't know how to make the link above a "live" link on this forum.


Subject: Dupes
Posted by: Jeff Moden (view profile)
Posted on: Monday, October 06, 2008 at 6:15 AM
Message: Heh... and I must have really phat phingers... dunno why my last posted twice.

Subject: ::
Posted by: Niamat (not signed in)
Posted on: Tuesday, October 07, 2008 at 12:34 AM
Message: Thanks Dear Anith Sen: Knowledge sharing isn double learning: Well explained and examplified article.

Wish you a good health and sharp brain that is so supportive.

Keep smiling

Subject: Fantastic article... and here's another twist!
Posted by: WinGraver (view profile)
Posted on: Tuesday, April 21, 2009 at 11:54 AM
Message: hey Anith,
This is a great article!... usually I'd skim through articles but this one I read from start to end.

I do however have a question... and maybe I'm just a wee bit lazy :) so thanks for any help here. How would you suggest handling multiples values where the string/column consists of name-value pairs? For example - 'type:sedan,colour:blue,year:1991,...etc'. I saw this done somewhere and thought it'd be interesting to tackle (though I haven't gotten around to it). Ideally, this should be converted to table(s)...

Thanks for any tips/tricks on the above and keep up the great articles!

Winston

Subject: Better late than never...
Posted by: Jeff Moden (view profile)
Posted on: Wednesday, June 24, 2009 at 1:47 AM
Message: Winston wrote: How would you suggest handling multiples values where the string/column consists of name-value pairs?

Heh... I guess they don't have email notifiction here.

Take a look at the following article, Winston...

http://www.sqlservercentral.com/articles/T-SQL/63003/

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

Searching for Strings in SQL Server Databases
 Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine... Read more...

Continuous Delivery and the Database
 Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.