Click here to monitor SSC
  • Av rating:
  • Total votes: 85
  • Total comments: 9
Paul Nielsen

Switching rows and columns in SQL

04 February 2010

When they use SQL Server, one the commoner questions that Ms Access programmers ask is 'Where's the TRANSFORM/PIVOT command? So how do you swap colums and rows in an aggregate table? Do you really need to use a CLR routine for this?

Recently, I came across a description of a transformation operation using SQL Server CLR – the goal is to convert, or transpose, columns into rows and rows into columns. Microsoft ACCESS has a SQL TRANSFORM /PIVOT extension to SQL that automatically produces a crosstab query, and SQL Server 2005 introduced the pivot command, but neither simply swaps columns and rows. The problem was to replicate this sort of functionality in SQL Server. It’s possible to write a CLR function to transform the data, however, I believe that a set-based T-SQL query is the preferred solution to this type of problem.

In the example of the CLR transform function, the source data included columns for PaymentDate, and three types of payments:

 

Once transformed the result would swap the columns and rows to this configuration:

The following script creates the test table and populates it with the sample test data:

Use tempdb;

 

CREATE TABLE Payment (

  PaymentID INT NOT NULL IDENTITY

    CONSTRAINT pkPayment PRIMARY KEY,

  PaymentDate Date Not Null,

  Medicare INT NOT NULL,

  Medicaid INT NOT NULL,

  VA INT

  );

   

INSERT Payment (PaymentDate, Medicare, Medicaid, VA)

  VALUES ('1/1/2009', 30, 20, 5),

         ('1/2/2009',30 ,20 ,5),

         ('1/3/2009',29, 20 ,6),

         ('1/4/2009',29, 19, 6)

The root cause behind this transformation is that the three payment columns violate first normal form. Column names that include data (Medicare, Medicaid, VA) is clear indication that first normal form is being ignored with repeating columns. Had the source data been properly normalized there would been no need to transform the columns and rows. If you find yourself needing to perform this type of transformation regularly, then you have a problem with your schema design. A normalized version of the source table should have columns for PaymentDate, PaymentType, and Amount.

But assuming the denormalized data is all that’s available, Microsoft provides a straight-forward solution using the Pivot and UnPivot commands available since SQL Server 2005. The key is recognizing that the source data violated first normal form, and that the unpivot command is designed specifically to normalize this type of unnormalized data. In the query below, the sq subquery first uses an unpivot command to normalize the data which is then passed to the pivot command in the outer query. Pairing the unpivot and pivot command makes quick work of the transformation task. The only trick in the query is appending a “d” before each date so the pivot command will accept it as a column name:

SELECT Type, d20090101 as '01-01-2009',

             d20090102 as '01-02-2009',

             d20090103 as '01-03-2009',

             d20090104 as '01-04-2009'

  FROM (SELECT PaymentDate, Amount, Type

          FROM (SELECT 'd'+ Convert (Char(10), PaymentDate, 112) as PaymentDate,

                  Medicare, Medicaid, VA

                  FROM Payment) as sq_source

          UNPIVOT (Amount FOR Type IN

            (Medicare, Medicaid, VA)) as sq_up

         ) as sq 

     PIVOT (

        MIN(Amount)

        FOR PaymentDate IN

           (d20090101, d20090102, d20090103, d20090104)

           ) as p;

We make some assumptions about the nature of the data when we use the MIN aggregate function. The routine expects to ignore duplicate entries for a particular PaymentDate if there are any.

Before SQL Server 2005, you would have had to use the slightly more convoluted code ...

-- SQL 7/2000 method

-- Transform using Select/Union - Sum(Case) Crosstab (SQL Server 7/2K Method) 

 

SELECT TYPE,

  SUM(CASE WHEN PaymentDate = '2009-01-01' THEN Amount ELSE 0 END ) AS '2009-01-01',

  SUM(CASE WHEN PaymentDate = '2009-01-02' THEN Amount ELSE 0 END ) AS '2009-01-02',

  SUM(CASE WHEN PaymentDate = '2009-01-03' THEN Amount ELSE 0 END ) AS '2009-01-03',

  SUM(CASE WHEN PaymentDate = '2009-01-04' THEN Amount ELSE 0 END ) AS '2009-01-04'

 

  FROM (SELECT PaymentDate, Medicare AS Amount, 'Medicare' as Type

          FROM Payment

        UNION

        SELECT PaymentDate, Medicaid AS Amount, 'Medicaid' as Type

          FROM Payment

        UNION

        SELECT PaymentDate, VA AS Amount, 'VA' as Type

          FROM Payment) AS NormalizedData

  GROUP BY Type             

You'll see that this would give different results from the first solution if there is duplicate data in the original aggregate table 

There have been several attempts to create a generic solution that emulates the TRANSFORM/PIVOT functionality ,(Rob Volk's being probably the best known, Dynamic Cross-Tabs/Pivot Tables, but see also Creating cross tab queries and pivot tables in SQL) but the PIVOT/UNPIVOT functionality of SQL Server 2005 onwards makes such devices unnecessary.

The example provides two lessons for SQL server designers and developers. First – don’t underestimate the power of the set-based query. Turning to the CLR for a solution when T-SQL can solve the problem is nearly always an error. Perhaps a more significant lesion is that there’s a performance cost to denormalization when additional code is required to re-normalize the data. The programmer’s cliché, “Normalize till is hurts, then denormalize till it works,” simply isn’t true.

 

Paul Nielsen

Author profile:

Paul Nielsen is a hands-on database developer, Microsoft SQL Server MVP, and trainer specializing in data architecture and database development using Microsoft SQL Server technologies. Active in the SQL Server community, Paul is the and founder of Colorado PASSCamp. Paul presents around the world at conferences such as Microsoft Tech Ed (Dev), SSWUG Virtual Conference, SQL Teach (Canada), SQL Open World (Denmark), and devLINK (Nashville), and PASS Summit. Besides holding several certifications, Paul is an instructor with Learning Tree, served on the Microsoft Education Domain Objectives panel for SQL Server 2005, and was the Design-SME (subject matter expert) for the Microsoft Official Course, 2784: Tuning and Optimizing Queries using Microsoft SQL Server 2005. Paul has been developing data-centric solutions since 1982, and was the Enterprise Data Architect for Compassion International, a SQL Server instructor with Learning Tree, the technical editor for a database magazine, and a U. S. Navy Submariner (Data Systems Tech Petty Officer).

Search for other articles by Paul Nielsen

Rate this article:   Avg rating: from a total of 85 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Not a normalization issue
Posted by: Anonymous (not signed in)
Posted on: Monday, February 08, 2010 at 4:18 AM
Message: Fine example of how to pivot. Less good is the claim that "three payment columns violate first normal form". That isn't so and it repeats a common misunderstanding of what 1NF means.

A table is in 1NF if it accurately represents a relation - ie. no duplicate rows or nulls permitted. 1NF requires nothing about the attributes of a relation other than that their values must all be drawn from some specific domain (which is implicit in the definition of a relation or SQL table anyway).

http://www.simple-talk.com/sql/learn-sql-server/facts-and-fallacies-about-first-normal-form/

Subject: SQL Pivot
Posted by: John O'Sullivan (not signed in)
Posted on: Monday, February 08, 2010 at 3:10 PM
Message: The Pivot command in SQL is very disappointing, having to specify columns is not very good.

Subject: Ugly example
Posted by: puzsol (view profile)
Posted on: Monday, February 08, 2010 at 6:12 PM
Message: Sorry, but I can't help but think this is a horrible example... It can't be used for a different date range, without the query being re-written... and the manipulation seems to be purely for some manager who wants the data axies swapped... which if you ask me belongs in the reporting application, not in an SQL query.

Subject: Re: Ugly Example
Posted by: Phil Factor (view profile)
Posted on: Tuesday, February 09, 2010 at 6:09 AM
Message: Here's a possible solution, using dynamic SQL, that doesn't rely on you having to key in the date range.

CREATE PROCEDURE TransformPaymentInfo
AS
IF
EXISTS ( SELECT  *
            
FROM    information_Schema.tables
            
WHERE   table_Name LIKE 'Payment' )
    
BEGIN
        DECLARE
@Command VARCHAR(MAX)
        
DECLARE @SubCommand VARCHAR(MAX)
        
SELECT  @Subcommand='
Select ''$$'' as [type]'
        
SELECT  @Subcommand=@Subcommand+',
sum(case when PaymentDate=convert(DateTime,'''
                
+CONVERT(VARCHAR(20), PaymentDate, 101)
                +
''',101) then  [$$] else 0 end) as ['
                
+CONVERT(VARCHAR(20), PaymentDate, 101)+']'
        
FROM    payment GROUP BY PaymentDate
  
        
SELECT  @Command=COALESCE(@Command+'
from '
+'Payment'+'
Union All
'
, '')+REPLACE(@SubCommand, '$$', column_Name)
        
FROM    information_Schema.columns
        
WHERE   table_Name LIKE 'Payment'
                
AND column_name NOT IN ('PaymentID', 'PaymentDate')
        
SELECT  @command=@Command+'
from payment'
  
SELECT @command
  
EXECUTE (@command)
    
END




Subject: 1NF violation
Posted by: Paul Nielsen (not signed in)
Posted on: Wednesday, February 10, 2010 at 9:53 AM
Message: While the columns Medicare, Medicaid, and VA columns have different names they are indeed duplicate, repeating columns. Each column represents a payment received from a source. Rather than model the sources as a table, the souces are represented in column names - a clear indicator of a poor table design.

Subject: Poor design? Very possibly. Violation of 1NF? No.
Posted by: Anonymous (not signed in)
Posted on: Wednesday, February 10, 2010 at 3:48 PM
Message: Hi Paul,

1NF doesn't make any restrictions on column names (other than that they must be unique) or on repeating the same type of attribute in a table.

"Repeating groups" (multivalued attributes) are a violation of 1NF but SQL Server doesn't support repeating groups in tables so that concept is not applicable.

The distinction is important because 1NF frequently gets abused in all sorts of unjustified ways - hence Anith's useful little "Facts and Fallacies" article.

Subject: 1NF violation
Posted by: PhoenixR (view profile)
Posted on: Friday, May 21, 2010 at 9:29 AM
Message: In my opinion this restriction does not apply to the column name but rather to the redundant information source.In this case it seems to me that they hold the same type of information and could easily be replaced by one PaymentType column.What would happen if a new payment source were to be added ? new column ? what about 1000 ?

Subject: On a good design
Posted by: Heravar (view profile)
Posted on: Monday, January 31, 2011 at 6:40 AM
Message: Leaving aside the debate about 1NF violations, I have a doubt.

The article says: "Had the source data been properly normalized (let's call this, been designed as: PaymentDate, PaymentType, and Amount.) there would been no need to transform the columns and rows..."

In a "PaymentDate, PaymentType, Amount" schema, how would you do that Transform/Pivot?

Subject: Dynamic row transposing
Posted by: Dalex (view profile)
Posted on: Friday, April 08, 2011 at 7:17 AM
Message: To easily transpose columns into rows with its names you should use XML. In my blog I was described this with example: http://sql-tricks.blogspot.com/2011/04/sql-server-rows-transpose.html

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

SQL Server XML Questions You Were Too Shy To Ask
 Sometimes, XML seems a bewildering convention that offers solutions to problems that the average... Read more...

Continuous Delivery and the Database
 Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.