Click here to monitor SSC
  • Av rating:
  • Total votes: 26
  • Total comments: 23
Robyn Page and Phil Factor

SQL Server Grouping Workbench

26 April 2007

/*This Workbench is about using the GROUP BY clause in a SQL Statement. It is
designed so it can be pasted straight into SSMS or the Query Analyser, though
to run the examples, you'll need the data file too. Just click the "CODE DOWNLOAD"
link at the top of this article. Also included in the download bundle is the
 
original SQL file, if you need it.

-- Contents
-----------

Introduction
Removing Duplicates
Simple Reporting
Grouping and Pivot Tables
Enumerating the Group members
Ranging
Grouping as a utility
Questions


-- Introduction

---------------

Like all these workbench articles, the whole point is to try different things
out, improve on the code, and learn in a practical way. If you're like us, and
find you can take in information only after real practice doing things for real
then this article is for you. 

If you think that it is too elementary, move to the end of the article and test
out your knowledge by answering all the questions (Yes, the editor is giving 
away prizes for the best answers he receives. Insist on the glorious USB Red-Gate 
pendant). 

Aggregations in SQL are very simple to do, but can cause a lot of confusion
due to the word 'Group'. The result you get from a grouping does not consist 
of the individual records but a collection or 'aggregation' of records. 

A pack of playing cards is a group of 52 playing cards.
If you were to group a pack of cards by colour, you'd have two 'rows' of 26
cards; if you grouped by suit, there would be four 'rows' with 13 in each. You
could group them by their rank/court attribute, or in other ways. The point is 
that the result is always an aggregate, even if there is only one member in the 
collection.

Each row in the result of a GROUP BY represents a collection, or grouping. 
You can get the values for the number of records in each collection, or grouping, 
or the sum, average, and so on of numeric values, or the max or min of strings,
but to ask for individual values makes no sense if you haven't grouped by
that value. 

Sometimes, you will use an aggregate function in a SELECT Statement without using
a GROUP BY clause. There is an implicit grouping of the entire result into one
row, which is why you can only use aggregate functions in such a case, and will
get one row returned.

GROUP BY statements are the bedrock of reporting in SQL. 

The built-in aggregate functions are... 

common ones

sum([all|distinct] expr.)   Total of the (distinct) values in the expression
avg([all|distinct] expr.)   Average of the (distinct) values in the expression
count([all|distinct] expr.) Number of (distinct) non-null values in the expression
count(*)                    Number of selected rows
max(expr.)                  Highest value in the expression
min(expr.)                  Lowest value in the expression

Statistical functions

stdev(expr.)               The standard deviation of the values in the expression
Stdevp(expr.)              Population standard deviation of the expression values
Var(expr.)                 The Variance of the expression values
Varp(expr.)                    The population Variance of the expression values

Checksum functions

Binary_checksum(expr.)      Returns the binary checksum value
Checksum(expr.)                Returns the checksum value
Checksum(*)                    checksum is over all the columns of the table
Checksum_agg
([all|distinct] expr.)     Returns the checksum of the NON-NULL values in a group. 

(You can use sum and avg with numeric columns only--int, smallint, TinyIntBigint 
decimal, numeric, float, SmallMoney and money. You cannot use min and max with bit 
datatypes. You cannot use aggregate functions other than count(*) with 
text and image datatypes.)

We'll start with some simple manoevres and then finish off with a few wheelies and
handbrake-turns.

As sample data, we'll take a list of all Oil-producing countries in the world
and their oil-production

Create a new database or use your 'practice' development Database and then create
this */

CREATE TABLE [dbo].[OilProducers](
   
[[OilProducer_ID] [int] IDENTITY(1,1) NOT NULL,
   
[country] [varchar](30) NOT NULL,
   
[BarrelsPerDay] [int] NOT NULL,
   
[Continent] [varchar](80) NOT NULL,
   
[Region] [varchar](80) NOT NULL
ON [PRIMARY]

GO
-- And we'll get our data in. Download the data file, and load the data as
follows:

-- substitute the full path to your file for the name 'OilProducers.txt'
BULK INSERT dbo.OilProducers
   
FROM ' OilProducers.txt'

/* or
Declare @Command varchar(8000)
select @Command='bcp '
   +db_name()
   +'.dbo.OilProducers in OilProducers.txt -c -S '+@@Servername+' -E -T'
execute xp_cmdshell @command--assumes Windows Security
*/



--Removing Duplicates
---------------------

-- Before we get stuck into doing reporting, I'll illustrate how simple it is
-- to remove duplicates from tables using grouping

-- Of course, if you are going to remove duplicates, you will really want to
-- move them to another table for forensic analysis to find out how it happened
-- and whether they are valid entries. Here is our Duplicates table

CREATE TABLE [dbo].[Duplicates](
   
[OilProducer_ID] [int] IDENTITY(1,1) NOT NULL,
   
[Duplicate_ID] [int],
   
[country] [varchar](30) NOT NULL,
   
[BarrelsPerDay ] [int] NOT NULL,
   
[Continent] [varchar](80) NOT NULL,
   
[Region] [varchar](80) NOT NULL
ON [PRIMARY]


--let's maliciously create some duplicates!

INSERT INTO OilProducers(Country,BarrelsPerDay,continentregion)
   
SELECT Country,BarrelsPerDay,continentregion 
         
FROM OilProducers WHERE country LIKE 'A%'
INSERT INTO OilProducers(Country,BarrelsPerDay ,continentregion)
   
SELECT Country,BarrelsPerDay,continentregion 
        
FROM OilProducers WHERE country BETWEEN 'A' AND 'E'

--you can then easily see these duplicates by 
SELECT 'First_ID'=MAX(OilProducer_ID), 'No.InTable'=COUNT(*)
   
FROM OilProducers
   
GROUP BY country 
   
HAVING COUNT(*)>1--notice that we use a HAVING clause. This is because 
--we are selecting only those rows that contain more than one row in the
--original table (we could have checked only part of the original table
--by using the WHERE clause. A WHERE clause cannot contain an aggregate 
-- function of course, but the HAVING clause can.


-- So we move them to another table to check them to make sure. We wrap
-- this up in a transaction just in case anything goes wrong, as we don't
-- want to lose a record that might turn out to be legit.
--start of routine

-------------------------------------------------------------
DECLARE @found INT
DECLARE 
@Errorhappened INT
DECLARE 
@OtherErrorhappened INT
BEGIN TRANSACTION
WHILE 
COALESCE(@found,1)>
       
AND COALESCE(@ErrorHappened,0) =0
       
AND COALESCE(@OtherErrorHappened,0)=0
   
BEGIN
   INSERT INTO 
Duplicates (Duplicate_ID,country,BarrelsPerDay,continent,region)
       
SELECT OilProducer_ID,country,BarrelsPerDay,continent,region
       
FROM OilProducers WHERE oilProducer_ID IN (
           
SELECT MAX(OilProducer_ID)
               
FROM OilProducers
               
GROUP BY country 
               
HAVING COUNT(*)>1)
   
SELECT @Errorhappened=@@Error@found=@@Rowcount
   
DELETE FROM OilProducers WHERE oilProducer_ID IN (
           
SELECT MAX(OilProducer_ID)
               
FROM OilProducers
               
GROUP BY country 
               
HAVING COUNT( *)>1)--the HAVING clause has to be used as
                                  --we have to select rows after the
                                  --aggregation
   
SELECT @OtherErrorHappened=@@Error
   
END
IF 
@errorHappened<>0 or @OtherErrorHappened <>0 ROLLBACK TRANSACTION
ELSE COMMIT TRANSACTION   
--end of routine

---------------------------------------------------------------

--Now look in the duplicate table
SELECT FROM Duplicates
--and check the main table for duplicates with the SQL I've already given


--Simple Reporting
------------------

-- what about getting a summary of oil production per continent?
SELECT [Barrels per day ]=SUM(BarrelsPerDay), continent
   
FROM OilProducers 
   
GROUP BY continent --each row should be a continent
   
ORDER BY SUM(BarrelsPerDayDESC --in descending order
-- we are grouping by continent and get back seven records, each of which
-- represents a collection of oil producers in the same continent

-- ..or by region ordered by production?
SELECT  [production]=SUM(BarrelsPerDay), 
       
[average]=AVG(BarrelsPerDay), 
       
Region
       
[ Producers in region]=COUNT(*)
   
FROM OilProducers 
   
GROUP BY region --each row is a collection of all the records for a region
   
ORDER BY SUM(BarrelsPerDayDESC --order by production

--Grand totals are easy
SELECT  
   
[continent]=COALESCE(continent,'Total (world production)'), 
   
[Barrels per day]=SUM(BarrelsPerDay)
   
FROM OilProducers 
   
GROUP BY continent WITH rollup--each row should be a continent
   
ORDER BY grouping(continentASC,SUM(BarrelsPerDayDESC
-- we used the grouping() function to order the total at the end
-- we used WITH ROLLUP to compute the total. It can be very useful
-- for running sums and running averages.


-- ..but someone is bound to ask for just the top five
SELECT  TOP 5 [production]=SUM(BarrelsPerDay), 
       
[average]=AVG(BarrelsPerDay), Region[Producers in region]=COUNT(*)
   
FROM OilProducers 
   
GROUP BY region --each row is a collection of all the records for a region
   
ORDER BY SUM(BarrelsPerDayDESC --order by production

-- ..or maybe the bottom five!
SELECT FROM
(SELECT  TOP 5 [production]=SUM(BarrelsPerDay), [average]=AVG(BarrelsPerDay), 
                                     
Region[Producers in region]=COUNT(*)
   
FROM OilProducers 
   
GROUP BY region--each row is a collection of all the records for a region
   
ORDER BY SUM(BarrelsPerDayASC)--order by production
ORDER BY production DESC

-- or, more tiresomely, for the top five, the others as an 'Others' row 
-- AND the sum total!
SELECT 
[region]=CASE WHEN g.n=THEN ELSE Region END,
[production]=CASE WHEN t='other' 
   
THEN production -
       (
SELECT SUM(PRODUCTION)
           
FROM    
               
(SELECT  TOP 5 [production]=SUM(BarrelsPerDay)
                   
FROM OilProducers 
               
GROUP BY region 
               
ORDER BY SUM(BarrelsPerDayDESC)F)
   
ELSE production END
FROM 
(SELECT [n]=0,[t]='',[TheOrder]=1  --trick to duplicate the null row  the
       
UNION ALL          --'to get 'others' row
       
SELECT 1,'Other',
       
UNION ALL SELECT 1,'Total',3)g
INNER JOIN
(SELECT  TOP 6 [production]=SUM(BarrelsPerDay),--the basic data
       
[n]=grouping(region), 
       
Region
   
FROM OilProducers 
   
GROUP  BY region WITH rollup--
   
ORDER BY SUM(BarrelsPerDayDESC
   
)
ON g.n=f.n
ORDER BY g.TheOrder ASCproduction DESC 

--and then  there will be the guy who wants a list of just those
--regions who produce more than the average

SELECT   Region[production]=SUM(BarrelsPerDay),  [Producers in region]=COUNT(*)
   
FROM OilProducers 
   
GROUP BY region-- each row is a collection of all the records for a region
   
HAVING SUM(BarrelsPerDay)>
       (
SELECT AVG(averageFROM
(SELECT [average]=AVG(BarrelsPerDayFROM OilProducers GROUP BY region)f)
   
ORDER BY SUM(BarrelsPerDayDESC --order by production

--or possibly the five countries closest to the average
SELECT TOP 5 countryBarrelsPerDaytheAverage
FROM OilProducers
CROSS JOIN
(SELECT [theAverage]=AVG(BarrelsPerDayFROM OilProducers)f
ORDER BY  ABS(barrelsPerDay-theaverage)   ASC

--or conceivably, region closest to the average
SELECT TOP FROM 
   
(SELECT region[sumBPD]=SUM(BarrelsPerDay)--, theAverage
       
FROM OilProducers GROUP BY region
   
)h
CROSS  JOIN
   
(SELECT [theAverage]=AVG(productionFROM 
       
(SELECT [production]=SUM(BarrelsPerDay
           
FROM OilProducers GROUP BY region
        
)g
   
)f
ORDER BY  ABS(sumBPD-theaverageASC

-- Grouping and Pivot Tables
----------------------------

-- So let's get a bit harder and do a pivot table relating regions and
-- continents with column and row totals.

SELECT [region/continent ]=CONVERT(CHAR(25),COALESCE(region,'Total')),
   
'Asia'=SUM(CASE WHEN continent 'Asia' THEN BarrelsPerDay ELSE END),
   
'North A.'=SUM(CASE WHEN continent ='North America' 
                                            
THEN BarrelsPerDay ELSE END) ,
   
'Latin A.'=SUM(CASE WHEN continent ='Latin America and the Caribbean' 
                                            
THEN BarrelsPerDay ELSE END),
   
'Africa'=SUM(CASE WHEN continent ='Africa' THEN BarrelsPerDay ELSE END),
   
'Russia'=SUM(CASE WHEN continent ='Russia' THEN BarrelsPerDay  ELSE END),
   
'Europe'=SUM(CASE WHEN continent ='Europe' THEN BarrelsPerDay ELSE END),
   
'Oceania'=SUM(CASE WHEN continent ='Oceania' THEN BarrelsPerDay ELSE END),
   
'sum'=SUM(BarrelsPerDay)
FROM OilProducers
GROUP BY region
   
WITH rollup
   
ORDER  BY grouping(region),SUM(BarrelsPerDayDESC

--the 'order by grouping' trick ensures that the total comes in the right place
--on the last row!

--Enumerating the Group members
-------------------------------

--This is all OK but a lot of people want, and expect, to have a list of all
--the constituents of their grouping in the result. They don't like DBAs
--laughing and saying it isn't possible. There are now CLR Aggregation routines
--that do it, but here is a method that works on SQL 2000. Not a cursor in 
--sight! (it works a lot better in 2005 with VaRCHAR(MAX))

DECLARE @list VARCHAR(8000)
SELECT @List=COALESCE(@list+',','') +'|'+region +'|'country+'|'+region +'|'
FROM OilProducers ORDER BY region

SELECT [region]=CONVERT(CHAR(26),region), 
   
[Oil-producing countries]=LEFT(members,CHARINDEX(',|',members+',|')-1FROM
(SELECT region'members'=
       
REPLACE(
           
SUBSTRING(list,CHARINDEX('|'+regions.region+'|',list),8000),
           
'|'+region+'|',
           
'')
FROM (SELECT 'list'=@List)f
CROSS JOIN (
   
SELECT region FROM oilproducers GROUP BY region
   
)regions
)g

/*
Results (Truncated to the right)
region                     Oil-producing countries
-------------------------- ----------------------------------------------------...
Australia and New Zealand  Australia,New Zealand
Caribbean                  Virgin Islands,Cuba,Trinidad and Tobago,Aruba,Barbad...
Central America            Belize,Nicaragua,Guatemala,Mexico
Eastern Africa             Zambia,Madagascar
Eastern Asia               Korea, North,Mongolia,Taiwan,Korea, South,China,Japan
Eastern Europe             Czech Republic,Poland,Belarus,Ukraine,Hungary,Romani...
Melanesia                  Papua New Guinea
Middle Africa              Congo,Chad,Equatorial Guinea,Gabon,Angola,Cameroon
North America              Canada, United States 
Northern Africa            Algeria,Egypt,Libya,Tunisia,Sudan,Morocco
Northern Europe            Finland,Sweden,Lithuania,Estonia,United Kingdom,Norw...
Russia                     Russia
South America              Ecuador,Argentina,Brazil,Colombia,Venezuela,Peru,Sur...
South-central Asia         Kyrgyzstan,Bangladesh,Pakistan,Tajikistan,Turkmenist...
South-eastern Asia         Malaysia,Indonesia,Brunei,Vietnam,Thailand,Singapore...
Southern Africa            South Africa
Southern Europe            Italy,Croatia,Spain,Greece,Albania,Slovenia
Western Africa             Sierra Leone,Ghana,Nigeria,Mauritania,Cote d'Ivoire
Western Asia               Turkey,Bahrain,Yemen,Syria,United Arab Emirates,Saud...
Western Europe             Switzerland,Belgium,Germany,Netherlands,France,Austr...
*/

-- now this technique could get quite interesting if combined with 'ranging' where
-- you can impose categories onto the data of your choice (Date ranges are very
-- common)

-- Ranging
----------

--By ranging, I mean imposinbg aribrary value ranges, and grouping by them
--you can, of course, use a helper table to do this much more elegantly

-- Ranging can be by column
SELECT
[range]='No. of Countries'
[< 10,000 bpd]=SUM(CASE WHEN Barrelsperday<10000 THEN ELSE END),
[10,000 to 100,000]=SUM(CASE WHEN Barrelsperday BETWEEN 10000 AND 100000  
                        
THEN ELSE 
                    
END),
[100,000 to 1m bpd]=SUM(CASE WHEN Barrelsperday BETWEEN 100001 AND 1000000 
                        
THEN ELSE 
                    
END),
[1m to 10m bpd]=SUM(CASE WHEN Barrelsperday BETWEEN 1000001 AND 10000000 
                        
THEN ELSE 
                    
END),
[more than 10m bpd]=SUM(CASE WHEN Barrelsperday 10000000  THEN ELSE END)
FROM OilProducers

--or by row...
SELECT [category]=CASE WHEN Barrelsperday<10000 THEN '1/ > 10,000 bpd' 
   
WHEN Barrelsperday BETWEEN 10000   AND 100000 THEN '2/ 10,000 to 100,000 '
    
WHEN Barrelsperday BETWEEN 100001  AND 1000000 THEN '3/ 100,000 to 1m bpd'
    
WHEN Barrelsperday BETWEEN 1000001 AND 10000000 THEN '4/ 1m to 10m bpd'
    
WHEN Barrelsperday 10000000  THEN '5/ more than 10m bpd' END,
   
[total]=SUM(BarrelsperDay),
   
[No. of Countries]=COUNT(*)
FROM OilProducers
GROUP BY CASE WHEN Barrelsperday<10000  THEN '1/ > 10,000 bpd' 
   
WHEN Barrelsperday BETWEEN 10000   AND 100000 THEN '2/ 10,000 to 100,000'
    
WHEN Barrelsperday BETWEEN 100001  AND 1000000 THEN '3/ 100,000 to 1m bpd'
    
WHEN Barrelsperday BETWEEN 1000001 AND 10000000 THEN '4/ 1m to 10m bpd'
    
WHEN Barrelsperday 10000000  THEN '5/ more than 10m bpd' END

--The code above is much more efficient than it looks, but why not make a 
--User-defined function to do it?

CREATE FUNCTION OilOutputCategory
(
   
@OilOutput INT
)
RETURNS VARCHAR(30)
AS
BEGIN
RETURN 
CASE WHEN @OilOutput<10000 THEN '1/ > 10,000 bpd' 
   
WHEN @OilOutput BETWEEN 10000   AND 100000 THEN '2/ 10,000 to 100,000'
    
WHEN @OilOutput BETWEEN 100001  AND 1000000 THEN '3/ 100,000 to 1m bpd'
    
WHEN @OilOutput BETWEEN 1000001 AND 10000000 THEN  '4/ 1m to 10m bpd'
    
WHEN @OilOutput 10000000  THEN '5/ more than 10m bpd' END

END
GO
--so the query becomes...
SELECT [category]=dbo.OilOutputCategory(BarrelsPerDay),
   
[total]=SUM(BarrelsperDay),
   
[producers]=COUNT(*)
FROM OilProducers
GROUP BY dbo.OilOutputCategory(BarrelsPerDay)


--so let's combine ranging and enumeration!

--We can list all the countries that fall in each range category....
DECLARE @list VARCHAR( 8000)
SELECT @List=COALESCE(@list+',','')+catfudge
FROM (--the concatenation of the string didn't work until we did this!
   
SELECT TOP 1000 [catfudge]=
   
('|'+dbo.OilOutputCategory(BarrelsPerDay) +'|'country+'|'
         
+dbo.OilOutputCategory(BarrelsPerDay) +'|'),
   
[cat]=dbo.OilOutputCategory(BarrelsPerDay)
FROM  OilProducers ORDER BY cat)f

SELECT [category]=CONVERT(CHAR(26),category), 
   
[Oil-producing countries]=LEFT(members,CHARINDEX(',|',members+',|')-1FROM
(SELECT category'members'=
       
REPLACE(
           
SUBSTRING(list,CHARINDEX('|'+ categories.category+'|',list),8000),
           
'|'+categories.category+'|',
           
'')
FROM (SELECT 'list'=@List)f
CROSS JOIN (
   
SELECT [category]=dbo.OilOutputCategory(BarrelsPerDayFROM oilproducers 
       
GROUP BY dbo.OilOutputCategory(BarrelsPerDay)
   )
categories
)g

/*
Results (Truncated to the right)
category                   Oil-producing countries
-------------------------- -------------------------------------------------...
1/ > 10,000 bpd            Singapore,Burma,Finland,Taiwan,Ghana,Korea, South,...
2/ 10,000 to 100,000       Tunisia,Netherlands,Ukraine,Mauritania,France,Cuba...
3/ 100,000 to 1m bpd       Syria,Sudan,Oman,Qatar,Romania,Italy,Malaysia,Chad...
4/ 1m to 10m bpd           Venezuela,United Arab Emirates,United Kingdom,Indo...
5/ more than 10m bpd       Saudi Arabia
*/


-- Grouping as a utility
------------------------

-- One can use grouping in a variety of ways that have nothing to do with 
-- reporting . Here is an example of using GROUP BY to help produce a table by 
-- splitting a  delimited list. It needs the Number Helper Table (You'll have 
-- to refer to the 'Helper Table Workbench to find out about that). 

CREATE FUNCTION [dbo].[uftSplitStringGroup]
(
 
@DelimitedList VARCHAR (8000),
 
@Delimiter VARCHAR(20)=','
)
RETURNS
@Results TABLE
(
 
SeqNo INT IDENTITY(11),
 
Item VARCHAR(8000)
)
AS
BEGIN
DECLARE 
@String VARCHAR(8000)
DECLARE @LenDelimiter INT

SELECT 
@String=@Delimiter+@DelimitedList+@Delimiter,
   
@LenDelimiter=LEN( @Delimiter)

INSERT INTO @results (item)
   
SELECT SUBSTRING(@string,
                     
s1.number+@LenDelimiter,
                     
MIN(s2.number)-s1.number-@LenDelimiter)
     
FROM    (SELECT [string]@String )f
   
CROSS JOIN numbers s1 
   
INNER JOIN numbers s2 ON s1.number< s2.number
   
WHERE s1.number<=LEN (@string)
       AND  
s2.number<=LEN (@string)
       AND 
SUBSTRING(@string,s1.number,@LenDelimiter)=@Delimiter
       
AND SUBSTRING(@string,s2.number,@LenDelimiter)=@Delimiter
   
GROUP BY s1.number
 
RETURN
END
/*
select * from uftSplitStringGroup('first,second,third',default)
select * from uftSplitStringGroup('<>this is something<>something else<>','<>')
select * from uftSplitStringGroup(
 'Turkey,Bahrain,Yemen,Syria,United Arab Emirates,Saudi Arabia,Qatar,Kuwait',',')


   Questions
------------

Send your answers to editor@simple-talk.com. The first three correct entries
will receive a much-coveted Simple-Talk goodie bag (polo shirt, USB key, bottle
opener, beermat, pen).


1/  How would you get the count of the number of rows in a table with NULLS in 
    a particular column, using GROUP BY, but without using a COALESCE or CASE 
    statement

2/  What is the GROUPING() function useful for?

3/  Can a WHERE clause contain an aggregate function?

4/  When would you need to use a HAVING clause?

5/  What does the ROLLUP do? How would you use it?

6/  Can you use UDFs (user-defined scalar-valued  functions) in GROUP BY clauses? */

See also other Workbenches at Simple-Talk

Robyn Page's SQL Server DATE/TIME Workbench

Robyn Page

Date calculation and formatting in SQL Server can be surprisingly tricky. Robyn Page's "hands-on" workbench will lead you through the minefield.

Robyn Page's SQL Server String Manipulation Workbench

Robyn Page

String searching and manipulation in SQL Server can be error-prone and tedious...unless you're armed with the techniques described in Robyn's string manipulation workbench...

SQL Server Error Handling Workbench

Grant Fritchey

Grant Fritchey steps into the workbench arena, with an example-fuelled examination of catching and gracefully handling errors in SQL 2000 and 2005, including worked examples of the new TRY..CATCH capabilities.

Robyn Page's SQL Server Cursor Workbench

Robyn Page

The topic of cursors is the ultimate "hot potato" in the world of SQL Server. Everyone has a view on when they should and mainly should not be used. By example and testing Robyn Page proves that, when handled with care, cursors are not necessarily a "bad thing".

Robyn Page's SQL Server Data Validation Workbench

Robyn Page

Robyn Page provides essential techniques for ensuring the validity of the data being entered into your SQL Server tables.

Robyn Page's Excel Workbench

Robyn Page and Phil Factor

The need to produce Excel reports from SQL Server is very common. Here, Robyn Page and Phil Factor present practical techniques for creating and manipulating Excel spreadsheets from SQL Server, using linked servers and T-SQL. The pièce de résistance is a stored procedure that uses OLE Automation...

Robyn Page's SQL Server Security Workbench

Robyn Page and Phil Factor

Robyn Page and Phil Factor present practical T-SQL techniques for controlling access to sensitive information within the database, and preventing malicious SQL injection attacks.

SQL Server Helper Table Workbench

Robyn Page and Phil Factor

Cursors and iterations are both renowned for slowing down Transact SQL code, but sometimes seem unavoidable. In this workbench, Robyn Page and Phil Factor demonstrate some set-based techniques for string manipulation and time interval-based reporting, which use helper tables instead.

Robyn Page and Phil Factor

Author profile:


Robyn Page has worked as a consultant with Enformatica and USP Networks with a special interest in the provision of broadcast services over IP intranets. She was also a well known actress, being most famous for her role as Katie Williams, barmaid and man-eater in the Television Series Family Affairs. She is currently having a career break to raise a young family.

Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 20 years of experience with database-intensive applications. Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

Search for other articles by Robyn Page and Phil Factor

Rate this article:   Avg rating: from a total of 26 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Average in pivot
Posted by: Anonymous (not signed in)
Posted on: Tuesday, May 01, 2007 at 6:32 PM
Message: Be careful with zeros and nulls when pivoting using a case statement if you are going to be taking averages -- you would want to have the "else" part be null instead of zero so that the "else" rows aren't included in the count for determining the average.

Subject: wahhh.......................
Posted by: Anonymous (not signed in)
Posted on: Wednesday, May 02, 2007 at 6:19 AM
Message: Hai Hai,

I am beginners. Its really help me a lot. Thanks you so much. Please give me more samples/links like above.

Subject: On the spot
Posted by: Anonymous (not signed in)
Posted on: Wednesday, May 02, 2007 at 6:38 AM
Message: Just the reference I needed for my current project. Thank you!

Best regards
LB

Subject: What about @OtherErrorHappened?
Posted by: Anonymous (not signed in)
Posted on: Wednesday, May 02, 2007 at 7:49 AM
Message: In the Duplicates section, you look for errors when you delete the dupes from the original table, but then you don't check to see if there were any errors in @OtherErrorHappened before you commit.

Why is this?

Subject: Wrong link
Posted by: dolson (view profile)
Posted on: Wednesday, May 02, 2007 at 8:44 AM
Message: Great article.

FYI: The link for "SQL Server Helper Table Workbench" goes to the "SQL Server Security Workbench" page:
sql-server-security-workbench-part-1

Subject: Did we move
Posted by: Anonymous (not signed in)
Posted on: Wednesday, May 02, 2007 at 11:42 AM
Message: Great article, looking at the data, not sure when the United States moved to asia

Subject: Re: @OtherErrorHappened , Did We Move etc.
Posted by: Robyn Page (view profile)
Posted on: Wednesday, May 02, 2007 at 12:27 PM
Message: Oops. Phil took something out temporarily in order to test it, and forgot to put it back in! Or was it me? (senior moment)
Thanks too to an anonymous person who pointed out a confusing typo in a comment where we put GROUP BY instead of HAVING. (brain fade)
The Oil-Producers data came from several sources and needed a lot of cleaning, but I can't use the excuse of Continental Drift for the States wandering to Asia. I know we're insular, but this is ridiculous! (blond moment)

Subject: Re: Did We Move etc. Continental drift
Posted by: Phil Factor (view profile)
Posted on: Wednesday, May 02, 2007 at 2:33 PM
Message: Source files fixed. The United States is safely back in North America. Apologies. We'd like to take Britain out of Europe as easily but that is another issue!

Subject: Removing duplicates - maybe this is better idea than loop etc ;-)
Posted by: Anonymous (not signed in)
Posted on: Thursday, May 03, 2007 at 12:31 PM
Message:
BEGIN TRAN

INSERT INTO 
Duplicates
    
(Duplicate_ID,country,BarrelsPerDay,continent,region)
    
SELECT
            
oil.OilProducer_ID,oil.country,
oil.BarrelsPerDay,oil.continent,oil.region
    
FROM
            
oilproducers oil
            
LEFT JOIN
    
(
    
SELECT MAX(oilproducer_idoilproducer_id
        
FROM oilproducers
        
GROUP BY country
    
p
            
ON
    
(oil.oilproducer_id p.oilproducer_id)
    
WHERE
            
p.oilproducer_id IS NULL
IF @@ERROR 
            
AND @@ROWCOUNT 0
    
BEGIN
    DELETE 
        FROM 
oilproducers 
        
WHERE oilproducer_id IN
        
(
        
SELECT duplicate_id 
            
FROM duplicates)
    
IF @@ERROR 0
             
COMMIT
        ELSE
             ROLLBACK
        END
ELSE
    ROLLBACK

Subject: Re: Removing Duplicates
Posted by: Robyn Page (view profile)
Posted on: Thursday, May 03, 2007 at 1:10 PM
Message: Dunno, but it is certainly cleverer!

Subject: Duplicates
Posted by: Anonymous (not signed in)
Posted on: Monday, May 07, 2007 at 3:46 AM
Message: Loops should be avoided.

INSERT INTO Duplicates (

         
Duplicate_ID,country,BarrelsPerDay,continent,region)
SELECT OilProducer_ID,country,BarrelsPerDay,continent,region
    
FROM OilProducers 
    
WHERE oilProducer_ID NOT IN (
    
SELECT MIN(OilProducer_ID)
        
FROM OilProducers
        
GROUP BY country
    
)
DELETE 
    FROM 
OilProducers 
    
WHERE oilProducer_ID NOT IN (
    
SELECT MIN(OilProducer_ID)
        
FROM OilProducers
        
GROUP BY country)

Subject: Can't download the textfile!!!!!
Posted by: Anonymous (not signed in)
Posted on: Monday, May 07, 2007 at 9:14 AM
Message: Sorry! We cannot display this page because of an error.
We have alerted the Simple Talk team, and will fix the problem as soon as we can.

In the mean time, please try exploring other parts of our site.

If you repeatedly get this error, please contact editor@simple-talk.com

Subject: Re: Can't download the textfile!!!!!
Posted by: Phil Factor (view profile)
Posted on: Monday, May 07, 2007 at 3:21 PM
Message: Whatever it was seems to have stopped happening. We'll get it checked out.

Subject: please help me !
Posted by: iman770 (view profile)
Posted on: Sunday, May 13, 2007 at 4:30 AM
Message: hi; i iman from iran
i start new project with SQL
i need code about text search in database with any format in for exam .txt or .doc or .pdf
i want this code for persian language .
thanks

Subject: Variance Calculation
Posted by: Anonymous (not signed in)
Posted on: Monday, May 14, 2007 at 10:13 AM
Message: I was looking for some help with a matrix style report I am trying to develop.

Example, months accross the top, columns would be product grouping, then year. The detail would be a summation of invoice value. I can achieve this no problem. The issue is when I want to get a variance between the two years, i.e. Jan 2006 vs 2007 as row, underneath each month.

Any help would be greatly appreciated.

Kind Regards

Subject: Re: Variance Calculation
Posted by: Phil Factor (view profile)
Posted on: Monday, May 14, 2007 at 1:17 PM
Message: I'm not sure what precisely you mean by variance. This is a statistical term that applies to a list of numbers, but here you just have the data that applies to just two dates. Do you mean the difference between the two? (Normally given as a percentage)

Subject: Re:Variance Calculation
Posted by: Anonymous (not signed in)
Posted on: Wednesday, May 16, 2007 at 4:51 AM
Message: By Variance I mean. There will be sales data for say Jan 2007 and Jan 2006 and so on throughout the year. In my matrix example the months would run along the top and the two years as rows. So I trying to calculate the variance between Jan 2007 and Jan 2007 and display in a row. I hope that makes more sense. Thanks Mark

Subject: Some of the 'duplicates' code needs to be updated...
Posted by: Anonymous (not signed in)
Posted on: Friday, May 18, 2007 at 3:18 AM
Message: you need to change the code:

--let's maliciously create some duplicates!
INSERT INTO OilProducers(Country,BarrelsPerDay,continent, region)
SELECT Country,BarrelsPerDay,continent, region
FROM OilProducers WHERE country LIKE 'A%'
INSERT INTO OilProducers(Country,BarrelsPerDay ,continent, region)
SELECT Country,BarrelsPerDay,continent, region
FROM OilProducers WHERE country BETWEEN 'A' AND 'E'
--you can then easily see these duplicates by
SELECT 'First_ID'=MAX(OilProducer_ID), 'No.InTable'=COUNT(*)
FROM OilProducers
GROUP BY country
HAVING COUNT(*)>1


TO:

INSERT INTO Duplicates(Country,BarrelsPerDay,continent, region)
SELECT Country,BarrelsPerDay,continent, region
FROM OilProducers WHERE country LIKE 'A%'
INSERT INTO Duplicates(Country,BarrelsPerDay ,continent, region)
SELECT Country,BarrelsPerDay,continent, region
FROM OilProducers WHERE country BETWEEN 'A' AND 'E'


SELECT 'First_ID'=MAX(OilProducer_ID), 'No.InTable'=COUNT(*)
FROM Duplicates
GROUP BY country
HAVING COUNT(*)>1

Subject: aaargh - ignore comment above ;)
Posted by: Anonymous (not signed in)
Posted on: Friday, May 18, 2007 at 3:22 AM
Message:
you don't need to change the code :)

Subject: Nice
Posted by: Anonymous (not signed in)
Posted on: Friday, May 18, 2007 at 4:30 AM
Message: Its Nice article and posting of Removing Loops is nice

Subject: FTS
Posted by: Anonymous (not signed in)
Posted on: Friday, May 18, 2007 at 9:21 AM
Message: iman770, look into Full-Text Search in Books Online. It does exactly what you're asking.

Subject: Top n and others
Posted by: Anonymous (not signed in)
Posted on: Sunday, September 09, 2007 at 3:17 AM
Message: Hi!
In the query "or, more tiresomely, for the top five, the others as an 'Others' row AND the sum total!" I want to show the columns "Continent" and "Country" as well, how can I do that?

Thanks for great tips!
Lasse

Subject: Anonymous Comments Disabled
Posted by: Sarah Grady (view profile)
Posted on: Friday, December 07, 2007 at 7:42 AM
Message: Due to a large volume of spam, anonymous comments have been disabled.

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

Continuous Delivery and the Database
 Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Highway to Database Recovery
 Discover the best backup and recovery articles on Simple-Talk, all in one place. Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.