Click here to monitor SSC
  • Av rating:
  • Total votes: 28
  • Total comments: 23
Robyn Page and Phil Factor

SQL Server Grouping Workbench

26 April 2007

/*This Workbench is about using the GROUP BY clause in a SQL Statement. It is
designed so it can be pasted straight into SSMS or the Query Analyser, though
to run the examples, you'll need the data file too. Just click the "CODE DOWNLOAD"
link at the top of this article. Also included in the download bundle is the
original SQL file, if you need it.

-- Contents

Removing Duplicates
Simple Reporting
Grouping and Pivot Tables
Enumerating the Group members
Grouping as a utility

-- Introduction


Like all these workbench articles, the whole point is to try different things
out, improve on the code, and learn in a practical way. If you're like us, and
find you can take in information only after real practice doing things for real
then this article is for you. 

If you think that it is too elementary, move to the end of the article and test
out your knowledge by answering all the questions (Yes, the editor is giving 
away prizes for the best answers he receives. Insist on the glorious USB Red-Gate 

Aggregations in SQL are very simple to do, but can cause a lot of confusion
due to the word 'Group'. The result you get from a grouping does not consist 
of the individual records but a collection or 'aggregation' of records. 

A pack of playing cards is a group of 52 playing cards.
If you were to group a pack of cards by colour, you'd have two 'rows' of 26
cards; if you grouped by suit, there would be four 'rows' with 13 in each. You
could group them by their rank/court attribute, or in other ways. The point is 
that the result is always an aggregate, even if there is only one member in the 

Each row in the result of a GROUP BY represents a collection, or grouping. 
You can get the values for the number of records in each collection, or grouping, 
or the sum, average, and so on of numeric values, or the max or min of strings,
but to ask for individual values makes no sense if you haven't grouped by
that value. 

Sometimes, you will use an aggregate function in a SELECT Statement without using
a GROUP BY clause. There is an implicit grouping of the entire result into one
row, which is why you can only use aggregate functions in such a case, and will
get one row returned.

GROUP BY statements are the bedrock of reporting in SQL. 

The built-in aggregate functions are... 

common ones

sum([all|distinct] expr.)   Total of the (distinct) values in the expression
avg([all|distinct] expr.)   Average of the (distinct) values in the expression
count([all|distinct] expr.) Number of (distinct) non-null values in the expression
count(*)                    Number of selected rows
max(expr.)                  Highest value in the expression
min(expr.)                  Lowest value in the expression

Statistical functions

stdev(expr.)               The standard deviation of the values in the expression
Stdevp(expr.)              Population standard deviation of the expression values
Var(expr.)                 The Variance of the expression values
Varp(expr.)                    The population Variance of the expression values

Checksum functions

Binary_checksum(expr.)      Returns the binary checksum value
Checksum(expr.)                Returns the checksum value
Checksum(*)                    checksum is over all the columns of the table
([all|distinct] expr.)     Returns the checksum of the NON-NULL values in a group. 

(You can use sum and avg with numeric columns only--int, smallint, TinyIntBigint 
decimal, numeric, float, SmallMoney and money. You cannot use min and max with bit 
datatypes. You cannot use aggregate functions other than count(*) with 
text and image datatypes.)

We'll start with some simple manoevres and then finish off with a few wheelies and

As sample data, we'll take a list of all Oil-producing countries in the world
and their oil-production

Create a new database or use your 'practice' development Database and then create
this */

CREATE TABLE [dbo].[OilProducers](
[[OilProducer_ID] [int] IDENTITY(1,1) NOT NULL,
[country] [varchar](30) NOT NULL,
[BarrelsPerDay] [int] NOT NULL,
[Continent] [varchar](80) NOT NULL,
[Region] [varchar](80) NOT NULL

-- And we'll get our data in. Download the data file, and load the data as

-- substitute the full path to your file for the name 'OilProducers.txt'
BULK INSERT dbo.OilProducers
FROM ' OilProducers.txt'

/* or
Declare @Command varchar(8000)
select @Command='bcp '
   +'.dbo.OilProducers in OilProducers.txt -c -S '+@@Servername+' -E -T'
execute xp_cmdshell @command--assumes Windows Security

--Removing Duplicates

-- Before we get stuck into doing reporting, I'll illustrate how simple it is
-- to remove duplicates from tables using grouping

-- Of course, if you are going to remove duplicates, you will really want to
-- move them to another table for forensic analysis to find out how it happened
-- and whether they are valid entries. Here is our Duplicates table

CREATE TABLE [dbo].[Duplicates](
[OilProducer_ID] [int] IDENTITY(1,1) NOT NULL,
[Duplicate_ID] [int],
[country] [varchar](30) NOT NULL,
[BarrelsPerDay ] [int] NOT NULL,
[Continent] [varchar](80) NOT NULL,
[Region] [varchar](80) NOT NULL

--let's maliciously create some duplicates!

INSERT INTO OilProducers(Country,BarrelsPerDay,continentregion)
SELECT Country,BarrelsPerDay,continentregion 
FROM OilProducers WHERE country LIKE 'A%'
INSERT INTO OilProducers(Country,BarrelsPerDay ,continentregion)
SELECT Country,BarrelsPerDay,continentregion 
FROM OilProducers WHERE country BETWEEN 'A' AND 'E'

--you can then easily see these duplicates by 
SELECT 'First_ID'=MAX(OilProducer_ID), 'No.InTable'=COUNT(*)
FROM OilProducers
GROUP BY country 
HAVING COUNT(*)>1--notice that we use a HAVING clause. This is because 
--we are selecting only those rows that contain more than one row in the
--original table (we could have checked only part of the original table
--by using the WHERE clause. A WHERE clause cannot contain an aggregate 
-- function of course, but the HAVING clause can.

-- So we move them to another table to check them to make sure. We wrap
-- this up in a transaction just in case anything goes wrong, as we don't
-- want to lose a record that might turn out to be legit.
--start of routine

@Errorhappened INT
@OtherErrorhappened INT
AND COALESCE(@ErrorHappened,0) =0
AND COALESCE(@OtherErrorHappened,0)=0
Duplicates (Duplicate_ID,country,BarrelsPerDay,continent,region)
SELECT OilProducer_ID,country,BarrelsPerDay,continent,region
FROM OilProducers WHERE oilProducer_ID IN (
SELECT MAX(OilProducer_ID)
FROM OilProducers
GROUP BY country 
SELECT @Errorhappened=@@Error@found=@@Rowcount
DELETE FROM OilProducers WHERE oilProducer_ID IN (
SELECT MAX(OilProducer_ID)
FROM OilProducers
GROUP BY country 
HAVING COUNT( *)>1)--the HAVING clause has to be used as
                                  --we have to select rows after the
SELECT @OtherErrorHappened=@@Error
@errorHappened<>0 or @OtherErrorHappened <>0 ROLLBACK TRANSACTION
--end of routine


--Now look in the duplicate table
SELECT FROM Duplicates
--and check the main table for duplicates with the SQL I've already given

--Simple Reporting

-- what about getting a summary of oil production per continent?
SELECT [Barrels per day ]=SUM(BarrelsPerDay), continent
FROM OilProducers 
GROUP BY continent --each row should be a continent
ORDER BY SUM(BarrelsPerDayDESC --in descending order
-- we are grouping by continent and get back seven records, each of which
-- represents a collection of oil producers in the same continent

-- ..or by region ordered by production?
SELECT  [production]=SUM(BarrelsPerDay), 
[ Producers in region]=COUNT(*)
FROM OilProducers 
GROUP BY region --each row is a collection of all the records for a region
ORDER BY SUM(BarrelsPerDayDESC --order by production

--Grand totals are easy
[continent]=COALESCE(continent,'Total (world production)'), 
[Barrels per day]=SUM(BarrelsPerDay)
FROM OilProducers 
GROUP BY continent WITH rollup--each row should be a continent
ORDER BY grouping(continentASC,SUM(BarrelsPerDayDESC
-- we used the grouping() function to order the total at the end
-- we used WITH ROLLUP to compute the total. It can be very useful
-- for running sums and running averages.

-- ..but someone is bound to ask for just the top five
SELECT  TOP 5 [production]=SUM(BarrelsPerDay), 
[average]=AVG(BarrelsPerDay), Region[Producers in region]=COUNT(*)
FROM OilProducers 
GROUP BY region --each row is a collection of all the records for a region
ORDER BY SUM(BarrelsPerDayDESC --order by production

-- ..or maybe the bottom five!
(SELECT  TOP 5 [production]=SUM(BarrelsPerDay), [average]=AVG(BarrelsPerDay), 
Region[Producers in region]=COUNT(*)
FROM OilProducers 
GROUP BY region--each row is a collection of all the records for a region
ORDER BY SUM(BarrelsPerDayASC)--order by production
ORDER BY production DESC

-- or, more tiresomely, for the top five, the others as an 'Others' row 
-- AND the sum total!
[region]=CASE WHEN g.n=THEN ELSE Region END,
[production]=CASE WHEN t='other' 
THEN production -
(SELECT  TOP 5 [production]=SUM(BarrelsPerDay)
FROM OilProducers 
GROUP BY region 
ELSE production END
(SELECT [n]=0,[t]='',[TheOrder]=1  --trick to duplicate the null row  the
UNION ALL          --'to get 'others' row
SELECT 1,'Other',
UNION ALL SELECT 1,'Total',3)g
(SELECT  TOP 6 [production]=SUM(BarrelsPerDay),--the basic data
FROM OilProducers 
GROUP  BY region WITH rollup--
ON g.n=f.n
ORDER BY g.TheOrder ASCproduction DESC 

--and then  there will be the guy who wants a list of just those
--regions who produce more than the average

SELECT   Region[production]=SUM(BarrelsPerDay),  [Producers in region]=COUNT(*)
FROM OilProducers 
GROUP BY region-- each row is a collection of all the records for a region
HAVING SUM(BarrelsPerDay)>
(SELECT [average]=AVG(BarrelsPerDayFROM OilProducers GROUP BY region)f)
ORDER BY SUM(BarrelsPerDayDESC --order by production

--or possibly the five countries closest to the average
SELECT TOP 5 countryBarrelsPerDaytheAverage
FROM OilProducers
(SELECT [theAverage]=AVG(BarrelsPerDayFROM OilProducers)f
ORDER BY  ABS(barrelsPerDay-theaverage)   ASC

--or conceivably, region closest to the average
(SELECT region[sumBPD]=SUM(BarrelsPerDay)--, theAverage
FROM OilProducers GROUP BY region
(SELECT [theAverage]=AVG(productionFROM 
(SELECT [production]=SUM(BarrelsPerDay
FROM OilProducers GROUP BY region
ORDER BY  ABS(sumBPD-theaverageASC

-- Grouping and Pivot Tables

-- So let's get a bit harder and do a pivot table relating regions and
-- continents with column and row totals.

SELECT [region/continent ]=CONVERT(CHAR(25),COALESCE(region,'Total')),
'Asia'=SUM(CASE WHEN continent 'Asia' THEN BarrelsPerDay ELSE END),
'North A.'=SUM(CASE WHEN continent ='North America' 
THEN BarrelsPerDay ELSE END) ,
'Latin A.'=SUM(CASE WHEN continent ='Latin America and the Caribbean' 
THEN BarrelsPerDay ELSE END),
'Africa'=SUM(CASE WHEN continent ='Africa' THEN BarrelsPerDay ELSE END),
'Russia'=SUM(CASE WHEN continent ='Russia' THEN BarrelsPerDay  ELSE END),
'Europe'=SUM(CASE WHEN continent ='Europe' THEN BarrelsPerDay ELSE END),
'Oceania'=SUM(CASE WHEN continent ='Oceania' THEN BarrelsPerDay ELSE END),
FROM OilProducers
GROUP BY region
WITH rollup
ORDER  BY grouping(region),SUM(BarrelsPerDayDESC

--the 'order by grouping' trick ensures that the total comes in the right place
--on the last row!

--Enumerating the Group members

--This is all OK but a lot of people want, and expect, to have a list of all
--the constituents of their grouping in the result. They don't like DBAs
--laughing and saying it isn't possible. There are now CLR Aggregation routines
--that do it, but here is a method that works on SQL 2000. Not a cursor in 
--sight! (it works a lot better in 2005 with VaRCHAR(MAX))

SELECT @List=COALESCE(@list+',','') +'|'+region +'|'country+'|'+region +'|'
FROM OilProducers ORDER BY region

SELECT [region]=CONVERT(CHAR(26),region), 
[Oil-producing countries]=LEFT(members,CHARINDEX(',|',members+',|')-1FROM
(SELECT region'members'=
FROM (SELECT 'list'=@List)f
SELECT region FROM oilproducers GROUP BY region

Results (Truncated to the right)
region                     Oil-producing countries
-------------------------- ----------------------------------------------------...
Australia and New Zealand  Australia,New Zealand
Caribbean                  Virgin Islands,Cuba,Trinidad and Tobago,Aruba,Barbad...
Central America            Belize,Nicaragua,Guatemala,Mexico
Eastern Africa             Zambia,Madagascar
Eastern Asia               Korea, North,Mongolia,Taiwan,Korea, South,China,Japan
Eastern Europe             Czech Republic,Poland,Belarus,Ukraine,Hungary,Romani...
Melanesia                  Papua New Guinea
Middle Africa              Congo,Chad,Equatorial Guinea,Gabon,Angola,Cameroon
North America              Canada, United States 
Northern Africa            Algeria,Egypt,Libya,Tunisia,Sudan,Morocco
Northern Europe            Finland,Sweden,Lithuania,Estonia,United Kingdom,Norw...
Russia                     Russia
South America              Ecuador,Argentina,Brazil,Colombia,Venezuela,Peru,Sur...
South-central Asia         Kyrgyzstan,Bangladesh,Pakistan,Tajikistan,Turkmenist...
South-eastern Asia         Malaysia,Indonesia,Brunei,Vietnam,Thailand,Singapore...
Southern Africa            South Africa
Southern Europe            Italy,Croatia,Spain,Greece,Albania,Slovenia
Western Africa             Sierra Leone,Ghana,Nigeria,Mauritania,Cote d'Ivoire
Western Asia               Turkey,Bahrain,Yemen,Syria,United Arab Emirates,Saud...
Western Europe             Switzerland,Belgium,Germany,Netherlands,France,Austr...

-- now this technique could get quite interesting if combined with 'ranging' where
-- you can impose categories onto the data of your choice (Date ranges are very
-- common)

-- Ranging

--By ranging, I mean imposinbg aribrary value ranges, and grouping by them
--you can, of course, use a helper table to do this much more elegantly

-- Ranging can be by column
[range]='No. of Countries'
[< 10,000 bpd]=SUM(CASE WHEN Barrelsperday<10000 THEN ELSE END),
[10,000 to 100,000]=SUM(CASE WHEN Barrelsperday BETWEEN 10000 AND 100000  
[100,000 to 1m bpd]=SUM(CASE WHEN Barrelsperday BETWEEN 100001 AND 1000000 
[1m to 10m bpd]=SUM(CASE WHEN Barrelsperday BETWEEN 1000001 AND 10000000 
[more than 10m bpd]=SUM(CASE WHEN Barrelsperday 10000000  THEN ELSE END)
FROM OilProducers

--or by row...
SELECT [category]=CASE WHEN Barrelsperday<10000 THEN '1/ > 10,000 bpd' 
WHEN Barrelsperday BETWEEN 10000   AND 100000 THEN '2/ 10,000 to 100,000 '
WHEN Barrelsperday BETWEEN 100001  AND 1000000 THEN '3/ 100,000 to 1m bpd'
WHEN Barrelsperday BETWEEN 1000001 AND 10000000 THEN '4/ 1m to 10m bpd'
WHEN Barrelsperday 10000000  THEN '5/ more than 10m bpd' END,
[No. of Countries]=COUNT(*)
FROM OilProducers
GROUP BY CASE WHEN Barrelsperday<10000  THEN '1/ > 10,000 bpd' 
WHEN Barrelsperday BETWEEN 10000   AND 100000 THEN '2/ 10,000 to 100,000'
WHEN Barrelsperday BETWEEN 100001  AND 1000000 THEN '3/ 100,000 to 1m bpd'
WHEN Barrelsperday BETWEEN 1000001 AND 10000000 THEN '4/ 1m to 10m bpd'
WHEN Barrelsperday 10000000  THEN '5/ more than 10m bpd' END

--The code above is much more efficient than it looks, but why not make a 
--User-defined function to do it?

@OilOutput INT
CASE WHEN @OilOutput<10000 THEN '1/ > 10,000 bpd' 
WHEN @OilOutput BETWEEN 10000   AND 100000 THEN '2/ 10,000 to 100,000'
WHEN @OilOutput BETWEEN 100001  AND 1000000 THEN '3/ 100,000 to 1m bpd'
WHEN @OilOutput BETWEEN 1000001 AND 10000000 THEN  '4/ 1m to 10m bpd'
WHEN @OilOutput 10000000  THEN '5/ more than 10m bpd' END

--so the query becomes...
SELECT [category]=dbo.OilOutputCategory(BarrelsPerDay),
FROM OilProducers
GROUP BY dbo.OilOutputCategory(BarrelsPerDay)

--so let's combine ranging and enumeration!

--We can list all the countries that fall in each range category....
DECLARE @list VARCHAR( 8000)
SELECT @List=COALESCE(@list+',','')+catfudge
FROM (--the concatenation of the string didn't work until we did this!
SELECT TOP 1000 [catfudge]=
('|'+dbo.OilOutputCategory(BarrelsPerDay) +'|'country+'|'
+dbo.OilOutputCategory(BarrelsPerDay) +'|'),
FROM  OilProducers ORDER BY cat)f

SELECT [category]=CONVERT(CHAR(26),category), 
[Oil-producing countries]=LEFT(members,CHARINDEX(',|',members+',|')-1FROM
(SELECT category'members'=
SUBSTRING(list,CHARINDEX('|'+ categories.category+'|',list),8000),
FROM (SELECT 'list'=@List)f
SELECT [category]=dbo.OilOutputCategory(BarrelsPerDayFROM oilproducers 
GROUP BY dbo.OilOutputCategory(BarrelsPerDay)

Results (Truncated to the right)
category                   Oil-producing countries
-------------------------- -------------------------------------------------...
1/ > 10,000 bpd            Singapore,Burma,Finland,Taiwan,Ghana,Korea, South,...
2/ 10,000 to 100,000       Tunisia,Netherlands,Ukraine,Mauritania,France,Cuba...
3/ 100,000 to 1m bpd       Syria,Sudan,Oman,Qatar,Romania,Italy,Malaysia,Chad...
4/ 1m to 10m bpd           Venezuela,United Arab Emirates,United Kingdom,Indo...
5/ more than 10m bpd       Saudi Arabia

-- Grouping as a utility

-- One can use grouping in a variety of ways that have nothing to do with 
-- reporting . Here is an example of using GROUP BY to help produce a table by 
-- splitting a  delimited list. It needs the Number Helper Table (You'll have 
-- to refer to the 'Helper Table Workbench to find out about that). 

CREATE FUNCTION [dbo].[uftSplitStringGroup]
@DelimitedList VARCHAR (8000),
@Delimiter VARCHAR(20)=','
@Results TABLE
Item VARCHAR(8000)
@String VARCHAR(8000)
DECLARE @LenDelimiter INT

@LenDelimiter=LEN( @Delimiter)

INSERT INTO @results (item)
FROM    (SELECT [string]@String )f
CROSS JOIN numbers s1 
INNER JOIN numbers s2 ON s1.number< s2.number
WHERE s1.number<=LEN (@string)
s2.number<=LEN (@string)
AND SUBSTRING(@string,s2.number,@LenDelimiter)=@Delimiter
GROUP BY s1.number
select * from uftSplitStringGroup('first,second,third',default)
select * from uftSplitStringGroup('<>this is something<>something else<>','<>')
select * from uftSplitStringGroup(
 'Turkey,Bahrain,Yemen,Syria,United Arab Emirates,Saudi Arabia,Qatar,Kuwait',',')


Send your answers to The first three correct entries
will receive a much-coveted Simple-Talk goodie bag (polo shirt, USB key, bottle
opener, beermat, pen).

1/  How would you get the count of the number of rows in a table with NULLS in 
    a particular column, using GROUP BY, but without using a COALESCE or CASE 

2/  What is the GROUPING() function useful for?

3/  Can a WHERE clause contain an aggregate function?

4/  When would you need to use a HAVING clause?

5/  What does the ROLLUP do? How would you use it?

6/  Can you use UDFs (user-defined scalar-valued  functions) in GROUP BY clauses? */

See also other Workbenches at Simple-Talk

Robyn Page's SQL Server DATE/TIME Workbench

Robyn Page

Date calculation and formatting in SQL Server can be surprisingly tricky. Robyn Page's "hands-on" workbench will lead you through the minefield.

Robyn Page's SQL Server String Manipulation Workbench

Robyn Page

String searching and manipulation in SQL Server can be error-prone and tedious...unless you're armed with the techniques described in Robyn's string manipulation workbench...

SQL Server Error Handling Workbench

Grant Fritchey

Grant Fritchey steps into the workbench arena, with an example-fuelled examination of catching and gracefully handling errors in SQL 2000 and 2005, including worked examples of the new TRY..CATCH capabilities.

Robyn Page's SQL Server Cursor Workbench

Robyn Page

The topic of cursors is the ultimate "hot potato" in the world of SQL Server. Everyone has a view on when they should and mainly should not be used. By example and testing Robyn Page proves that, when handled with care, cursors are not necessarily a "bad thing".

Robyn Page's SQL Server Data Validation Workbench

Robyn Page

Robyn Page provides essential techniques for ensuring the validity of the data being entered into your SQL Server tables.

Robyn Page's Excel Workbench

Robyn Page and Phil Factor

The need to produce Excel reports from SQL Server is very common. Here, Robyn Page and Phil Factor present practical techniques for creating and manipulating Excel spreadsheets from SQL Server, using linked servers and T-SQL. The pièce de résistance is a stored procedure that uses OLE Automation...

Robyn Page's SQL Server Security Workbench

Robyn Page and Phil Factor

Robyn Page and Phil Factor present practical T-SQL techniques for controlling access to sensitive information within the database, and preventing malicious SQL injection attacks.

SQL Server Helper Table Workbench

Robyn Page and Phil Factor

Cursors and iterations are both renowned for slowing down Transact SQL code, but sometimes seem unavoidable. In this workbench, Robyn Page and Phil Factor demonstrate some set-based techniques for string manipulation and time interval-based reporting, which use helper tables instead.

Robyn Page and Phil Factor

Author profile:

Robyn Page has worked as a consultant with Enformatica and USP Networks with a special interest in the provision of broadcast services over IP intranets. She was also a well known actress, being most famous for her role as Katie Williams, barmaid and man-eater in the Television Series Family Affairs. She is currently having a career break to raise a young family.

Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 20 years of experience with database-intensive applications. Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

Search for other articles by Robyn Page and Phil Factor

Rate this article:   Avg rating: from a total of 28 votes.





Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.

Subject: Average in pivot
Posted by: Anonymous (not signed in)
Posted on: Tuesday, May 1, 2007 at 6:32 PM
Message: Be careful with zeros and nulls when pivoting using a case statement if you are going to be taking averages -- you would want to have the "else" part be null instead of zero so that the "else" rows aren't included in the count for determining the average.

Subject: wahhh.......................
Posted by: Anonymous (not signed in)
Posted on: Wednesday, May 2, 2007 at 6:19 AM
Message: Hai Hai,

I am beginners. Its really help me a lot. Thanks you so much. Please give me more samples/links like above.

Subject: On the spot
Posted by: Anonymous (not signed in)
Posted on: Wednesday, May 2, 2007 at 6:38 AM
Message: Just the reference I needed for my current project. Thank you!

Best regards

Subject: What about @OtherErrorHappened?
Posted by: Anonymous (not signed in)
Posted on: Wednesday, May 2, 2007 at 7:49 AM
Message: In the Duplicates section, you look for errors when you delete the dupes from the original table, but then you don't check to see if there were any errors in @OtherErrorHappened before you commit.

Why is this?

Subject: Wrong link
Posted by: dolson (view profile)
Posted on: Wednesday, May 2, 2007 at 8:44 AM
Message: Great article.

FYI: The link for "SQL Server Helper Table Workbench" goes to the "SQL Server Security Workbench" page:

Subject: Did we move
Posted by: Anonymous (not signed in)
Posted on: Wednesday, May 2, 2007 at 11:42 AM
Message: Great article, looking at the data, not sure when the United States moved to asia

Subject: Re: @OtherErrorHappened , Did We Move etc.
Posted by: Robyn Page (view profile)
Posted on: Wednesday, May 2, 2007 at 12:27 PM
Message: Oops. Phil took something out temporarily in order to test it, and forgot to put it back in! Or was it me? (senior moment)
Thanks too to an anonymous person who pointed out a confusing typo in a comment where we put GROUP BY instead of HAVING. (brain fade)
The Oil-Producers data came from several sources and needed a lot of cleaning, but I can't use the excuse of Continental Drift for the States wandering to Asia. I know we're insular, but this is ridiculous! (blond moment)

Subject: Re: Did We Move etc. Continental drift
Posted by: Phil Factor (view profile)
Posted on: Wednesday, May 2, 2007 at 2:33 PM
Message: Source files fixed. The United States is safely back in North America. Apologies. We'd like to take Britain out of Europe as easily but that is another issue!

Subject: Removing duplicates - maybe this is better idea than loop etc ;-)
Posted by: Anonymous (not signed in)
Posted on: Thursday, May 3, 2007 at 12:31 PM

oilproducers oil
SELECT MAX(oilproducer_idoilproducer_id
FROM oilproducers
GROUP BY country
(oil.oilproducer_id p.oilproducer_id)
p.oilproducer_id IS NULL
WHERE oilproducer_id IN
SELECT duplicate_id 
FROM duplicates)

Subject: Re: Removing Duplicates
Posted by: Robyn Page (view profile)
Posted on: Thursday, May 3, 2007 at 1:10 PM
Message: Dunno, but it is certainly cleverer!

Subject: Duplicates
Posted by: Anonymous (not signed in)
Posted on: Monday, May 7, 2007 at 3:46 AM
Message: Loops should be avoided.

INSERT INTO Duplicates (

SELECT OilProducer_ID,country,BarrelsPerDay,continent,region
FROM OilProducers 
WHERE oilProducer_ID NOT IN (
SELECT MIN(OilProducer_ID)
FROM OilProducers
GROUP BY country
WHERE oilProducer_ID NOT IN (
SELECT MIN(OilProducer_ID)
FROM OilProducers
GROUP BY country)

Subject: Can't download the textfile!!!!!
Posted by: Anonymous (not signed in)
Posted on: Monday, May 7, 2007 at 9:14 AM
Message: Sorry! We cannot display this page because of an error.
We have alerted the Simple Talk team, and will fix the problem as soon as we can.

In the mean time, please try exploring other parts of our site.

If you repeatedly get this error, please contact

Subject: Re: Can't download the textfile!!!!!
Posted by: Phil Factor (view profile)
Posted on: Monday, May 7, 2007 at 3:21 PM
Message: Whatever it was seems to have stopped happening. We'll get it checked out.

Subject: please help me !
Posted by: iman770 (view profile)
Posted on: Sunday, May 13, 2007 at 4:30 AM
Message: hi; i iman from iran
i start new project with SQL
i need code about text search in database with any format in for exam .txt or .doc or .pdf
i want this code for persian language .

Subject: Variance Calculation
Posted by: Anonymous (not signed in)
Posted on: Monday, May 14, 2007 at 10:13 AM
Message: I was looking for some help with a matrix style report I am trying to develop.

Example, months accross the top, columns would be product grouping, then year. The detail would be a summation of invoice value. I can achieve this no problem. The issue is when I want to get a variance between the two years, i.e. Jan 2006 vs 2007 as row, underneath each month.

Any help would be greatly appreciated.

Kind Regards

Subject: Re: Variance Calculation
Posted by: Phil Factor (view profile)
Posted on: Monday, May 14, 2007 at 1:17 PM
Message: I'm not sure what precisely you mean by variance. This is a statistical term that applies to a list of numbers, but here you just have the data that applies to just two dates. Do you mean the difference between the two? (Normally given as a percentage)

Subject: Re:Variance Calculation
Posted by: Anonymous (not signed in)
Posted on: Wednesday, May 16, 2007 at 4:51 AM
Message: By Variance I mean. There will be sales data for say Jan 2007 and Jan 2006 and so on throughout the year. In my matrix example the months would run along the top and the two years as rows. So I trying to calculate the variance between Jan 2007 and Jan 2007 and display in a row. I hope that makes more sense. Thanks Mark

Subject: Some of the 'duplicates' code needs to be updated...
Posted by: Anonymous (not signed in)
Posted on: Friday, May 18, 2007 at 3:18 AM
Message: you need to change the code:

--let's maliciously create some duplicates!
INSERT INTO OilProducers(Country,BarrelsPerDay,continent, region)
SELECT Country,BarrelsPerDay,continent, region
FROM OilProducers WHERE country LIKE 'A%'
INSERT INTO OilProducers(Country,BarrelsPerDay ,continent, region)
SELECT Country,BarrelsPerDay,continent, region
FROM OilProducers WHERE country BETWEEN 'A' AND 'E'
--you can then easily see these duplicates by
SELECT 'First_ID'=MAX(OilProducer_ID), 'No.InTable'=COUNT(*)
FROM OilProducers
GROUP BY country


INSERT INTO Duplicates(Country,BarrelsPerDay,continent, region)
SELECT Country,BarrelsPerDay,continent, region
FROM OilProducers WHERE country LIKE 'A%'
INSERT INTO Duplicates(Country,BarrelsPerDay ,continent, region)
SELECT Country,BarrelsPerDay,continent, region
FROM OilProducers WHERE country BETWEEN 'A' AND 'E'

SELECT 'First_ID'=MAX(OilProducer_ID), 'No.InTable'=COUNT(*)
FROM Duplicates
GROUP BY country

Subject: aaargh - ignore comment above ;)
Posted by: Anonymous (not signed in)
Posted on: Friday, May 18, 2007 at 3:22 AM
you don't need to change the code :)

Subject: Nice
Posted by: Anonymous (not signed in)
Posted on: Friday, May 18, 2007 at 4:30 AM
Message: Its Nice article and posting of Removing Loops is nice

Subject: FTS
Posted by: Anonymous (not signed in)
Posted on: Friday, May 18, 2007 at 9:21 AM
Message: iman770, look into Full-Text Search in Books Online. It does exactly what you're asking.

Subject: Top n and others
Posted by: Anonymous (not signed in)
Posted on: Sunday, September 9, 2007 at 3:17 AM
Message: Hi!
In the query "or, more tiresomely, for the top five, the others as an 'Others' row AND the sum total!" I want to show the columns "Continent" and "Country" as well, how can I do that?

Thanks for great tips!

Subject: Anonymous Comments Disabled
Posted by: Sarah Grady (view profile)
Posted on: Friday, December 7, 2007 at 7:42 AM
Message: Due to a large volume of spam, anonymous comments have been disabled.

Simple-Talk Database Delivery

Patterns & Practices Library

Visit our patterns and practices library to learn more about database lifecycle management.

Find out how to automate the process of building, testing and deploying your database changes to reduce risk and make rapid releases possible.

Get started

Phil Factor
Dependencies and References in SQL Server

It is important for developers and DBAs to be able to determine the interdependencies of any database object.... Read more...

 View the blog

Top Rated

Checklists: A DBA’s Best Friend
 Because so much of the work of the DBA involved doing things in the right order without missing steps... Read more...

Introducing SQL Server In-Memory OLTP
 In-Memory OLTP, aka Hekaton, originally shipped with 2014, and although it certainly helped the... Read more...

SQLXML Bulk Loader: The Sequel
 SQLXML isn't exactly new technology, and parts of it aren't pretty, but if you need to heave vast... Read more...

Writing Build vNext tasks for Visual Studio Online
 Hosted TFS, now called Visual Studio Online (VSO), has a new way of writing build processes called... Read more...

Microsoft and Database Lifecycle Management (DLM): The DacPac
 The Data-Tier Application Package (DacPac), together with the Data-Tier Application Framework (DacFx),... Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

Temporary Tables in SQL Server
 Temporary tables are used by every DB developer, but they're not likely to be too adventurous with... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.