Click here to monitor SSC
  • Av rating:
  • Total votes: 300
  • Total comments: 27
András Belokosztolszki

Removing Duplicates from a Table in SQL Server

11 February 2009

Sometimes, in SQL, it is the routine operations that turn out to be the trickiest for a DBA or developer. The cleaning up, or de-duplication, of data is one of those. András runs through a whole range of  methods and tricks, and ends with a a fascinating technique using CTE, ROW_NUMBER() and DELETE

Only rarely will you need to remove duplicate entries from a table on a production database. The tables in these databases should have a constraint, such as a primary key or unique constraint, to prevent these duplicate entries occurring in the first place. However, last year at SQL Bits 3 in Reading, I asked my audience how many of them needed to remove duplicate rows from a table, and almost eighty percent raised a hand.

How is it that duplicates can get into a properly-designed table?  Most commonly, this is due to changes in the business rules that define what constitutes a duplicate, especially after the merging of two different systems.  In this article, I will look at some ways of removing duplicates from tables in SQL Server 2000 and later versions, and at some of the problems that may arise.

Checking for Duplicates

On any version of SQL Server, you can identify duplicates using a simple query, with GROUP BY and HAVING, as follows:

DECLARE @table TABLE (data VARCHAR(20))

INSERT INTO @table VALUES ('not duplicate row')

INSERT INTO @table VALUES ('duplicate row')

INSERT INTO @table VALUES ('duplicate row')

 

SELECT  data

      , COUNT(data) nr

FROM    @table

GROUP BY data

HAVING  COUNT(data) > 1

The result indicates that there are two occurrences of the row containing the “duplicate row” text:

data                 nr

-------------------- -----------

duplicate row        2

Removing Duplicate Rows in SQL Server

The following sections present a variety of techniques for removing duplicates from SQL Server database tables, depending on the nature of the table design.

Tables with no primary key

When you have duplicates in a table that has no primary key defined, and you are using an older version of SQL Server, such as SQL Server 2000, you do not have an easy way to identify a single row. Therefore, you cannot simply delete this row by specifying a WHERE clause in a DELETE statement.

You can, however, use the SET ROWCOUNT 1 command, which will restrict the subsequent DELETE statement to removing only one row. For example:

DECLARE @table TABLE (data VARCHAR(20))

INSERT INTO @table VALUES ('not duplicate row')

INSERT INTO @table VALUES ('duplicate row')

INSERT INTO @table VALUES ('duplicate row')

 

SET ROWCOUNT 1

DELETE FROM @table WHERE data = 'duplicate row'

SET ROWCOUNT 0

In the above example, only one row is deleted. Consequently, there will be one remaining row with the content “duplicate row”. If you have more than one duplicate of a particular row, you would simply adjust the ROWCOUNT accordingly. Note that after the delete, you should reset the ROWCOUNT to 0 so that subsequent queries are not affected.

To remove all duplicates in a single pass, the following code will work, but is likely to be horrendously slow if there are a large number of duplicates and table rows:

DECLARE @table TABLE (data VARCHAR(20))

INSERT INTO @table VALUES ('not duplicate row')

INSERT INTO @table VALUES ('duplicate row')

INSERT INTO @table VALUES ('duplicate row')

 

SET NOCOUNT ON

SET ROWCOUNT 1

WHILE 1 = 1

   BEGIN

      DELETE   FROM @table

      WHERE    data IN (SELECT  data

                               FROM    @table

                               GROUP BY data

                               HAVING  COUNT(*) > 1)

      IF @@Rowcount = 0

         BREAK ;

   END

SET ROWCOUNT 0

When cleaning up a table that has a large number of duplicate rows, a better approach is to select just a distinct list of the duplicates, delete all occurrences of those duplicate entries from the original and then insert the list into the original table.

DECLARE @table TABLE(data VARCHAR(20))

INSERT INTO @table VALUES ('not duplicate row')

INSERT INTO @table VALUES ('duplicate row')

INSERT INTO @table VALUES ('duplicate row')

INSERT INTO @table VALUES ('second duplicate row')

INSERT INTO @table VALUES ('second duplicate row')

 

SELECT   data

INTO     #duplicates

FROM     @table

GROUP BY data

HAVING   COUNT(*) > 1

 

-- delete all rows that are duplicated

DELETE   FROM @table

FROM     @table o INNER JOIN #duplicates d

         ON d.data = o.data

 

-- insert one row for every duplicate set

INSERT   INTO @table(data)

         SELECT   data

         FROM     #duplicates

As a variation of this technique, you could select all the data, without duplicates, into a new table, delete the old table, and then rename the new table to match the name of the original table:

CREATE TABLE duplicateTable3(data VARCHAR(20))

INSERT INTO duplicateTable3 VALUES ('not duplicate row')

INSERT INTO duplicateTable3 VALUES ('duplicate row')

INSERT INTO duplicateTable3 VALUES ('duplicate row')

INSERT INTO duplicateTable3 VALUES ('second duplicate row')

INSERT INTO duplicateTable3 VALUES ('second duplicate row')

 

SELECT DISTINCT data

INTO    tempTable

FROM    duplicateTable3

GO

TRUNCATE TABLE duplicateTable3

DROP TABLE duplicateTable3

exec sp_rename 'tempTable', 'duplicateTable3'

In this solution, the SELECT DISTINCT will select all the rows from our table except for the duplicates. These rows are immediately inserted into a table named tempTable. This is a temporary table in the sense that we will use it to temporarily store the unique rows. However, it is not a true temporary table (i.e. one that lives in the temporary database), because we need the table to exist in the current database, so that it can later be renamed, using sp_Rename.

The sp_Rename command is an absolutely horrible way of renaming textual objects, such as stored procedures, because it does not update all the system tables consistently. However, it works well for non-textual schema objects, such as tables.

Note that this solution is usually used on table that has no primary key. If there is a key, and there  are foreign keys referencing the rows that  are identified as being  duplicates, then the foreign key constraints need to be dropped and re-created again during the table swap.

Tables with a primary key, but no foreign key constraints

If your table has a primary key, but no foreign key constraints, then the following solution offers a way to remove duplicates that is much quicker, as it entails less iteration:

DECLARE @table TABLE(

      id INT IDENTITY(1, 1)

    , data VARCHAR(20)

    )

INSERT INTO @table VALUES ('not duplicate row')

INSERT INTO @table VALUES ('duplicate row')

INSERT INTO @table VALUES ('duplicate row')

 

WHILE 1 = 1

   BEGIN

      DELETE   FROM @table

      WHERE    id IN (SELECT   MAX(id)

                        FROM     @table

                        GROUP BY data

                        HAVING   COUNT(*) > 1)

      IF @@Rowcount = 0

         BREAK ;

   END

Unfortunately, this sort of technique does not scale well.

If your table has a reliable primary key, for example one that has an assigned a value that can be used in a comparison, such as a numeric value in a column with the IDENTITY property enabled, then the following approach is probably the neatest and best. Essentially, it deletes all the duplicates except for the one with the highest value for the primary key. If a table has a unique column such as a number or integer, that will reliably return just one value with  MAX() or MIN(), then you can use this technique  to identify the chosen survivor of the group of duplicates.

DECLARE @table TABLE (

      id INT IDENTITY(1, 1)

    , data VARCHAR(20)

    )

INSERT INTO @table VALUES ('not duplicate row')

INSERT INTO @table VALUES ('duplicate row')

INSERT INTO @table VALUES ('duplicate row')

INSERT INTO @table VALUES ('second duplicate row')

INSERT INTO @table VALUES ('second duplicate row')

 

 

DELETE  FROM @table

FROM    @table o

        INNER JOIN ( SELECT data

                     FROM   @table

                     GROUP BY data

                     HAVING COUNT(*) > 1

                   ) f ON o.data = f.data

        LEFT OUTER JOIN ( SELECT    [id] = MAX(id)

                          FROM      @table

                          GROUP BY  data

                          HAVING    COUNT(*) > 1

                        ) g ON o.id = g.id

WHERE   g.id IS NULL

This can be simplified even further, though the logic is rather harder to follow.

DELETE   FROM f

FROM     @table AS f INNER JOIN @table AS g

         ON g.data = f.data

              AND f.id < g.id

Tables that are referenced by a Foreign Key

If you've you’ve set up your constraints properly then you will be unable to delete duplicate rows from a table that is referenced by another table, using the above techniques unless you have specified cascading deletes in the foreign key constraints.

You can alter existing foreign key constraints by adding a cascading delete on the foreign key constraint. This means that rows in other tables that refer to the duplicate row via a foreign key constraint will be deleted.  Because you will lose the referenced data as well as the duplicate, you are more likely to wish to save the duplicate data in its entirety first in a holding table.  When you are dealing with real data, you are likely to need to identify the duplicate rows that are being referred to, and delete the duplicates that are not referenced, or merge duplicates and update the references. This task will probably have to be done manually in order to ensure data integrity.

Tables with columns that cannot have a UNIQUE constraint

Sometimes, of course, you may have columns on which you cannot define a unique constraint, or you cannot even use the DISTINCT keyword. Large object types, like NTEXT, TEXT and IMAGE in SQL Server 2000 are good examples of this.  These are data types that cannot be compared, and so the above solutions would not work.

In these situations, you will need to add an extra column to the table that you could use as a surrogate key. Such a surrogate key is not derived from the application data. Its value may be automatically generated, similarly to the identity columns in our previous examples. Unfortunately, in SQL Server, you cannot add an identity column to a table as part of the ALTER TABLE command. The only way to add such a column is to rebuild the table, using SELECT INTO and the IDENTITY() function, as follows:

CREATE TABLE duplicateTable4 (data NTEXT)

INSERT INTO duplicateTable4 VALUES ('not duplicate row')

INSERT INTO duplicateTable4 VALUES ('duplicate row')

INSERT INTO duplicateTable4 VALUES ('duplicate row')

INSERT INTO duplicateTable4 VALUES ('second duplicate row')

INSERT INTO duplicateTable4 VALUES ('second duplicate row')

 

 

SELECT  IDENTITY( INT, 1,1 ) AS id,

        data

INTO    duplicateTable4_Copy

FROM    duplicateTable4

The above will create the duplicateTable4_Copy table. This table will have an identity column named id, which will already have unique numeric values set. Note that although we are creating an Identity column, uniqueness is not enforced in this case; you will need to add a unique index or define the id column as a primary key.

Using a cursor

People with application development background would consider using a cursor to try to eliminate duplicates. The basic idea is to order the contents of the table, iterate through the ordered rows, and check if the current row is equal to the previous row. If it does, then delete the row. This solution could look like the following in T-SQL:

CREATE TABLE duplicateTable5 (data varchar(30))

INSERT INTO duplicateTable5 VALUES ('not duplicate row')

INSERT INTO duplicateTable5 VALUES ('duplicate row')

INSERT INTO duplicateTable5 VALUES ('duplicate row')

INSERT INTO duplicateTable5 VALUES ('second duplicate row')

INSERT INTO duplicateTable5 VALUES ('second duplicate row')

DECLARE @data VARCHAR(30),

    @previousData VARCHAR(30)

DECLARE cursor1 CURSOR SCROLL_LOCKS

    FOR SELECT  data

        FROM    duplicateTable5

        ORDER BY data

    FOR UPDATE

OPEN cursor1

 

FETCH NEXT FROM cursor1 INTO @data

WHILE @@FETCH_STATUS = 0

    BEGIN

          IF @previousData = @data

              DELETE  FROM duplicateTable5

              WHERE CURRENT OF cursor1

                 

          SET @previousData = @data

          FETCH NEXT FROM cursor1 INTO @data

    END

CLOSE cursor1

DEALLOCATE cursor1

The above script will not work, because once you apply the ORDER BY clause in the cursor declaration the cursor will become read-only. If you remove the ORDER BY clause, then there will be no guarantee that the rows will be in order, and checking two subsequent rows would no longer be sufficient to identify duplicates. Interestingly, since the above example creates a small table where all the rows fit onto a single database page and duplicate rows are inserted in groups, removing the ORDER BY clause does make the cursor solution work. It will fail, however, with any table that is larger and has seen some modifications.

New Techniques for Removing Duplicate Rows in SQL Server 2005

SQL Server 2005 has introduced the row_number() function, which provides an alternative means of identifying duplicates. Rewriting the first example, for tables with no primary key, we can now assign a row number to each row in a duplicate group, with a command such as:

DECLARE  @duplicateTable4 TABLE (data VARCHAR(20))

INSERT INTO @duplicateTable4 VALUES ('not duplicate row')

INSERT INTO @duplicateTable4 VALUES ('duplicate row')

INSERT INTO @duplicateTable4 VALUES ('duplicate row')

INSERT INTO @duplicateTable4 VALUES ('second duplicate row')

INSERT INTO @duplicateTable4 VALUES ('second duplicate row')

 

SELECT  data

      , row_number() OVER ( PARTITION BY data ORDER BY data ) AS nr

FROM    @duplicateTable4

The result will show:

data                 nr

-------------------- --------------------

duplicate row        1

duplicate row        2

not duplicate row    1

second duplicate row 1

second duplicate row 2

In the above example, we specify an ordering and partitioning for the row_number() function. Note that the row_number() is a ranking window function, therefore the ORDER BY and the PARTITION BY in the OVER clause are used only to determine the value for the nr column, and they do not affect the row order of the query. Also, while the above is similar to our previous GROUP BY clause, there is a big difference concerning the returned rows. With GROUP BY you must use an aggregate on the columns that are not listed after the GROUP BY. With the OVER clause there is no such restriction, and you can get access to the individual rows in the groups specified by the PARTITION BY clause. This gives us access to the individual duplicate rows, so we can get not only the number of occurrences, but also a sequence number for the individual duplicates. To filter out the duplicate rows only, we could just put the above query into a CTE or a subquery. The CTE approach is as follows:

DECLARE  @duplicateTable4 TABLE (data VARCHAR(20))

INSERT INTO @duplicateTable4 VALUES ('not duplicate row')

INSERT INTO @duplicateTable4 VALUES ('duplicate row')

INSERT INTO @duplicateTable4 VALUES ('duplicate row')

INSERT INTO @duplicateTable4 VALUES ('second duplicate row')

INSERT INTO @duplicateTable4 VALUES ('second duplicate row')

;

WITH    numbered

          AS ( SELECT   data

                      , row_number() OVER ( PARTITION BY data ORDER BY data ) AS nr

               FROM     @duplicateTable4

             )

    SELECT  data

    FROM    numbered

    WHERE   nr > 1

This is not really any different from what we could do on SQL Server 2000.  However, here comes an absolutely amazing feature in SQL Server 2005 and later: We can refer to, and identify, a duplicate row based on the row_number() column and then, with the above CTE expression, we can use a DELETE statement instead of a SELECT, and directly remove the duplicate entries from our table.

We can demonstrate this technique with the following example:

DECLARE  @duplicateTable4 TABLE (data VARCHAR(20))

INSERT INTO @duplicateTable4 VALUES ('not duplicate row')

INSERT INTO @duplicateTable4 VALUES ('duplicate row')

INSERT INTO @duplicateTable4 VALUES ('duplicate row')

INSERT INTO @duplicateTable4 VALUES ('second duplicate row')

INSERT INTO @duplicateTable4 VALUES ('second duplicate row')

;

WITH    numbered

          AS ( SELECT   data

                      , row_number() OVER ( PARTITION BY data ORDER BY data ) AS nr

               FROM     @duplicateTable4

             )

    DELETE  FROM numbered

    WHERE   nr > 1

This solution will even work with large objects, if you stick to the new large object types introduced in SQL Server 2005: i.e. use VARCHAR(MAX) instead of TEXT, NVARCHAR(MAX) instead of NTEXT, and VARBINARY(MAX) instead of IMAGE. These new types are comparable to the deprecated TEXT, NTEXT and IMAGE, and they have the advantage that you will be able to use them with both DISTINCT and row_number().

 I find this last solution, using CTE, ROW_NUMBER() and DELETE, fascinating. Partly because now we can identify rows in a table when there is no other alternative way of doing it, and partly because it is a solution to a problem that should not, in theory, exist at all since production tables will have a unique constraint or a primary key to prevent duplicates getting into the table in the first place.

András Belokosztolszki

Author profile:

András Belokosztolszki is a software architect at Red Gate Software Ltd. He is a frequent speaker at many UK user groups and events (VBUG, NxtGen, Developer’s Group, SQLBits). He is primarily interested in database internals and database change management. At Red Gate he has designed and led the development of many database tools that compare database schemata and enable source control for databases (SQL Compare versions 4 to 7), refactor databases (SQL Refactor) and show the history of databases by analyzing the transaction log (SQL Log Rescue). András has a PhD from Cambridge and an MSc and BSc from ELTE, Hungary. He is also a MCSD and MCPD Enterprise. See my blogs on simple-talk.

Search for other articles by András Belokosztolszki

Rate this article:   Avg rating: from a total of 300 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Excellent, Andras.
Posted by: Rodney (view profile)
Posted on: Wednesday, February 11, 2009 at 2:15 PM
Message: Yet another Simple-Talk quality article that I am forwarding to my DBA team!

Subject: Nice!
Posted by: Anonymous (not signed in)
Posted on: Wednesday, February 11, 2009 at 10:26 PM
Message: This is going to my scripts repository.

Subject: Nice techniques !
Posted by: randyvol (view profile)
Posted on: Tuesday, February 17, 2009 at 10:15 AM
Message: I'll have to give these a try.

With the exception of FK references, the approach I have taken is to use the group by/having count(*)>1 query to insert the rows into a table (usually the name of the table I want to cleanse with '_DUPS' added to the end of the table).
I also add a row_id column to the table I'm creating with an identity of (1,1).
Finally I use an Order by on the grouped by columns.

The result is all the duplicated rows added to the _DUPS table and nicely ordered with a sequential row_id.

Next I delete the duplicate rows from the original table.

Then I insert a single row from each duplicated rows set from the _DUPS table using the min(row_id) as the means to pick one copy of each set of duplicated rows to insert back into the original table.

Has served me well over the years.

Subject: one way dups can happen in production systems...
Posted by: randyvol (view profile)
Posted on: Tuesday, February 17, 2009 at 10:26 AM
Message: By the by -

I can provide you an example of a 'real-world' problem that has hounded us for over a year on our commercial ERP package.

As will sometimes occur with commercial packages, this one was modified by the vendor.

Somewhere in their code (either the original or the modified source, we're still waiting a resolution on where) is a bug that allows duplicate customer records to get created.

The way this happens is the code uses system generated surrogate keys to "guarantee" distinctness. But if one loses sight of what is supposed to be distinct (i.e., the natural key, a customer number) and one doesn't test to be sure the natural key is not being dup'd, then the system will happily and serenely keep duplicating, a customer record an infinite number of times, each time with a distinct surrogate key - which is not anything useful by itself.


Subject: nice explanation
Posted by: hugosheb (view profile)
Posted on: Tuesday, February 17, 2009 at 10:42 PM
Message: thank you, now I get the row_number usage much better!

Subject: Another Cursor Method using TOP
Posted by: SQLDeveloper (view profile)
Posted on: Wednesday, February 18, 2009 at 6:21 AM
Message: Hello,
I liked the methods you have demonstrated here in this article.
I liked the latest one a lot.

I can suggest one more method with cursors using TOP key that I'm using and I prefer to use against such a case. You can consider this method too, check it at http://www.kodyaz.com/articles/delete-duplicate-records-rows-in-a-table.aspx



Subject: Re: Randyvol and SQLDeveloper
Posted by: András (view profile)
Posted on: Wednesday, February 18, 2009 at 6:40 AM
Message: Hi Randyvol and SQLDeveloper,
thank you for sharing your approaches.
- Andras

Subject: Great
Posted by: Anonymous (not signed in)
Posted on: Wednesday, February 18, 2009 at 7:27 AM
Message: It's Great Andras.
Really very useful.

Thank you,
Ashok

Subject: Removing Duplicate Rows in SQL Server
Posted by: Cyrille L (not signed in)
Posted on: Wednesday, February 18, 2009 at 7:36 AM
Message: Hey Andras,

This is the most extensive coverage yet of dealing with duplicate data. You made a great effort to cover all possible scenarios and the article came at a time when I need to dedup some tables in my company's production databases. Thanks for saving my day. Thank you simple-talk.

Subject: Removing Duplicates
Posted by: James in the Midwest (not signed in)
Posted on: Wednesday, February 18, 2009 at 9:58 AM
Message: Good job...saved it in my SQL script reference folder

Subject: Awesome and very informative :)
Posted by: Anil Mahadev (not signed in)
Posted on: Wednesday, February 18, 2009 at 11:12 AM
Message: Great Article! :)

Subject: remove duplicate data
Posted by: Anonymous (not signed in)
Posted on: Wednesday, February 18, 2009 at 1:47 PM
Message: Fantastic. Thank you. I learn new technique to remove dup data.

Subject: Wish I read this last week
Posted by: monsterjta (view profile)
Posted on: Wednesday, February 18, 2009 at 11:46 PM
Message: Great post. Very complete. I was just doing this for a customer last week. Wish I would had this article handy then!

Subject: Dupe data happens
Posted by: Jeff Moden (view profile)
Posted on: Friday, February 20, 2009 at 8:58 PM
Message: One place where duplicate data happens all the time is in telephony... most companies just can't seem to generate unique data. Even when they can, there are those folks who hit redial several times in the same minute because they're desparate to get through but keep getting a recording. As soon as they hear the recording, they hit redial to try again. Some places have laws or rules that prohibit a customer from being billed more than once each minute no matter how many times they hit redial.

Of course, the whole idea is to delete all the duplicates in the staging table before you move the data to it's final resting place. Even the move has a test for existance so you don't end up putting dupe data into the final table.

Subject: nice
Posted by: Rahul Bhatt (not signed in)
Posted on: Sunday, February 22, 2009 at 11:43 PM
Message: Great Article! :)

Subject: Wonder ful
Posted by: Mega (not signed in)
Posted on: Wednesday, February 25, 2009 at 3:40 AM
Message: Great article! :)

Subject: Duplicate Rows
Posted by: amitb (view profile)
Posted on: Tuesday, August 04, 2009 at 8:38 AM
Message: Hey Excellent article.

Regarding Deleting dups from table having pk as - identity(int,1,1)
wont this query work?

DELETE FROM #table
WHERE id Not IN (SELECT MAX(id)
FROM #table
GROUP BY data
)




Subject: Duplicate Rows
Posted by: NullPointer (view profile)
Posted on: Thursday, August 20, 2009 at 8:24 AM
Message: @amitb - yes it will work in a case where you have a PK with identity 1,1 - in that case, you already have a clear identifier that you can use to identify each row uniquely.

The problem arises when you don't - for example when the supposed "unique" key is the customer account number and you discover that there are now somehow two 'ACC01' entries that are 100% identical across *all* fields - that's when you need to use row_number() to assign what is essentially a virtual identity() column to the data so you can weed out the duplicates.

Subject: Simple mathematical trick
Posted by: krishnasrikanth (view profile)
Posted on: Tuesday, August 25, 2009 at 1:06 AM
Message: Supose you have table which u forgot to add primary key on a column, and all rows are appearing twice. This trick might work. Especially with large data, around 100,000 records and u have 200,000 records in table.


select columns into temptable from maintable order by pk_forgotten_column.


add column, ID bigint identity(1,1) to the temptable


Now every second row is a duplicate one. remove them


delete from temptable where ID%2=0


now the temptable is the clean one, rename it to the main one. Simple.

Subject: NOT EXISTS method
Posted by: msalim (view profile)
Posted on: Thursday, April 29, 2010 at 12:16 PM
Message: <code style='font-size: 12px;'><span style='color:blue'>DELETE</span><br/>
</code>

Subject: NOT EXISTS
Posted by: msalim (view profile)
Posted on: Thursday, April 29, 2010 at 12:17 PM
Message: [code]SELECT[/code]

Subject: NOT EXISTS
Posted by: msalim (view profile)
Posted on: Thursday, April 29, 2010 at 12:17 PM
Message: DELETE FROM t
FROM @table t
WHERE NOT EXISTS
(SELECT *
FROM @table
GROUP BY data
HAVING data = t.data
AND MAX(id) = t.id)

Subject: Very Helpful Article
Posted by: capetownnatural (view profile)
Posted on: Tuesday, May 11, 2010 at 5:32 AM
Message: Thanks András, a very helpful article!

Subject: Remove Duplicates From Table!!!
Posted by: Nirav Parikh (view profile)
Posted on: Tuesday, December 06, 2011 at 9:46 AM
Message: With duplicates
As
(Select *, ROW_NUMBER() Over (PARTITION by [Col_Name,...n] Order by [Col_Name,...n]) as Duplicate From Table_Name)
delete From duplicates
Where Duplicate > 1 ;

Subject: Remove duplicate records in a smart way
Posted by: Doron (view profile)
Posted on: Monday, January 09, 2012 at 8:06 PM
Message: Hi There,

Great examples you have there and here is another article on that subject as well.

http://www.dfarber.com/computer-consulting-blog/2011/12/26/remove-duplicate-records-in-sql.aspx

Regards,

Doron

Subject: Thank you
Posted by: RoniVered (view profile)
Posted on: Tuesday, July 10, 2012 at 6:05 AM
Message: Very good article.
I especially liked the last example with the CTE, ROW_NUMBER() and DELETE query.

Thanks for teaching me something.
Regards,
Roni.

Subject: Special Case with Grouping
Posted by: Izaychik (view profile)
Posted on: Friday, March 01, 2013 at 1:25 PM
Message: I need to count unique transactions during the month
by client. Here how I do it:

Select a.[Client ID], a.[Transaction Code], a.[Transaction Date] from DB a
where a.[Transaction Code] in (..)
and not exists
(Select * from DB b
Where b.[Client Id] = a.[Client ID] and b.[Transaction Code] =a.[Transaction Code]
-- This line filter a maximum transaction date
and b.[Transaction Date] > a.[Transaction Date]
-- Next two lines group maximum values by year months
and Year(b.[Transaction Date] = Year(a.[Transaction Date])
and Month([b.[Transaction Date] = Month(a.[Transaction Date])
)

what do you think?

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

Searching for Strings in SQL Server Databases
 Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Highway to Database Recovery
 Discover the best backup and recovery articles on Simple-Talk, all in one place. Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.