Click here to monitor SSC
  • Av rating:
  • Total votes: 42
  • Total comments: 9
Phil Factor

On Comparing Tables in SQL Server

29 May 2014

How do you compare two SQL tables?  Every SQL Developer or DBA knows the answer, which is 'it depends'. It is not just the size of the table or the type of data in it but what you want to achieve. Phil sets about to cover the basics and point out some snags and advantages to the various techniques.

Introduction

There are several reasons why you might need to compare tables or results.

  •  Sometimes, one just needs to know whether the tables contain data that is the same or different;  No details: just yes or no.  This is typical with test assertions, where you just need to know whether your routine or batch produces a result with the right data in it. when provided with particular vales for the parameters. It is either wrong or right
  • Occasionally, you need to know what rows have changed without, maybe, being particular about which columns changed and how.
  • There are times when you have a large table in terms of both columns and rows, and you need something that shows you specifically the column(s) that changed their value. You might also want this when tracking down a bug in a routine that might otherwise require you to wasting time scanning ‘by eye’.

We’ll be tackling these three rather different tasks in SQL

If two tables have a different number of rows, they can’t of course be the same. However, there are times when you need to know whether Table_B contains all the rows of Table_A, without differences. If  you wish more detail, you might even wish to know the rows in either table that aren’t in common, or the common rows, as indicated by the primary key,  that were different.  Why stick at comparing just two tables? There are ways of comparing as many as you need. (as, for example, when you’re comparing the metadata in several database snapshots). Yes, there are many variations

You’ve got tools and features to do this stuff, surely?

There is always a place for tools like SQL Data Compare,  TableDiff, tSQLt  or  Change Data Capture. A lot depends on circumstances and the type of task.  The problem of doing audits on changes to data in a live system  is a separate topic, as is the synchronization of tables and databases.  Comparison of XML documents are also out of scope. We are going to deal purely with the routine comparison of the data in tables

I’m most likely to use TSQL techniques to compare tables when:

Developing…

In the course of developing a database, a lot of tables get compared. It’s not just the big stuff: Every table-valued function, for example, needs a test harness in the build script that makes sure it does what you think it should do  under all conceivable test circumstances, and incorporating all the nasty edge cases where it has been caught by the testers in the past.  Every stored procedure needs a test to make sure that the process that it executes does exactly what is intended and nothing else.

There was a time that the build activity was rather leisurely, but when you’ve got a nightly build and integration test, it is best to automate it entirely and be rid of the chore.

ETL

When you are automating the loading of data into a system, you often need to test various conditions. Do you need to update existing versions of the rows as well as inserting the new ones? Do you need a trap to prevent duplicate entries, or even  delete  existing entries?

Setting up the test data.

The scripts in this article all use a table from the venerable PUBS database. We’re going to use the authors table, but will beef up the number of rows a bit to 5000 in order to get a size that is a bit more realistic. I’ve provided the source for the table with the article.

I then created a copy of the table …

SELECT * INTO authorsCopy 
    
FROM authors
GO
ALTER TABLE dbo.authorsCopy 
ADD CONSTRAINT PK_authorsCopy PRIMARY KEY CLUSTERED
    
(au_id)  ON PRIMARY
GO

And then altered some of the rows.

UPDATE authorsCopy SET address=STUFF(address,1,1,'') 
    
WHERE au_ID IN (
    
SELECT TOP 10 au_id 
        
FROM authorsCopy f 
        
ORDER BY phone)

So now the two tables should be predominately the same with a few minor changes in the address field

Testing to see if tables are different.

Sometimes you just want to know if tables are the same. An example of this would be checking that a TVF is working properly by comparing its result to that of an existing table with the correct results. The usual way to do this is with the CHECKSUM()group of functions in SQL Server, because they are very quick.

Using Checksums

You can use the BINARY_CHECKSUM function to check whether tables are the same: well, roughly the same. It is fast, but it is not perfect, as I'll demonstrate in a moment. If you have a series of tests, for example it is generally sufficient.

IF (

    SELECT   CHECKSUM_AGG(BINARY_CHECKSUM(*))

        FROM   authors)=(

    SELECT   CHECKSUM_AGG(BINARY_CHECKSUM(*))

        FROM   authorsCopy)

SELECT 'they are probably the same'

ELSE

SELECT 'they are different'

 

For this to work, your table must not have TEXT, NTEXT, IMAGE or CURSOR (or a SQL_VARIANT with any of these types) as its base type. Nowadays, this is increasingly rare, but If you have any sort of complication, you can coerce any column with one of the unsupported types into a supported type. In practice, I generally use a routine that checks the metadata and does this automatically, but it isn’t pretty.

In a working version you would probably want to specify the list of columns, especially if you are having to do an explicit coercion of datatypes, or if you are checking  just certain columns,

Neither BINARY_CHECKSUM() nor its plain sister CHECKSUM() are completely accurate in telling you if something has changed in a row or table. We’ll show this by looking at the common words of the English language, contained in a table called CommonWords.. You’d expect them all to have a different checksum, but that’s not the case.

SELECT string, BINARY_CHECKSUM(string) AS "Checksum"

    FROM commonWords

    WHERE BINARY_CHECKSUM(string) IN

    (

    SELECT BINARY_CHECKSUM(string)

        FROM commonwords

        GROUP BY BINARY_CHECKSUM(string)

        HAVING COUNT(*) > 2)

    ORDER BY BINARY_CHECKSUM(string)

... giving the result ...

 

string                         Checksum

------------------------------ -----------

nerd                           426564

nest                           426564

oust                           426564

reed                           475956

stud                           475956

sued                           475956

ousts                          6825011

nests                          6825011

nerds                          6825011

Armed with this information, we can quickly demonstrate that different strings can have the same checksum

SELECT BINARY_CHECKSUM('reed the nerd'),

       BINARY_CHECKSUM('sued the nest'),

       BINARY_CHECKSUM('stud the oust') 

All these will; have the same checksum, as would ...

SELECT

     BINARY_CHECKSUM('accosted guards'),

     BINARY_CHECKSUM('accorded feasts')

....whereas...

SELECT BINARY_CHECKSUM('This looks very much like the next'),

       BINARY_CHECKSUM('this looks very much like the next'),

       BINARY_CHECKSUM('This looks very much like the Next') 

… gives you different checksums like this…

----------- ----------- -----------

-447523377  -447522865  -447654449

The sister function CHECKSUM()

 

SELECT CHECKSUM('This looks very much like the next'),

       CHECKSUM('this looks very much like the next'),

       CHECKSUM('This looks very much like the Next') 

… finds them to be all the same, because it is using the current collation and my collation for the database is case-insensitive. CHECKSUM() aims to find strings equal in checksum if they are equal in a string comparison.

----------- ----------- -----------

-943581052  -943581052  -943581052

So, the best you can say is that there is a strong likelihood that the tables will be the same but if you need to be absolutely certain, then use another algorithm.

If you don’t mind difference in case in text strings, then you can use CHECKSUM() instead of BINARY_CHECKSUM()

The great value of this technique is that, once you’ve calculated the checksum that you need, you can store it as a value in the column of  a table instead of needing the original table and therefore you can make the whole process even faster, and take less time. If you are storing the checksum value returned by CHECKSUM() make sure you  check against the live table with a checksum generated with the same collation.

Here is a simple example of a ‘what’s changed’ routine.

--we'll create a 'checksum' table 'on the fly' using a SELECT INTO.

SELECT

      au_ID,

      BINARY_CHECKSUM(au_id, au_lname, au_fname, phone, [address], city, [state], zip, [contract]) AS [checksum]

      INTO auchk

  FROM authorscopy

  ORDER BY au_ID

      /* now we'll put in a constraint just to check that we haven't won the lottery (very unlikely but not completely impossible that we have two rows with the same checksum) */

ALTER TABLE AuChk ADD CONSTRAINT IsItUnique UNIQUE ([checksum])

UPDATE authorscopy SET au_fname='Arthur'

  WHERE au_ID='327-89-2366'

SELECT authorscopy.*

  FROM authorscopy

      INNER JOIN AuChk ON authorscopy.au_ID=AuChk.au_ID

  WHERE [checksum]<>BINARY_CHECKSUM(authorscopy.au_id, au_lname, au_fname, phone, [address], city, [state], zip, [contract])

...which gives...

au_id       au_lname  au_fname  phone        address         city          state zip   contract

----------- --------- --------- ------------ --------------- ------------- ----- ----- --------

327-89-2366 Mendoza   Arthur    529275-5757  15 Hague Blvd.  Little Rock   DE    98949 1

And then we just tidy up.

 /* and we just pop it back to what it was, as part of the teardown */

UPDATE authorscopy SET au_fname='Arnold'

  WHERE au_ID='327-89-2366'

Of course, you could use a trigger but sometimes you might want just a daily or weekly report of changes without the intrusion of a trigger into a table.

Using XML

One general possibility is to compare the XML version of the two tables, since this does the datatype translation into strings for you. It is slower than the Checksum approach but more reliable.

IF CONVERT(VARCHAR(MAX),(

    SELECT *

      FROM authors ORDER BY au_id FOR XML path, root))

  =

      CONVERT(VARCHAR(MAX),(

    SELECT *

      FROM authorscopy ORDER BY au_id FOR XML path, root))

SELECT 'they are  the same'

ELSE

SELECT 'they are different'

Here, you can specify the type of comparison by specifying the collation.

or you can do this, comparing data in tables ..

IF BINARY_CHECKSUM(CONVERT(VARCHAR(MAX),(

      SELECT *

        FROM authors ORDER BY au_id FOR XML path, root)))

    =

        BINARY_CHECKSUM (CONVERT(VARCHAR(MAX),(

      SELECT *

        FROM authorscopy ORDER BY au_id FOR XML path, root)))

  SELECT 'they are pretty much the same'

ELSE

SELECT 'they are different'  SELECT 'they are different'

… by calculating a checksum of the XML version of the table. This allows you to store the checksum of the table you are comparing to.

Finding where the differences are in a table

The simplest task is where the tables have an identical number of rows, and an identical table structure. Sometimes you want to know which rows are different, and which are missing.  You have, of course, to specify what you mean by ‘the same’, particularly if the two tables have different columns. The method you choose to do the comparison is generally determined by these details.

The UNION ALL ... GROUP BY technique

The classic approach to comparing tables is to use a  UNION ALL for the SELECT statements that include the columns you want to compare, and then GROUP BY those columns. Obviously, for this to work, there must be a column with unique values in the GROUP BY, and the primary key is ideal for this. Neither table are allowed duplicates. If they have different numbers of rows, these will show up as differences.

SELECT DISTINCT au_ID 
  FROM
  (
  SELECT au_ID 
    FROM
    (
    SELECT au_id, au_lname, au_fname, phone, address, city, state, zip, contract 
      FROM authors
      UNION ALL
    SELECT au_id, au_lname, au_fname, phone, address, city, state, zip, contract 
      FROM authorsCopy) BothOfEm
    GROUP BY au_id, au_lname, au_fname, phone, address, city, state, zip, contract
    HAVING COUNT(*)<2) f

If one of the tables has a duplicate, then it will give you a false result, as here, where you have two tables that are very different  and the result tells you that they are the same! For this reason, it is a good idea to include the column(s) that  constitute the primary key, and only include the rows  once!

SELECT COUNT(*), Address_ID,TheAddress,ThePostCode

  FROM

  (

  SELECT Address_ID,TheAddress,ThePostCode

    FROM

    (

      VALUES

      (9, '929 Augustine lane,  Staple Hill Ward  South Gloucestershire  UK','BS16 4LL'),

      (10, '45 Bradfield road, Parwich  Derbyshire  UK','DE6 1QN')

    ) TableA(Address_ID,TheAddress,ThePostCode)

    UNION ALL

  SELECT Address_ID,TheAddress,ThePostCode

    FROM

    (

      VALUES

      (8, '''The Pippins'', 20 Gloucester Pl, Chirton Ward,  Tyne &amp; Wear  UK','NE29 7AD'),

      (8, '''The Pippins'', 20 Gloucester Pl, Chirton Ward,  Tyne &amp; Wear  UK','NE29 7AD'),

      (9, '929 Augustine lane,  Staple Hill Ward  South Gloucestershire  UK','BS16 4LL'),

      (10, '45 Bradfield road, Parwich  Derbyshire  UK','DE6 1QN')

    ) TableB(Address_ID,TheAddress,ThePostCode)

  )f

  GROUP BY Address_ID,TheAddress,ThePostCode

  HAVING COUNT(*)<2

... giving ...

TheCount    Address_ID  TheAddress                ThePostCode

----------- ----------- ------------------------- ------------

 

(0 row(s) affected)

The technique can be used for comparing more than two tables.  You’d just need to UNION ALL the tables you need to compare and change the HAVING clause to filter just the rows that aren’t in all the tables.

Using EXCEPT

You can now use the much cleaner and slightly faster EXCEPT.

SELECT *  from authors

EXCEPT

SELECT * from authorsCopy

This shows all the rows in authors that are not found  in authorsCopy.  If they are the same, it would return no rows

au_id       au_lname    au_fname  phone        address                    city        state zip   contract

----------- ----------- --------- ------------ -------------------------- ----------- ----- ----- --------

041-76-1076 Sosa        Sonja     000-198-8753 29 Second Avenue           Omaha       CT    23243 0

187-42-2491 Mc Connell  Trenton   0003090766   279 Hague Way              San Diego   NY    94940 1

220-43-7067 Fox         Judith    000-137-9418 269 East Hague Street      Richmond    VA    55027 0

505-28-2848 Hardy       Mitchell  001-2479822  73 Green Milton Drive      Norfolk     WA    69949 1

697-84-0401 Montes      Leanne    000-018-0454 441 East Oak Parkway       San Antonio MD    38169 1

727-35-9948 Long        Jonathon  000-8761152  280 Nobel Avenue           Anchorage   LA    NULL  1

875-54-8676 Stone       Keisha    000-107-1947 763 White Fabien Way       Fremont     ND    08520 0

884-64-5876 Keller      Steven    000-2787554  45 White Nobel Boulevard   Milwaukee   NY    29108 1

886-75-9197 Ellis       Marie     001032-5109  35 East Second Boulevard   Chicago     IL    32390 1

975-80-3567 Salazar     Johanna   001-028-0716 17 New Boulevard           Jackson     ND    71625 0

 

(10 row(s) affected)

I’m only using SELECT * to keep things simple for the article. You’d normally itemize all the columns you want to compare.

This will only work for tables with the same number of rows because, if authors had extra rows, it would still say that they were different since the rows in Authors that weren’t in authorsCopy would be returned. This is because EXCEPT returns any distinct values from the query to the left of the EXCEPT operand that are not also found from the query on the right

This, hopefully shows what I mean

SELECT Address_ID,TheAddress,ThePostCode

    FROM

       (VALUES

          (9, '929 Augustine lane, Staple Hill Ward     South Gloucestershire      UK','BS16 4LL'),

          (10, '45 Bradfield road, Parwich      Derbyshire    UK','DE6 1QN')

       ) TableA(Address_ID,TheAddress,ThePostCode)

EXCEPT

   SELECT Address_ID,TheAddress,ThePostCode from

       (VALUES

          (8, '''The Pippins'', 20 Gloucester Pl, Chirton Ward,      Tyne & Wear   UK','NE29 7AD'),

          (8, '''The Pippins'', 20 Gloucester Pl, Chirton Ward,      Tyne & Wear   UK','NE29 7AD'),

          (9, '929 Augustine lane, Staple Hill Ward     South Gloucestershire      UK','BS16 4LL'),

          (10, '45 Bradfield road, Parwich      Derbyshire    UK','DE6 1QN')

       ) TableB(Address_ID,TheAddress,ThePostCode)

...yields ...

 Address_ID  TheAddress                                  ThePostCode

----------- ---------------------------------------------- -----------

 

(0 row(s) affected)

...whereas ...

  SELECT Address_ID,TheAddress,ThePostCode FROM

       (VALUES

          (8, '''The Pippins'', 20 Gloucester Pl, Chirton Ward,      Tyne & Wear   UK','NE29 7AD'),

          (8, '''The Pippins'', 20 Gloucester Pl, Chirton Ward,      Tyne & Wear   UK','NE29 7AD'),

          (9, '929 Augustine lane, Staple Hill Ward     South Gloucestershire      UK','BS16 4LL'),

          (10, '45 Bradfield road, Parwich      Derbyshire    UK','DE6 1QN')

       ) TableB(Address_ID,TheAddress,ThePostCode)     

EXCEPT

     SELECT Address_ID,TheAddress,ThePostCode

    FROM

       (VALUES

          (9, '929 Augustine lane, Staple Hill Ward     South Gloucestershire      UK','BS16 4LL'),

          (10, '45 Bradfield road, Parwich      Derbyshire    UK','DE6 1QN')

       ) TableA(Address_ID,TheAddress,ThePostCode)

..results in ...

Address_ID  TheAddress                                                    ThePostCode

----------- ------------------------------------------------------------- -----------

8           'The Pippins', 20 Gloucester Pl, Chirton Ward, Tyne & Wear UK NE29 7AD

 

(1 row(s) affected)

This feature of EXCEPT could be used to advantage if you particularly wish to check that TableA is contained within TableB. So where the tables have a different number of rows you can still compare them.

You might not want to compare all columns.  You should always specify those columns you wish to compare to determine ‘sameness’. If you only wanted to compare the Address for example, you’d use …

SELECT address  FROM authors

EXCEPT

SELECT address FROM authorsCopy

The Outer Join technique

There is also the technique of the outer join.  This is a more general technique that give you additional facilities. If, for example, you use the full outer join then  you can get the unmatched rows in either table. This gives you a ‘before’ and ‘after’ view of alterations in the data.  It is used more generally in synchronisation to tell you what rows to delete, insert and update.

We’ll just use the technique to get the altered rows in authorsCopy

SELECT  authors.au_id, authors.au_lname, authors.au_fname, authors.phone, authors.address, authors.city, authors.state, authors.zip, authors.contract

  FROM authors

      LEFT OUTER JOIN authorsCopy

      ON authors.au_ID = AuthorsCopy.au_ID

    AND authors.au_lname =authorsCopy.au_lname

    AND  authors.au_fname =authorsCopy.au_fname

    AND  authors.phone    =authorsCopy.phone

    AND  COALESCE(authors.address,'')=COALESCE(authorsCopy.address,'')

    AND  COALESCE(authors.city,'')   =COALESCE(authorsCopy.city,'')

    AND  COALESCE(authors.state,'')  =COALESCE(authorsCopy.state,'')

    AND  COALESCE(authors.zip,'')    =COALESCE(authorsCopy.zip,'')

    AND  authors.contract =authorsCopy.contract

  WHERE authorsCopy.au_ID IS NULL

As you can see, there are difficulties with null columns with this approach, but it is as fast as the others and it gives you rather more versatility for your comparisons.

Locating the differences between tables

You may need a quick way of seeing what column and row has changed. A very ingenious way of doing this was published recently. It used XML. ‘Compare Tables And Report The Differences By Using Xml To Pivot The Data’. It is clever, but too slow. The same thing can be done purely in SQL.  Basically, you perform a column by column comparison of data based on the primary key, using a key/value pair. If you do the entire table at once it is rather slow: The best trick is to do this only on those rows where you know there is a difference.

 

DECLARE  @temp TABLE(au_id VARCHAR(11) PRIMARY KEY) /*this holds the primary keys of rows that have changed */

INSERT INTO @Temp(au_ID) --determine which rows have changed

  SELECT au_ID

  FROM --use the EXCEPT technique qhich is the quickest in our tests

  (

  SELECT au_id, au_lname, au_fname, phone, [address], city, state, zip, [contract]

    FROM authors

        EXCEPT

  SELECT au_id, au_lname, au_fname, phone, address, city, state, zip, contract

    FROM authorsCopy

  )f--now we just SELECT those columns that have changed

SELECT lefthand.au_id,lefthand.name,lefthand.value AS original,Righthand.value AS changed

  FROM (--now we just lay out the two tables as key value pairs, using the string versions of the data

  SELECT authors.au_id, 'au_lname' AS 'name',au_lname AS 'value'

    FROM authors INNER JOIN @Temp altered ON altered.au_id=authors.au_id

    UNION

  SELECT authors.au_id, 'au_fname' AS 'name',au_fname AS 'value'

    FROM authors INNER JOIN @Temp altered ON altered.au_id=authors.au_id

    UNION

  SELECT authors.au_id, 'phone',phone

    FROM authors INNER JOIN @Temp altered ON altered.au_id=authors.au_id

    UNION

  SELECT authors.au_id, 'address',address

    FROM authors INNER JOIN @Temp altered ON altered.au_id=authors.au_id

    UNION

  SELECT authors.au_id, 'City' AS 'name',City AS 'value'

    FROM authors INNER JOIN @Temp altered ON altered.au_id=authors.au_id

    UNION

  SELECT authors.au_id, 'State',state

    FROM authors INNER JOIN @Temp altered ON altered.au_id=authors.au_id

    UNION

  SELECT authors.au_id, 'zip',zip

    FROM authors INNER JOIN @Temp altered ON altered.au_id=authors.au_id

    UNION

  SELECT authors.au_id, 'contract',CONVERT(CHAR(1),contract)

    FROM authors INNER JOIN @Temp altered ON altered.au_id=authors.au_id) LeftHand

      INNER JOIN (

  SELECT authorsCopy.au_id, 'au_lname' AS 'name',au_lname AS 'value'

    FROM authorsCopy INNER JOIN @Temp altered ON altered.au_id=authorsCopy.au_id

    UNION

  SELECT authorsCopy.au_id, 'au_fname',au_fname

    FROM authorsCopy INNER JOIN @Temp altered ON altered.au_id=authorsCopy.au_id

    UNION

  SELECT authorsCopy.au_id, 'phone',phone

    FROM authorsCopy INNER JOIN @Temp altered ON altered.au_id=authorsCopy.au_id

    UNION

  SELECT authorsCopy.au_id, 'address',address

    FROM authorsCopy INNER JOIN @Temp altered ON altered.au_id=authorsCopy.au_id

    UNION

  SELECT authorsCopy.au_id, 'City' AS 'name',City AS 'value'

    FROM authorsCopy INNER JOIN @Temp altered ON altered.au_id=authorsCopy.au_id

    UNION

  SELECT authorsCopy.au_id, 'State',state

    FROM authorsCopy INNER JOIN @Temp altered ON altered.au_id=authorsCopy.au_id

    UNION

  SELECT authorsCopy.au_id, 'zip',zip

    FROM authorsCopy INNER JOIN @Temp altered ON altered.au_id=authorsCopy.au_id

    UNION

  SELECT authorsCopy.au_id, 'contract',CONVERT(CHAR(1),contract)

    FROM authorsCopy INNER JOIN @Temp altered ON altered.au_id=authorsCopy.au_id) rightHand

      ON lefthand.au_ID=righthand.au_ID

    AND lefthand.name=righthand.name

  WHERE lefthand.value<>righthand.value

in our example, this would give:

au_id       name     original                      changed

----------- -------- ---------------------------- ------------------------------------

041-76-1076 address  29 Second Avenue              9 Second Avenue

187-42-2491 address  279 Hague Way                 79 Hague Way

220-43-7067 address  269 East Hague Street         69 East Hague Street

505-28-2848 address  73 Green Milton Drive         3 Green Milton Drive

697-84-0401 address  441 East Oak Parkway          41 East Oak Parkway

727-35-9948 address  280 Nobel Avenue              80 Nobel Avenue

875-54-8676 address  763 White Fabien Way          63 White Fabien Way

884-64-5876 address  45 White Nobel Boulevard      5 White Nobel Boulevard

886-75-9197 address  35 East Second Boulevard      5 East Second Boulevard

975-80-3567 address  17 New Boulevard              7 New Boulevard

This technique rotates the rows of the tables that have differences into an Entity–attribute–value (EAV) table so that differences within a row can be compared and displayed.  It does this rotation by UNIONing the name and string-value of each column.  This technique works best where there are not a large number of differences.

Conclusions

There is no single ideal method of comparing the data in tables or results. One of a number of techniques will be the most relevant  for any particular task. It is all down to precisely the answers you need and the type of task. Do you need a quick check that a table hasn’t changed, or do you need to know precisely what the changes are?  SQL is naturally fast at doing this task and comparisons of tables and results is a familiar task to many database developers.

If there is a general rule, I’d say that  exploratory or ad-hoc work  needs a tool such as SQL Data Compare, whereas  a routine process within the database  requires a hand-cut SQL technique.

The source to the table, and the insert-statements to fill it to 5000 rows is in the link at the top of the article, or you can click here to get it.

Phil Factor

Author profile:

Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications. Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career. See also :

Google + To translate this article...

Search for other articles by Phil Factor

Rate this article:   Avg rating: from a total of 42 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Using intersection and cardinality
Posted by: Anonymous (not signed in)
Posted on: Tuesday, June 10, 2014 at 6:59 AM
Message: If the tables are true sets (i.e. not multisets or bags. uniqueness enforced by PK or unique index), they are equal if |A| = |B| = |A ∩ B|.

That is:

IF (SELECT COUNT(*) FROM A) = (SELECT COUNT(*) FROM B)
AND (SELECT COUNT(*) FROM (SELECT * FROM A INTERSECT SELECT * FROM B)

THEN ... -- tables are equal

Subject: oops -- hit enter to soon!
Posted by: Anonymous (not signed in)
Posted on: Tuesday, June 10, 2014 at 7:00 AM
Message: F (SELECT COUNT(*) FROM A) = (SELECT COUNT(*) FROM B)
AND (SELECT COUNT(*) FROM (SELECT * FROM A INTERSECT SELECT * FROM B) = (SELECT COUNT(*) FROM A)

THEN ...

Subject: Comparing tables
Posted by: Anonymous (not signed in)
Posted on: Tuesday, June 10, 2014 at 7:39 AM
Message: I have used a different technique for years that seems to perform ok and is easy to code, customize and extend. I have tables in development, acceptance and production. I want to show the rows where anything is different between the environments. In this case I want to check constraints across databases:

print '-- 1. Show differences in information_schema.CHECK_CONSTRAINTS across databases'
select 1 '1', sum(src) as srcnum,
case sum(src) when 1 then 'D__'
when 2 then '_A_'
when 3 then 'DA_'
when 4 then '__P'
when 5 then 'D_P'
when 6 then '_AP'
when 7 then 'DAP' end as src,
CONSTRAINT_SCHEMA, object_name(b.parent_object_id) table_name,
CONSTRAINT_NAME, CHECK_CLAUSE
from (
select 1 src, CONSTRAINT_SCHEMA, CONSTRAINT_NAME, CHECK_CLAUSE
from EmChemRefDev.information_schema.CHECK_CONSTRAINTS
union
select
2 src, CONSTRAINT_SCHEMA, CONSTRAINT_NAME, CHECK_CLAUSE
from EmChemRefAccDev.information_schema.CHECK_CONSTRAINTS
union
select
4 src, CONSTRAINT_SCHEMA, CONSTRAINT_NAME, CHECK_CLAUSE
from EmChemRefProDev.information_schema.CHECK_CONSTRAINTS
) a
left join sys.objects b
on a.CONSTRAINT_NAME = b.name
group by CONSTRAINT_SCHEMA, object_name(b.parent_object_id),
CONSTRAINT_NAME, CHECK_CLAUSE
having sum(src) < 7
order by CONSTRAINT_SCHEMA, object_name(b.parent_object_id),
CONSTRAINT_NAME, 3;

In my database this results in a case insensitive matching. The internal select statements can be subsets of columns or * for all. The list of columns must be in the top most select and group by. It has the advantage of comparing NULLs as the same.

Subject: Subject: Using intersection and cardinality
Posted by: Phil Factor (view profile)
Posted on: Tuesday, June 10, 2014 at 7:46 AM
Message:
--yours (if I've understood it right
IF (SELECT COUNT(*) FROM Authors) = (SELECT COUNT(*) FROM AuthorsCopy)
AND (
SELECT COUNT(*) FROM Authors)
= (SELECT COUNT(*) FROM
(SELECT * FROM Authors INTERSECT SELECT * FROM AuthorsCopy) f)
SELECT 'They are the same'
ELSE
SELECT
'They are different'

--but I think I'd do this which should work (it is a tiny bit quicker.
IF (SELECT COUNT(*) FROM Authors) = (SELECT COUNT(*) FROM AuthorsCopy)
AND NOT EXISTS (
SELECT * FROM AuthorsCopy EXCEPT SELECT * FROM Authors)
SELECT 'They are the same'
ELSE
SELECT
'They are different'




Subject: The Outer Join technique
Posted by: Steve Long (not signed in)
Posted on: Tuesday, June 10, 2014 at 7:49 AM
Message: As Phil said, a problem with the Outer Join technique is dealing with null values. The query shown will treat null values as identical to empty strings. If you want to preserve null values, then the query needs to be more complicated, with two tests for each field that can have a null value:

SELECT authors.au_id, authors.au_lname, authors.au_fname, authors.phone, authors.address, authors.city, authors.state, authors.zip, authors.contract
FROM authors
LEFT OUTER JOIN authorsCopy
ON authors.au_ID = AuthorsCopy.au_ID
AND authors.au_lname =authorsCopy.au_lname
AND authors.au_fname =authorsCopy.au_fname
AND authors.phone =authorsCopy.phone
AND COALESCE(authors.address,'a')=COALESCE(authorsCopy.address,'a')
AND COALESCE(authors.address,'b')=COALESCE(authorsCopy.address,'b')
AND COALESCE(authors.city,'a') =COALESCE(authorsCopy.city,'a')
AND COALESCE(authors.city,'b') =COALESCE(authorsCopy.city,'b')
AND COALESCE(authors.state,'a') =COALESCE(authorsCopy.state,'a')
AND COALESCE(authors.state,'b') =COALESCE(authorsCopy.state,'b')
AND COALESCE(authors.zip,'a') =COALESCE(authorsCopy.zip,'a')
AND COALESCE(authors.zip,'b') =COALESCE(authorsCopy.zip,'b')
AND authors.contract =authorsCopy.contract
WHERE authorsCopy.au_ID IS NULL

Subject: Thanks!
Posted by: Anonymous (not signed in)
Posted on: Tuesday, June 10, 2014 at 2:47 PM
Message: Just used this information today to compare tables in replicated databases on two servers to find mismatched rows.

Subject: Using XML without ORDER BY
Posted by: Mister Magoo (not signed in)
Posted on: Tuesday, June 10, 2014 at 4:50 PM
Message: First let me say that this is a nice overview of the various techniques, thanks.

As far as using XML to compare, are you sure that method is reliable without an ORDER BY clause ?

I cannot see any SORT operations in an execution plan for SELECT...FOR XML PATH() without an ORDER BY.

Of course, it may use the clustered index on a single thread but that is not guaranteed is it?

Even if it were guaranteed, there is no guarantee that the two tables would have the same CI, so I would definitely include an ORDER BY in that XML example. ED: The article has been updated to correct this. Many thanks for the spot!

Subject: Re: Using XML without ORDER BY
Posted by: Phil Factor (view profile)
Posted on: Wednesday, June 11, 2014 at 2:43 AM
Message: Oops. Yes, it would only have worked if they'd had the same clustered index, and then, as you say, not ultra-reliably. The editor has now fixed the article. Apologies!

Subject: Comparing tables from different databses
Posted by: manasa (view profile)
Posted on: Thursday, April 30, 2015 at 12:53 AM
Message: Hello,
I have a requirement to compare tables from 2 different databases and i have to use sql server scripts (ot stored procedures).. I SHOULD NOT USE ANY THIRD PARTY TOOL. Kindly help me to do schema level comparision like column name, column size, column datatype of two same tables from different databases. I wanna compare all columns at a time between two tables....

 
Simple-Talk Database Delivery

DLM
Patterns & Practices Library

Visit our patterns and practices library to learn more about database lifecycle management.

Find out how to automate the process of building, testing and deploying your database changes to reduce risk and make rapid releases possible.

Get started

Phil Factor
Documenting your SQL Server Database

One of the shocks that a developer can get when starting to program in T-SQL is that there is no simple way of... Read more...

 View the blog

Top Rated

A Start with Automating Database Configuration Management
 For a number of reasons, it pays to have the up-to-date source of all the databases and servers that... Read more...

Archiving Hierarchical, Deleted Transactions Using XML
 When you delete a business transaction from the database, there are times when you might want to keep a... Read more...

Rollback and Recovery Troubleshooting; Challenges and Strategies
 What happens if your database deployment goes awry? Do you restore from a backup or snapshot and lose... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Highway to Database Recovery
 Discover the best backup and recovery articles on Simple-Talk, all in one place. Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.