Click here to monitor SSC
  • Av rating:
  • Total votes: 111
  • Total comments: 15
Hilary Cotter

The identity crisis in replication

05 July 2005

This article discusses three common problems DBAs are likely to encounter when columns have the identity property, which is defined as an attribute of int, smallint, bigint, decimal, numeric or tinyint columns that will auto-increment their value when data is inserted. These problems are humorously referred to as the identity crisis.

Consider a table that looks like this:

CREATE TABLE test
(PK INT NOT NULL IDENTITY(1,1) CONSTRAINT PrimaryKey PRIMARY KEY,
CharCol CHAR(20))

To insert a row you do not have to specify a value for the PK, or identity, column like this:

INSERT INTO test (CharCol) VALUES (‘test’)
GO

The following query…

SELECT * FROM test 

…will yield:

PK CharCol
-----------
1 test
(1 row(s) affected)

You cannot supply a value for an identity column; trying to do so will result in an error as the following example illustrates:

INSERT INTO test (PK, Charcol) VALUES (1,’test’)
GO
Server: Msg 544, Level 16, State 1, Line 1
Cannot insert explicit value for identity column in table ‘test’ when
IDENTITY_INSERT is set to OFF.

If you need to assign a value to the identity column, you will have to use the SET option IDENTITY_INSERT as illustrated below:

SET IDENTITY_INSERT test ON
GO
INSERT INTO test (PK, CharCol) VALUES (2, ‘test’)
GOSET IDENTITY_INSERT test OFF
GO

Columns with the identity property—or, as they are more familiarly known, identity columns—are most frequently used to ensure uniqueness, generally as a primary key. While GUIDs, also know as unique identifiers, can also be used to ensure uniqueness, they have their own set of problems. (For more information on using GUIDs as PKs, please refer to http://www.aspfaq.com/show.asp?id=2504.)

When you create an identity column, you can specify a start value called a seed and an increment value called an increment. By default the identity seed and increment are 1. There are three caveats associated with identity columns:

You can use any real number as a seed or increment; positive and negative numbers are valid, but fractional, decimal or imaginary numbers are not.

  • You cannot alter a column to add the identity property to it.
  • You can only have one column per table with the identity property.
  • You can assign the identity property to columns when creating them by using the Create and Alter table predicates.

Here are some examples of creating columns with the identity property.

CREATE TABLE test
(pk INT NOT NULL IDENTITY)
CREATE TABLE test1
(pk INT NOT NULL IDENTITY (1,1))

These two tables above are equivalent.

CREATE TABLE test2
(pk INT NOT NULL)
ALTER TABLE test2
ADD Ident INT NOT NULL IDENTITY(-1,-1)

DBAs face several problems when replicating columns that contain the identity property. Immediate updating subscribers do not have these replication difficulties as we will see shortly, but first let’s look at some common problems.

Triple threat: common problems

There are three problems—the identity crisis—you will likely face when replicating identity columns:

  1. A replication process inserts an identity value into a column with the identity property on it, and the insert fails.
  2. When both sides of a replication solution assign the same identity value on their side, and when the replication process carries the value to the other side, a conflict arises.
  3. Updates may fail even when you are not updating the identity column in transactional replication solutions.

The first problem above is illustrated in Script 1 in the code download. It occurs when you have bi-directional replication such as merge, bi-directional transactional, or one of the subscriber variants (transactional or snapshot) that can be updated using queued updating or immediate updating with queued failover. Note that immediate updating subscribers with queued failover will only be problematic when you have modified your subscription agents for queued failover.

The second problem, known as a primary key collision, is illustrated in Script 2 in the code download. In this example we will be using bi-directional transactional replication, but the results you see here are applicable to any replication method. When an insert is replicated, Script 2 will generate the error message:

Violation of PRIMARY KEY constraint ‘primarykey’.
Cannot insert duplicate key in object ‘script2’.

The third problem is typically caused when all of the columns in a row are updated by the application, including the identity column. To resolve the problem, edit your update replication stored procedure and comment the update of your identity column in the second part of the update procedure. An alternative is to use sp_scriptdynamicupdproc (only in SQL Server SP3) to create a "corrected" update proc.

The reason the updatable subscribers using immediate updating are immune to primary key collisions is because an insert on the subscriber is performed first on the publisher and then on the subscriber using a two-phase commit process coordinated by MS DTC.

So consider the case with transactional replication with immediate updating subscribers. In our hypothetical case you run DBCC CHECKIDENT(‘TableName’) in the publication database and get a value of 100. Then you run DBCC CHECKIDENT(‘TableName’) in your subscription database and get a value of 50. This means that there are 50 inserts that have occurred in the table TableName in the publication database that have not yet been replicated to the subscription database. Consider the case where TableName looks like this:

CREATE TABLE TableName
(pk INT NOT NULL IDENTITY CONSTRAINT PrimaryKey PRIMARY KEY,
Charcol CHAR(200))

If you were to insert a row in TableName on the subscription database like this…

INSERT INTO TableName (charcoal) VALUES (‘test’) 

…the insert would be performed on the publication database first and then be applied to the subscription database. Therefore, the newly inserted row on publication and subscription databases would have a value for the PK of 101. Script 3 in the code download illustrates this.

Not for replication

The first problem is easily fixed by adding the Not For Replication clause to the identity column. Script 4 in the code download illustrates how to implement this on pre-existing tables.

The Not For Replication clause will prevent the identity property from being enforced when a replication process performs the identity insert. In other words, replication processes will not generate this message when they try to replicate the identity value from the publisher to the subscriber or vice versa:

Cannot insert explicit value for identity column 
in table ‘Table1’ when IDENTITY_INSERT is set to OFF.

The Not For Replication property can also be applied to triggers and constraints.

Avoiding primary key collisions

The problem of avoiding primary key collisions is more difficult to solve. Typically solutions to this second problem revolve around some form of partitioning or extending your primary key to include a location-specific element.

Partitioning

: Partitioning is where a column is identified that has location-specific data in it and your primary key is extended to include this column. In other words, there is something about the data in this column that enables you to tell at a glance from which server the row originated. Sometimes you will have to resort to extending your schema to include such a column.

In the example in Script 2, we could add another column with a default of db_name(), and then extend our primary key to include this column. This is enough to ensure uniqueness as Script 5 illustrates. In most real-world replication solutions, the default for this location identifier will not be the database name; ideally you will choose a numeric identifier (or small char column), since keeping your primary key as narrow as possible offers the best indexing performance, especially for clustered PKs.

Manual identity range management solutions

: Partitioning is the best approach to avoiding primary key collisions. There are times, however, when you simply don’t have the luxury of extending the schema to include an additional key, or even extending the PK to include another existing column, as it might break the application that overlies the schema. You could have the application talk to views and then modify the tables underlying the views to include these location identifier columns, or use indexed views to insulate your underlying schema. Another option is to use a GUID column instead of an identity column. Such solutions are frequently not scalable, however, and there are other issues with using a GUID column as well.

In situations like this you can use manual identity range management, in which you use different seeds on the publisher and subscribers. Immediate updating subscribers is often not an option either, as it requires a well-connected link between the publisher and subscriber, or transactions originating on the subscriber will fail and be rolled back if the publisher is off-line. Immediate updating also adds latency to transactions origination on the subscriber, which can make it an unacceptable solution.

For two nodes, you could use odd and even seeds as illustrated in Table 1.

Server

Seed

Increment

Publisher

1

2

Subscriber

0

2

Table 1 – Identity configuration for a publisher with a single subscriber

Typical values for the publisher would be 1, 3, 5, 7 and so on. Typical values for the subscriber would be 0, 2, 4, etc.

For three nodes you could use an increment of three as illustrated in Table 2. This configuration was first suggested by Kestutis Adomavicius (kicker.lt@nospaamm_tut.by).

Server

Seed

Increment

Publisher

1

3

Subscriber1

2

3

Subscriber 2

3

3

Table 2 – Identity configuration for a publisher with two subscribers

Typical values for the publisher would be 1, 4, 7, 10, etc. Typical values for the first subscriber would be 2, 5, 8, 11 and so on, and typical values for the second subscriber would be 3, 6, 9, 12, etc.

For four nodes you could use a combination of odd, even, positive and negative seeds and increments as Table 3 illustrates.

Server

Seed

Increment

Publisher

0

2

Subscriber1

1

2

Subscriber2

-1

-2

Subscriber3

-2

-2

Table 3 – Identity configuration for a publisher with three subscribers

As you can see, the more subscribers you have the more difficult it gets to develop a scheme to segregate your identity values. Another problem is that once you have implemented such a solution it becomes very difficult to modify it to add new subscribers. There are also inefficiencies with identity range consumption, since the bulk of inserts may occur at publisher or on one subscriber, and you may find that this node exhausts the data type before the other nodes. This is not generally problematic with the int data type, but it can be problematic with smallint.

Another approach is to manually adjust the identity ranges on the publisher or subscriber(s) on an as-needed basis.

Consider a publisher with three subscribers. On the publisher you could create an identity seed of 1 and an increment of 1. Let replication deploy the snapshots to the subscribers, and then before any database activity occurs on the publisher, change the seeds on each subscriber by using DBCC CHECKIDENT. Table 4 illustrates what this would look like.

Server

Original Seed

Original Incr

DBCC

New Seed

New Incr

Publisher

1

1

1

1

Subscriber1

1

1

DBCC CHECKIDENT(‘TableName’,RESEED,100)

100

1

Subscriber2

1

1

DBCC CHECKIDENT(‘TableName’,RESEED,200)

200

1

Subscriber3

1

1

DBCC CHECKIDENT(‘TableName’,RESEED,300)

300

1

Table 4 – Using DBCC CHECKIDENT to implement a manual identity range management solution

Suppose that in your first week the majority of the new rows originate on the publisher, and you run SELECT IDENT_CURRENT(‘TableName’) and discover to your horror that this returns a value of 99. You quickly run DBCC CHECKIDENT(‘TableName’, RESEED, 400) and you are fine until more than 100 rows are inserted at Subscriber1, Subscriber2 or Subscriber3.

The 100th row entered in Subscriber1 will have an identity value of 200, which will cause a PK violation with rows inserted on Subscriber2; the 100th row entered on Subscriber2 will cause a PK violation with rows inserted on Subscriber3; and the 100th row entered on Subscriber3 will cause a PK violation with the new rows entered on the publisher.

If you did choose to implement such a scheme, you would have to set an alerting threshold of perhaps 80 percent full range that would give you time to run over to the publisher or subscriber and run your DBCC CHECKIDENT reseed command. You would also want to size your reseed range for something that represents your largest batch update.

Consider the example illustrated in Table 4. Suppose the application that uses Subscriber1 generated 10,000 inserts; these inserts would have PKs from 100 to 10,100. When Subscriber1 syncs with the publisher, it is entirely possible that there will be a series of PK collisions with data originating from Subscriber2 and Subscriber3. The result will be that some of the inserts are rolled back depending on who syncs last.

With merge replication, if the publisher generates a PK value first, the publisher’s row will stay; later subscribers merging with the same PK value for their row will have their insert replaced by the publisher’s insert for that PK value. If a subscriber generates the PK value first and the row with this PK value is merged to the publisher, all other subscribers merging with the same PK value will have the row with that PK value replaced with the row for the PK value from the first merged subscriber. These PK violations will be recorded in the conflict tables and you may have the ability to roll them back.

You might, therefore, want to take preventive action by sizing your ranges (the value you put in for your RESEED) to a value that represents the largest set of inserts you might ever have between syncs of a single subscriber and the publisher. You might also want to put in a check constraint that will prevent any of the ranges from colliding with other ranges. So consider Subscriber1. We might want to put a check constraint that would limit that maximum value that the identity value could have so that inserts on Subscriber1 would not overrun Subscriber2’s range.

Let’s have a quick look at what this check constraint would look like for Subscriber1:

ALTER TABLE TableName ADD
CHECK NOT FOR REPLICATION ([PK] > 100 and [PK] < 200)

Notice how I have the Not For Replication clause on here, which means that this check constraint will not be enforced by any replication process.

There is one caveat for using a constraint like this. The check constraint will cause the insert to be rolled back, but the identity value will be incremented with each failed insert.

Script 6 in the code download illustrates this. Note how in Script 6 our check constraint is for 101 to 199. We attempted to insert 300 rows; the first 100 are kicked back by the check constraint, the next 99 make it in, and the final 101 are kicked back by the check constraint. Our current identity value would be 300. If instead we attempted to insert 500 rows instead of 300 in Script 6, our identity value would end up as 500.

Now suppose that we changed our constraint so that it is 400 to 500, which is the next free range. Any inserts on this table would fail since the current identity value would be 500. This would continue until we reseeded our identity value back to 401. Table 1.1 illustrates this. This is an important point to note as DBAs are frequently bewildered when they encounter this problem in SQL Server 2000 automatic identity range management.

Row
Inserted

Successfully
Inserted

Select @@IDENTITY

Select max(PK) from TableName

1

No, check constraint refuses it

1

0

2

No, check constraint refuses it

2

0

3

No, check constraint refuses it

3

0

.

0

.

0

.

0

99

No, check constraint refuses it

99

0

100

No, check constraint refuses it

100

0

101

Yes!

101

101

102

Yes!

102

102

.

.

.

198

Yes!

198

198

199

Yes!

199

199

200

No, check constraint refuses it

200

199

201

No, check constraint refuses it

201

199

Table 1.1

If you choose to implement a manual identity range management solution, you will need a tracking table on your publisher to know what ranges are currently in use by all subscribers so the new ranges you assign will not tread on any other subscribers. All you need to know is the last assigned identity range. You can use this max value as a basis for assigning the next range.

This manual identity range management solution is actually an implementation of how Microsoft does automatic identity range management, which is the third method for avoiding primary key collisions.

SQL Server 2000 automatic identity range management: A better solution for avoiding primary key collisions is to use Microsoft’s automatic identity range management feature, which incorporates the methodology discussed above in the manual identity range management section. To use this feature you must click on the browse button to the right of your table in the specify article dialog when you are creating your merge publication. This is illustrated in Figure 1.

Figure 1

Note that if the identity range management tab is disabled, chances are the table to which you are trying to add automatic identity range management is already part of an existing merge publication that does not use automatic identity range management. You will have to drop this table from the existing merge publication(s), enable it for automatic identity range management, and then add it back.

Notice the options for range size at the publisher and subscriber as well as the threshold. These options are symmetrical with what was discussed earlier in the section on manual identity range management.

Many DBAs will assign a range size for the publisher and subscriber for what they anticipate will be the life of their replication solution. DBAs might be tempted to assign a value of 100,000,000 or more, for example, for both the range size at publisher and range size at subscribers. The problem with this set-it-and-forget-it approach is that it is not scalable—there is not a lot of room for growth because with 100,000,000 there is room for only two subscribers. You may find that if the number of subscribers grows and there are a lot of inserts in one node, you may exhaust your range quickly and have plenty of room left over in ranges at other nodes.

A better approach is to size your ranges for the largest number of inserts that could occur when a subscriber or publisher is off-line (and then multiply by 10 or another safety factor).

If you don’t size these ranges adequately, or if you don’t run your merge agent frequently enough, you will run into the infamous error message:

Server: Msg 548, Level 16, State 2, Line 1

The identity range managed by replication is full and must be updated
by a replication agent. The INSERT conflict occurred in database 'pubs',
table 'testidentity', column 'pk'. sp_adjustpublisheridentityrange can be
called to get a new identity range. The statement has been terminated.

If you do get this message, run DBCC CHECKIDENT(‘tablename’,RESEED,100) to set your identity back to one more than the value of the identity that was inserted. In other words, do a select max(IdentityCol) from TableName. Then run sp_adjustpublisheridentityrange in the publication database. Sometimes rerunning the merge agent will solve the problem.

If you encounter this problem frequently, you may want to run your merge agent more often. If this is not possible, rebuild your publication to adjust your ranges to higher values.

Summary

In this article we looked at ways to implement identity management solutions in replication to avoid primary key conflicts. We looked at partitioning and manual and automatic identity range management solutions. Solutions using automatic identity range management can work well with careful planning.

Hilary Cotter

Author profile:

Hilary Cotter is a SQL Server MVP specializing in replication, and full-text search. He recently published a book on replication and is writing another about Microsoft search technologies. Hilary has been involved in IT for 21 years and is Director of Text Mining and Database Strategy for RelevantNoise.com - dedicated to mining blogs for business intelligence.

Search for other articles by Hilary Cotter

Rate this article:   Avg rating: from a total of 111 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Good article
Posted by: Anonymous (not signed in)
Posted on: Thursday, August 17, 2006 at 3:21 PM
Message: Thanks for the great job! i'm buying your book!

Subject: asp
Posted by: Anonymous (not signed in)
Posted on: Thursday, August 24, 2006 at 2:42 AM
Message: ASP is interesting web page programming

Subject: A good article
Posted by: Anonymous (not signed in)
Posted on: Thursday, August 31, 2006 at 5:04 AM
Message: This is a good article. I needed more info about manual identity range management (for 2 nodes) to avoid primary key collisions during transactional replication processes and it provided it. The example of this scenario was useful though I wish it included more code of how to change the seed & increment.

Subject: Finally, Two years of searching is over!
Posted by: Anonymous (not signed in)
Posted on: Friday, September 01, 2006 at 1:48 PM
Message: I have been scouring the web on and off for nearly two years to find a well written, concise explanation about how to deal with the "Identity Crisis". I realize this was writen over a year ago, but I am very excited to have finally found it. Many Thanks!

Subject: Check Contraints
Posted by: Anonymous (not signed in)
Posted on: Thursday, September 07, 2006 at 4:53 AM
Message: Hi, thx much for this good paper.
I got a problem on 2 sites with continusly merge that occurded every minute.
A soon as the link between the 2 servers is broken, the database creates on both side 'check constraints'.
Is there anyway to avoid this or is there a confortable script to delete all constraints in a click.
Thx for answer. reply can be done at pdelsaut@itsupport.fr

Subject: Article was fine, scripts are awful
Posted by: Anonymous (not signed in)
Posted on: Friday, October 27, 2006 at 2:06 PM
Message: How about explaining what the hell is going on in Script4, and using some reasonable formatting?


Subject: Replication triggers deployed to subscribers change @@identity
Posted by: Anonymous (not signed in)
Posted on: Friday, January 19, 2007 at 8:15 AM
Message: Here is my little piece of help when dealing with replication, identities, SQL Server 2005 and using MS Access as backend: http://jagbarcelo.blogspot.com/2006/06/problems-with-identity-fields-in-ms.html

Since MS Access 2003 uses @@identity to retrieve the last inserted record for tables indexed by identity fields, every time you try to insert a new record on a subscriber of transactional replication with queued (or immediate) updates on a table with an identity primary key, the following error message appears:

"The data was added to the database but the data won't be displayed in the form because it doesn't satisfy the criteria in the underlying record source."

In the former URL, the instructions to overide that problem are explained. I hope they would help someone.

Subject: Code Download?
Posted by: Troy (not signed in)
Posted on: Saturday, August 11, 2007 at 1:32 AM
Message: looking for the code download section but I can't find it... has it been removed?

Subject: excellent article
Posted by: Anonymous (not signed in)
Posted on: Tuesday, August 21, 2007 at 5:19 AM
Message: Wow, I've been struggling with this problem for 2 weeks now and finally stumbled on this article which explains it all to me.

Thanks

Subject: Limited Inserts
Posted by: Bob (view profile)
Posted on: Monday, September 03, 2007 at 7:16 AM
Message: The "Subscriber Range size" property limits the number of rows that can be inserted in a single statement. (Illogically at the publisher too).

Attempts to edit this have no effect.

A workaround for this would be really useful, as it regularly brings an entire application to a halt.

Subject: Limited Inserts
Posted by: Bob (view profile)
Posted on: Monday, September 03, 2007 at 9:38 AM
Message: The "Subscriber Range size" property limits the number of rows that can be inserted in a single statement. (Illogically at the publisher too).

Attempts to edit this have no effect.

A workaround for this would be really useful, as it regularly brings an entire application to a halt.

Subject: handling identity columns in replication
Posted by: RT-DB (not signed in)
Posted on: Monday, January 14, 2008 at 8:03 AM
Message: I think this is one of the best papers i've every read on how to handle identity columns and overcome replication issues

Subject: Identity crisis in SQL2005 Import
Posted by: Mickeyding (view profile)
Posted on: Friday, October 10, 2008 at 2:08 AM
Message: SQL2005 has altered many things. One of the big things that has been altered is that when you perform a data import on a table with an identity column the identity values are not preserved !!!
Our work around was to change the identity column into a normal integer column through the properties dialog of the column, update the values from the original source table and re-instate the identity column properties.
It is an added complication in a replicated system so be sure to check your identity column values PRIOR to setting up replication if your data has just been imported from elsewhere.

Useful article BTW. Nice to see information like this available for everyone.

Subject: Next range starting value
Posted by: FrankG (view profile)
Posted on: Thursday, June 04, 2009 at 9:38 AM
Message: We use merge replication and automatic management of the IDs under sql2005 and we had to launch 3 times sp_adjustpublisheridentityrange today. The movements in the database can not explain this. I would like to obtain the "next range starting value" by a sql request to follow what is happening (compared to the max(id) of our articles).
I can not find the sp_.. to obtain this information which sql knows as it is indicated in the GUI.
Before today and the update with sp3+cum3 of sql2005 we did not have any troule with the pool of identities as we execute sp_adjust.. everynight.

Subject: Switching from automatic to manual identity range management
Posted by: FishNChipPapers (view profile)
Posted on: Tuesday, July 10, 2012 at 9:13 AM
Message:
I have inherited a database that uses merge replication for high availability between geographically separate sites (the database is also mirrored for intra-site failover). Currently the vast majority of articles have Automatic identity range management. This relies on the connectivity between the Publisher, the Subscriber and the Distributor (a separate SQL Server instance co-located with the Publisher). In the event of a prolonged failover to the Subscriber site we have encountered issues where the identity range at the subscriber is exhausted and/or we encounter primary key constraint violations (due to publisher and subscriber allocating the same identity value).

We believe the most effective way to deal with this is to move to manual identity range management, allocating sufficiently large ranges at the Publisher and Subscriber (an alternative of using odd and even numbers at publisher and subscriber is not feasible due to the impact on availability as we would have to drop and recreate the tables and reestablish replication).

As far as I can tell from [url=http://msdn.microsoft.com/en-us/library/ms152745(v=sql.100).aspx][/url] there is no requirement to create a new snapshot when changing the identity range management mechanism. The question is how to go about changing the identity range management mechanism, together with the required ranges for all of the articles.

sp_changemergearticle allows the identityrangemanagementoption option to be changed but that's only half-way there.

Any advice and guidance would be welcomed.

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

SQL Server XML Questions You Were Too Shy To Ask
 Sometimes, XML seems a bewildering convention that offers solutions to problems that the average... Read more...

Continuous Delivery and the Database
 Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.