Click here to monitor SSC
  • Av rating:
  • Total votes: 33
  • Total comments: 19
Francis Hanlon

Monitoring Transactional Replication in SQL Server

11 April 2013

If you are using replication in SQL Server, you can monitor it in SSMS, but it makes sense to monitor distribution jobs automatically, especially if you can set up alerts or even set up first-line remedial action when a problem is detected. Francis shows how to do it in TSQL as an agent job.

There are several variables that we can examine in order to determine the health of our transactional replication system. Whilst the values  of these various counters may not, by themselves, indicate a problem, they can serve to give us clues as to what we can do to get replication back into a healthy status.

Normally we can monitor replication with the Replication Monitor Tool that is part of SQL Server Management Studio.  Since we cannot watch this 24 hours a day, it is useful to supplement this with a programmatic way to monitor replication.  Even if you opt to use a 3rd party tool like Red Gate’s SQL Monitor you may need to create a custom alert. Even if your tool of choice allows ‘out-of-the-box’ monitoring of replication, you might still need to set up a specific response to the counters if they exceed a threshold value.  More on that later.

One of the more useful tools supplied by Microsoft is the system stored procedure called dbo.sp_replmonitorsubscriptionpendingcmds which is stored in the distribution database.

This procedure returns both the number of commands that are pending for the subscription and an estimate of the number of seconds required to deliver all of the pending commands to the Subscriber.  I have a job that runs every 10 minutes and saves the data in a table as an historical record of the status.  I keep this data for 14 days.

This table is created in a DBA database installed on the subscriber server.  The code to create the table is:

CREATE TABLE dbo.Replication_Qu_History(

       Subscriber_db varchar(50) NOT NULL,

       Records_In_Que numeric(18, 0) NULL,

       CatchUpTime numeric(18, 0) NULL,

       LogDate datetime NOT NULL,



       Subscriber_db ASC, LogDate DESC



Script 1

The data in this table is populated by the monitoring procedures and provides an historical context for examining issues.  But to monitor what is happening right now more is required.

There are three things that help to determine the health of replication.

  1. The status of the replication related jobs
  2. The latency (especially the distribution latency) as measured by the counter Dist:Delivery Latency  
  3. The number of outstanding commands that are pending for the subscription

I have focused on the distribution latency because past experience has shown this to be more of an issue than log reader latency. Much of the time, the distribution latency is due to an increase in  transaction volume. For example doing an index rebuild on a large table in the publisher database can cause a large increase in the volume of the transaction log which results in a higher than normal volume of data to be replicated. For further discussion of this idea see Linchi Shea’s blog from 2009 Distribution latency in transactional replication: Is a volume surge the culprit?

If there are a large  number of outstanding commands waiting to be distributed, then sometimes this is due to a distribution agent job that is not running.  On the other hand, sometimes this job is running but not keeping up. By stopping and starting the agent the job starts to process the outstanding commands.

   To start off, we need to get information about replication, such as the name of the publisher and the subscriber, the names of the distribution agent jobs etc.  Microsoft has supplied several procedures in the Distributor database to help to gather information.  My Distributor database is on the same server as my Subscriber database so my script is simpler than if they were on separate servers. Firstly executing sp_replmonitorhelppublisher returns monitoring information for all Publishers using this Distributor. Executing sp_replmonitorhelppublication returns monitoring information for all publications using this Distributor. Lastly, executing sp_replmonitorhelpsubscription returns monitoring information for all subscriptions.  This information includes some latency numbers so I after executing this procedure I already have some key information.

This is the code I use to gather this information:


DECLARE @publisher SYSNAME, @publisher_db SYSNAME, @publication SYSNAME, @pubtype INT

DECLARE @subscriber SYSNAME, @subscriber_db SYSNAME, @subtype INT

DECLARE @cmdcount INT, @processtime INT

DECLARE @ParmDefinition NVARCHAR(500)


DECLARE @minutes INT, @threshold INT, @maxCommands INT, @mail CHAR(1) = 'N'

SET @minutes = 60 --> Define how many minutes latency before you would like to be notified

SET @maxCommands = 80000  --->  change this to represent the max number of outstanding commands to be proceduresed before notification

SET @threshold = @minutes * 60


SELECT * INTO #PublisherInfo


, 'SET FMTONLY OFF EXEC distribution.dbo.sp_replmonitorhelppublisher')


SELECT @publisher = publisher FROM #PublisherInfo     



,''SET FMTONLY OFF EXEC distribution.dbo.sp_replmonitorhelppublication @publisher='

+ @publisher + ''')'

--select @cmd

EXEC sp_executesql @cmd


SELECT @publisher_db=publisher_db, @publication=publication, @pubtype=publication_type  FROM ##PublicationInfo



,''SET FMTONLY OFF EXEC distribution.dbo.sp_replmonitorhelpsubscription @publisher='

+ @publisher + ',@publication_type=' + CONVERT(CHAR(1),@pubtype) + ''')'

--select @cmd

EXEC sp_executesql @cmd



ALTER TABLE ##SubscriptionInfo

ADD  PendingCmdCount INT NULL,

EstimatedProcessTime INT NULL


Script 2

Once I know some of this basic information about my publishers and my subscribers I can then check the status of the distribution jobs.  They should all be running.  If any are not running then I need to start them.  If I need to restart a job I set a flag to force an email alert to be sent.  I don’t want to actually send the email yet since I need to check the status of all my subscriptions. Again, if this number exceeds my threshold then I also set the flag to trigger my email alert. I used a cursor to go through my subscriptions because this was the easiest way I could gather the information and then use this information as parameters to other stored procedures to determine if the distribution agent was running and potentially stop and restart the agent:



SELECT @publisher, s.publisher_db, s.publication, s.subscriber, s.subscriber_db, s.subtype, s.distribution_agentname

FROM ##SubscriptionInfo s


OPEN cur_sub  

FETCH NEXT FROM cur_sub INTO @publisher, @publisher_db, @publication, @subscriber, @subscriber_db, @subtype, @JobName




       SET @cmd = 'SELECT @cmdcount=pendingcmdcount, @processtime=estimatedprocesstime FROM OPENROWSET(''SQLOLEDB'',''SERVER=(LOCAL);TRUSTED_CONNECTION=YES''

        ,''SET FMTONLY OFF EXEC distribution.dbo.sp_replmonitorsubscriptionpendingcmds @publisher=' + @publisher

       + ',@publisher_db=' + @publisher_db + ',@publication=' + @publication

       + ',@subscriber=' + @subscriber + ',@subscriber_db=' + @subscriber_db

       + ',@subscription_type=' + CONVERT(CHAR(1),@subtype) + ';' + ''')'

       SET @ParmDefinition = N'@cmdcount INT OUTPUT,

                        @processtime INT OUTPUT'

       --select @cmd

       EXEC sp_executesql @cmd,@ParmDefinition,@cmdcount OUTPUT, @processtime OUTPUT


       UPDATE ##SubscriptionInfo

       SET PendingCmdCount = @cmdcount

        , EstimatedProcessTime = @processtime

       WHERE subscriber_db = @subscriber_db


    INSERT INTO DBA.dbo.Replication_Que_History

    VALUES(@subscriber_db, @cmdcount, @processtime, GETDATE())

       --  find out if the distribution job with the high number of outstanding commands running or not

       --  if it is running then sometimes stopping and starting the agent fixes the issue


                           DROP TABLE ##JobInfo



              ,''SET FMTONLY OFF EXEC msdb.dbo.sp_help_job @job_name='''''

              + @JobName + ''''',@job_aspect=''''JOB'''''')'

       EXEC sp_executesql @cmd


       IF @cmdcount > @maxCommands OR (@processtime > @threshold AND @cmdcount > 0)


              IF (SELECT current_execution_status FROM ##JobInfo) = 1 --  This means job is currently executing so stop/start it


                     EXEC distribution.dbo.sp_MSstopdistribution_agent

                           @publisher = @publisher

                           , @publisher_db = @publisher_db

                           , @publication = @publication

                           , @subscriber = @subscriber

                           , @subscriber_db = @subscriber_db

                     WAITFOR DELAY '00:00:05' ---- 5 Second Delay

                     SET @mail = 'Y' 



       --SELECT name, current_execution_status FROM ##JobInfo

       IF (SELECT current_execution_status FROM ##JobInfo) <> 1 -- if the job is not running start it


              EXEC distribution.dbo.sp_MSstartdistribution_agent

                     @publisher = @publisher

                     , @publisher_db = @publisher_db

                     , @publication = @publication

                     , @subscriber = @subscriber

                     , @subscriber_db = @subscriber_db

              SET @mail = 'Y'      -- Send email if job has stopped and needed to be restarted


       DROP TABLE ##JobInfo

       FETCH NEXT FROM cur_sub INTO @publisher, @publisher_db, @publication, @subscriber, @subscriber_db, @subtype, @JobName



CLOSE cur_sub  


Script 3

I run the Microsoft-supplied procedure sp_replmonitorsubscriptionpendingcmds to gather information on outstanding commands and the expected number of seconds to catch up.

This is the information I want to store in my history table so I can get an idea of how replication is performing.

We need to decide on an acceptable threshold for latency.  I use sixty minutes, meaning that I want to be notified if the replicated database is more than sixty minutes behind the publisher database.  The other decision is the maximum number of undistributed commands.  If this number trends upwards as opposed to fluctuating, then there may be a problem. It is your choice how high you want to let this number get before taking action.  I choose to let the system get to 80000 undistributed commands. 

I picked these numbers by letting my Replication Queue checker job run for a couple of weeks, ensuring all maintenance jobs like index rebuild jobs had run.  Then I looked at the periods with the greatest number of undistributed commands and the greatest latency and made sure my settings were higher.  I don’t want to get called at night if a Rebuild Index job causes the system to backup temporarily but is capable of easily recovering by itself.

The code below requires the Ad Hoc Distributed Queries server configuration option be enabled. Here I create the email to be sent assuming the previous Script 3 found an issue.

IF @mail = 'Y'


   DECLARE @msg VARCHAR(MAX) = 'Replication on ' + @@SERVERNAME

       + ' may be experiencing some problems.  Attempts to restart the distribution agent have been made. '

       + 'If this is not the first message like this that you have received within the last hour, please investigate.'







       SET @xml1 = CAST(( SELECT subscriber AS 'td','',subscriber_db AS 'td','',

       latency AS 'td','', PendingCmdCount AS 'td','', EstimatedProcessTime AS 'td'

       FROM  ##SubscriptionInfo s



       SET @tab1 ='<html><body><H4>Subscription Information </H4>

         <table border = 1> <tr>

         <th> Subscriber </th> <th> Subscriber Database </th> <th> Latency(seconds)</th>

         <th> Undistributed Commands </th> <th> Estimated Catch Up Time</th></tr>'   

--  this command gives us the last 10 measurements of latency for each subscriber

       SET @xml2 = CAST(( SELECT s.Subscriber_db AS 'td','', s.Records_In_Que AS 'td','', s.CatchUpTime AS 'td','', CONVERT(CHAR(22),LogDate, 100) AS 'td'

              FROM (SELECT ROW_NUMBER() OVER ( PARTITION BY subscriber_db ORDER BY Logdate DESC ) AS 'RowNumber',


              , Records_In_Que

              , CatchUpTime

              , Logdate

              FROM DBA.dbo.Replication_Que_History

              ) s

              WHERE RowNumber <= 8

              FOR XML PATH('tr'), ELEMENTS ) AS NVARCHAR(MAX))


       SET @tab2 ='<br><br><H4>Historical Latency Information </H4>

         <table border = 1>


         <th>Subscriber</th> <th>Undistributed Commands</th> <th> Catch Up Time </th> <th> Date\Time </th></tr>'


       SET @body = @msg + @tab1 + @xml1 + '</table>'

       + @tab2 + @xml2 + '</body></html>'


       DECLARE @to NVARCHAR(200)


       EXEC msdb.dbo.sp_send_dbmail

         @body = @body,

         @body_format ='HTML',

         @recipients = @to,

         @subject = 'Possible Replication Problem' ;


DROP TABLE #PublisherInfo

DROP TABLE ##PublicationInfo

DROP TABLE ##SubscriptionInfo


Script 4

 The last process is to periodically delete rows from the replication status table so the data does not get stale.

DECLARE @delDate datetime = getdate()-10

  DELETE FROM DBA.dbo.Replication_Que_History

  WHERE LogDate < @deldate

Script 5

If any of the thresholds you set up in this script are met, then the distribution agent corresponding to the subscription with the offending counter will be stopped and restarted unless it is already stopped in which case the job will just be started. An email will be sent to let you know that some action was taken.  In many cases, restarting the distribution agent fixed the issue and replication started to work again.  If this did not fix the issue then when this job next runs the same action will be taken and a second email sent.  At this point, if you aren’t already examining the situation then you need to start.

If you have a 3rd party alerting system you could just use a modified form of the 3rd script supplied to stop and restart the distribution agent jobs if any of the thresholds are met.  Otherwise, run script 1 to create the required table.  Create a new job and put scripts 2-4 in the first step, then put script 5 as the second step of the job and schedule this job for whatever frequency makes sense to you.  I am currently running this every 10 minutes.

This process should help with the period stoppages of transactional replication but automating the recovery process.  It is still important to periodically monitor the process using the Replication Monitor tool.  This process is only meant to help prevent the off-hours calls that can be easily fixed by kick starting a distribution agent job.

    Francis Hanlon

    Author profile:

    Francis works as the Senior SQL Server DBA for Manitoba eHealth; a Canadian organization providing IT solutions for health care facilities in the province of Manitoba. He has been work involved in IT since using punch cards was a common practice in programming. He has been involved with SQL Server databases since the 20th century.

    Search for other articles by Francis Hanlon

    Rate this article:   Avg rating: from a total of 33 votes.





    Must read
    Have Your Say
    Do you have an opinion on this article? Then add your comment below:
    You must be logged in to post to this forum

    Click here to log in.

    Subject: Server Timeout
    Posted by: scruppy43 (view profile)
    Posted on: Thursday, April 18, 2013 at 1:34 AM
    Message: what possible cause why some server are caused of server timeout..when i run my queries...

    Subject: Replication stats question?
    Posted by: Patrick Index (view profile)
    Posted on: Wednesday, April 24, 2013 at 7:24 AM
    Message: Francis
    Apart from putting triggers on the subscriber tables or doctoring the replication sp's, is there a simple way (for transactional replication) to establish how many records are replicated by article, by transaction type (i.e. UPDATE, DELETE, INSERT) for a given time period say 1 day? Thanks.

    Subject: Issue with sp_replmonitorsubscriptionpendingcmds for 2005 and 2008
    Posted by: pkpetroff (view profile)
    Posted on: Monday, May 6, 2013 at 10:36 AM
    Message: There is an issue with sp_replmonitorsubscriptionpendingcmds if you are still with SQL Server 2005 and 2008:

    Subject: Issue with sp_replmonitorsubscriptionpendingcmds for 2005 and 2008
    Posted by: fhanlon (view profile)
    Posted on: Thursday, July 11, 2013 at 2:49 PM
    Message: That link to the connect article is a very interesting read. I have noticed this behaviour at times. The suggestion to use a "fixed" version of the sp_replmonitorsubscriptionpendingcmds proc may be of benefit.

    Thanks for pointing that out

    Subject: Replication stats question?
    Posted by: fhanlon (view profile)
    Posted on: Thursday, July 11, 2013 at 2:51 PM
    Message: Patrick I don't have an easy answer for this. I would have to do some research on this.

    Subject: Im getting errors! Pls help
    Posted by: Tiana0000 (view profile)
    Posted on: Monday, November 11, 2013 at 9:52 AM
    Message: When i run script 3 i get the error msg:

    Msg 207, Level 16, State 1, Procedure Repl_Monitor, Line 0
    Invalid column name 'PendingCmdCount'.
    Msg 207, Level 16, State 1, Procedure Repl_Monitor, Line 0
    Invalid column name 'EstimatedProceduresTime'

    *Note Repl_Monitor is the name of my procedure

    From what i see, its updating temp table '##SubscriptionInfo' with column that don't exist in the table ie column 'PendingCmdCount' and 'EstimatedProceduresTime'.
    I tried selecting into different temp table name like so...

    SET @cmd = 'SELECT @cmdcount=pendingcmdcount, @procedurestime=estimatedprocedurestime INTO ##Sub FROM OPENROWSET(''SQLOLEDB'',''SERVER=(LOCAL);TRUSTED_CONNECTION=YES''..etc

    And then i updated my ##Sub temp table but it showed an error message at WHERE clause of the UPDATE statement ie ' WHERE subscriber_db = @subscriber_db' because subscriber_db column doesnt exist in the execution of stored procedure sp_replmonitorsubscriptionpendingcmds. The second thing is that your cursor is not close nor deallocated. Even after doing so an error msg still shows saying 'The cursor 'cur_sub' already exit. Pls need help. Thanks in advance.

    Subject: Im getting errors! Pls help (follow up)
    Posted by: fhanlon (view profile)
    Posted on: Thursday, November 14, 2013 at 11:35 AM
    Message: Tiana:
    Thanks for your comments. I made some corrections to the script. The corrected scripts can be downloaded from the top of the article. I will try to get these corrections into the body of the article as soon as possible

    Subject: Im getting errors! Pls help (follow up)
    Posted by: Tiana0000 (view profile)
    Posted on: Monday, November 18, 2013 at 7:23 AM
    Message: Thanks for much fhanlon, it works now :)

    Subject: SQL Server 2012 Errors!
    Posted by: sqlMilieu (view profile)
    Posted on: Monday, January 26, 2015 at 2:23 PM
    Message: Thanks for a great post!
    Getting errors in 2012:
    Msg 11514, Level 16, State 1, Procedure sp_describe_first_result_set, Line 1
    The metadata could not be determined because statement 'exec @cmd N'if is_member(N''db_owner'') = 1 or isnull(is_member(N''replmonitor''),0) = 1 set @has_ac' in procedure 'sp_MSrepl_DistributorReplMonitorAccess' contains dynamic SQL. Consider using the WITH RESULT SETS clause to explicitly describe the result set.

    Subject: SQL Server 2012 Errors!
    Posted by: fhanlon (view profile)
    Posted on: Monday, January 26, 2015 at 2:37 PM
    Message: The code associated with this article was never tested with SQL 2012. I didn't make use of sp_describe_first_result_set so I can't comment on what might need to change. There is more than one way to approach this issue. I was trying to convey how I approached replication monitoring when this article was written and there is certainly room to adapt the code and enhance the code to take advantage of constructs in later versions. I will look into what might need to change with SQL 2012 when my system gets upgraded from the current 2008 R2.

    Subject: SQL Server 2012 Errors!
    Posted by: sqlMilieu (view profile)
    Posted on: Monday, January 26, 2015 at 4:04 PM
    Message: Thanks for the prompt reply Hanlon! Not sure why my previous comments shows twice!

    Subject: The distributor_AgentName is not found
    Posted by: fhanlon (view profile)
    Posted on: Monday, March 2, 2015 at 1:40 PM
    Message: When I look at the Microsoft documentation for sp_replmonitorsubscriptionpendingcmds it indicates the @subscription_type parameter is either set to 0 for a push subscription or 1 for a pull subscription. I would check to ensure the command is created with all the correct parameters for your situation. You may need to make modifications. There is not a different system proc for pull and push subscriptions. Print out the command that is created for you and check that all the parameters are filled in correctly. There may be more than 1 issue.

    Subject: The distributor_AgentName is not found
    Posted by: benhur216 (view profile)
    Posted on: Monday, March 2, 2015 at 1:51 PM
    Message: When I run this script on my distributor database, I get an error that the Distributor_AgentName could not be found. I have looked at the job name that is being returned, and it is the job name as if I were doing a push subscription. However, my environment requires a Pull subscription. Which I have successfully configured. Is there another system stored procedure that I could use instead of "sp_replmonitorsubscriptionpendingcmds" to get the correct job name?

    Subject: The Distributor_AgentName is not found Con't
    Posted by: benhur216 (view profile)
    Posted on: Monday, March 2, 2015 at 1:58 PM
    Message: Sorry, I specified the wrong system stored procedure. The name of the stored procedure that I am having trouble with, is "sp_replmonitorhelpsubscription". Thanks in advance for your help.

    Subject: The Distributor_AgentName is not found Con't
    Posted by: fhanlon (view profile)
    Posted on: Monday, March 2, 2015 at 2:08 PM
    Message: There are many parameters that can be passed to the sp_replmonitorhelpsubscription proc. I did not use all of them. One of the parameters I notice in the Microsoft documentation is the @subtype parameter which defaults to 0 meaning push subscription. Changing this to 1 represents a Pull subscription. The code already includes a subscription type variable - @subtype. I don't remember noticing this when I originally ran the code and it worked since I had a push subscription. Adding this parameter and populating it using the existing variable @subtype would not be difficult. This way it should work for both pull and push subscriptions. Good catch

    Subject: The Distributor_AgentName is not found Con't
    Posted by: benhur216 (view profile)
    Posted on: Tuesday, March 3, 2015 at 9:06 AM
    Message: I have tried using the parameter of @SubType to the sp_replmonitorhelpsubscription and it gave me an error that the @SubType is not a valid parameter. As I look at the specification for the stored procedure, I notice that the subtype is a result variable not a parameter variable.

    Subject: The Distributor_AgentName is not found Con't
    Posted by: fhanlon (view profile)
    Posted on: Tuesday, March 3, 2015 at 9:44 AM
    Message: So you are saying when you execute the command EXEC distribution.dbo.sp_replmonitorhelpsubscription @publisher=servername,@publication_type=0
    then the publication you are interested in does not show up? If this is transactional replication then all subscriptions should show up. If its not transactional publication then the @publication_type must be changed to the appropriate type. If your subscription does not show up and replication is working then I'm sorry but I can't answer why based on the information given. As far as I know there is no other command to list subscriptions.

    Subject: The Distributor_AgentName is not found Con't
    Posted by: benhur216 (view profile)
    Posted on: Tuesday, March 3, 2015 at 10:15 AM
    Message: In further testing I hvave noticed that in the result table for the sp_replmonitorhelpsubscription stored procedure is returning a SubType of 1 which translates to a Pull subscription. However, the Distributor_AgentName is the same name as if the replication is a push subscription. So, thank you for your responses, but I believe that Microsoft has not implemented this stored procedure properly.

    Subject: The Distributor_AgentName is not found Con't
    Posted by: benhur216 (view profile)
    Posted on: Tuesday, March 3, 2015 at 10:19 AM
    Message: Sir,
    I am seeing the publication and the publication type that I expect. I receive the SubType that I expect, however, the Distributor_AgenName is not the Job name that I expect to see.

    Simple-Talk Database Delivery

    Patterns & Practices Library

    Visit our patterns and practices library to learn more about database lifecycle management.

    Find out how to automate the process of building, testing and deploying your database changes to reduce risk and make rapid releases possible.

    Get started

    Phil Factor
    How to Build and Deploy a Database from Object-Level Source in a VCS

    It is easy for someone who is developing a database to shrug and say 'if only my budget would extend to buying fancy... Read more...

     View the blog

    Top Rated

    SQL Server Statistics Basics
     Distribution statistics are used by SQL Server's Query Optimiser to determine a good execution plan for... Read more...

    Clone, Sweet Clone: Database Provisioning Made Easy?
     One of the difficulties of designing a completely different type of development tool such as SQL Clone... Read more...

    Database Lifecycle Management: Deployment and Release
     So often, the unexpected delays in delivering database code are more likely to happen after the... Read more...

    The PoSh DBA: Assigning Data to Variables Via PowerShell Common Parameters
     Sometimes, it is the small improvements in a language that can make a real difference. PowerShell is... Read more...

    Issue Tracking for Databases
     Any database development project will be hard to manage without a system for reporting bugs in the... Read more...

    Most Viewed

    Beginning SQL Server 2005 Reporting Services Part 1
     Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

    Ten Common Database Design Mistakes
     If database design is done right, then the development, deployment and subsequent performance in... Read more...

    Temporary Tables in SQL Server
     Temporary tables are used by every DB developer, but they're not likely to be too adventurous with... Read more...

    Concatenating Row Values in Transact-SQL
     It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

    SQL Server Index Basics
     Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

    Why Join

    Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.