Click here to monitor SSC
  • Av rating:
  • Total votes: 34
  • Total comments: 7
Adam Machanic

A Primer on Managing Data Bitemporally

10 May 2007

Adapted from Expert SQL Server 2005 Development (Apress, 2007), Chapter 10, "Working with Temporal Data"

A central truth that we need to embrace to be successful as database developers is that not all data is as great as it could be (or as we might wish it to be). Sometimes, we're forced to work with incomplete or incorrect data, and correct things later as a more complete picture of reality becomes available.

Modifying data in the database is simple enough – a call to a DML statement and the work is done. But, in systems that require, for auditing purposes, advanced logging and reproducibility of reports between runs, a straightforward update, insert, or delete may be counter-productive. Doing such a data modification can destroy the possibility of re-creating the same output on consecutive runs of the same query.

To get around doing a simple update in the case of invalid data, some systems use the idea of offset transactions. An offset transaction uses the additive nature of summarization logic to fix the data in place. For instance, consider a financial reporting system with a table that describes customer transactions. The system should see a particular transaction as doing $1500.00 worth of business, but a data entry clerk accidentally keys it in as $1600.00. By later adding a ($100.00) transaction to the table, aggregation would result in the correct output.

Following is a highly simplified representation of what such a table might look like:

CREATE TABLE Transactions
(
   TransactionId INT,
   Customer VARCHAR(50),
   TransactionDate DATETIME,
   TransactionType VARCHAR(50),
   TransactionAmount DECIMAL(9,2)
)

On June 12, 2005, customer Smith deposited $500. However, due to a teller's key error that was not caught in time, by the time the reporting data was loaded the amount that made it into the system was $5000:

INSERT Transactions
VALUES
(1001, 'Smith', '2005-06-12', 'DEPOSIT', 5000.00)

The next morning, the erroneous data is detected. Although the transaction row itself could be updated in-place, this would destroy the audit trail, and so an offset transaction must be issued. There are a few ways of handling this scenario. The first method is to issue an offset transaction dated the same as the incorrect transaction:

INSERT Transactions
VALUES
(1001, 'Smith', '2005-06-12', 'OFFSET', -4500.00)

Back-dating the offset fixes the problem in summary reports that group any dimension (transaction number, customer, date, or transaction type), but fails to keep track of the fact that the error was actually caught on June 13. Properly dating the offset record is imperative for data auditing purposes:

INSERT Transactions
VALUES
(1001, 'Smith', '2005-06-13', 'OFFSET', -4500.00)

Unfortunately, proper dating does not fix all of the issues—and introduces new ones. After properly dating the offset, a query of the data on customer Smith, for all business done through June 12, does not include the correction. Only by including data from June 13 would the query return the correct data. Although a correlated query could be written to return the correct summary report for June 12, the data is in a somewhat-strange state when querying for ranges after June 12, e.g. June 13-15. The offset record is orphaned if June 12 is not considered in a given query, along with June 13.

To get around these and similar issues, a bitemporal model is necessary. In a bitemporal table, each transaction has two dates:

  1. The actual date that the transaction took place, and
  2.  A "valid" date, which represents the date that we know the updated data to be correct.

The following, modified version of the Transactions table, shows the new column:

CREATE TABLE Transactions
(
   TransactionId INT,
   Customer VARCHAR(50),
   TransactionDate DATETIME,
   TransactionType VARCHAR(50),
   TransactionAmount DECIMAL(9,2),
   ValidDate DATETIME
)

When inserting the data for Smith on June 12, a valid date of June 12 is also applied:

INSERT Transactions
VALUES
(1001, 'Smith', '2005-06-12', 'DEPOSIT', 5000.00, '2005-06-12')

Effectively this row can be read as, "as of June 12, we believe that transaction 1001, dated June 12, was a deposit for $5000.00." On June 13, when the error is caught, no offset record is inserted. Instead a corrected deposit record is inserted, with a new valid date:

INSERT Transactions
VALUES
(1001, 'Smith', '2005-06-12', 'DEPOSIT', 500.00, '2005-06-13')

This row indicates that as of June 13, transaction 1001 has been modified. But the important difference is that the transaction still maintains its correct date—so running a report for transactions that occurred on June 13 would show no rows for June 12. In addition, this model eliminates the need for offset transactions. Rather than use an offset, queries should always find the last update for any given transaction, within the valid range.

To understand this a bit more, consider a report run on August 5, looking at all transactions that occurred on June 12. The person running the report wants the most "correct" version of the data; that is, all available corrections should be applied. This is done by taking the transaction data for each transaction from the row with the maximum valid date:

SELECT
   T1.TransactionId,
   T1.Customer,
   T1.TransactionType,
   T1.TransactionAmount
FROM Transactions AS T1
WHERE
   T1.TransactionDate = '2005-06-12'
   AND T1.ValidDate =
   (
      SELECT MAX(ValidDate)
      FROM Transactions AS T2
      WHERE T2.TransactionId = T1.TransactionId
   )

By modifying the subquery, it is possible to get "snapshot" reports based on data before updates were applied. For instance, assume that this same report was run on the evening of June 12. The output for Smith would show a deposit of $5000.00 for transaction 1001. To reproduce that report on August 5 (or any day after June 12), change the ValidDate subquery:

SELECT
   T1.TransactionId,
   T1.Customer,
  
T1.TransactionType,
   T1.TransactionAmount
FROM Transactions AS T1
WHERE
   T1.TransactionDate = '2005-06-12'
   AND T1.ValidDate =
   (
      SELECT MAX(ValidDate)
      FROM Transactions AS T2
      WHERE
         T2.TransactionId = T1.TransactionId
         AND ValidDate <= '2005-06-12'
   )

Note that, in this case, the subquery could have been eliminated altogether, and the search argument could have become AND T1.ValidDate = '2005-06-12'. However, the subquery is needed any time you're querying a range of dates, so it's a good idea to leave it in place for ease of maintenance of the query.

Using this same pattern, data can also be booked in the future, before it is actually valid. It's common when doing wire transfers, credit card payments, and other kinds of electronic funds transactions to be able to set the "posting date" on which the business will actually be executed. By working with the valid date, Smith can make a request for an outgoing transfer on June 14, but ask that the transfer actually take place on June 16:

INSERT Transactions
VALUES
(1002, 'Smith', '2005-06-16', 'TRANSFER', -1000.00, '2005-06-14')

Since the transaction date is June 16, a report dealing with transactions that occurred between June 1 and June 15 will not show the transfer. But a business manager can query on June 15 to find out which transactions will hit in the coming days or weeks, and audit when the data entered the system.

The examples shown above use only one table, for the sake of simplicity. In a real system, a base transactions table would be used, along with a running ledger table in which the actual transaction data would be persisted. The base table would contain information like the customer involved and the transaction type, while the ledger table would include such attributes as the transaction date, valid date, and amounts:

CREATE TABLE Transactions
(
    TransactionId INT,
    Customer VARCHAR(50),
    TransactionType VARCHAR(50),
    BaseTransactionDate DATETIME
)

 

CREATE TABLE TransactionLedger
(
    TransactionID INT,
    TransactionDate DATETIME,
    ValidDate DATETIME,
    TransactionAmount DECIMAL(9,2)

)

It's important to differentiate between the BaseTransactionDate in the Transactions table and the TransactionDate in the TransactionLedger table. The former is there for record keeping purposes only—when did we first hear about this transaction? The latter is there for bitemporal queries—when is the actual transaction date? As an example, consider an online credit card payment system. You log in and ask to have your payment posted next Tuesday (future posting dates are currently available on all three of the online credit card payment systems to which I belong). Next Monday arrives and you realize that your paycheck isn't going to clear on time, so you log in and change the posting date to Thursday. In this case, the new transaction took place on Monday—and, therefore, that date needs to be updated. And of course the valid date needs to be updated. However, the base transaction date does not change—it still happened the week before, and that fact should never change.

As a final note, when modeling bitemporal data you may want to investigate the possibility of implementing cutoff date rules, after which changes to transactions cannot be made. For example, the system may have a policy whereby transactions are said to be closed after 90-days. In this case, a simple CHECK constraint would do the trick, to ensure that the ValidDate is within 90 days of the TransactionDate. Another example would be data that has been used to generate an official report, such as for a government agency. In that case, you'd want a rule so that no transaction can be back-dated to before the report was run (lest it change the data in the report). In that case a trigger would be needed, in order to verify the date against a table containing the report run history.

Summary

Modeling data bitemporally allows for an auditable, accurate representation of historical or future knowledge as data is updated. This can be tremendously useful in many scenarios—especially in the realm of financial reconciliation when you can be forced to deal with backdated contracts that change the terms of previous transactions and business booked in the future to avoid certain types of auditing issues.

Adam Machanic

Author profile:

Adam Machanic is a Boston-based SQL Server developer, writer, and speaker. He focuses on large-scale data warehouse performance and development, and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has written for numerous web sites and magazines, including SQLblog, Simple Talk, Search SQL Server, SQL Server Professional, CoDe, and VSJ. He has also contributed to several books on SQL Server, including "SQL Server 2008 Internals" (Microsoft Press, 2009) and "Expert SQL Server 2005 Development" (Apress, 2007).

Adam regularly speaks at conferences and training events on a variety of SQL Server topics. He is a Microsoft Most Valuable Professional (MVP) for SQL Server, a Microsoft Certified IT Professional (MCITP), and an alumnus of the INETA North American Speakers Bureau.

Search for other articles by Adam Machanic

Rate this article:   Avg rating: from a total of 34 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: bitemporal relationships
Posted by: Anonymous (not signed in)
Posted on: Wednesday, May 16, 2007 at 8:48 AM
Message: First off, thanks for the interesting article. We are trying to create a database that uses bitemporal data at a deeper level than described here. How would you go about making a parent-child relationship when both tables consisting of data that needs to be bitemporally tracked (keeping the integrity of the relationhsip intact when the parent and/or child gets new data, new rows) We've ordered a copy of your book, in hopes that Chapter 10: Working with Temporal Data will be of help. Does it shed light on this issue?

Subject: Not sure
Posted by: Adam Machanic (view profile)
Posted on: Wednesday, May 16, 2007 at 9:15 AM
Message: Hi Anonymous,

I'm not sure if the chapter will help with your scenario; do you want to provide some more detail so that we can examine the situation? Please post the actual business reqs, or if you need to keep it private a contrived example that has the same technical problems... I don't really understand yet what you're trying to do. Perhaps we should move this over to the forums so that we can really dig in? Let me know.


Subject: Temporal Data
Posted by: Anonymous (not signed in)
Posted on: Monday, June 04, 2007 at 12:53 PM
Message: Nice article, although I found much of the material to be very similar to "Temporal Data & the Relational Model" by C.J. Date, Hugh Darwen, and Nikos Lorentzos who take the topics even further (an entire book's worth of detail).

Subject: Agree with above reader
Posted by: Anonymous (not signed in)
Posted on: Monday, June 04, 2007 at 4:59 PM
Message:
see also: http://www.timeconsult.com/TemporalData/TemporalDB.html

and: http://www.cs.arizona.edu/people/rts/tdbbook.pdf

There is plenty of research & literature on termporal/bitemporal relations, thus a discussion on the practical implications of the various techniques in the commercial world would be valuable.

Subject: Great suggestion!
Posted by: Adam Machanic (view profile)
Posted on: Monday, June 18, 2007 at 6:05 PM
Message: I haven't read that Date/Darwen/Lorentzos book, but I used the Rick Snodgrass book as one of my main research sources when writing the chapter that this article was extracted from. It's definitely a fantastic and enlightening read.

Subject: Anonymous Comments Disabled
Posted by: Sarah Grady (view profile)
Posted on: Wednesday, December 19, 2007 at 9:56 AM
Message: Due to a high volume of spam, anonymous comments have been disabled.

Subject: Bitemporal data and SQL Server 2008
Posted by: sagopal (view profile)
Posted on: Tuesday, January 11, 2011 at 7:45 AM
Message: Hi Adam,

Thanks for the article. We have developed a Bitemporal schema and a perl library which takes care of making IUD on Bitemporal form. All the data is in SQL Server 2008 and the entire system is always busy with too many readers and writers (writing in bitemporal form).Will it scale since SQL server's internal indexing is B+ tree based. We have had a lot of deadlock(s) in the past in the non bitemporal system which is not happening too frequently nowadays after we added better indexes, added NOLOCK, ROWLOCK judiciously. But in the current Bitemporal form , all readers and writers mostly run range queries. We believe that will add more to the deadlock problem considering the internal indexing is a B+ tree.

Is our assumption right ?
Any thoughts/Pointers ?

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

Continuous Delivery and the Database
 Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Highway to Database Recovery
 Discover the best backup and recovery articles on Simple-Talk, all in one place. Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.