Self-maintaining, Contiguous Effective Dates in Temporal Tables

'Temporal' tables contain facts that are valid for a period of time. When they are used for financial information they have to be very well constrained to prevent errors getting in and causing incorrect reporting. This makes them more difficult to maintain. Is it possible to have both the stringent constraints and simple CRUD operations? Well, yes. Dwain Camps patiently explains the whole process.

Self-maintaining, Contiguous Effective Dates in Temporal Tables

In a database that is dealing with price information where a new price immediately supersedes the previous one, you can find the correct price for any particular date when all you have is an ‘effective date’ on each row. This is because the ‘effective end date’ is implied by the effective date on the subsequent row, and the current price is always the one with the latest date on it. It is a simple system that requires few constraints to implement.

Although this allows us to reproduce account information from the data and run historic financial reports, it is often a better fit for the business requirements to have both an ‘effective start’ date and an ‘effective end’ date on each row. This can handle a range of the more generic cases of a ‘transaction time table’, and it also is much quicker to determine the current value since it will have a NULL end-date.

Joe Celko‘s article Contiguous Time Periods gives a good summary of this type of table, and he described an ingenious way of enforcing the basic rules of this type of table solely via constraints, in what he called “Kuznetsov’s History Table.” This is a special version of the temporal price table which has the extra ‘previous end-date' included in the row. This allows us to use a foreign key constraint to prevent a row being inserted that doesn’t have an existing row with an end-date matching the entry in the ‘previous-end-date’ column. While it is an interesting approach, it does have some issues that he and Alex both cover in good detail.

In his article “Modifying Contiguous Time Periods in a History Table” Alex Kuznetsov shows not only how to set this up but also how to maintain such a table, because with such tight constraints it is no mean task.

In this article I’ll show how it is possible to design and use such a table, but without the complexity of the ‘previous-end-date' column, yet still using only constraints to enforce data integrity. Then I’ll show you how to use the MERGE statement to update the table with new prices, while automatically maintaining the effective end date column. Finally, I’ll then show how it is possible to do updates to this table using only simple CRUD (Create, Read, Update, Delete) operations, hiding the complexity of the actual MERGE process in a trigger.

Enforcing Effective End-to-start Dates Using Constraints without the Additional Column

In order to demonstrate this approach, and show how it is used, we’ll use a simple example. Here is the Data Definition Language (DDL) to create two tables. The first is a simple Products table (which we will populate with some test data) and the second is a ProductPrices table with CONSTRAINTs sufficient to enforce this end-to-start date relationship.

Let’s describe the CONSTRAINTs we’ve defined on ProductPrices.

  • pp_pk – This is the PRIMARY KEY CONSTRAINT that ensures that for any product, the effective start date must be unique across all rows. That is, the same product cannot have the same effective start date appearing more than once.
  • pp_ck1 – This is a rather simple CHECK CONSTRAINT that ensures that any effective end date (on a row) will always be greater than the effective start date on that row.
  • pp_ck2 – This is another fairly simple CHECK CONSTRAINT that ensures that a product’s price is always greater than zero. Unless you are giving products away, it is unlikely the price would ever be zero, and quite unlikely that it would ever be negative. A product’s price can also not be NULL (as defined by the column itself).
  • pp_fk1 – This is a self-referencing FOREIGN KEY (FK) CONSTRAINT, that says any row that has an effective end date defined, i.e., it is not NULL, must have a matching row where that row’s effective start date matches the effective end date. This enforces the “contiguous” nature of the end-to-start date relationships in the table.
  • pp_fk2 – This is a FOREIGN KEY CONSTRAINT that ensures that any ProductID entered into the ProductPrices table exists as a product in the Products table.
  • pp_u1 – This is a UNIQUE CONSTRAINT that ensures that each ProductID/EffectiveEndDT combination exists only once.

Let’s add two additional facts about the way these CONSTRAINTs will work together:

  • When an effective end date of NULL is inserted into the table, the self-referencing FK CONSTRAINT (pp_fk1) allows it, meaning that is the only case where there need not exist a corresponding effective start date.
  • Because of the UNIQUE CONSTRAINT (pp_u1), only one such NULL effective end date can be in the table for a given product.

Populating our Product/Prices Table

Now, we will put some data into our ProductPrices table with an INSERT, where we have ensured that the contiguous nature of the end-to-start dates by setting them appropriately.

Our final SELECT returns these rows from the ProductPrices table, showing that all is well:

The NULL EffectiveEndDT on each of the products indicates an open-ended time period, meaning that final price extends out into the future without bound.

Suppose we wish to find the price of a particular product on a specific date. Let’s examine a couple of cases:

We locate the proper time record by finding the one for that product that is greater than or equal to the start date and less than the end date. Our first query reports the product price for ProductID=1 on 2013-12-31 as:

Suppose we no longer want to sell mouse traps in 2015 because after that time we only want to sell better mouse traps. We can set the discontinued date of Product=1 to be 2015-01-01, and then run with the corresponding date of interest accordingly. The second SELECT query shows this and returns no rows, indicating no price for the product on that date because it has been discontinued.

You do need to make sure that the discontinued date is always later than the latest effective pricing record’s start date. But that isn’t really the point of today’s exercise.

Our CONSTRAINTs Maintain Data Integrity for our ProductPrices

These constraints in our simple product/price example are enforcing the essential business rules:

  • All products with a current price must have a single entry that has a NULL end date
  • An effective end date is always greater than the start date
  • A valid price is positive and greater than zero
  • All products must exist in the referenced products table
  • No product should have more than one row with a NULL end-date
  • No product should ever have more than one relevant row for any point in time.
  • For every non-NULL end-date on a row, the next row for that product must have that end-date as the new start-date.
  • No product can have duplicate start-dates or duplicate end-dates.

If we didn’t enforce these rules, we would allow bad data, such as having an end date overlapping a start date on the following row. This would allow an item to have more than one price at any point in time. Although this should be caught in the application’s front end, bad data can get through for one reason or another. When that happens, it must be identified and constraints set in this way will block it.

We have not yet tested the CONSTRAINTs we’ve created to make sure that they enforce data integrity according to the business rules, so let’s do that now.

All products must exist in the referenced products table

In the above example, the first INSERT fails because ProductID=4 does not exist in our Products table (CONSTRAINT pp_fk2). Once we’ve inserted ProductID=4, the exact same insert works just fine. Note how at the end we rolled back the transaction we created before our second insert into ProductPrices. We’ve done this to maintain the previously noted entries in that table, and we will continue to do this for any T-SQL query that would change the data without violating one of our constraints.

Since the INSERT of ProductID=4 into the ProductPrices table above does not fail, it is possible to seed each product with an open-ended price without any problems.

An effective end date is always greater than the start date

This INSERT would violate pp_ck1.

A valid price is positive and greater than zero

This insert violates pp_ck2. ProductID=4 remains in our Products table because we left it there in the previous set of queries. Remember at this time there are still no price records for ProductID=4 (both insert attempts failed).

No product can have duplicate start-dates or end-dates

This insert fails on the PRIMARY KEY CONSTRAINT (pp_pk) because we’re trying to insert the same effective start date more than once.

This fails because we’re trying to insert the same effective end date twice, violating our UNIQUE CONSTRAINT (pp_u1). Note that were both end dates NULL, the UNIQUE constraint would still fail (only one NULL end date is allowed for each product).

For every non-NULL end-date on a row, the next row for that product must have that end-date as the new start-date

These two fail the self-referencing FOREIGN KEY CONSTRAINT (pp_fk1) because of either a gap or an overlap in the end-to-start date relationship, meaning those rows we’re trying to insert don’t have contiguous dates.

Maintaining our ProductPrices – The Dilemma

We’ve already shown that it is pretty simple to insert a new product’s prices into our table. But what of the case where we need to do any of the following:

  • Insert a new row where a product already has a price record.
  • Change a product price for an existing row.
  • Change the effective start date for an existing product/price.
  • Delete the row for an effective date.

Some of these things are easy to do, while some are less trivial. For example:

The insert case above works only because we are inserting the first row (prior to the earliest effective date) and because we know what the next effective start date is (2013-06-04). In order to insert a row anywhere else, it is necessary to modify at least one other row’s effective end date, and that is not something that can be done with an INSERT.

The update in Case 2 above works because all we are changing is the product’s price. Had we tried instead to modify ProductID, EffectiveStartDT or EffectiveEndDT for that row, it would have violated one of the CONSTRAINTS.

While it is possible to change the effective start date of the first row (earliest effective start date), it is not possible to modify the effective start date of any other row. In order to do that, once again it becomes necessary to update the effective end date of at least one other row, and while that could be done with a carefully crafted UPDATE, it is a non-trivial exercise.

Once again, it is possible to delete the first effective start date row for any product, but the same cannot be done if the row to be deleted is in the middle or at the end of the set without violating one or more CONSTRAINTs. Unless of course it was possible to at the same time modify the effective end dates of one or more rows using DELETE (which it is not).

The astute SQL-ers will at this time raise their collective hands and say, “hey wait a minute. All those things that you said INSERT, UPDATE and DELETE cannot do could be done by using MERGE instead.” And you would be right! In fact, that is precisely how Alex proposed to manage maintenance of his history tables in the referenced articles. He simply developed a stored procedure to maintain the tables using a very clever MERGE.

So clearly we could do the same thing here, albeit a little differently due to the fact that we’ve constructed our CONSTRAINTs somewhat differently.

To demonstrate this, we’ll starting with an empty ProductPrices table and insert some sample data.

And we can now simply insert a new current price, modifying the existing price by changing the NULL to the end date and inserting a new row with the current price. We have to supply the row to be updated and the new row

This is fine where the SQL is hand-crafted, or you are using an isolation layer of stored procedures. What if the application has direct access to this table and the ORM (Object Relational Mapper), or UI Widget, being used is able to use only simple SELECT, INSERT, UPDATE and UPDATE statements?

In cases like this it would be impossible to impose any requirement on the application to do a MERGE statement.

We also may not want our application to have to decide what the correct effective end date is when we insert, update or delete any row in our product prices table. Since the title of this article includes the phrase “self-maintaining,” we want our SQL to do just that. Is it possible to isolate all the complexity from the application, yet allow them to access the table directly to maintain it?

While that sounds like a pretty tall order, it probably won’t surprise you that I’ve found a way to do all of this thanks to the wonders of a SQL 2012 LEAD analytical function.

Review of the SQL 2012 LEAD Analytical Function

Let’s show a quick and simple example of LEAD when applied to the ProductPrices table:

The results produced by this SELECT are:

Here we have used the optional second and third arguments of LEAD:

  • The second argument: the offset (number of rows) to look ahead to retrieve the value for EffectiveStartDT.
  • The third argument: by setting the third argument to 0, when there is no following row the NextEffectiveStartDT is set to 1900-01-01, which works quite well for the DATETIME data type, but not for other date or time data types. For those you’d need to use an explicit date, like ‘1900-01-01’. Ultimately, you should use any date that isn’t expected to be within the time period of your effective dates.

For each case except the latest effective start date row, the NextEffectiveStartDTmatches the EffectiveStartDT of the following row. This will be quite useful in what we’re about to propose.

Building Blocks of a TRIGGER to Support Self-maintenance of our Effective Dates

The title of this subsection gives away the show, however as always the devil is in the details.

Before we propose a TRIGGER (an INSTEAD OF TRIGGER), let’s review a couple of features that are available in all T-SQL TRIGGERS, specifically the virtual tables.

  • INSERTED – The INSERTED virtual table contains one row for each row to be inserted into the table at the time the TRIGGER is fired. In the case of an INSTEAD OF TRIGGER, these rows have not yet been inserted. On a DELETE event, the INSERTED virtual table contains no rows, but on INSERT and UPDATE events it will always have at least one row.
  • DELETED – The DELETED virtual table contains one row for each row to be deleted from the table at the time the TRIGGER is fired. In the case of an INSTEAD OF TRIGGER, these rows have not yet been deleted. On an INSERT event, the DELETED virtual table contains no rows, but on DELETE and UPDATE events it will always have at least one row.

It is also important to note that on an UPDATE event, both INSERTED and DELETED tables contain precisely the same number of rows. Both tables have all of the columns that correspond to the underlying table on which the TRIGGER is defined.

Suppose that we’re inside of a TRIGGER, and we run the following query:

It should be reasonably obvious, given what we’ve said about the INSERTED and DELETED virtual tables above, that this query will produce a results set that includes all rows fromProductPrices for any ProductIDthat is contained in either of the INSERTED or DELETED virtual tables. We won’t show that set, and if you’re unclear on this you may want to construct your own TRIGGER and put this SELECT statement in it to see the results.

We’ve called our Common Table Expression (CTE) TargetPrices for a reason. This will be the target of the actual maintenance work we’re going to perform. However we will impose one additional requirement on our TRIGGER, and that is that it should only “touch” (maintain) rows that must be maintained based on the underlying operation. And that’s usually going to mean not all of the rows in TargetPricesshould be touched. We’ll see how this can be done in a moment.

Let’s now look at another interesting query, which will probably require some examples to understand it. Again, we must assume we are operating inside of our TRIGGER for INSERTED and DELETED to be defined, but we can simulate that case using CTEs. This also happens to be a useful way to simulate code executing in a TRIGGER that uses the INSERTED/DELETED tables, in case you’ve ever struggled with that before.

The INSERTED CTE contains one row that we wish to insert and the DELETED CTE contains no rows, simulating what might happen on a single row insert within the TRIGGER. The results set produced is this. If there were rows in the DELETED table, those corresponding rows would be removed from the resulting set by the EXCEPT.

Notice how the new effective start date we want to insert (2013-06-10) appears in the results set.

Now let’s consider some other cases. The first simulates updating a price for an existing row (we’ll show only the INSERTED and DELETED CTEs for brevity).

Both INSERTED and DELETED include the effective start date of 2013-06-17 for ProductID=1, with DELETED showing the old price and INSERTED showing the new price (the same way the TRIGGER would populate the virtual tables). These results appear as follows (using the full query above).

Note how the target row (effective start date = 2013-06-17) shows only the new product price.

The next case shows the same two CTEs if we were attempting to update the 2013-06-17 record’s effective start date, without wanting to concern ourselves with the effective end date.

The full query would produce these results, again eliminating the effective start date row for 2013-06-17.

These results are the final rows we’d expect.

One more, to illustrate the case of a DELETE.

These results are once again the three rows we’d like to be left with, excluding of course the requisite, contiguous effective end dates.

Now we will modify our final query slightly to recalculate (adjust) what the effective end date should be for one of these cases (we’ll choose the third case where we updated the effective start date for 2013-06-17), using the magic of the SQL 2012 LEAD analytical function.

This produces a result set that has the contiguous end date saved in the NextEffectiveStartDT column:

Note that for the time being, the last row doesn’t have a NULL effective end date, but we’ll address that later.

By now you may have deduced what we are about to propose, which is a single INSTEAD OF TRIGGER that fires on INSERT, UPDATE and DELETE, which does its work using a MERGE statement.

The TRIGGER to Self-maintain Contiguous, Effective Dates

Without any preamble, we’ll list out our TRIGGER now.

Hopefully, the comments and the previous section’s examples of LEAD should explain the logic sufficiently to get a basic understanding of what this TRIGGER is doing. However we’ll provide some additional explanation because the MERGE itself is not particularly straightforward:

  • The results from the TargetPrices and SourcePrices CTEs were described above.
  • NULLIF appears in a couple of places. These are used to convert our invalid dates (1900-01-01) to NULL when it is appropriate to do so.
  • Our matching criteria (ProductID and EffectiveStartDT) is always the PRIMARY KEY for the underlying table.
  • The WHEN MATCHED clause identifies cases where we have a matching Source=Target row but something has changed – either EffectiveEndDTis different than the calculated, adjusted end date (=NextEffectiveStartDT), or the ProductPrice (or any of the other time sensitive attributes were they present on this row). This is one way that we’re narrowing the actual rows affected by our TRIGGER, to only those we really need to touch.
  • The WHEN NOT MATCHED (BY TARGET) clause identifies cases where an insert is required. This is done directly from the source, using NextEffectiveStartDT (our adjusted, calculated effective end date) as the effective end date for the inserted row.
  • The WHEN NOT MATCHED BY SOURCE clause identifies rows in our target to be deleted, but only when those PRIMARY KEYs exist in the DELETED virtual table.

You may also be wondering how this is going to work, for those cases above that we said couldn’t be done for the respective INSERT, UPDATE and DELETE cases. The reason it will work is that within the MERGE statement, all necessary adjustments are made to the effective end dates prior to the process writing anything to the table, and SQL Server validating the CONSTRAINTs. Maybe it is simply best to see it in action.

Testing our Self-maintaining Effective Dates TRIGGER

Let’s start with some cases of INSERTs. In the first, we’ll truncate our table and restore the original target row set without having to specify the effective end dates for any of the inserted rows (the TRIGGER does this for us).

The results you should see, are the same as the original results we told you to keep in mind above, but they’re shown again here to keep them fresh in your mind.

Let’s look at some additional cases of INSERT.

The results below highlight the rows that were inserted (in yellow) and those whose effective end date were adjusted (in green).

If we count up the yellow and green highlighted rows, we get seven, and that interestingly corresponds to precisely the number of rows reported in the SQL Server Management Studio (SSMS) Messages pane as having been touched by our TRIGGER:

Now let’s try some updates, doing so by using a CTE to consolidate the information to update.

An important detail of this UPDATE is the use of ISNULL (or COALESCE for the purist). You should do this to ensure (for example) that moving a row without specifying theProductPrice, retains the original product price.

We’ll see this from these results, where changes are once again highlighted (yellow for moved or updated rows, and green for adjusted effective end dates):

In this case, if we specified the ProductPrice as NULL, it is an indicator that we didn’t want to change it. In this case, the SSMS Messages pane reports this action for the TRIGGER:

This is three more than the number of highlighted rows, because in the cases where a row is moved (effective start date is changed), it involves a delete and an insert.

Let’s now try some deletes, once again consolidated into a CTE.

The final result set in this case looks like this, where I can’t exactly highlight the rows that have been deleted (four of those) but I can highlight in green the ones where the effective end date was adjusted.

Here the SSMS Messages Pane reports:

Which makes sense considering the deleted rows (4) plus the highlighted rows (2).

Now none of this is worth a hoot if our TRIGGER doesn’t work for the case of a MERGE, where we want to insert, update and delete all at the same time. So let’s give that a try by building a transaction table in our leading CTE.

Let’s check to see if the following results are what we expect./p>

For ProductID=1, we inserted two rows which are highlighted in yellow (second and fifth). This caused updates (self-maintenance) of the two preceding rows’ effective end dates (highlighted in green). For ProductID=2 we deleted a row (2013-08-09), which we obviously can’t highlight in yellow, but we can highlight in green the row where the effective end date was adjusted (2013-05-01) as a result. Note that we also deleted the single row for ProductID=3. In this case there were no other rows where the end date needed adjustment. Finally, for ProductID=2 we also moved a row (2013-08-10) to a new effective start date (2013-04-01) and this is highlighted in yellow.

In case you’ve never worked with TRIGGERs in the case of a MERGE statement, they fire depending on what clauses are fired in the MERGE statement (we are talking here about our latest example, and not the MERGE statement in the TRIGGER). Since our CTE contained transactions that caused each of the three clauses to fire (an UPDATE, an INSERT and a DELETE) we expect that our TRIGGER was fired three times. The results in the SSMS Messages pane confirms this.

We’ll let the interested reader confirm how the row counts for each execution of the TRIGGER came about (hint: the order is INSERT, UPDATE and finally DELETE). To dig a little deeper into the workings of the TRIGGER, you may want to introduce a couple of SELECTs at the beginning of the TRIGGER and then run the MERGE above again to see what’s going on.

Some Caveats for this Approach

In a table such asProductPrices, there may be more than one time-sensitive column, although a relational model purist would probably argue this shouldn’t be. In the TRIGGER, you’ll note that in several places there is a comment that indicates “time-sensitive attributes.” In each case, this is a reminder that should you ALTER the table to include more time-sensitive attributes, those must be handled the same way that ProductPrice is handled in the various points within that TRIGGER. In other words, you must change the TRIGGER to include the additional columns.

If you are likely to use a system like this in a team, or if you aren’t blessed with a perfect memory, it is possible to set a reminder for anyone that modifies (ALTERs) this table. This can be done through a DATABASE TRIGGER, such as the one that follows.

So now, when anyone comes along and ALTERs this table, a warning message is displayed.

Yes you can ROLLBACK an ALTER of a TABLE!

The other caveat to this approach, which also affects Alex’s original (stored procedure) approach, has to do with issues reported about using MERGE. See “Use Caution with SQL Server’s MERGE Statement” by 18 times nominated (as of this writing) SQL MVP and avid blogger Aaron Bertrand. I cannot honestly say how many of these issues have been addressed in SQL 2012, but many of them may not be pertinent to your case.

Conclusions, Comparisons and Final Words

It is definitely possible to use constraints to enforce the essential business rules of a temporal table. The necessary updates to the data can be done by using SQL Server’s MERGE statement, but in several types of application design, this complexity is best hidden from the application.

This article shows how this complexity can be hidden so that even where the application has full direct access to the table, simple CRUD operations can be used, but at the same time the table is still protected from bad data.

I don’t believe I’ve built a better mousetrap here. I have, rather, offered an alternative mousetrap, which may be applicable and/or effective in some of your use cases. Here is a summary of the advantages and disadvantages of the two approaches, as I see it.

Category

Original Approach

Alternate Approach

Remark

Required SQL Server Version

2008

2012

Original approach is limited by the MERGE, although Alex does offer an alternative that works in SQL 2005.

Additional Storage

One Column for Previous End Date

None

Implementation

Stored Procedure

TRIGGER

It might be possible to implement the original approach in a TRIGGER.

Specifying Values on DML operations

Must specify Effective End Date and Previous Effective End Date

Need not specify Effective End Date

The alternate approach allows the user to issue normal Data Manipulation Language (DML) INSERTs, UPDATEs, DELETEs and MERGEs to modify data in the table without specifying the effective end date.

It may be possible to implement this same effect in the original approach if done with a MERGE in a TRIGGER, as done in the alternate approach.

CONSTRAINTs

All data integrity checks are managed through CONSTRAINTs

All data integrity checks are managed through CONSTRAINTs

With the alternative approach, you don’t need to specify any adjusted effective end dates when you perform normal DML actions. No additional storage is taken up by the additional column for the previous end date that Alex proposed.

It would also be possible to replace the LEAD analytical function in the TRIGGER with an OUTER APPLY/SELECT TOP 1/ORDER BY EffectiveStartDT construct (which effectively simulates a LEAD), making for a fully SQL 2008 compatible solution. However we will refrain from showing that, leaving it to our interested readers, because it is likely its performance would not be as good as using LEAD.

One point not described in the comparison above is how relatively easy it would be to implement this on top of your existing temporal tables. Simply create the constraints and the trigger on top of your table having effective start/end dates, and assuming there are no constraint failures (data anomalies) when you create them, your basic CRUD operations should pretty much work without code changes.

Thanks for listening folks! I hope you find this approach useful and tell me about your success stories.

Tags: ,

  • 16206 views

  • Rate
    [Total: 0    Average: 0/5]
  • Celko

    Minor things
    1) We use COALESCE() and not ISNULL() now.
    2) We have a DATE data type, which starts with ‘0001-01-01’; why use the old Sybase ‘1900-01-01" stuff or casting numeric to get a fake timestamp.
    3) I prefer to use non-overlapping times. That means I can use a BETWEEN predicate and I have (date_1, date_1) pair for a one-day sale.

  • Dwain.C

    Re: Minor Things
    Hi Joe – thanks for stopping by!

    Let me address your points one at a time because I think each represents a decision rather than an issue.

    1. Using ISNULL is a choice, just as is using COALESCE. I even noted in the article that SQL purists would use COALESCE, and interestingly enough I was thinking of you when I said that.

    2. Yes there is a DATE data type (among others) but I chose to use DATETIME for two reasons: 1) I think it is more commonly used (right or wrong) in cases like this and 2) it adds the flexibility to end a period at noon (or any time of day) and start up the same way

    3. You may prefer the BETWEEN predicate but I believe that there are those that would argue that it is not appropriate for DATETIME, and most specifically when the date is precisely contiguous as I’ve set mine up as. You could go with a DATE data type where the effective end date is the day before to use BETWEEN (otherwise it would overlap on the boundary points) but then you’d have some messiness when it came to checking for a DATETIME on the boundary where you’d need to truncate it to the calendar day.

    Again, we make are choices and it is my belief that if they are reasonably defensible then others can make their own.

  • Phil Factor

    RE: Minor things
    I think it is all a matter of thinking of the risks. If you are likely to be putting in prices that go back into the nineteenth century then you’d have a serious problem. Also, if you happen to be asked for the value of stock on the stroke of midnight, just as a price changes, then you might have duplicates. However, if you are prepared to take those risks, you will gain some pretty tight constraints that would be hard to circumvent. Now that would make lots of other risks a lot less likely.

  • Celko

    prices that go back into the nineteenth century
    Many decades ago, I was a PL/I contract programmer at Coca Cola HQ in Atlanta. Someone had found the original paperwork from the first deliveries made in the city in the 1890’s. It was horse drawn wagons (not as impressive as Budweiser’s) delivering firkins of cocaine laced syrup. This was [i]really[/i] original formula coke :).

    The penmanship was beautiful, so transcriptions were easy. Converting obscure units of measurements for whiskey and beer casks and barrels, filling in missing data and finding duplicates were a good exercises. We were putting this valuable data into what would be a data warehouse in later years!

    Hey, I was paid by the hour. I do not care if this made sense or not.
    🙂

  • thehitman

    Allen Intervals Algebra and Bitemporal data (linkedIn)
    Excellent article, isn’t it fundamentally based upon Allen intervals?

    In 1983 James F. Allen published a paper in which he proposed thirteen basic relations between time intervals that are distinct, exhaustive, and qualitative.
    • distinct because no pair of definite intervals can be related by more than one of the relationships
    • exhaustive because any pair of definite intervals are described by one of the relations
    • qualitative (rather than quantitative) because no numeric time spans are considered

    These relations and the operations on them form Allen’s interval algebra (http://www.ics.uci.edu/~alspaugh/cls/shr/allen.html)

    Also, Dr.Tom Johnston has published two books on "Managing time in relational databases". He discusses the difference between uni-temporal versus bi-temporal (a definition below:)

    Data in uni-temporal model are bounded by validity, whereas bi-temporal model contains validity, but also transaction time to provide opportunities for storing updates of incorrect states.

  • Dwain.C

    Allen Intervals
    Hitman asked if this is "fundamentally based upon Allen intervals?"

    Since I’m not familiar with Allen intervals, I can’t say that it is. I can say that it is a common, practical problem that systems designers have been dealing with for a very long time..

    My original thesis (rejected by Simple Talk) was how to identify data anomalies for correction, in cases where such tight constraints are not in place. I don’t know about you, but it is comforting to me to know that there ways to instead avoid that particular problem.

  • Dwain.C

    An Addendum
    I’ve received some editorial critique from to-remain-unnamed quarters on the content of this article that I’d like to share, because someone’s bound to come along later and say the same things anyway.

    Let me address them as an addendum and invite others to comment as well.

    1. History tables should not be changed. Once history is in place (the past), it should remain so forever enshrined.

    Well, I exaggerate a little but you see the point. And to a certain extent I agree with this. However, users often make keying errors and can easily put in wrong prices effective on wrong dates. So a correction mechanism is needed. The trigger provided gives you that. I believe that it is an authorization requirement of the front-end to ensure that only users authorized to make corrections (updates/deletes) can do so, assuming that you cannot formulate other business rules that would constrain such updates.

    2. In another recent article of mine (http://www.sqlservercentral.com/articles/T-SQL/123283/), a good friend of mine pointed out my statement: "A trigger should always be minimalistic in the sense that it should only do the work it needs to do and no more. Overloading triggers with all kinds of stuff that can be done elsewhere can cause serious performance problems"

    He was alluding to the complexity of the trigger I came up with to handle this instance, to which I agree that yes it does seem rather complex. If anyone can come up with a simpler/more efficient one that does the same job, in the immortal words of H. Ross Perot: "I’m all ears." The thing is, with the tight constraints imposed by the proposed data model, maintaining the data simply is not simple. I have done my best to try to make the trigger operate as efficiently as possible, although I have not tested and confirmed precisely how efficient it is.

    Perhaps that will be the subject of future ramblings.

  • thehitman

    Link for Allen Intervals
    (http://www.ics.uci.edu/~alspaugh/cls/shr/allen.html)

    Also, you did a great job in your article as is.

  • AlexK

    Very nice!
    Dwain,

    This is a very nice simplification for the special case when there can be no gaps ever. Of course, it does not handle more general cases, such as "Who is on call" table: as in Alex is on call from Mon 7 AM – Fri 7 PM, Courtney is on call Sat 1 PM onwards, and there is a gap Fri 7 PM – Sat 1 PM.

    I concur that we do not want to use MERGE if we can. These days I am using PostgreSql, which has deferrable constraints, so that I begin a transaction, issue a simple UPDATE, then a simple INSERT, then COMMIT – and the constrainsts validate a COMMIT time, which is simple and robust.

  • AlexK

    This is not based on Allen Intervals at all
    thehitman,

    This solution is not based on Allen Intervals at all – the article you are referring is purely theoretical. It has nothing to do with the practical solution which Dwain describes. Reading that theoretical article does not in any way help us solve practical problems. We can safley skip reading that article and be just as productive solving real life problems.

    In fact, this solution is based on works by Donald Knuth, the linked list to be precise.

  • Dwain.C

    Is that AlexK as in K is for Kuznetsov?
    Thanks for taking a look and offering your opinion sir, as this work is merely building on yours prior. I like your comparison to a linked list. So subtly true I chose not to mention it.

    Yes it is for the (relatively common I think) special case where there can be no gaps or overlaps. It could probably be adapted to cases where a fixed gap (or overlap) is required though.

    I’ve read about deferred commit in Oracle, and I agree that would really be the ticket here to avoid the MERGE. Too bad (so sad) that we don’t have that in T-SQL. The CONSTRAINTs could be briefly relaxed to avoid the MERGE, but that’s such a crude kludge.

  • Phil Factor

    pp_fk1
    The foreign key constraint, which is rather mysteriously named ‘pp_fk1’ (I hope you gave your cat a less geeky name) might benefit from an ON UPDATE CASCADE which would mean that once a link has been made, you can then alter any start date, and the end-date of the preceding link would magically change. This would mean you’d need no magic code for this sort of modification!

  • thehitman

    Less filling tastes great!
    In the product table, establishes when the product price was factual (EffectiveStartDT<= @somedate and @somedate < EffectiveEndDT). This is the "meets" Allen interval where a ends and b starts. I prefer to use 12/31/9999 for the open ended date instead of a null. It reduces the predicate for testing for null.

    In either case, setting null or 12/31/99 as the EffectiveEndDT determines the current product price.

    Dr.Tom Johnston has published two books on "Managing time in relational databases" with productive examples leveraging Allen intervals.

    Finally, I respect your opinion. Although, I do not concur with it and it is not worth continuing any additional threads on this topic from my perspective.

    So you are correct that in fact, this solution is based on works by Donald Knuth, the linked list to be precise.

  • thehitman

    Excellent article
    Dwain thanks for sharing this.

  • AlexK

    Read the book, not quite relevant
    I’ve read some of the book entitled ""Managing time in relational databases". The authors describe how they validate integrity in the application, then save it in RDBMS:

    "After doing edit checks to insure that each element of the transaction, including its data values, is well formed, the AVF does validity checks on the transaction as a whole. Only if a transaction passes both edit and validity checks will the AVF map it into one or more physical SQL transactions, submit those transactions to the DBMS, and monitor the results to insure that either all of them or none of them update the database, i.e. that they make up a semantically complete atomic unit…"

    This article describes how to implement data integrity in RDBMS, which is usually preferable, cheaper and needing less maintenance, to doing it in the application layer.

    Further quoting from the book:

    "Past experience has shown us that doing our own application-developed bi-temporal data maintenance, using standard SQL, is resource-intensive and error-prone. It is a job for a company’s most experienced DBAs, and even they will have a difficult time with it. Having an enterprise standard framework like the AVF to carry out these operations significantly reduces the work involved in maintaining temporal data, and will eliminate the errors that would otherwise inevitably happen as temporal data is maintained."

    In fact, many practical scenarios are very simple, and these few constraints describes in Dwain’s article will completely prevent errors, without much effort. I think the problem is not as complex as the book’s authors say it is.

  • AlexK

    A dozen or more transactions per change?
    I found another fascinating statement in this book. It may take multiple physical transactions to implement one change:

    "many temporal update transactions, as we will see, and many temporal delete cascade transactions too, can require a dozen or more physical transactions to complete"

    I’d say it is doable, and clearly safer, to complete one change in one transaction.

  • Dwain.C

    To Phil Factor on pp_fk1
    I’m assuming you mean to construct the FKs like this?

    CREATE TABLE dbo.ProductPrices2
    (
    ProductID INT NOT NULL
    ,EffectiveStartDT DATETIME NOT NULL
    ,EffectiveEndDT DATETIME NULL
    — Time dependent attributes
    ,ProductPrice MONEY NOT NULL
    — In PK, EffectiveStartDT should occur only once for each product
    ,CONSTRAINT pp2_pk PRIMARY KEY (ProductID, EffectiveStartDT)

    — EffectiveEndDT should always be greater than EffectiveStartDT (unless NULL)
    ,CONSTRAINT pp2_ck1 CHECK (EffectiveEndDT > EffectiveStartDT)

    — ProductPrice should always be greater than zero
    ,CONSTRAINT pp2_ck2 CHECK (ProductPrice > 0)

    — FK1: Each EffectiveEndDT should be linked to an EffectiveStartDT
    ,CONSTRAINT pp2_fk1 FOREIGN KEY (ProductID, EffectiveEndDT)
    REFERENCES dbo.ProductPrices2(ProductID, EffectiveStartDT)
    ON UPDATE CASCADE

    — FK2: Products/Prices must reflect existing Products
    ,CONSTRAINT pp2_fk2 FOREIGN KEY (ProductID)
    REFERENCES dbo.Products(ProductID)

    — EffectiveEndDT must be unique for each row of a product
    ,CONSTRAINT pp2_u1 UNIQUE (ProductID, EffectiveEndDT)
    );

    Try running that and you get this::
    Introducing FOREIGN KEY constraint ‘pp2_fk1’ on table ‘ProductPrices2’ may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints.

    Or feel free to correct me with some code that does run.

    BOL seems to suggest that this is not possible:
    ON UPDATE CASCADE
    Specifies that if an attempt is made to update a key value in a row, where the key value is referenced by foreign keys in existing rows in other tables, all the values that make up the foreign key are also updated to the new value specified for the key.
    https://technet.microsoft.com/en-us/library/ms186973(v=sql.105).aspx

    Where it states "in existing rows in other tables."

    Andrew (editor at Simple Talk) also didn’t like my "geeky" FK naming conventions. My statement to him was that when an FK violation occurs, I’d rather have the developer have to look at the code that caused the violation to understand what’s going on, rather than trying to lead him to that via the naming. I’d say to each his own.

  • Phil Factor

    Re:pp_fk1
    Oops. I forgot that no cascaded constraints are permitted on a self-referential foreign key. SQL Server assumes that it is going to go in an infinite loop otherwise.

  • Dwain.C

    To Phil F on pp_fk1
    Phil,

    I appreciate you getting back with your clarification. You had me worried there. Cascade updates was one of the things I checked out before writing this article and I was afraid I might have missed something.

    Dwain

  • dwalker07

    There’s a FAR easier way
    It would be much, much easier to create the EffectiveEndDT as a computed column using the ProductID and EffectiveStartDT.

    No triggers, no complex constraints. Why isn’t a computed column better?

  • Phil Factor

    Re: There’s a FAR easier way
    An interesting thought, but how would you do it? I’m struggling to see how.

  • Celko

    Computing end dates is not easy
    Computing end dates is not easy; what if it falls on a weekend or holiday? Then you have to do exception handling; but a calendar table can be built in an afternoon.

  • dwalker07

    Thed code is in your trigger
    If you can use Lead in a computed column, then use the same code from your trigger. Something like this:

    Alter Table dbo.Products Add EffectiveEndDt As (LEAD(EffectiveStartDT, 1, 0) OVER PARTITION BY ProductID ORDER BY EffectiveStartDT)

  • dwalker07

    Hmmmm.. Have to use a UDF
    Well, you can’t put a windowed function anywhere other than in a Select statement.

    I believe the column could be computed as the result of a scalar user-defined function (and its value persisted), and the user-defined function could have what I put before.

    Whether that works better, or not, is up for consideration. It might be simpler than the triggers.

  • dwalker07

    Windowed functions
    What I don’t like is the message "Windowed functions can only appear in the Select or Order By clauses".

    Lots of people would like this restriction relaxed; I occasionally would like to use windowed functions in a Where clause, for example.

  • Dwain.C

    Effective End Date in Computed Column
    I’ll give you my take here without testing anything or writing any code.

    1. I do not believe you’ll be able to use LEAD in a computed column because as noted it can only be used in SELECT statements.
    2. You probably can use LEAD in a scalar-valued, user-defined function and put that into a computed column.
    3. I doubt you can make that column persisted because SQL will probably tell you the computed result is non-deterministic.
    4. So what you end up with is something that is probably going to be less efficient to query on than the structure/method I showed here: http://www.sqlservercentral.com/articles/T-SQL/106783/

    The reason I think it would be less efficient is because of the sUDF.

    So rather than do what you suggested, I’d probably just not include the Effective End date on the row at all and simply use LEAD as shown in that article.

    But that isn’t a CONSTRAINT-based solution to this problem. Doing it the way I suggest in this article will make your retrieval queries much more efficient.

  • Dwain.C

    Effective End Date in Computed Column
    Note that, in the above, all I was offering was my opinion. And you are of course entitled to disagree with it.

    But I’d suggest that if you do, that you dig the code out of the article on SSC whose link I provided. Everything you need is there to prove that your way works and is faster (or not).

    Post up the results and let us all know please. I for one would be interested.

  • dwalker07

    DWalker07
    I am not saying that I think the UDF is a better solution overall; it may not perform very well.

    My solution would have been far simpler IF "Lead" and similar windowing functions were allowed in places other than the Select statement. But they are not. And I really, really wish that would get fixed.

    It seems like the triggers are awfully complex, with the discussion of cascading triggers, and AlexK’s comments (dozens of transactions?). I try to stay away from triggers… In my experience and opinion, which may not apply to everyone or every situation, I find that triggers often don’t perform well and they are hard to maintain.

    I will leave it up to others, if they want to, to see if a UDF performs acceptably or not. You’re probably right that SQL won’t allow the column to be persisted.

    Let’s just say that using a UDF with a function is a possible alternative way to do what this article says. A computed column that calls a UDF would be, at least, self-maintaining and easy to understand.

  • Dwain.C

    Effective End Date in Computed Column
    Let me first say what I should have said the first time but didn’t. I thought your idea to use a computed column to calculate effective end date was a pretty clever one. I can admit to not having thought of that myself.

    I do agree that being able to use LEAD (or more especially ROW_NUMBER) in other places besides SELECT, like WHERE could be really cool. I don’t know what ANSI/ISO standards say about this, and I have no insights into Microsoft’s reasoning for not supporting it. Perhaps it is because it might produce some ambiguity in the filtering.

    I like to think TRIGGERs are complex only until you get to know them. I heard somewhere "Everything is hard until it is easy" (Albert Einstein quote perhaps). Yes there would be maintenance in a TRIGGER like the one I proposed. Mainly if you added columns to the table. But in this case, would you? Think about it. How many time-dependent attributes of a product are you going to store in a Prices table?

    In the end, the choice of approach is up to you. Some are better for certain things, and others for others. In T-SQL there is rarely the one approach that fits all.

  • dwalker07

    Thanks
    I agree with everything you said there. Yes, everything is complex until you understand it.

    I do, however, think that triggers are often an unnecessary complexity that — IF the job can be done without them — are best avoided, for maintenance and performance reasons. They are just another thing to maintain. But they have their place. I stay away from triggers and cursors both. 🙂

  • Celko

    Quotes and TRIIGERs
    "the problem, once it is solved, will be easy" Max Planck

    The fundamental problem with TRIGGERs is that in general they cannot be optimized. The reason we added ON [DELETE | UPDATE] CASCADE is that these DRI actions were 80-90% of the work done in TRIGGERs in the old days

  • Dwain.C

    TRIGGERs
    I’ll agree with the bits that TRIGGERs, if used improperly can cause performance issues, and that TRIGGERs (such as Joe’s CASCADE DELETE example) can also implement logic that is inappropriate.

    And I’ll even confess to having done so. For example, I’ve had a case where we had a form that wasn’t updating some de-normalized data (forgive me Joe) to its child tables properly, but the devil of it was we simply could not find the issue. Enter a TRIGGER to ensure the update gets done.

    On the other , there are other appropriate things that can be done in TRIGGERs, where using a TRIGGER immensely helps simplify getting the job done.

  • Dwain.C

    TRIGGERs vs. CURSORs
    BTW. I certainly would not mention TRIGGERs and CURSORs in the same breath, particularly if performance appears in the same sentence.

  • dwalker07

    You could use a view
    Ah… the "alternative solution" I posed above, where the table has a computed column which uses:

    Alter Table dbo.Products Add EffectiveEndDt As (LEAD(EffectiveStartDT, 1, 0) OVER PARTITION BY ProductID ORDER BY EffectiveStartDT)

    … and which won’t work, because a computed column can’t have windowing functions like LEAD, could be solved by:

    Creating a view which includes the table plus EffectiveEndDt.

    The rest of the columns (other than EffectiveEndDt) in the view should be updateable.

    If you don’t want to change code that refers to the table, just rename the table, create a view with the former name of the table, and have the view refer to the new table name.

    THAT approach will be self-maintaining, it should (I predict) perform well, and it doesn’t require triggers.

  • Dwain.C

    RE: You could use a view
    Yep, you could. Something like this is what I assume you mean:

    CREATE VIEW v_ProductPrices_wEffectiveEndDT
    WITH SCHEMABINDING
    AS
    SELECT ProductID, EffectiveStartDT
    ,EffectiveEndDT=LEAD(EffectiveStartDT, 1, NULL) OVER
    (PARTITION BY ProductID ORDER BY EffectiveStartDT)
    ,ProductPrice
    FROM dbo.ProductPrices;

    Now suppose you had a query like this:

    SELECT *
    FROM dbo.v_ProductPrices_wEffectiveEndDT
    WHERE EffectiveStartDT BETWEEN ‘2010-03-02 00:00:00.000’ AND ‘2010-06-11 00:00:00.000’;

    This will do an index scan on 1M rows (assuming that’s how many rows are in your table). Whereas the same query against the original table containing EffectiveEndDT, like this:

    SELECT ProductID, EffectiveStartDT
    ,EffectiveEndDT
    ,ProductPrice
    FROM dbo.ProductPrices
    WHERE EffectiveStartDT BETWEEN ‘2010-03-02 00:00:00.000’ AND ‘2010-06-11 00:00:00.000’;

    Only does an INDEX SCAN on about 100,000 rows (depends on the number of ProductIDs).

    And on the table containing the EffectiveEndDT, you can create an INDEX to help out that query:

    CREATE NONCLUSTERED INDEX pp_ix1
    ON dbo.ProductPrices(EffectiveStartDT) INCLUDE (ProductID, EffectiveEndDT, ProductPrice);

    That gets you a nice INDEX SEEK.

    Try Indexing your VIEW, for example:
    CREATE UNIQUE CLUSTERED INDEX vpp_ix1
    ON dbo.v_ProductPrices_wEffectiveEndDT(ProductID, EffectiveStartDT);

    And this is what you’ll get:
    Cannot create index on view "SANDBOX-2012.dbo.v_ProductPrices_wEffectiveEndDT" because it contains a ranking or aggregate window function. Remove the function from the view definition or, alternatively, do not index the view.

    Not saying it won’t work. Just saying that having the EffectiveEndDT available on a particular row of interest is going to generally perform better.

  • dwalker07

    Using a view
    Yep, the performance of a view that uses aggregate window functions will probably not be as good as a column in the table.

    Like everything else, there’s a tradeoff between the complexity of the triggers (which I probably think are more complex than you think they are) 🙂 … and the performance. If the table is only queried occasionally, a view might be fine. It’s another alternative.

    Still, I want SQL to allow windowing functions in computed column definitions!

    I appreciate the back and forth on this. Thanks.

  • MagK

    A possible SQL 2008 version
    Very interesting article. Thanks for taking the time to post it.

    How would one define the trigger without the LEAD? (prior 2012)
    You mention "OUTER APPLY/SELECT TOP 1/ORDER BY EffectiveStartDT construct". Have tried to make it work, but can’t seem to do it right..
    Any chance the author or some other other wiz have a suggestion on how to do that correctly?

  • will

    It looks like there’s an issue with bulk updates and this trigger when the update statement doesn’t actually change the data. For example:

    update ProductPrices set ProductPrice = ProductPrice

    will cause the “not matched on source” clause to be invoked due to the “except” clause in the source, which ends up deleting all of the data in the table…