Click here to monitor SSC
  • Av rating:
  • Total votes: 44
  • Total comments: 7
Joe Celko

Temporal Data Techniques in SQL

18 January 2009

In the first part of this series on Temporal Data, Joe explained how it is that the Common Era calendar is irregular and mentioned that, although there are ANSI/ISO Standards for temporal operations in SQL, every vendor has something different.  Now, he discusses other factors to take into account when using temporal data such as Holidays, and discusses a few techniques using Calendar, Report Usage and History tables

Holidays

Nobody agrees on holidays.  There are the obvious international differences – the Fourth of July doesn’t have the same appeal to the UK that it does to the US, and most Americans have no idea what Bank holidays are, and why the banks deserve them.  This can be a real problem in certain industries.  The ocean freight and maritime industries publish a list of all the non-working holidays for every seaport on Earth.  Air freight is not so bothered by this, but strong dock worker unions mean that if you show up in a port on the wrong day, you can pay extra docking fees. 

Holidays fall into two types. Some can be computed from a rule and some are simply declared by an authority.  The declared holidays are usually civic ones and they might change.  The computed ones that involve lunar and/or solar cycles are mostly religious.  You also have holidays that appear only once, such as a “National Day of Mourning” for the death of an important political figure. 

But even the familiar holidays are harder to compute than you think.  Every programmer is aware that Easter has a pretty elaborate formula.  But nobody asks “Which Easter?” we are to compute.  Orthodox and Catholic (which is used by Protestants) dates do not agree.  For example, in 2008 the Catholic Easter was on March 10 and the Greek Orthodox Easter on April 14; but by 2010 they will both be on March 22.  You can get a side by side comparison at Side-by-side Easter calendar reference for the 21st century

Some holidays are longer than others. The obvious example is the Weekends.  That usually means Saturdays and Sundays in Western countries.  In the Islamic world, it means Fridays.  And probably in my lifetime, we will probably see the four day work week in some countries.  Then there is Hanukkah and Eid ul-Fitr, which last for several days. 

In short, holidays are just too complicated and arbitrary to put in a function, VIEW or a query.  I wish that there was even an ISO Standard for just classifying types of holidays.  What this means is that you cannot help but use a Calendar table for holidays and you have to do most of the work for yourself.  We will get to that in another section. 

Leap Seconds and Trivia  

Be grateful that most commercial work is done at the level of days and perhaps time of day, and not to the accuracy of decimal fractions of seconds. That leads to the choice of atomic versus the astronomical clocks, if you have to co-ordinate events at the nanosecond level.  Scientific work has to cope with leap-seconds.  These are positive or negative corrections defined by the International Earth Rotation and Reference System Service to keep the atomic clocks within 0.9 seconds of the planet's actual rotation.  They announced on 2008 July 4 that the world's atomic clocks will be adding another second at the end of this year.  I hope this does not spoil your New Years Eve plans. 

Calendar Tables

The simplest Calendar is just a single column with the dates for a particular range.  Since it is all key, we know that It is as normalized as it can get.  Since there are 365.2422 days per year, a 100 year calendar is pretty small and would fit into main storage in most modern machines.  It is also of pretty limited use without other data that we want to look up. 

Since we already talked about holidays, the first impulse is to add a second column with a flag for “work day” or “holiday” to the table, something like this skeleton. 

CREATE TABLE Calendar

(cal_date DATE NOT NULL PRIMARY KEY,

 business_day_flg INTEGER DEFAULT 1 NOT NULL

  CHECK (business_day_flg IN (0,1)),

 ...);

The most common calculations you need are the number of business days between two dates or the nearest business day to a given single date.  You can do this with the integer flag and earlier SQLs with a skeleton like this:

SELECT SUM(C.business_day_flg) AS business_day_cnt

  FROM Calendar AS C, Foobar AS F

 WHERE C.cal_date BETWEEN F.foo_start_date AND F.foo_end_date

   AND ..;

This interferes with the ability to get other data out of Foobar without adding GROUP BY clauses.  Today you can use the windowed aggregates functions – that is the new ones with an OVER() clause. 

SELECT F.*, -- use real column names in production code

       SUM(C.business_day_flg)

       OVER (ORDER BY C.cal_date) AS business_day_cnt

  FROM Calendar AS C, Foobar AS F

 WHERE C.cal_date BETWEEN F.foo_start_date AND F.foo_end_date

   AND ..;

This is still not as good as we can make it.  Let me take time out to discuss the Julian date, which is often misused.  Julian date is technically a count of days used by astronomers for observations (http://en.wikipedia.org/wiki/Julian_date).  It is a large integer number and has some rules for computing it.  Only astronomers care about it, and the rest of us get it confused with a Julian-IZED date,

The Julianized date is a count of days from a base day, used in business.  You see it as the "day within year" number on the bottom of the pages of a desk calendar.  If your time frame is a calendar year, it will be between 1 and 365 or 366 days.  It is called the ordinal day format (“yyyy-ddd”) in ISO-8601.  But you can Julianize any time frame you wish such as the entire range covered by your Calendar table.  Instead of a flag, you simply repeat the Julianized number for holidays and use integer math to get the counts.  Here is a skeleton. 

CREATE TABLE Calendar

(cal_date DATE NOT NULL PRIMARY KEY,

 julian_business_nbr INTEGER NOT NULL,

 ...);

 

INSERT INTO Calendar VALUES ('2007-04-05', 42);

INSERT INTO Calendar VALUES ('2007-04-06', 43); -- Good Friday

INSERT INTO Calendar VALUES ('2007-04-07', 43);

INSERT INTO Calendar VALUES ('2007-04-08', 43); -- Easter Sunday

INSERT INTO Calendar VALUES ('2007-04-09', 44);

INSERT INTO Calendar VALUES ('2007-04-10', 45); -- Tuesday, back to work

To compute the business days from '2007-04-05', Thursday before Easter in 2007 to '2007-04-10', the next Tuesday:

SELECT (C2.julian_business_nbr - C1.julian_business_nbr -1) AS business_day_cnt

  FROM Calendar AS C1, Calendar AS C2

 WHERE C1.cal_date = '2007-04-05'

   AND C2.cal_date = '2007-04-10';

This saves having to scan all of the rows within the range to get a sum.  The -1 is to avoid counting the current day as one of the business days, but you might have a different rule in your company.  

Report Range Tables

A common application is have special reporting periods, perhaps based on the fiscal calendar or business defined events (“Annual Going out Of Business Sale!” was one of my favorites). 

CREATE TABLE ReportRanges

(report_name VARCHAR(30) NOT NULL PRIMARY KEY,

 report_start_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL,

 report_end_date DATE NOT NULL,

 CONSTRAINT date_ordering

  CHECK (report_start_date <= report_end_date),

 etc);

These reports can overlap; a fiscal quarter will be contained in the range of its fiscal year.  There can be gaps between them; we have to wait a year between each “Annual Going out Of Business Sale!” and there might be long stretches of time without any special sales.  But we want to know their ranges so that the table is fairly constant once it is created. 

The basic skeleton for use with these tables is

SELECT R.report_name, << summary computations >>

  FROM ReportRanges AS R, [Events] AS E

 WHERE E.event_date BETWEEN R.report_start_date AND report_end_date

   AND R.report_name IN (<<report name list>>)

 GROUP BY R.report_name;

The <<report name list>> would probably be events nested inside each other, like fiscal quarters inside a fiscal year.  While this is a useful and portable programming trick, you need to consider replacing it with the newer OLAP extensions to the GROUP BY clause such as ROLLUP and CUBE. 

History Tables

Another form of calendar table records the history of events.  You will see history tables used in Data Warehouses and queries dealing with things that have duration. If you go to the French Quarter in New Orleans, there is a lamp post with a historical plaque on it which gives the years of the Spanish domination, French domination, American domination, Confederate domination and finally, the second American domination.  The last entry has a blank space to be filled in later and empty room on the plaque beneath that line. 

Now look at the NULL-able end_date in the skeleton that follows.  It is useful because SQL does not have a token for “Eternity” to show that something is not yet finished like New Orleans.  Using a dummy date in the distant future for “Eternity” has portability problems.  And it can mess up temporal math if you don’t catch it in CASE expressions.  There should be, at most, one NULL for each item in this inventory history to model the current price.  It is also possible that we know a future ending data for an item – look at the expiration dates on food.  

To make this easier to see, let me use a history table for price changes.  So a basic history table looks like this skeleton:

CREATE TABLE PriceHistory

(ean CHAR(13) NOT NULL   -- industry standard barcode

  REFERENCES Inventory(ean),

 price_prev_date DATE NOT NULL,

 price_start_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL,

 price_end_date DATE, -- NULL means current price

 item_price DECIMAL (12,4) NOT NULL

  CHECK (item_price > 0.0000),

 etc.

  -- now we add constraints

 

--uniqueness

PRIMARY KEY (ean, price_start_date),

UNIQUE (ean, price_end_date),

UNIQUE (ean, prev_end_date),

 

--ordering of events

CHECK (price_start_date <= price_end_date),

-- Optionally, CHECK (price_start_date < price_end_date

CHECK (prev_end_date <= price_start_date),

-- Optionally, CHECK (price_start_date = price_prev_date + INTERVAL ‘1’ DAY), -- prevents gaps

 

-- self-referencing constraint

CONSTRAINT ean_prev_end_date

 FOREIGN KEY (ean, prev_end_date)

 REFERENCES PriceHistory (ean, price_end_date)

 );

The first three uniqueness constraints are fairly obvious.  The EAN and the start of a price have to be unique and not NULL-able so we have a natural primary key.  The ending dates, current and previous, might have NULLs, so we need to use a UNIQUE constraint. 

The next constraints give an ordering to each event, namely, price_prev_date is on or before price_start_date, which is on or before price_end_date.  The reason for not putting this into a single BETWEEN predicate is that each constraint will have a name in production code that will show up in error messages so we want to be exact. 

The self-referencing constraint is a trick from Alex Kuznetsov.  It says that all the previous price ending dates were really ending dates for some time period.  You will want to play with options to get them to fit your own business rules. 

It is also a good idea to have a VIEW with the current data:  

CREATE VIEW CurrentPrices (..)

AS

SELECT ..

  FROM PriceHistory

 WHERE price_end_date IS NULL

    OR price_end_date >= CURRENT_TIMESTAMP;

 

You use a BETWEEN predicate to get the appropriate price for a given order on a particular date. 

SELECT ..

  FROM PriceHistory AS H, Orders AS O

 WHERE O.sales_date BETWEEN H.price_start_date

           AND COALESCE (price_end_date, CURRENT_TIMESTAMP); -- or other known data as needed

 

You can also enforce the "one NULL per item" with a trigger.  If your SQL has CREATE ASSERTION or a full implementation of the CHECK() constraints in SQL-92, you can do it with declarative code.  But there is a sneaky trick that uses the WITH CHECK OPTION on a VIEW to fake it; here is the skeleton:

CREATE VIEW Prices (..)

AS (

SELECT ...

  FROM PriceHistory AS P1

 WHERE NOT EXISTS

      (SELECT *

         FROM PriceHistory AS P2

        GROUP BY ean

       HAVING COUNT(*) - COUNT(price_end_date) > 1)

    )

WITH CHECK OPTION

This forces you to update through the VIEW so that the WITH CHECK OPTION is always used. However, if you go directly to the base table, it can be circumvented.  Now your problem is to write a stored procedure that will update the table and insert a new row.  You can do this with a single MERGE statement, or with a short block of procedural code:

CREATE PROCEDURE UpdateItemPrice

@in_ean CHAR(13), @new_item_price DECIMAL (12,4)

AS

-- LANGUAGE SQL

BEGIN ATOMIC

UPDATE PriceHistory

   SET price_end_date = CURRENT_TIMESTAMP

 WHERE ean = @in_ean;

INSERT INTO PriceHistory (ean, price_prev_date, price_start_date, price_end_date, item_price)

VALUES (@in_ean, CURRENT_TIMESTAMP, DATEADD(DAY,1,CURRENT_TIMESTAMP), NULL, @new_item_price);

END

This will make the price change go into effect tomorrow.

In his next article, Joe gets distracted from temporal data by reading the Manga Guide to Databases, and decides, instead, to write about relational division.

Joe Celko

Author profile:

Joe Celko is one of the most widely read of all writers about SQL, and was the winner of the DBMS Magazine Reader's Choice Award four consecutive years. He is an independent consultant living in Austin, TX. He has taught SQL in the US, UK, the Nordic countries, South America and Africa.
He served 10 years on ANSI/ISO SQL Standards Committee and contributed to the SQL-89 and SQL-92 Standards.
He has written over 800 columns in the computer trade and academic press, mostly dealing with data and databases. He is the author of eight books on SQL for Morgan-Kaufmann, including the best selling SQL FOR SMARTIES.
Joe is a well-known figure on Newsgroups and Forums, and he is famous for his his dry wit. He is also interested in Science Fiction.

Search for other articles by Joe Celko

Rate this article:   Avg rating: from a total of 44 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: NULL end date ???
Posted by: Anonymous (not signed in)
Posted on: Tuesday, January 27, 2009 at 5:18 AM
Message: Using a NULL end date to denote a current price seems like a questionable design decision. I'd prefer a Current Price table + a Price History table. This would have performance benefits as well as allowing more robust constraints. I know this is only an example but people will follow what they read.

Subject: attribute splitting
Posted by: Joe Celko (not signed in)
Posted on: Tuesday, January 27, 2009 at 10:40 AM
Message: I would go for the single table and a current prices view. It keeps all of data in one table; I can improve performance by partitioning the table on the starting dates. I can maintain it all with one MERGE statement. But look at what you have to do with the Price History split over two tables on a temporal attribute (look up "attribute splitting" as a design flaw) .

Subject: Question on dates spanning years
Posted by: paschott (view profile)
Posted on: Wednesday, February 18, 2009 at 3:28 PM
Message: Joe,
You gave us this query:

SELECT (C2.julian_business_nbr - C1.julian_business_nbr -1) AS business_day_cnt
FROM Calendar AS C1, Calendar AS C2
WHERE C1.cal_date = '2007-04-05'
AND C2.cal_date = '2007-04-10';

However, what if our dates are something like '2007-12-25' and '2008-01-03' ? Wouldn't that return an invalid number in this case? I've run into this situation before and am curious how you'd handle it. Thanks.

Subject: Ordinal dates versus Jualinized dates versus Julian date
Posted by: Celko (view profile)
Posted on: Monday, August 10, 2009 at 2:52 PM
Message: No, the date-within-year or ordinal day(incorrectly called the Julian date) is within a single year (001-366), while a JULIAN-IZED date is a single sequence from a starting date that youpoicked. Astronomers have the real Julian date, which is a very large number based on the days since some event. See http://en.wikipedia.org/wiki/Julian_day

Set up your calendar table for a 100 years and you will be fine.

Subject: Syntax typo errors in History Table section...
Posted by: babap (view profile)
Posted on: Monday, January 18, 2010 at 2:43 PM
Message: Small typo...column in table definition reads...<price_prev_date>, but column refrence in constraints read...<prev_end_date>.

Good article, thanks.

Subject: FK prevents INSERT for the very first row....
Posted by: Toronto (view profile)
Posted on: Friday, July 23, 2010 at 3:27 PM
Message: I have cretaed table PriceHistory, as sugested, and then tried to INSERT the first row. However, FK prevented me:

INSERT INTO PriceHistory (ean, price_prev_date, price_start_date, price_end_date, item_price)
VALUES ('1234567890123','20100701','20100701',null,1.00)

Msg 547, Level 16, State 0, Line 1
The INSERT statement conflicted with the FOREIGN KEY SAME TABLE constraint "ean_prev_end_DAte". The conflict occurred in database "DMA", table "dbo.PriceHistory".

Of course, for the very first row there is no previous ending date. Am I missing something?

Subject: FK prevents INSERT for the very first row
Posted by: AntC (view profile)
Posted on: Thursday, June 23, 2011 at 2:29 AM
Message: Toronto

The problem you describe can be fixed by making price_prev_date (aka "prev_end_date") nullable, and then ensuring (e.g. via a trigger) that there is only a single row with NULL (the starting row)

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

SQL Server XML Questions You Were Too Shy To Ask
 Sometimes, XML seems a bewildering convention that offers solutions to problems that the average... Read more...

Continuous Delivery and the Database
 Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.