Lionel Clarke

Software Engineer - Red Gate Software

SQL Puzzle 9

Published Monday, January 07, 2008 5:56 PM

/*
Happy New Year everybody! To help with the January blues, I thought it was time for another puzzle. The inspiration for this one comes from a SQL question posed by my girlfriend. She is a geologist and, on a recent field trip to Iceland, was wandering around with a GPS strapped to her head to make 3D representations of the side of hills. At the same time she was taking pictures of rocks on her digital camera (we have the best holiday snaps!).  Her question was this: how could she match the GPS points to the photos so that she had a rough idea of where all the photos were taken.

So, on to the puzzle. We have a whole lot of points from our GPS and a whole lot of photos. Both the photos and the points have a time and date when they were taken. The puzzle is simple. I want to know which is the nearest point to each photo. Since we only have the time to go on, I want the point that is closest in terms time.

So, for example, if we have a photo taken at 13:01 and points taken at 12:55, 13.00 and 13:06 then I want the photo to be at the point taken at 13:00. Any way of doing this in SQL is fine but the simpler your solution the better. If anyone wants to show off and do this in LINQ as well then I would be interested in your answer (as somebody who likes to write stored procedures I am a bit wary of LINQ but I am very interested in learning about it). I had to make up some imaginary data and my results are:

PhotoName x           y           z
--------- ----------- ----------- -----------
Photo01   1           1           1
Photo02   1           1           1
Photo03   2           1           1
Photo04   4           1           1
Photo05   4           1           1
Photo06   5           1           1
Photo07   8           1           1
Photo08   15          1           1
Photo09   19          1           1
Photo10   20          1           1
Photo11   20          1           1
Photo12   20          1           1
Photo13   20          1           1
Photo14   21          1           1
Photo15   21          1           1


On a final note, I would like to ask a huge favour of anyone reading this. I have been writing a tool to fill a SQL Server database with test data and have recently released a beta version. I think it is a really cool tool and is something I wanted back when I was a developer DBA. If you need a quick and easy way of filling a table with some example data you will almost certainly find this handy! We could really do with some feedback so if you have a spare moment please check it out! You can download it at:

http://www.red-gate.com/messageboard/viewtopic.php?t=6140

Most importantly, if you have any comments then please post them in the forum:

http://www.red-gate.com/messageboard/viewforum.php?f=76

Thanks and have fun,

Lionel

SET NOCOUNT ON

DECLARE @Photos TABLE
(
    PhotoName NVARCHAR(7) PRIMARY KEY,
    TimeTaken DATETIME
)

INSERT INTO @Photos(PhotoName, TimeTaken) VALUES ('Photo01', '2008-01-02 15:01:10.12')
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES ('Photo02', '2008-01-07 10:21:06.13')
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES ('Photo03', '2008-01-07 11:15:11.42')
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES ('Photo04', '2008-01-07 12:38:53.653')
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES ('Photo05', '2008-01-07 12:38:53.657')
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES ('Photo06', '2008-01-07 13:25:51.12')
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES ('Photo07', '2008-01-07 14:35:00.00')
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES ('Photo08', '2008-01-07 15:04:19.55')
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES ('Photo09', '2008-01-07 16:01:01.17')
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES ('Photo10', '2008-01-07 16:10:32.92')
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES ('Photo11', '2008-01-07 16:10:32.92')
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES ('Photo12', '2008-01-07 16:10:32.92')
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES ('Photo13', '2008-01-07 16:33:18.35')
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES ('Photo14', '2008-10-15 16:15:52.45')
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES ('Photo15', '2010-10-07 17:53:25.26')

DECLARE @Locations TABLE
(
    LocationID INT IDENTITY(1,1) PRIMARY KEY,
    x INT,
    y INT,
    z INT,
    TimeTaken DATETIME
)

INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (1,1,1,  '2008-01-07 10:00:00.00')
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (2,1,1,  '2008-01-07 11:05:00.00')
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (3,1,1,  '2008-01-07 11:30:00.00')
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (4,1,1,  '2008-01-07 12:30:00.00')
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (5,1,1,  '2008-01-07 13:10:00.00')
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (6,1,1,  '2008-01-07 14:10:00.00')
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (7,1,1,  '2008-01-07 14:20:00.00')
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (8,1,1,  '2008-01-07 14:30:00.00')
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (9,1,1,  '2008-01-07 14:40:00.00')
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (10,1,1, '2008-01-07 14:50:00.00')
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (12,1,1, '2008-01-07 15:01:00.00')
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (13,1,1, '2008-01-07 15:02:00.00')
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (14,1,1, '2008-01-07 15:03:00.00')
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (15,1,1, '2008-01-07 15:04:00.00')
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (16,1,1, '2008-01-07 15:05:00.00')
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (17,1,1, '2008-01-07 15:06:00.00')
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (18,1,1, '2008-01-07 15:07:00.00')
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (19,1,1, '2008-01-07 16:08:00.00')
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (20,1,1, '2008-01-07 16:09:00.00')
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (21,1,1, '2008-01-07 17:00:00.00')
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (22,1,1, '2008-01-07 17:00:00.00')

SET NOCOUNT OFF

by Lionel

Comments

 

GPS » SQL Puzzle 9 said:

January 7, 2008 3:12 PM
 

» Daily Bits - January 8, 2008 Alvin Ashcraft’s Daily Geek Bits: Daily links plus random ramblings about development, gadgets and raising rugrats. said:

January 8, 2008 7:25 AM
 

Brad24 said:

INSERT INTO @Photos(PhotoName, TimeTaken) VALUES ('Photo15', '2010-10-07 17:53:25.26')

This seems to have been taken two years after the others. Your girlfriend was there for a long time.

INSERT INTO @Photos(PhotoName, TimeTaken) VALUES ('Photo14', '2008-10-15 16:15:52.45')

good heavens, still there next october!
January 8, 2008 10:16 AM
 

Peter Day said:

SELECT  photoname,
       [where_x] = MIN(x)
FROM    
 ( SELECT    photoname,
             [closest] = MIN(closeness),
             [whentaken] = MIN(phototimetaken)
   FROM      
     ( SELECT    photoname,
       [phototimeTaken] = p.timeTaken,
       closeness = ABS(DATEDIFF(second, l.timeTaken,p.timeTaken))
       FROM      @locations l
          CROSS JOIN @photos p
     ) f
  GROUP BY  photoname
  ) g
INNER JOIN @locations l
ON closest = ABS(DATEDIFF(second, l.timeTaken, whentaken))
GROUP BY photoname
January 8, 2008 10:48 AM
 

Lionel said:

Brad24: I meant to add in some extreme cases so you could not just do a DateDiff but I got it wrong :/. Just thought I would try to make the puzzle a bit more interesting :)

Peter Day: Looks good! There are a few things you solution doesn't quite do which
makes the whole thing a bit more interesting. So I have added a few test cases:

INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (23,1,1, '2000-01-07 17:00:00.003')
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (24,1,1, '2000-01-07 17:00:00.005')
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (25,1,1, '2000-01-07 17:00:01.000')

INSERT INTO @Photos(PhotoName, TimeTaken) VALUES ('Photo16', '2000-01-07 17:00:00.007')

INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (26,1,1, '9999-01-07 17:00:00.000')
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES ('Photo17', '9999-01-07 17:00:00.000')

Which should give you:

PhotoName x           y           z
--------- ----------- ----------- -----------
Photo16   24          1           1
Photo17   26          1           1

A good soution though!

January 8, 2008 12:26 PM
 

sangha said:

SELECT Pho.PhotoName,(SELECT TOP 1 LocationID FROM @Locations Loc
ORDER BY ABS(CAST(Loc.TimeTaken AS decimal(12,5)) - CAST(Pho.TimeTaken AS decimal(12,5)))
) AS LocationID
FROM @Photos Pho
January 8, 2008 5:50 PM
 

sangha said:

SELECT Pho.PhotoName,(SELECT TOP 1 Loc.LocationID FROM @Locations Loc
ORDER BY ABS(CAST(Pho.TimeTaken AS decimal(20,12))- CAST(Loc.TimeTaken AS decimal(20,12)))
) AS LocationID
FROM @Photos Pho
January 8, 2008 6:06 PM
 

sangha said:

The following query will return the output that you want:-

SELECT Photos.PhotoName, Locations.x, Locations.y, Locations.z FROM
(SELECT Pho.PhotoName,(SELECT TOP 1 Loc.LocationID FROM @Locations Loc
ORDER BY ABS(CAST(Pho.TimeTaken AS decimal(20,12))- CAST(Loc.TimeTaken AS decimal(20,12)))
) AS LocationID
FROM @Photos Pho) Photos
INNER JOIN @Locations Locations ON Photos.LocationID = Locations.LocationID
January 8, 2008 6:16 PM
 

BorisNikolaevich said:

Includes the extra test cases you posted for Peter!

SELECT PhotoName, x, y, z
FROM @Photos AS Photos
CROSS JOIN @Locations AS Locations
WHERE Locations.LocationID IN
(
SELECT TOP 1 LocationID FROM @Locations AS NearestLocations
ORDER BY ABS(CAST(NearestLocations.TimeTaken AS float) - CAST(Photos.TimeTaken AS float))
)
ORDER BY PhotoName
January 8, 2008 6:39 PM
 

JRMorris said:

SELECT p.PhotoName,
L.x, L.y, L.z
FROM @Photos P, @Locations L
WHERE ABS((CAST(P.TimeTaken as float)-CAST(L.TimeTaken as float))) =
(SELECT MIN (ABS((CAST(P2.TimeTaken as float)-CAST(L2.TimeTaken as float))))
FROM @Photos P2, @Locations L2 WHERE P2.PhotoName = P.PhotoName)
ORDER BY
P.PhotoName
January 9, 2008 2:55 AM
 

Anders said:

This will give the correct answer

SELECT PhotoName, PhotoTime, LocationId, LocationTime
FROM
(SELECT
A.PhotoName,
A.TimeTaken AS PhotoTime,
B.TimeTaken AS LocationTime,
B.LocationId,
ABS(DATEDIFF(ss, A.TimeTaken, B.TimeTaken)) AS TimeDiffAbs,
ROW_NUMBER() OVER (PARTITION BY PhotoName ORDER BY ABS(DATEDIFF(ss, A.TimeTaken, B.TimeTaken))) AS RANK
FROM @Photos A LEFT JOIN
@Locations B
ON A.TimeTaken <> B.TimeTaken OR A.TimeTaken = B.TimeTaken) A
WHERE RANK = 1
ORDER BY PhotoTime

The only "cheat" is that I assume more than one second timedifference. Using ms causes an overflow (but a workaround would probably not be too difficult)
January 9, 2008 4:02 AM
 

StefanG said:

SELECT photoname, x, y, z
FROM @photos a CROSS JOIN @locations b
WHERE b.locationid = (
SELECT TOP 1 locationid FROM @locations c
ORDER BY abs(cast(c.TimeTaken as float) - cast(a.TimeTaken as float))
)
January 9, 2008 4:14 AM
 

jimness said:

OK first time poster here....

I thought I'd try something different, i.e. to calculate the most likely coordinates assuming that you travelled in a straight line between taking GPS locations.  So, for example, having taken locations at 13:00 and 13:05 and you want to see where you were for a photo taken at 13:02, you should be 2/5 of the way along the line between the two points.  Here's the SQL that solves it for the above (I think):

SELECT st2.PhotoName,
CASE WHEN x1 < x2
THEN x1 + FractionOfAB * (x2 - x1)
ELSE x2 + (1 - FractionOfAB) * (x1 - x2)
END AS ActualX,
CASE WHEN y1 < y2
THEN y1 + FractionOfAB * (y2 - y1)
ELSE y2 + (1 - FractionOfAB) * (y1 - y2)
END AS ActualY,
CASE WHEN z1 < z2
THEN z1 + FractionOfAB * (z2 - z1)
ELSE z2 + (1 - FractionOfAB) * (z1 - z2)
END AS ActualZ

FROM (
SELECT st1.PhotoName,
PhotoTimeFloat - CONVERT(FLOAT, lu.TimeTaken) AS DistanceFromPointA,
CONVERT(FLOAT, lo.TimeTaken) - PhotoTimeFloat AS DistanceFromPointB,
CASE
WHEN CONVERT(FLOAT, lo.TimeTaken) = CONVERT(FLOAT, lu.TimeTaken) THEN 0
ELSE (PhotoTimeFloat - CONVERT(FLOAT, lu.TimeTaken)) / (CONVERT(FLOAT, lo.TimeTaken) - CONVERT(FLOAT, lu.TimeTaken))
END AS FractionOfAB,
lu.x AS x1,
lo.x AS x2,
lu.y AS y1,
lo.y AS y2,
lu.z AS z1,
lo.z AS z2
FROM (
SELECT p.PhotoName,
CONVERT(FLOAT, p.TimeTaken) AS PhotoTimeFloat,
(SELECT TOP 1 LocationId FROM @Locations WHERE TimeTaken <= p.TimeTaken ORDER BY TimeTaken DESC) AS ClosestLocationTimeUnderId,
(SELECT TOP 1 LocationId FROM @Locations WHERE TimeTaken >= p.TimeTaken ORDER BY TimeTaken ASC) AS ClosestLocationTimeOverId
FROM @Photos p
) st1
JOIN @Locations lu ON st1.ClosestLocationTimeUnderId = lu.LocationId
JOIN @Locations lo ON st1.ClosestLocationTimeOverId = lo.LocationId
) st2


Quite a bit more convoluted and could have been compacted more, I'm sure.  Also I'm sure there's a better way using trig but I couldn't be bothered.

Jim
January 9, 2008 6:51 AM
 

Brian@Chaco said:

I would have posted basically what BorisNikolaevich and StefanG came up with.  However, I have seen slightly faster performance with the IN construct Boris uses, especially in larger data sets, which I believe is because the SQL execution plan is able to use "if any matching record in the set exists" comparison instead of 1-to-1 equality.

However, I think Jim's interpretation about travelling in a straight line between coordinates is a really great twist on the puzzle!
January 9, 2008 10:25 AM
 

mikegood said:

I really like StefanG's solution.  Posts by JRMorris & Boris  are similar, but Stefan's post is more concise and maybe twice as efficient.

Here is solution that's about twice as efficient as StefanG's (on my server anyway), and it may scale better.  But it's not nearly as elegant.

with cte as
(
   select  row_number() over (order by TimeTaken) as RowID, x, y, z, IsFake, TimeTaken
     from (select  x, y, z, 0 as IsFake, TimeTaken from Locations
           union all
           select  0, 0, 0, 1, '1753-01-01'
           union all
           select  0, 0, 0, 1, '9999-12-31 23:59:59') as x
)
select  p.PhotoName,
       case when c2.IsFake = 1
              or abs(cast(c1.TimeTaken as float) - cast(p.TimeTaken as float)) <=
                 abs(cast(c2.TimeTaken as float) - cast(p.TimeTaken as float)) then c1.x
            else c2.x
       end as x,
       case when c2.IsFake = 1
              or abs(cast(c1.TimeTaken as float) - cast(p.TimeTaken as float)) <=
                 abs(cast(c2.TimeTaken as float) - cast(p.TimeTaken as float)) then c1.y
            else c2.y
       end as y,
       case when c2.IsFake = 1
              or abs(cast(c1.TimeTaken as float) - cast(p.TimeTaken as float)) <=
                 abs(cast(c2.TimeTaken as float) - cast(p.TimeTaken as float)) then c1.z
            else c2.z
       end as z
 from  Photos p
cross  join cte c1
 left  join cte c2 on c2.RowID = c1.RowID + 1 and c2.TimeTaken > c1.TimeTaken
where  p.TimeTaken >= c1.TimeTaken and p.TimeTaken < c2.TimeTaken
order  by p.PhotoName
January 9, 2008 11:57 AM
 

mikegood said:

Oops!  Too late I considered performance implications of table vars vs actual tables.  I'd used tables to work this out, not table vars, and performance implications are different.  

When using table vars, as the problem requires, StefanG's post proves best all around.  

At least with a small data set.  Going to check on this a little further....  I realize its silly talking performance for small data like this, was trying to consider general solution that would work well against larger data set.
January 9, 2008 12:24 PM
 

mikegood said:

StefanG's approach is even better with large data, whether using table vars or tables.  Only time my post was better was when using real tables & very little data...wish I could withdraw that post!

This puzzle looks to me like sort of problem where row_number() based solution would make a big performance improvement.  But I guess not.
January 9, 2008 12:46 PM
 

CBMcKinney said:

I like StefanG's as well, except I'd change it to
"ORDER BY abs(convert(float, c.TimeTaken - a.TimeTaken)))"
to make it match the earlier of two points when the TimeTaken is exactly halfway between GPS points.
January 9, 2008 1:42 PM
 

tbomberger said:

; WITH all_combos
   AS (
SELECT PhotoName
    , x
    , y
    , z
    , row_number() over
                (partition by PhotoName
                     order by PhotoName
                            , case when l.timetaken > p.TimeTaken
                                   then l.timetaken - p.TimeTaken
                                   else p.timetaken - l.TimeTaken end
                            , l.TimeTaken
                            , l.LocationID ) rownum
 FROM @Photos     p
CROSS
 JOIN @Locations  l
      )
SELECT *
 FROM all_combos
WHERE rownum = 1
January 9, 2008 2:39 PM
 

BorisNikolaevich said:

Heh—I wasn't thinking about whether "IN" or "=" was more efficient; I just can't wrap my head around being able to say "equals ... top 1" and have the query return more than one record!  I see it, it works, I just can't define it!  Even good SQL guys have to learn something new every day.

As far as being more concise... I'm not sure I buy that part.  The spelled-out versus single-letter table aliases and line breaks appear to be the only significant difference between the two queries, and I'll bet even the indentation was the same before pasting into the comments box!  Don't get me wrong, I'm not in any way trying to make StefanG's post less awesome; it rocks.  I'm just questioning the implication of "it's better because it uses fewer characters" regardless of the actual content & structure of the query.

As a side note (and again nothing to do with StefanG's excellent, effective, and concise solution), my developers are not allowed to use single-letter aliases, or to alias tables and columns without the AS keyword, in any production or example/demonstration code (ad-hoc queries are totally exempt, of course!)  In a team environment, it's not about saving yourself a few keystrokes, it's about saving time later when someone has to look at the code again.  The "broken-record mantra" I am known for is, "Be nice to the next developer—it might be you!"  (You never know when you are going to have to go back and look at your own code in six months or a year, and although today your brilliant code is crystal clear, when your head has been in other projects for any length of time it can take some effort to read through and remember what "t1" and "t2" referred to!)  For this reason, since I type pretty quickly anyway, I tend to spell things out even when I am just submitting a response to one of Lionel's puzzles... <grin>
January 9, 2008 3:15 PM
 

ethan said:

select *
from (select p.*, LocationID = (select top 1 locationID from @locations l order by abs(datediff(s, p.timetaken, l.timetaken)))
  from @photos p) P2,
  @locations l1
where P2.locationid = l1.locationid
January 9, 2008 3:49 PM
 

StefanG said:

Boris, funny to read that you had problems understanding the = (SELECT TOP 1 x)  syntax.

I myself had trouble grasping what your IN (SELECT TOP 1 x) syntax was really doing. :-)

Of course, both variants are functionally equivalent. For some reason though, the query plans are different.

As far as I can understand after comparing the query plans, the = version should be slightly more efficient for larger datasets.

As for the slight differences in coding style (single-letter aliases and using the AS keyword) which is best is entirely a matter of personal taste.
I myself tend to use single-letter aliases just to make it perfectly clear to the next developer that he is looking at an alias and not a real table name.

Funny that two people can come up with two so similar solutions independently.
January 9, 2008 5:34 PM
 

the_englishman said:

This should offer the best combination of conciseness (is that a word) and performance

SELECT *
FROM
(
SELECT PhotoName,
x,
y,
z,
ROW_NUMBER() OVER
(
PARTITION BY PhotoName
ORDER BY ABS( CAST(p.TimeTaken AS FLOAT)- CAST(l.TimeTaken AS FLOAT))
) AS rn
FROM @Photos p
CROSS JOIN @Locations l
) a
WHERE a.[rn] = 1
January 9, 2008 6:23 PM
 

the_englishman said:

Sorry, was not copying tbomberger, I did not see his post until now. His case statement would best be replaced with the abs function.
January 9, 2008 6:28 PM
 

rmallamace said:

Here's a solution using "not exists" so it should run on 2000 as well.
I haven't compared it to the others for speed though.

select phto.PhotoName,
      locn.x,
      locn.y,
      locn.z
from @photos phto,
    @locations locn
where not exists
     (
      select *
      from @locations locnDiff
      where abs(datediff(s, phto.timetaken, locn.timetaken))
             > abs(datediff(s, phto.timetaken, locnDiff.timetaken))
     )
 and not exists
     (
      select *
      from @locations locnSame
      where locn.LocationID != locnSame.LocationID
        and abs(datediff(s, phto.timetaken, locn.timetaken))
             = abs(datediff(s, phto.timetaken, locnSame.timetaken))
        and locnSame.LocationID < locn.LocationID
     )
January 10, 2008 2:04 AM
 

Phil Factor said:

One's first instinct is to do a linear regression in TSQL, which isn't too difficult
see http://www.webservertalk.com/archive133-2004-2-111656.html

The trouble is that the data is too poor and it does not allow us to do a Linear regression, something good like a least squares analysis, as the data isn't homoscedastic. (the errors are not normally distributed)

From the data, it is clear that she is walking in a west-east direction, but the readings don't tell us much.

Remarkably, in the final reading, she is in two places at once. (21 and 22). She also could haver wandered about all over the place from 10 am intil five past 11 am. She probably nipped off to a nice warm spot to have a cup of warm cocoa. It is difficult to make any assumptions, even when the times tally

Do we, for example, assume that she is moving slowly, or darting about from rock to rock?

Given better data, I'd do a polynomial each on X, Y, and Z and then calculate the prediction for the location for any time in the range. But that would assume she is walking constantly, rather than stopping and starting.
January 10, 2008 5:46 AM
 

tbomberger said:

Reason  I went to the case statement (I tried a number of variants of abs(cast()/convert()))...   This (the case statement version) works with the additional extreme data points that were added after the initial list of data points and it chooses the earlier of the two position data points (8,9) that are both offset by 5 minutes from the time of Photo07.  The answer set provided chooses position 8 which is 5 minutes prior to the time of the photo.  Using abs and the then required cast/convert to float selects for position 9 which is 5 minutes after time of the photo because either there is a floating point error or there is an additional millisecond or three included in the conversion to float on some datetimes.  More problematic is the fact that, with the additional rows added with extreme dates/times, it is difficult to find a combination of the arithmetic that does not cause an overflow.
January 10, 2008 7:34 AM
 

StefanG said:

Well, if you really want to be sure that you get the earliest location if the time difference to two locations is exactly the same, you could use the following order by clause:

ORDER BY abs(cast(c.TimeTaken as float) - cast(a.TimeTaken as float)), c.LocationID

No CASE expression is necessary.

casting datetime to float before subtraction also avoids any numeric overflows.
January 10, 2008 8:01 AM
 

tbomberger said:

Yes it does and this is one of the combinations that i attempted.  Perhaps one of the settings on my SQL Server instance is different than yours, but with the ORDER BY you (StefanG) provided, the result I get is x=9 for PhotoName = 'Photo07'.  It should be x=8 according to Lionel's answer set.

position 8 = 14:30
      eval of abs(cast()-cast())  8 to 7-->     0.00347222222626442
picture  7 =  14:35
      eval of abs(cast()-cast())  7 to 9-->     0.00347222221898846
position 9 =  14:40

so it chooses 9 by the barest of margins
January 10, 2008 8:26 AM
 

the_englishman said:

Concise CTE solution:

WITH cte AS
(
SELECT PhotoName,
x,
y,
z,
ABS( CAST(p.TimeTaken AS FLOAT) - CAST(l.TimeTaken AS FLOAT) ) AS diff
FROM @Photos p
CROSS JOIN @Locations l
)
SELECT a.[PhotoName], b.x, b.y, b.z
FROM
(
SELECT PhotoName, MIN(diff) AS diff
FROM cte
GROUP BY PhotoName
) a
JOIN [cte] b
ON a.PhotoName = b.PhotoName
AND a.[diff] = b.diff

I still need to fix the photo7 example
January 10, 2008 8:50 AM
 

mikegood said:

Englishman- Testing @tablevars, real tables, default data set, and default + 1000 rows added to each table, StefanG's post is still best performer (on my server).  The plans look better & cheaper & duration testing confirms this:

DataSet           TableType     Algorithm         Duration
default             table var       StefanG                  0 ms
default             table var       Englishman           16 ms
default             real table      StefanG                16 ms
default             real table      Englishman           63 ms
default+1000    table var       StefanG             1153 ms
default+1000    table var       Englishman      10690 ms
default+1000    real table      StefanG             1796 ms
default+1000    real table      Englishman        9360 ms
January 10, 2008 9:32 AM
 

mikegood said:

RMallamace - Tried to similarly test your post, but get "Difference of two datetime columns caused overflow at runtime" with default data set (data from original question plus a few more rows suggested in subsequent posts by Lionel).  

I believe you can't assume that you can successfully pull off datediff(s, ...), which is why many of the other posts instead cast the datetime as float & use abs(cast(...) - cast(...)) approach instead.  

I made this change & then re-ran tests.  Ok for small data, but does not scale.  

DataSet           TableType     Algorithm         Duration
default             table var       RMallamace      16 ms
default             real table      RMallamace      30 ms
default+1000    table var       RMallamace      247050 ms
default+1000    real table      RMallamace      DNF

January 10, 2008 9:33 AM
 

tbomberger said:

Englishman -  Concise CTE solution, You also need to check Photo14 and Photo15 as each is returning 2 rows -- positions 21 and 22
January 10, 2008 9:52 AM
 

mikegood said:

Englishman - results I posted above applied to your earlier algorithm.  After I posted them I saw your changes.  Seemed like you were going for an accuracy fix, but your new post is a bit more efficient too - nice work.

DataSet           TableType     Algorithm         Duration
default             table var       Englishman           16 ms
default             real table      Englishman           30 ms
default+1000    table var       Englishman        1296 ms
default+1000    real table      Englishman        1533 ms

Seeing this, I tried larger data sets looking to find a trend, didn't really see one.  There's quite a bit of variability and I'm not repeating tests enough to be certain of anything.  But generally the StefanG algorithm seems to perform a little better.  
January 10, 2008 10:33 AM
 

BorisNikolaevich said:

That single change from my 'IN' to StafanG's 'equals' makes a HUGE difference, and I have gone back and modified several production queries where I used the same 'IN' syntax to achieve a great performance gain!  Still having trouble with the "equals a TOP 1 result" concept, but I'm sure happy with the output.

The best puzzles are those that lend themselves to some real-life applications.  And this one was FUN, too—I will be participating in all of Lionel's puzzles from here on out!
January 10, 2008 11:41 AM
 

the_englishman said:

Ok, this works correctly, and has not duplicates. Technically I think my original answer was correct, as there is not logical way of picking between the two locations when they are the same difference. I like the ingenuity of my solution, although it's a bit hacky:

WITH cte AS
(
SELECT PhotoName,
x,
y,
z,
ABS( CAST(p.TimeTaken AS FLOAT) - CAST(l.TimeTaken AS FLOAT) ) + CAST(l.LocationID AS FLOAT)/ 100000000000
AS diff
FROM @Photos p
CROSS JOIN @Locations l
)
SELECT a.[PhotoName], a.x, a.y, a.z
FROM cte a
JOIN
(
SELECT photoname, MIN(diff) AS diff
FROM cte b
GROUP BY [photoname]
) c
ON a.photoname = c.photoname
AND a.diff = c.diff
January 10, 2008 12:09 PM
 

the_englishman said:

Actuaslly, the cast is redundant in this case it seems:

WITH cte AS
(
SELECT PhotoName,
x,
y,
z,
ABS( CAST(p.TimeTaken AS FLOAT) - CAST(l.TimeTaken AS FLOAT) ) + l.LocationID / 100000000000
AS diff
FROM @Photos p
CROSS JOIN @Locations l
)
SELECT a.[PhotoName], a.x, a.y, a.z
FROM cte a
JOIN
(
SELECT photoname, MIN(diff) AS diff
FROM cte b
GROUP BY [photoname]
) c
ON a.photoname = c.photoname
AND a.diff = c.diff

How do you get the format to work correctly?
January 10, 2008 12:12 PM
 

the_englishman said:

This looks smaller:

WITH cte AS
(
SELECT PhotoName, x, y, z,
ABS( CAST(p.TimeTaken AS FLOAT) - CAST(l.TimeTaken AS FLOAT) )
+ l.LocationID / 100000000000 AS diff
FROM @Photos p
CROSS JOIN @Locations l
)
SELECT a.[PhotoName], x, y, z
FROM cte a
JOIN
( SELECT photoname, MIN(diff) AS diff
FROM cte b
GROUP BY [photoname]
) c
ON a.photoname = c.photoname
AND a.diff = c.diff
January 10, 2008 12:15 PM
 

the_englishman said:

Ok, this is more concise, probably performs the same:

WITH cte AS
(
SELECT PhotoName, x, y, z,
ABS( CAST(p.TimeTaken AS FLOAT) - CAST(l.TimeTaken AS FLOAT) )
+ l.LocationID / 100000000000 AS diff
FROM @Photos p
CROSS JOIN @Locations l
)
SELECT a.[PhotoName], x, y, z FROM cte a
WHERE EXISTS
( SELECT 1 FROM cte b
WHERE a.photoname = b.photoname
GROUP BY [photoname] HAVING MIN(diff) = a.diff
)

I did not know I could do that with an exists. Cool
January 10, 2008 12:26 PM
 

Lionel said:

I think all of your solutions are looking a whole lot better than mine! Sorry I don't have time atm to go through and write a proper reply to all the comments (I am being a bit very useless at putting comments in my own blog). I will try and post something more tomorrow. I though I should post up my solution so you guys could see how I tried to solve it.

SELECT PhotoLocations.PhotoName, l.x, l.y, l.z

FROM @Locations l
INNER JOIN
(
SELECT PhotoDelta.PhotoName, MIN(LocationID) AS LocationID

FROM @Locations l
INNER JOIN
(

SELECT PhotoName, p.TimeTaken AS PhotoTimeTaken,
MIN(CASE WHEN l.TimeTaken > p.TimeTaken THEN l.TimeTaken - p.TimeTaken ELSE p.TimeTaken - l.TimeTaken END) AS MinDelta
FROM @Locations l CROSS JOIN @Photos p
GROUP BY PhotoName, p.TimeTaken

) PhotoDelta
ON l.TimeTaken = PhotoDelta.PhotoTimeTaken + PhotoDelta.MinDelta
OR l.TimeTaken = PhotoDelta.PhotoTimeTaken - PhotoDelta.MinDelta

GROUP BY PhotoDelta.PhotoName

) PhotoLocations
ON l.LocationID = PhotoLocations.LocationID
January 10, 2008 12:51 PM
 

BorisNikolaevich said:

Brian@Chaco — I'd love to know where you have "seen" better performance with the IN construct, since every query I have changed to use = instead of my original IN has improved.  In fact the larger the data set, the better the improvement after the change!

I think you are blowing smoke.
January 10, 2008 1:38 PM
 

StefanG said:

Okay, tbomberger is right - it is not a good idea to subtract two almost identical floats. This causes cancellation which means that we lose precision in the result.

To demonstrate this you can run the following:

declare @p1 datetime, @p2 datetime, @p3 datetime
select @p1 = '20080101 14:30', @p2 = '20080101 14:35', @p3 = '20080101 14:40'

select cast(@p2 as float)-cast(@p1 as float)
select cast(@p3 as float)-cast(@p2 as float)

select @p2-@p1
select @p3-@p2

This gives the following result:

3.4722222262644209E-3
3.4722222189884633E-3
1900-01-01 00:05:00.000
1900-01-01 00:05:00.000

As you can see, casting to float before subtracting means that we get different results which means that the my original query does not work properly.

I now think that the best query is a combination of tbombergers case technique and my original query which results in a query that is very efficient, absolutely correct, and also works perfectly on both SQL2000 and SQL2005

Like this:

SELECT photoname, x, y, z
FROM @photos a CROSS JOIN @locations b
WHERE b.locationid = (
SELECT TOP 1 locationid FROM @locations c
ORDER BY CASE
WHEN c.TimeTaken > a.TimeTaken
THEN c.TimeTaken - a.TimeTaken
ELSE a.TimeTaken - c.TimeTaken
END, c.LocationID
)

January 10, 2008 4:10 PM
 

mikegood said:

StefangG - very nice!  Took me a while to prove to myself, because was not obvious to me why case stmt better than abs when first looking at it.  

PS If you cast as numeric instead of float, the error disappears, at least for your example above.  Regarless, your last post is pretty clear case stmt is not as hackish (I think) as casting datetimes to numbers.  And I think it is faster.
January 10, 2008 5:09 PM
 

rmallamace said:

mikegood - thanks for the info on timings. I'll keep that in mind for future SQL.
And I agree with StefagG's stategy too - I think its the hit the mark!
January 11, 2008 12:21 AM
 

Outside the tent.... said:

There is a great deal of activity on the Red-Gate forums at the moment. In fact, they are a fascinating...
February 7, 2008 9:37 AM
 

http://www.simple-talk.com/community/blogs/lionel/archive/2008/01/07/42301.aspx said:

April 5, 2008 8:55 AM
You need to sign in to comment on this blog

















<January 2008>
SuMoTuWeThFrSa
303112345
6789101112
13141516171819
20212223242526
272829303112
3456789
Virtual Exchange Servers
 Microsoft now supports running Exchange Server 2007 in server virtualization environments, not just on... Read more...

Virtualizing Exchange: points for discussion
 With the increasing acceptance of the use of Virtualization as a means of providing server... Read more...

Encouraging .NET Reflector Add-ins
 Jason Haley is well-known for the resources he's provided to developers who wish to extend Reflector's... Read more...

Using .NET Reflector Add-ins
 .NET Reflector by itself is great, but it really comes into its own with the help of some add-ins. Here... Read more...

Unique Experiences!
 You'd have thought that a unique constraint was an easy concept - Not a bit of it; it can cause a lot... Read more...