Click here to monitor SSC

Software Engineer - Red Gate Software

SQL Puzzle 9

Published 7 January 2008 11:56 am

/*
Happy New Year everybody! To help with the January blues, I thought it was time for another puzzle. The inspiration for this one comes from a SQL question posed by my girlfriend. She is a geologist and, on a recent field trip to Iceland, was wandering around with a GPS strapped to her head to make 3D representations of the side of hills. At the same time she was taking pictures of rocks on her digital camera (we have the best holiday snaps!).  Her question was this: how could she match the GPS points to the photos so that she had a rough idea of where all the photos were taken.

So, on to the puzzle. We have a whole lot of points from our GPS and a whole lot of photos. Both the photos and the points have a time and date when they were taken. The puzzle is simple. I want to know which is the nearest point to each photo. Since we only have the time to go on, I want the point that is closest in terms time.

So, for example, if we have a photo taken at 13:01 and points taken at 12:55, 13.00 and 13:06 then I want the photo to be at the point taken at 13:00. Any way of doing this in SQL is fine but the simpler your solution the better. If anyone wants to show off and do this in LINQ as well then I would be interested in your answer (as somebody who likes to write stored procedures I am a bit wary of LINQ but I am very interested in learning about it). I had to make up some imaginary data and my results are:

PhotoName x           y           z
——— ———– ———– ———–
Photo01   1           1           1
Photo02   1           1           1
Photo03   2           1           1
Photo04   4           1           1
Photo05   4           1           1
Photo06   5           1           1
Photo07   8           1           1
Photo08   15          1           1
Photo09   19          1           1
Photo10   20          1           1
Photo11   20          1           1
Photo12   20          1           1
Photo13   20          1           1
Photo14   21          1           1
Photo15   21          1           1


On a final note, I would like to ask a huge favour of anyone reading this. I have been writing a tool to fill a SQL Server database with test data and have recently released a beta version. I think it is a really cool tool and is something I wanted back when I was a developer DBA. If you need a quick and easy way of filling a table with some example data you will almost certainly find this handy! We could really do with some feedback so if you have a spare moment please check it out! You can download it at:

http://www.red-gate.com/messageboard/viewtopic.php?t=6140

Most importantly, if you have any comments then please post them in the forum:

http://www.red-gate.com/messageboard/viewforum.php?f=76

Thanks and have fun,

Lionel

SET NOCOUNT ON

DECLARE @Photos TABLE
(
    PhotoName NVARCHAR(7) PRIMARY KEY,
    TimeTaken DATETIME
)

INSERT INTO @Photos(PhotoName, TimeTaken) VALUES (‘Photo01′, ’2008-01-02 15:01:10.12′)
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES (‘Photo02′, ’2008-01-07 10:21:06.13′)
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES (‘Photo03′, ’2008-01-07 11:15:11.42′)
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES (‘Photo04′, ’2008-01-07 12:38:53.653′)
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES (‘Photo05′, ’2008-01-07 12:38:53.657′)
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES (‘Photo06′, ’2008-01-07 13:25:51.12′)
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES (‘Photo07′, ’2008-01-07 14:35:00.00′)
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES (‘Photo08′, ’2008-01-07 15:04:19.55′)
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES (‘Photo09′, ’2008-01-07 16:01:01.17′)
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES (‘Photo10′, ’2008-01-07 16:10:32.92′)
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES (‘Photo11′, ’2008-01-07 16:10:32.92′)
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES (‘Photo12′, ’2008-01-07 16:10:32.92′)
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES (‘Photo13′, ’2008-01-07 16:33:18.35′)
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES (‘Photo14′, ’2008-10-15 16:15:52.45′)
INSERT INTO @Photos(PhotoName, TimeTaken) VALUES (‘Photo15′, ’2010-10-07 17:53:25.26′)

DECLARE @Locations TABLE
(
    LocationID INT IDENTITY(1,1) PRIMARY KEY,
    x INT,
    y INT,
    z INT,
    TimeTaken DATETIME
)

INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (1,1,1,  ’2008-01-07 10:00:00.00′)
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (2,1,1,  ’2008-01-07 11:05:00.00′)
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (3,1,1,  ’2008-01-07 11:30:00.00′)
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (4,1,1,  ’2008-01-07 12:30:00.00′)
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (5,1,1,  ’2008-01-07 13:10:00.00′)
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (6,1,1,  ’2008-01-07 14:10:00.00′)
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (7,1,1,  ’2008-01-07 14:20:00.00′)
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (8,1,1,  ’2008-01-07 14:30:00.00′)
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (9,1,1,  ’2008-01-07 14:40:00.00′)
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (10,1,1, ’2008-01-07 14:50:00.00′)
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (12,1,1, ’2008-01-07 15:01:00.00′)
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (13,1,1, ’2008-01-07 15:02:00.00′)
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (14,1,1, ’2008-01-07 15:03:00.00′)
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (15,1,1, ’2008-01-07 15:04:00.00′)
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (16,1,1, ’2008-01-07 15:05:00.00′)
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (17,1,1, ’2008-01-07 15:06:00.00′)
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (18,1,1, ’2008-01-07 15:07:00.00′)
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (19,1,1, ’2008-01-07 16:08:00.00′)
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (20,1,1, ’2008-01-07 16:09:00.00′)
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (21,1,1, ’2008-01-07 17:00:00.00′)
INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (22,1,1, ’2008-01-07 17:00:00.00′)

SET NOCOUNT OFF

47 Responses to “SQL Puzzle 9”

  1. Brad24 says:

    INSERT INTO @Photos(PhotoName, TimeTaken) VALUES (‘Photo15′, ’2010-10-07 17:53:25.26′)

    This seems to have been taken two years after the others. Your girlfriend was there for a long time.

    INSERT INTO @Photos(PhotoName, TimeTaken) VALUES (‘Photo14′, ’2008-10-15 16:15:52.45′)

    good heavens, still there next october!

  2. Peter Day says:

    SELECT photoname,
    [where_x] = MIN(x)
    FROM
    ( SELECT photoname,
    [closest] = MIN(closeness),
    [whentaken] = MIN(phototimetaken)
    FROM
    ( SELECT photoname,
    [phototimeTaken] = p.timeTaken,
    closeness = ABS(DATEDIFF(second, l.timeTaken,p.timeTaken))
    FROM @locations l
    CROSS JOIN @photos p
    ) f
    GROUP BY photoname
    ) g
    INNER JOIN @locations l
    ON closest = ABS(DATEDIFF(second, l.timeTaken, whentaken))
    GROUP BY photoname

  3. Lionel says:

    Brad24: I meant to add in some extreme cases so you could not just do a DateDiff but I got it wrong :/. Just thought I would try to make the puzzle a bit more interesting :)

    Peter Day: Looks good! There are a few things you solution doesn’t quite do which
    makes the whole thing a bit more interesting. So I have added a few test cases:

    INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (23,1,1, ’2000-01-07 17:00:00.003′)
    INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (24,1,1, ’2000-01-07 17:00:00.005′)
    INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (25,1,1, ’2000-01-07 17:00:01.000′)

    INSERT INTO @Photos(PhotoName, TimeTaken) VALUES (‘Photo16′, ’2000-01-07 17:00:00.007′)

    INSERT INTO @Locations(x,y,z, TimeTaken) VALUES (26,1,1, ’9999-01-07 17:00:00.000′)
    INSERT INTO @Photos(PhotoName, TimeTaken) VALUES (‘Photo17′, ’9999-01-07 17:00:00.000′)

    Which should give you:

    PhotoName x y z
    ——— ———– ———– ———–
    Photo16 24 1 1
    Photo17 26 1 1

    A good soution though!

  4. sangha says:

    SELECT Pho.PhotoName,(SELECT TOP 1 LocationID FROM @Locations Loc
    ORDER BY ABS(CAST(Loc.TimeTaken AS decimal(12,5)) – CAST(Pho.TimeTaken AS decimal(12,5)))
    ) AS LocationID
    FROM @Photos Pho

  5. sangha says:

    SELECT Pho.PhotoName,(SELECT TOP 1 Loc.LocationID FROM @Locations Loc
    ORDER BY ABS(CAST(Pho.TimeTaken AS decimal(20,12))- CAST(Loc.TimeTaken AS decimal(20,12)))
    ) AS LocationID
    FROM @Photos Pho

  6. sangha says:

    The following query will return the output that you want:-

    SELECT Photos.PhotoName, Locations.x, Locations.y, Locations.z FROM
    (SELECT Pho.PhotoName,(SELECT TOP 1 Loc.LocationID FROM @Locations Loc
    ORDER BY ABS(CAST(Pho.TimeTaken AS decimal(20,12))- CAST(Loc.TimeTaken AS decimal(20,12)))
    ) AS LocationID
    FROM @Photos Pho) Photos
    INNER JOIN @Locations Locations ON Photos.LocationID = Locations.LocationID

  7. BorisNikolaevich says:

    Includes the extra test cases you posted for Peter!

    SELECT PhotoName, x, y, z
    FROM @Photos AS Photos
    CROSS JOIN @Locations AS Locations
    WHERE Locations.LocationID IN
    (
    SELECT TOP 1 LocationID FROM @Locations AS NearestLocations
    ORDER BY ABS(CAST(NearestLocations.TimeTaken AS float) – CAST(Photos.TimeTaken AS float))
    )
    ORDER BY PhotoName

  8. JRMorris says:

    SELECT p.PhotoName,
    L.x, L.y, L.z
    FROM @Photos P, @Locations L
    WHERE ABS((CAST(P.TimeTaken as float)-CAST(L.TimeTaken as float))) =
    (SELECT MIN (ABS((CAST(P2.TimeTaken as float)-CAST(L2.TimeTaken as float))))
    FROM @Photos P2, @Locations L2 WHERE P2.PhotoName = P.PhotoName)
    ORDER BY
    P.PhotoName

  9. Anders says:

    This will give the correct answer

    SELECT PhotoName, PhotoTime, LocationId, LocationTime
    FROM
    (SELECT
    A.PhotoName,
    A.TimeTaken AS PhotoTime,
    B.TimeTaken AS LocationTime,
    B.LocationId,
    ABS(DATEDIFF(ss, A.TimeTaken, B.TimeTaken)) AS TimeDiffAbs,
    ROW_NUMBER() OVER (PARTITION BY PhotoName ORDER BY ABS(DATEDIFF(ss, A.TimeTaken, B.TimeTaken))) AS RANK
    FROM @Photos A LEFT JOIN
    @Locations B
    ON A.TimeTaken <> B.TimeTaken OR A.TimeTaken = B.TimeTaken) A
    WHERE RANK = 1
    ORDER BY PhotoTime

    The only “cheat” is that I assume more than one second timedifference. Using ms causes an overflow (but a workaround would probably not be too difficult)

  10. StefanG says:

    SELECT photoname, x, y, z
    FROM @photos a CROSS JOIN @locations b
    WHERE b.locationid = (
    SELECT TOP 1 locationid FROM @locations c
    ORDER BY abs(cast(c.TimeTaken as float) – cast(a.TimeTaken as float))
    )

  11. jimness says:

    OK first time poster here….

    I thought I’d try something different, i.e. to calculate the most likely coordinates assuming that you travelled in a straight line between taking GPS locations. So, for example, having taken locations at 13:00 and 13:05 and you want to see where you were for a photo taken at 13:02, you should be 2/5 of the way along the line between the two points. Here’s the SQL that solves it for the above (I think):

    SELECT st2.PhotoName,
    CASE WHEN x1 < x2
    THEN x1 + FractionOfAB * (x2 - x1)
    ELSE x2 + (1 - FractionOfAB) * (x1 - x2)
    END AS ActualX,
    CASE WHEN y1 < y2
    THEN y1 + FractionOfAB * (y2 - y1)
    ELSE y2 + (1 - FractionOfAB) * (y1 - y2)
    END AS ActualY,
    CASE WHEN z1 < z2
    THEN z1 + FractionOfAB * (z2 - z1)
    ELSE z2 + (1 - FractionOfAB) * (z1 - z2)
    END AS ActualZ

    FROM (
    SELECT st1.PhotoName,
    PhotoTimeFloat - CONVERT(FLOAT, lu.TimeTaken) AS DistanceFromPointA,
    CONVERT(FLOAT, lo.TimeTaken) - PhotoTimeFloat AS DistanceFromPointB,
    CASE
    WHEN CONVERT(FLOAT, lo.TimeTaken) = CONVERT(FLOAT, lu.TimeTaken) THEN 0
    ELSE (PhotoTimeFloat - CONVERT(FLOAT, lu.TimeTaken)) / (CONVERT(FLOAT, lo.TimeTaken) - CONVERT(FLOAT, lu.TimeTaken))
    END AS FractionOfAB,
    lu.x AS x1,
    lo.x AS x2,
    lu.y AS y1,
    lo.y AS y2,
    lu.z AS z1,
    lo.z AS z2
    FROM (
    SELECT p.PhotoName,
    CONVERT(FLOAT, p.TimeTaken) AS PhotoTimeFloat,
    (SELECT TOP 1 LocationId FROM @Locations WHERE TimeTaken <= p.TimeTaken ORDER BY TimeTaken DESC) AS ClosestLocationTimeUnderId,
    (SELECT TOP 1 LocationId FROM @Locations WHERE TimeTaken >= p.TimeTaken ORDER BY TimeTaken ASC) AS ClosestLocationTimeOverId
    FROM @Photos p
    ) st1
    JOIN @Locations lu ON st1.ClosestLocationTimeUnderId = lu.LocationId
    JOIN @Locations lo ON st1.ClosestLocationTimeOverId = lo.LocationId
    ) st2

    Quite a bit more convoluted and could have been compacted more, I’m sure. Also I’m sure there’s a better way using trig but I couldn’t be bothered.

    Jim

  12. Brian@Chaco says:

    I would have posted basically what BorisNikolaevich and StefanG came up with. However, I have seen slightly faster performance with the IN construct Boris uses, especially in larger data sets, which I believe is because the SQL execution plan is able to use “if any matching record in the set exists” comparison instead of 1-to-1 equality.

    However, I think Jim’s interpretation about travelling in a straight line between coordinates is a really great twist on the puzzle!

  13. mikegood says:

    I really like StefanG’s solution. Posts by JRMorris & Boris are similar, but Stefan’s post is more concise and maybe twice as efficient.

    Here is solution that’s about twice as efficient as StefanG’s (on my server anyway), and it may scale better. But it’s not nearly as elegant.

    with cte as
    (
    select row_number() over (order by TimeTaken) as RowID, x, y, z, IsFake, TimeTaken
    from (select x, y, z, 0 as IsFake, TimeTaken from Locations
    union all
    select 0, 0, 0, 1, ’1753-01-01′
    union all
    select 0, 0, 0, 1, ’9999-12-31 23:59:59′) as x
    )
    select p.PhotoName,
    case when c2.IsFake = 1
    or abs(cast(c1.TimeTaken as float) – cast(p.TimeTaken as float)) < =
    abs(cast(c2.TimeTaken as float) - cast(p.TimeTaken as float)) then c1.x
    else c2.x
    end as x,
    case when c2.IsFake = 1
    or abs(cast(c1.TimeTaken as float) - cast(p.TimeTaken as float)) <=
    abs(cast(c2.TimeTaken as float) - cast(p.TimeTaken as float)) then c1.y
    else c2.y
    end as y,
    case when c2.IsFake = 1
    or abs(cast(c1.TimeTaken as float) - cast(p.TimeTaken as float)) <=
    abs(cast(c2.TimeTaken as float) - cast(p.TimeTaken as float)) then c1.z
    else c2.z
    end as z
    from Photos p
    cross join cte c1
    left join cte c2 on c2.RowID = c1.RowID + 1 and c2.TimeTaken > c1.TimeTaken
    where p.TimeTaken >= c1.TimeTaken and p.TimeTaken < c2.TimeTaken
    order by p.PhotoName

  14. mikegood says:

    Oops! Too late I considered performance implications of table vars vs actual tables. I’d used tables to work this out, not table vars, and performance implications are different.

    When using table vars, as the problem requires, StefanG’s post proves best all around.

    At least with a small data set. Going to check on this a little further…. I realize its silly talking performance for small data like this, was trying to consider general solution that would work well against larger data set.

  15. mikegood says:

    StefanG’s approach is even better with large data, whether using table vars or tables. Only time my post was better was when using real tables & very little data…wish I could withdraw that post!

    This puzzle looks to me like sort of problem where row_number() based solution would make a big performance improvement. But I guess not.

  16. CBMcKinney says:

    I like StefanG’s as well, except I’d change it to
    “ORDER BY abs(convert(float, c.TimeTaken – a.TimeTaken)))”
    to make it match the earlier of two points when the TimeTaken is exactly halfway between GPS points.

  17. tbomberger says:

    ; WITH all_combos
    AS (
    SELECT PhotoName
    , x
    , y
    , z
    , row_number() over
    (partition by PhotoName
    order by PhotoName
    , case when l.timetaken > p.TimeTaken
    then l.timetaken – p.TimeTaken
    else p.timetaken – l.TimeTaken end
    , l.TimeTaken
    , l.LocationID ) rownum
    FROM @Photos p
    CROSS
    JOIN @Locations l
    )
    SELECT *
    FROM all_combos
    WHERE rownum = 1

  18. BorisNikolaevich says:

    Heh—I wasn’t thinking about whether “IN” or “=” was more efficient; I just can’t wrap my head around being able to say “equals … top 1″ and have the query return more than one record! I see it, it works, I just can’t define it! Even good SQL guys have to learn something new every day.

    As far as being more concise… I’m not sure I buy that part. The spelled-out versus single-letter table aliases and line breaks appear to be the only significant difference between the two queries, and I’ll bet even the indentation was the same before pasting into the comments box! Don’t get me wrong, I’m not in any way trying to make StefanG’s post less awesome; it rocks. I’m just questioning the implication of “it’s better because it uses fewer characters” regardless of the actual content & structure of the query.

    As a side note (and again nothing to do with StefanG’s excellent, effective, and concise solution), my developers are not allowed to use single-letter aliases, or to alias tables and columns without the AS keyword, in any production or example/demonstration code (ad-hoc queries are totally exempt, of course!) In a team environment, it’s not about saving yourself a few keystrokes, it’s about saving time later when someone has to look at the code again. The “broken-record mantra” I am known for is, “Be nice to the next developer—it might be you!” (You never know when you are going to have to go back and look at your own code in six months or a year, and although today your brilliant code is crystal clear, when your head has been in other projects for any length of time it can take some effort to read through and remember what “t1″ and “t2″ referred to!) For this reason, since I type pretty quickly anyway, I tend to spell things out even when I am just submitting a response to one of Lionel’s puzzles…

  19. ethan says:

    select *
    from (select p.*, LocationID = (select top 1 locationID from @locations l order by abs(datediff(s, p.timetaken, l.timetaken)))
    from @photos p) P2,
    @locations l1
    where P2.locationid = l1.locationid

  20. StefanG says:

    Boris, funny to read that you had problems understanding the = (SELECT TOP 1 x) syntax.

    I myself had trouble grasping what your IN (SELECT TOP 1 x) syntax was really doing. :-)

    Of course, both variants are functionally equivalent. For some reason though, the query plans are different.

    As far as I can understand after comparing the query plans, the = version should be slightly more efficient for larger datasets.

    As for the slight differences in coding style (single-letter aliases and using the AS keyword) which is best is entirely a matter of personal taste.
    I myself tend to use single-letter aliases just to make it perfectly clear to the next developer that he is looking at an alias and not a real table name.

    Funny that two people can come up with two so similar solutions independently.

  21. the_englishman says:

    This should offer the best combination of conciseness (is that a word) and performance

    SELECT *
    FROM
    (
    SELECT PhotoName,
    x,
    y,
    z,
    ROW_NUMBER() OVER
    (
    PARTITION BY PhotoName
    ORDER BY ABS( CAST(p.TimeTaken AS FLOAT)- CAST(l.TimeTaken AS FLOAT))
    ) AS rn
    FROM @Photos p
    CROSS JOIN @Locations l
    ) a
    WHERE a.[rn] = 1

  22. the_englishman says:

    Sorry, was not copying tbomberger, I did not see his post until now. His case statement would best be replaced with the abs function.

  23. rmallamace says:

    Here’s a solution using “not exists” so it should run on 2000 as well.
    I haven’t compared it to the others for speed though.

    select phto.PhotoName,
    locn.x,
    locn.y,
    locn.z
    from @photos phto,
    @locations locn
    where not exists
    (
    select *
    from @locations locnDiff
    where abs(datediff(s, phto.timetaken, locn.timetaken))
    > abs(datediff(s, phto.timetaken, locnDiff.timetaken))
    )
    and not exists
    (
    select *
    from @locations locnSame
    where locn.LocationID != locnSame.LocationID
    and abs(datediff(s, phto.timetaken, locn.timetaken))
    = abs(datediff(s, phto.timetaken, locnSame.timetaken))
    and locnSame.LocationID < locn.LocationID
    )

  24. Phil Factor says:

    One’s first instinct is to do a linear regression in TSQL, which isn’t too difficult
    see http://www.webservertalk.com/archive133-2004-2-111656.html

    The trouble is that the data is too poor and it does not allow us to do a Linear regression, something good like a least squares analysis, as the data isn’t homoscedastic. (the errors are not normally distributed)

    From the data, it is clear that she is walking in a west-east direction, but the readings don’t tell us much.

    Remarkably, in the final reading, she is in two places at once. (21 and 22). She also could haver wandered about all over the place from 10 am intil five past 11 am. She probably nipped off to a nice warm spot to have a cup of warm cocoa. It is difficult to make any assumptions, even when the times tally

    Do we, for example, assume that she is moving slowly, or darting about from rock to rock?

    Given better data, I’d do a polynomial each on X, Y, and Z and then calculate the prediction for the location for any time in the range. But that would assume she is walking constantly, rather than stopping and starting.

  25. tbomberger says:

    Reason I went to the case statement (I tried a number of variants of abs(cast()/convert()))… This (the case statement version) works with the additional extreme data points that were added after the initial list of data points and it chooses the earlier of the two position data points (8,9) that are both offset by 5 minutes from the time of Photo07. The answer set provided chooses position 8 which is 5 minutes prior to the time of the photo. Using abs and the then required cast/convert to float selects for position 9 which is 5 minutes after time of the photo because either there is a floating point error or there is an additional millisecond or three included in the conversion to float on some datetimes. More problematic is the fact that, with the additional rows added with extreme dates/times, it is difficult to find a combination of the arithmetic that does not cause an overflow.

  26. StefanG says:

    Well, if you really want to be sure that you get the earliest location if the time difference to two locations is exactly the same, you could use the following order by clause:

    ORDER BY abs(cast(c.TimeTaken as float) – cast(a.TimeTaken as float)), c.LocationID

    No CASE expression is necessary.

    casting datetime to float before subtraction also avoids any numeric overflows.

  27. tbomberger says:

    Yes it does and this is one of the combinations that i attempted. Perhaps one of the settings on my SQL Server instance is different than yours, but with the ORDER BY you (StefanG) provided, the result I get is x=9 for PhotoName = ‘Photo07′. It should be x=8 according to Lionel’s answer set.

    position 8 = 14:30
    eval of abs(cast()-cast()) 8 to 7–> 0.00347222222626442
    picture 7 = 14:35
    eval of abs(cast()-cast()) 7 to 9–> 0.00347222221898846
    position 9 = 14:40

    so it chooses 9 by the barest of margins

  28. the_englishman says:

    Concise CTE solution:

    WITH cte AS
    (
    SELECT PhotoName,
    x,
    y,
    z,
    ABS( CAST(p.TimeTaken AS FLOAT) – CAST(l.TimeTaken AS FLOAT) ) AS diff
    FROM @Photos p
    CROSS JOIN @Locations l
    )
    SELECT a.[PhotoName], b.x, b.y, b.z
    FROM
    (
    SELECT PhotoName, MIN(diff) AS diff
    FROM cte
    GROUP BY PhotoName
    ) a
    JOIN [cte] b
    ON a.PhotoName = b.PhotoName
    AND a.[diff] = b.diff

    I still need to fix the photo7 example

  29. mikegood says:

    Englishman- Testing @tablevars, real tables, default data set, and default + 1000 rows added to each table, StefanG’s post is still best performer (on my server). The plans look better & cheaper & duration testing confirms this:

    DataSet TableType Algorithm Duration
    default table var StefanG 0 ms
    default table var Englishman 16 ms
    default real table StefanG 16 ms
    default real table Englishman 63 ms
    default+1000 table var StefanG 1153 ms
    default+1000 table var Englishman 10690 ms
    default+1000 real table StefanG 1796 ms
    default+1000 real table Englishman 9360 ms

  30. mikegood says:

    RMallamace – Tried to similarly test your post, but get “Difference of two datetime columns caused overflow at runtime” with default data set (data from original question plus a few more rows suggested in subsequent posts by Lionel).

    I believe you can’t assume that you can successfully pull off datediff(s, …), which is why many of the other posts instead cast the datetime as float & use abs(cast(…) – cast(…)) approach instead.

    I made this change & then re-ran tests. Ok for small data, but does not scale.

    DataSet TableType Algorithm Duration
    default table var RMallamace 16 ms
    default real table RMallamace 30 ms
    default+1000 table var RMallamace 247050 ms
    default+1000 real table RMallamace DNF

  31. tbomberger says:

    Englishman – Concise CTE solution, You also need to check Photo14 and Photo15 as each is returning 2 rows — positions 21 and 22

  32. mikegood says:

    Englishman – results I posted above applied to your earlier algorithm. After I posted them I saw your changes. Seemed like you were going for an accuracy fix, but your new post is a bit more efficient too – nice work.

    DataSet TableType Algorithm Duration
    default table var Englishman 16 ms
    default real table Englishman 30 ms
    default+1000 table var Englishman 1296 ms
    default+1000 real table Englishman 1533 ms

    Seeing this, I tried larger data sets looking to find a trend, didn’t really see one. There’s quite a bit of variability and I’m not repeating tests enough to be certain of anything. But generally the StefanG algorithm seems to perform a little better.

  33. BorisNikolaevich says:

    That single change from my ‘IN’ to StafanG’s ‘equals’ makes a HUGE difference, and I have gone back and modified several production queries where I used the same ‘IN’ syntax to achieve a great performance gain! Still having trouble with the “equals a TOP 1 result” concept, but I’m sure happy with the output.

    The best puzzles are those that lend themselves to some real-life applications. And this one was FUN, too—I will be participating in all of Lionel’s puzzles from here on out!

  34. the_englishman says:

    Ok, this works correctly, and has not duplicates. Technically I think my original answer was correct, as there is not logical way of picking between the two locations when they are the same difference. I like the ingenuity of my solution, although it’s a bit hacky:

    WITH cte AS
    (
    SELECT PhotoName,
    x,
    y,
    z,
    ABS( CAST(p.TimeTaken AS FLOAT) – CAST(l.TimeTaken AS FLOAT) ) + CAST(l.LocationID AS FLOAT)/ 100000000000
    AS diff
    FROM @Photos p
    CROSS JOIN @Locations l
    )
    SELECT a.[PhotoName], a.x, a.y, a.z
    FROM cte a
    JOIN
    (
    SELECT photoname, MIN(diff) AS diff
    FROM cte b
    GROUP BY [photoname]
    ) c
    ON a.photoname = c.photoname
    AND a.diff = c.diff

  35. the_englishman says:

    Actuaslly, the cast is redundant in this case it seems:

    WITH cte AS
    (
    SELECT PhotoName,
    x,
    y,
    z,
    ABS( CAST(p.TimeTaken AS FLOAT) – CAST(l.TimeTaken AS FLOAT) ) + l.LocationID / 100000000000
    AS diff
    FROM @Photos p
    CROSS JOIN @Locations l
    )
    SELECT a.[PhotoName], a.x, a.y, a.z
    FROM cte a
    JOIN
    (
    SELECT photoname, MIN(diff) AS diff
    FROM cte b
    GROUP BY [photoname]
    ) c
    ON a.photoname = c.photoname
    AND a.diff = c.diff

    How do you get the format to work correctly?

  36. the_englishman says:

    This looks smaller:

    WITH cte AS
    (
    SELECT PhotoName, x, y, z,
    ABS( CAST(p.TimeTaken AS FLOAT) – CAST(l.TimeTaken AS FLOAT) )
    + l.LocationID / 100000000000 AS diff
    FROM @Photos p
    CROSS JOIN @Locations l
    )
    SELECT a.[PhotoName], x, y, z
    FROM cte a
    JOIN
    ( SELECT photoname, MIN(diff) AS diff
    FROM cte b
    GROUP BY [photoname]
    ) c
    ON a.photoname = c.photoname
    AND a.diff = c.diff

  37. the_englishman says:

    Ok, this is more concise, probably performs the same:

    WITH cte AS
    (
    SELECT PhotoName, x, y, z,
    ABS( CAST(p.TimeTaken AS FLOAT) – CAST(l.TimeTaken AS FLOAT) )
    + l.LocationID / 100000000000 AS diff
    FROM @Photos p
    CROSS JOIN @Locations l
    )
    SELECT a.[PhotoName], x, y, z FROM cte a
    WHERE EXISTS
    ( SELECT 1 FROM cte b
    WHERE a.photoname = b.photoname
    GROUP BY [photoname] HAVING MIN(diff) = a.diff
    )

    I did not know I could do that with an exists. Cool

  38. Lionel says:

    I think all of your solutions are looking a whole lot better than mine! Sorry I don’t have time atm to go through and write a proper reply to all the comments (I am being a bit very useless at putting comments in my own blog). I will try and post something more tomorrow. I though I should post up my solution so you guys could see how I tried to solve it.

    SELECT PhotoLocations.PhotoName, l.x, l.y, l.z

    FROM @Locations l
    INNER JOIN
    (
    SELECT PhotoDelta.PhotoName, MIN(LocationID) AS LocationID

    FROM @Locations l
    INNER JOIN
    (

    SELECT PhotoName, p.TimeTaken AS PhotoTimeTaken,
    MIN(CASE WHEN l.TimeTaken > p.TimeTaken THEN l.TimeTaken – p.TimeTaken ELSE p.TimeTaken – l.TimeTaken END) AS MinDelta
    FROM @Locations l CROSS JOIN @Photos p
    GROUP BY PhotoName, p.TimeTaken

    ) PhotoDelta
    ON l.TimeTaken = PhotoDelta.PhotoTimeTaken + PhotoDelta.MinDelta
    OR l.TimeTaken = PhotoDelta.PhotoTimeTaken – PhotoDelta.MinDelta

    GROUP BY PhotoDelta.PhotoName

    ) PhotoLocations
    ON l.LocationID = PhotoLocations.LocationID

  39. BorisNikolaevich says:

    Brian@Chaco — I’d love to know where you have “seen” better performance with the IN construct, since every query I have changed to use = instead of my original IN has improved. In fact the larger the data set, the better the improvement after the change!

    I think you are blowing smoke.

  40. StefanG says:

    Okay, tbomberger is right – it is not a good idea to subtract two almost identical floats. This causes cancellation which means that we lose precision in the result.

    To demonstrate this you can run the following:

    declare @p1 datetime, @p2 datetime, @p3 datetime
    select @p1 = ’20080101 14:30′, @p2 = ’20080101 14:35′, @p3 = ’20080101 14:40′

    select cast(@p2 as float)-cast(@p1 as float)
    select cast(@p3 as float)-cast(@p2 as float)

    select @p2-@p1
    select @p3-@p2

    This gives the following result:

    3.4722222262644209E-3
    3.4722222189884633E-3
    1900-01-01 00:05:00.000
    1900-01-01 00:05:00.000

    As you can see, casting to float before subtracting means that we get different results which means that the my original query does not work properly.

    I now think that the best query is a combination of tbombergers case technique and my original query which results in a query that is very efficient, absolutely correct, and also works perfectly on both SQL2000 and SQL2005

    Like this:

    SELECT photoname, x, y, z
    FROM @photos a CROSS JOIN @locations b
    WHERE b.locationid = (
    SELECT TOP 1 locationid FROM @locations c
    ORDER BY CASE
    WHEN c.TimeTaken > a.TimeTaken
    THEN c.TimeTaken – a.TimeTaken
    ELSE a.TimeTaken – c.TimeTaken
    END, c.LocationID
    )

  41. mikegood says:

    StefangG – very nice! Took me a while to prove to myself, because was not obvious to me why case stmt better than abs when first looking at it.

    PS If you cast as numeric instead of float, the error disappears, at least for your example above. Regarless, your last post is pretty clear case stmt is not as hackish (I think) as casting datetimes to numbers. And I think it is faster.

  42. rmallamace says:

    mikegood – thanks for the info on timings. I’ll keep that in mind for future SQL.
    And I agree with StefagG’s stategy too – I think its the hit the mark!

  43. Anonymous says:

    There is a great deal of activity on the Red-Gate forums at the moment. In fact, they are a fascinating…

Leave a Reply