Click here to monitor SSC

Software Engineer - Red Gate Software

SQL Puzzle 7

Published 26 October 2007 4:53 am
/*
Well, it has been a while since I posted up a puzzle. I  blame this on the very unflattering picture of myself attached to the blog. This causes me a natural reluctance to put it in a position where people might see it! It is that and …ahem… the cat ate my homework …ahem… blog entry. Anyway, this puzzle has a couple of parts. Since I am short on ideas, the first part is the same as the second part just with one extra column.

The puzzle is very simple. You have to move as much of the data as you can from the source tables to the destination tables. There is one restriction though;  you can only use one insert statement in each part! Extra points are definitely awarded for a solution that works in both SQL Server 2000 and 2005. So, to summarize, one insert statement to move data from Part1Source -> Part1Dest, and one insert stament to move Part2Source -> Part2Dest.

Have fun!

Lionel


CREATE TABLE Part1Source
(
    a INT,
    b INT,
    
    PRIMARY KEY (a,b)
)
    
INSERT INTO Part1Source VALUES (1,1)
INSERT INTO Part1Source VALUES (1,2)
INSERT INTO Part1Source VALUES (2,3)
INSERT INTO Part1Source VALUES (7,2)
INSERT INTO Part1Source VALUES (2,4)
INSERT INTO Part1Source VALUES (5,5)
INSERT INTO Part1Source VALUES (5,1)
INSERT INTO Part1Source VALUES (5,3)
INSERT INTO Part1Source VALUES (9,0)
INSERT INTO Part1Source VALUES (11,2)

CREATE TABLE Part1Dest
(
    a INT UNIQUE,
    b INT UNIQUE,
    
    FOREIGN KEY (a,b)  REFERENCES Part1Source(a,b)    
)

———————— Part 2 ————————

CREATE TABLE Part2Source
(
    a INT,
    b INT,
    c INT,
    
    PRIMARY KEY (a,b,c)
)
    
    
INSERT INTO Part2Source VALUES (1,1,2)
INSERT INTO Part2Source VALUES (1,2,2)
INSERT INTO Part2Source VALUES (2,3,1)
INSERT INTO Part2Source VALUES (7,2,1)
INSERT INTO Part2Source VALUES (2,4,7)
INSERT INTO Part2Source VALUES (5,5,6)
INSERT INTO Part2Source VALUES (5,1,4)
INSERT INTO Part2Source VALUES (5,3,4)
INSERT INTO Part2Source VALUES (9,0,1)
INSERT INTO Part2Source VALUES (11,2,9)

CREATE TABLE Part2Dest
(
    a INT UNIQUE,
    b INT UNIQUE,
    c INT UNIQUE,
    
    FOREIGN KEY (a,b,c)  REFERENCES Part2Source(a,b,c)    
)

*/

104 Responses to “SQL Puzzle 7”

  1. GSquared says:

    part 1

    INSERT INTO part1dest (ab)
    
    SELECT DISTINCT a,
        (
        
    SELECT MIN(b)
            
    FROM part1source t2
            
    WHERE part1source.a
                
    AND NOT IN
            
    (
            
    SELECT MIN(b)
                
    FROM part1source t3
                
    WHERE <> part1source.a
                
    GROUP BY a))
        
    FROM part1source
        
    WHERE
        
    (
        
    SELECT MIN(b)
            
    FROM part1source t2
            
    WHERE part1source.a
                
    AND NOT IN
            
    (
            
    SELECT MIN(b)
                
    FROM part1source t3
                
    WHERE <> part1source.a
                
    GROUP BY a)) IS NOT NULL
        
    UNION
    SELECT DISTINCT 
    a,
        (
        
    SELECT MAX(b)
            
    FROM part1source t2
            
    WHERE part1source.a
                
    AND NOT IN
            
    (
            
    SELECT MAX(b)
                
    FROM part1source t3
                
    WHERE <> part1source.a
                
    GROUP BY a))
        
    FROM part1source
        
    WHERE
        
    (
        
    SELECT MAX(b)
            
    FROM part1source t2
            
    WHERE part1source.a
                
    AND NOT IN
            
    (
            
    SELECT MAX(b)
                
    FROM part1source t3
                
    WHERE <> part1source.a
                
    GROUP BY a)) IS NOT NULL
            AND 
    NOT IN
        
    (
        
    SELECT DISTINCT a
            
    FROM part1source
            
    WHERE
            
    (
            
    SELECT MIN(b)
                
    FROM part1source t2
                
    WHERE part1source.a
                    
    AND NOT IN
                
    (
                
    SELECT MIN(b)
                    
    FROM part1source t3
                    
    WHERE <> part1source.a
                    
    GROUP BY a)) IS NOT NULL)
        
    UNION ALL
    SELECT DISTINCT MIN(a), b
        
    FROM part1source
        
    WHERE NOT IN
        
    (
        
    SELECT DISTINCT a
            
    FROM part1source
            
    WHERE
            
    (
            
    SELECT MIN(b)
                
    FROM part1source t2
                
    WHERE part1source.a
                    
    AND NOT IN
                
    (
                
    SELECT MIN(b)
                    
    FROM part1source t3
                    
    WHERE <> part1source.a
                    
    GROUP BY a)) IS NOT NULL
            
    UNION
        SELECT DISTINCT 
    a
            
    FROM part1source
            
    WHERE
            
    (
            
    SELECT MAX(b)
                
    FROM part1source t2
                
    WHERE part1source.a
                    
    AND NOT IN
                
    (
                
    SELECT MAX(b)
                    
    FROM part1source t3
                    
    WHERE <> part1source.a
                    
    GROUP BY a)) IS NOT NULL
                AND 
    NOT IN
            
    (
            
    SELECT DISTINCT a
                
    FROM part1source
                
    WHERE
                
    (
                
    SELECT MIN(b)
                    
    FROM part1source t2
                    
    WHERE part1source.a
                        
    AND NOT IN
                    
    (
                    
    SELECT MIN(b)
                        
    FROM part1source t3
                        
    WHERE <> part1source.a
                        
    GROUP BY a)) IS NOT NULL))
            AND 
    NOT IN
        
    (
        
    SELECT DISTINCT
            
    (
            
    SELECT MIN(b)
                
    FROM part1source t2
                
    WHERE part1source.a
                    
    AND NOT IN
                
    (
                
    SELECT MIN(b)
                    
    FROM part1source t3
                    
    WHERE <> part1source.a
                    
    GROUP BY a))
            
    FROM part1source
            
    WHERE
            
    (
            
    SELECT MIN(b)
                
    FROM part1source t2
                
    WHERE part1source.a
                    
    AND NOT IN
                
    (
                
    SELECT MIN(b)
                    
    FROM part1source t3
                    
    WHERE <> part1source.a
                    
    GROUP BY a)) IS NOT NULL
            
    UNION
        SELECT DISTINCT
            
    (
            
    SELECT MAX(b)
                
    FROM part1source t2
                
    WHERE part1source.a
                    
    AND NOT IN
                
    (
                
    SELECT MAX(b)
                    
    FROM part1source t3
                    
    WHERE <> part1source.a
                    
    GROUP BY a))
            
    FROM part1source
            
    WHERE
            
    (
            
    SELECT MAX(b)
                
    FROM part1source t2
                
    WHERE part1source.a
                    
    AND NOT IN
                
    (
                
    SELECT MAX(b)
                    
    FROM part1source t3
                    
    WHERE <> part1source.a
                    
    GROUP BY a)) IS NOT NULL
                AND 
    NOT IN
            
    (
            
    SELECT DISTINCT a
                
    FROM part1source
                
    WHERE
                
    (
                
    SELECT MIN(b)
                    
    FROM part1source t2
                    
    WHERE part1source.a
                        
    AND NOT IN
                    
    (
                    
    SELECT MIN(b)
                        
    FROM part1source t3
                        
    WHERE <> part1source.a
                        
    GROUP BY a)) IS NOT NULL))
        
    GROUP BY 

    If that’s the pattern you’re looking for, Part 2 could have a similar solution, just with more union statements.

  2. Phil Factor says:

    I thought I’d go straight to Part 2 as it seemed more difficult. There are 432 solutions, some of which only manage three insertions. Many manage four, but a lot manage five. I think I’ve kept to the rules as you said nothing about not using an EXECUTE statement, and the stipulation of only one INSERT seemed rather unfair, but I kept to it!

    I have a lingering suspicion that there is a simpler solution, but I can’t think of it.

    DECLARE @command VARCHAR(255)
    

    SELECT TOP @command=
    Insert into Part2dest
    select ’
    +CAST(f.[a] AS VARCHAR)+‘, ’
            
    +CAST(f.[b] AS VARCHAR)+‘, ’
            
    +CAST(f.[c] AS VARCHAR)+
    union
    select ’
    +CAST(g.[a] AS VARCHAR)+‘, ’
           
    +CAST(g.[b] AS VARCHAR)+‘, ’
           
    +CAST(g.[c] AS VARCHAR)+
    union
    select ’
    +CAST(h.[a] AS VARCHAR)+‘, ’
           
    +CAST(h.[b] AS VARCHAR)+‘, ’
           
    +CAST(h.[c] AS VARCHAR)+
     ’
    +COALESCE(‘union
    select ’
    +CAST(i.[a] AS VARCHAR)+‘, ’
           
    +CAST(i.[b] AS VARCHAR)+‘, ’
           
    +CAST(i.[c] AS VARCHAR)+
    ,)+COALESCE(‘union
    select ’
    +CAST(j.[a] AS VARCHAR)+‘, ’
           
    +CAST(j.[b] AS VARCHAR)+‘, ’
           
    +CAST(j.[c] AS VARCHAR)+
    ,) +

    FROM part2source f
       
    LEFT OUTER JOIN part2source g
       
    ON f.a<>g.a AND f.b<>g.b AND f.c<>g.c 
    LEFT OUTER JOIN part2source h
       
    ON h.a NOT IN (g.a,f.a
       AND 
    h.b NOT IN (g.b,f.b)
       AND 
    h.c NOT IN (g.c,f.c)
    LEFT OUTER JOIN part2source i
       
    ON i.a NOT IN (h.a,g.a,f.a
       AND 
    i.b NOT IN (h.b,g.b,f.b)
       AND 
    i.c NOT IN (h.c,g.c,f.c)
    LEFT OUTER JOIN part2source j
       
    ON j.a NOT IN (i.a,h.a,g.a,f.a
       AND 
    j.b NOT IN (i.b,h.b,g.b,f.b)
       AND 
    j.c NOT IN (i.c,h.c,g.c,f.c)
    –the following lines are just a check
    –that there are no s-entry solutions
    /*LEFT outer JOIN part2source k
    ON k.a not IN (j.a,i.a,h.a,g.a,f.a) 
    AND k.b NOT IN (j.b,i.b,h.b,g.b,f.b)
    AND k.c NOT IN (j.c,i.c,h.c,g.c,f.c)*/
    WHERE j.a IS NOT NULL– change this to 
    –WHERE i.a IS NULL –to get the worst solution!

    EXECUTE (@command)
  3. Phil Factor says:

    For some reason, I feel compelled to provide the part 1 ‘cheat’ solution. Is there a prize for this? I fancy a Simple-Talk goody bag after all this work

    DECLARE @command VARCHAR(255)
    
    –there are 936 possible solutions
    SELECT TOP @command=
    Insert into Part1dest
    select ’
    +CAST(f.[a] AS VARCHAR)+‘, ’
            
    +CAST(f.[b] AS VARCHAR)+
    union
    select ’
    +CAST(g.[a] AS VARCHAR)+‘, ’
           
    +CAST(g.[b] AS VARCHAR)+
    union
    select ’
    +CAST(h.[a] AS VARCHAR)+‘, ’
           
    +CAST(h.[b] AS VARCHAR)+
     ’
    +COALESCE(‘union
    select ’
    +CAST(i.[a] AS VARCHAR)+‘, ’
           
    +CAST(i.[b] AS VARCHAR)+
    ,)+COALESCE(‘union
    select ’
    +CAST(j.[a] AS VARCHAR)+‘, ’
           
    +CAST(j.[b] AS VARCHAR)+
    ,) +

    FROM part1source f
       
    LEFT OUTER JOIN part1source g
       
    ON f.a<>g.a AND f.b<>g.b 
    LEFT OUTER JOIN part1source h
       
    ON h.a NOT IN (g.a,f.a
       AND 
    h.b NOT IN (g.b,f.b)
    LEFT OUTER JOIN part1source i
       
    ON i.a NOT IN (h.a,g.a,f.a
       AND 
    i.b NOT IN (h.b,g.b,f.b)
    LEFT OUTER JOIN part1source j
       
    ON j.a NOT IN (i.a,h.a,g.a,f.a
       AND 
    j.b NOT IN (i.b,h.b,g.b,f.b)
    –the following lines are just a check
    –that there are no s-entry solutions
    LEFT OUTER JOIN part1source k
    ON k.a NOT IN (j.a,i.a,h.a,g.a,f.a
    AND 
    k.b NOT IN (j.b,i.b,h.b,g.b,f.b)
    WHERE j.a IS NOT NULL– change this to 
    –WHERE j.a IS NULL –to get the worst solution!

    EXECUTE (@command)
  4. Adam Machanic says:

    OK, I used only one insert statement ;-)

    DECLARE @i INT
    
    SET 
    @i 1
    WHILE @i >= 0
        
    BEGIN
        DECLARE 
    @s VARCHAR(200)
        
    SELECT
            
    @s ‘ALTER TABLE Part1Dest 
    DROP ’ 
    CONSTRAINT_NAME
            
    FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS
            
    WHERE
                    
    TABLE_NAME ‘Part1Dest’
                
    AND CONSTRAINT_TYPE ‘UNIQUE’
        
    EXEC (@s)
            
    SET @i @i 1
        
    END
    INSERT 
    Part1Dest
        
    SELECT *
        
    FROM Part1Source
        
  5. Steve S says:

    Hmm, some of that SQL is a bit unwieldy. Here’s a method structured around SQL 2005 Common Table Expressions. I guess it could be done as a single big load of JOINs but I like the CTE syntax for its readability:

    ; WITH
    [Sequence] AS (
    – Get a numeric sequence at least as long as the 2^N possible combinations of data points
    select TOP (SELECT POWER(2,COUNT(1)) FROM Part1Source) -1+ROW_NUMBER() OVER (ORDER BY a.name) AS num
    from sys.sysobjects a CROSS JOIN sys.sysobjects b
    ),
    DataPoints AS (
    – Assign a sequential ID to each data point
    SELECT a, b, -1+ROW_NUMBER() OVER (ORDER BY a) AS Position FROM Part1Source
    ),
    ValidCombinations AS (
    – Determine which of the 2^N combinations are valid (i.e. meet uniqueness constraints)
    SELECT [Sequence].num, COUNT(DataPoints.a) AS NumPoints
    FROM DataPoints
    CROSS JOIN [Sequence]
    WHERE POWER(2,DataPoints.Position) & [Sequence].num > 0
    GROUP BY [Sequence].num
    HAVING COUNT(DISTINCT DataPoints.a) = COUNT(DataPoints.a) — All the a’s must be distinct
    AND COUNT(DISTINCT DataPoints.b) = COUNT(DataPoints.a) – All the b’s must be distinct
    ),
    ChosenCombination AS (
    – Pick one of the combinations that contains the most data points
    SELECT TOP 1 num
    FROM ValidCombinations
    ORDER BY NumPoints DESC
    )
    – Insert the data points that correspond to the “best” combination
    INSERT INTO Part1Dest ([a],[b])
    SELECT dp.a, dp.b
    FROM DataPoints dp
    CROSS JOIN ChosenCombination
    WHERE POWER(2, dp.Position) & ChosenCombination.num > 0

  6. SQLzombie says:

    If you dont care about  err messages , this wud be the solution for part 1..

    DECLARE @cur CURSOR 
    DECLARE 
    @c INT,@d INT
        SET 
    @cur = CURSOR fast_forward FOR
    SELECT 

        
    FROM Part1Source
    OPEN @cur
                
    FETCH next 
        
    FROM @cur INTO @c,@d
    WHILE  @@fetch_status 0
        
    BEGIN
        
    rep:
        
    INSERT INTO  Part1Dest 
            
    SELECT @c,@d
        
    IF @@error =4145
                        
    GOTO rep
                        
    FETCH next 
                
    FROM @cur INTO @c,@d
        
    END
                
    
    
    		
    				
  7. JRMorris says:

    This probably isn’t what you meant by “use only one insert”.. but:

    DECLARE @a INT
    DECLARE
    @b INT
    DECLARE
    cur_Uniques CURSOR
             FOR
    SELECT
    a, b
      
    FROM part1source
    OPEN  cur_Uniques
            
    FETCH next
      
    FROM cur_Uniques INTO @a, @b
    WHILE @@fetch_status = 0
      
    BEGIN
       INSERT INTO
    part1dest
          
    SELECT @a, @b
          
    WHERE @a NOT IN (
          
    SELECT a
            
    FROM part1dest)
             AND
    @b NOT IN (
          
    SELECT b
            
    FROM part1dest)
                
    FETCH next
          
    FROM cur_Uniques INTO @a, @b
      
    END
             CLOSE  
    cur_Uniques
            
    DEALLOCATE  cur_Uniques
  8. Drew1022 says:

    It might not be the best code, but then again, SQL isn’t involved in my day job. This is my contribution for part 1. I can probably extend this to work for part 2 as well.

    /************ SOLUTION TO PART 1 ********************/
    
        – I started off by coming up with this statement to 
        – build up a view of the table to give me something 
        – better to key off of and help with finding the 
        – combination that will allow me to insert the most 
        – rows. 
    SELECT (ABS(bCount aCount) + aCount bCountpriority,  
    (CAST(AS VARCHAR) + ‘_’ CAST(AS VARCHAR)) a_b
    (CASE WHEN aCount bCount THEN ELSE ENDaORb
    (
    CASE WHEN aCount bCount THEN ‘b’ ELSE ‘a’ ENDaORbIndicator,
     
    abCounts.*
        
    FROM
        
    (
        
    SELECT a, (
            
    SELECT COUNT(*) 
                
    FROM Part1Source p1s2 
                
    WHERE p1s2.a p1s1.aaCountb, (
            
    SELECT COUNT(*) 
                
    FROM Part1Source p1s3 
                
    WHERE p1s3.b p1s1.bbCount
            
    FROM Part1Source p1s1
        
    AS abCounts
        
    ORDER BY priorityaORb
        
    – I then applied it here to actually select a subset 
        – of the records that is *hopefully* close to the 
        – maximum I can get. I have no idea if this will 
        – always get the most rows no matter what combinations 
        – are in the source table. It gets 5 with the current 
        – data set for this problem. 
    SELECT 
        
    FROM
        
    (
        
    SELECT (ABS(bCount aCount) + aCount bCountpriority,
    (
    CAST(AS VARCHAR) + ‘_’ CAST(AS VARCHAR)) a_b,
    (
    CASE WHEN aCount bCount THEN ELSE ENDaORb,
    (
    CASE WHEN aCount bCount THEN ‘b’ ELSE ‘a’ ENDaORbIndicator,
    abCounts.*       FROM        (
            
    SELECT a, (
                
    SELECT COUNT(*) 
                    
    FROM Part1Source p1s2 
                    
    WHERE p1s2.a p1s1.aaCountb, (
                
    SELECT COUNT(*) 
                    
    FROM Part1Source p1s3 
                    
    WHERE p1s3.b p1s1.bbCount
                
    FROM Part1Source p1s1
            
    AS abCounts
        
    AS sub1
        
    WHERE NOT IN
        
    (
        
    SELECT a
            
    FROM
            
    (
            
    SELECT (ABS(bCount aCount) + aCount bCountpriority,
    (
    CAST(AS VARCHAR) + ‘_’ CAST(AS VARCHAR)) a_b,
    (
    CASE WHEN aCount bCount THEN ELSE ENDaORb,
    (
    CASE WHEN aCount bCount 
    THEN ‘b’ ELSE ‘a’ ENDaORbIndicator,
    abCounts.*
                
    FROM
                
    (
                
    SELECT a, (
                    
    SELECT COUNT(*) 
                        
    FROM Part1Source p1s2 
                        
    WHERE p1s2.a p1s1.aaCountb, (
                    
    SELECT COUNT(*) 
                        
    FROM Part1Source p1s3 
                        
    WHERE p1s3.b p1s1.bbCount
                    
    FROM Part1Source p1s1
                
    AS abCounts
            
    sub2
            
    WHERE sub2.priority sub1.priority
                
    OR
            (
                        
    sub2.priority sub1.priority
                    
    AND sub2.aORb sub1.aORb
            
    )
        )
            AND 
    NOT IN
        
    (
        
    SELECT b
            
    FROM
            
    (
            
    SELECT (ABS(bCount aCount) + aCount bCountpriority,
    (
    CAST(AS VARCHAR) + ‘_’ CAST(AS VARCHAR)) a_b
    (
    CASE WHEN aCount bCount THEN ELSE ENDaORb
    (
    CASE WHEN aCount bCount 
    THEN ‘b’ ElSE ‘a’ ENDaORbIndicator,
    abCounts.*
                
    FROM
                
    (
                
    SELECT a, (
                    
    SELECT COUNT(*) 
                        
    FROM Part1Source p1s2 
                        
    WHERE p1s2.a p1s1.aaCountb, (
                    
    SELECT COUNT(*) 
                        
    FROM Part1Source p1s3 
                        
    WHERE p1s3.b p1s1.bbCount
                    
    FROM Part1Source p1s1
                
    AS abCounts
            
    sub3
            
    WHERE sub3.priority sub1.priority
                
    OR
            (
                        
    sub3.priority sub1.priority
                    
    AND sub3.aORb sub1.aORb
            
    )
        )
     
  9. Drew1022 says:

    Oh man, you guys need an edit button here for me. I forgot to include the little INSERT line I had for it and alter the main select to just select a, b … I was testing it without doing the insert part so that I didn’t have to keep clearing the destination table. I won’t report all my code here with that little difference.

    Maybe I should get back to my COBOL and RPG instead of sitting here monkeying with these things? I don’t think this is in my job description, heh. ;)

  10. im_bob_loucans says:

    Heres what I tried – wasted first half day at work on this – but sometimes just gotta do it – it uses no insert statements and is 2000 and 2005 compatible – i dont know if its bullet proof – maybe i’ll write some unit tests around it for the second half:

    IF EXISTS (SELECT 
              
    FROM   sysobjects 
              
    WHERE  name ‘part1dest’ 
                     
    AND TYPE ‘U’
       
    BEGIN 
           DROP TABLE 
    part1dest 
       
    END 
        
    SELECT   MIN
    (aAS a
            

    INTO     part1dest 
    FROM     (SELECT   AS a
                      
    MIN(bAS 
             
    FROM     (SELECT   MIN(aAS a
                                

                       
    FROM     (SELECT   AS a
                                          
    MIN(bAS 
                                 
    FROM     part1source 
                                 
    GROUP BY a
                       
    GROUP BY 
                                                              
                       
    UNION 
                                          
                       SELECT   MAX
    (aAS a
                                

                       
    FROM     (SELECT   AS a
                                          
    MAX(bAS 
                                 
    FROM     part1source 
                                 
    GROUP BY a
                       
    GROUP BY b
             
    GROUP BY a
    GROUP BY 
    ORDER BY MIN(a), 
            

            
    IF EXISTS (SELECT 
              
    FROM   sysobjects 
              
    WHERE  name ‘part1dest’ 
                     
    AND TYPE ‘U’
       
    BEGIN 
           ALTER TABLE 
    part1dest 
            
    ADD CONSTRAINT uk_a UNIQUEa  
            
           
    ALTER TABLE part1dest 
            
    ADD CONSTRAINT uk_b UNIQUEb  
            
           
    ALTER TABLE part1dest 
            
    ADD CONSTRAINT fk_ab FOREIGN KEY
                
    REFERENCES part1source
       
    END 
  11. honga says:

    The puzzle does not require the logical layer has to be done in SQL. So, Here is my answer for both 2 parts.

    Part1:
    INSERT INTO part1dest (a, b)
    select a,b from Part1Source
    where a*b in (1,6,25,14,0)

    Part2:
    INSERT INTO part2dest (a,b,c)
    select a,b,c from Part2Source
    where a*b*c in (0,2,56,150,198)

  12. im_bob_loucans says:

    Good call Honga – sometimes we waste so much time with generic apply to all situations solutions we forget that the solution is very simple. I bow to you. But it was fun nonetheless.

  13. puzsol says:

    Ok, so I didn’t stick to the ‘only one insert’ requirement… but many of the solutions used a select into, which if you ask me is the same thing, or they were restricted to the solution of the data provided (max limits etc)… and if you are going to do that then Honga’s solution is by far the smallest and simplest (as long as a human can resolve the solution in the first place)…. I liked Phil’s solution the best – basically expanding the solution tree, then picking the rows with the most data – but it is still limited to only working for this dataset…. you could of course add another dynamic layer to make it work for any dataset…. (I’m not about to though)

    Anyway, here is a solution (that doesn’t stick to the ‘rules’, but if you replace the table vars with temp tables, and the inserts into selects, then you could make it fit the ‘rules’)…

    – Part 1
    – declare the tables used to build the solution
    declare @sol_groups table(gn int, a int, b int);
    declare @sol_step table(gn int, a int, b int, rn int);

    – Get the base data
    insert into @sol_groups
    select row_number() over (order by a, b), a, b
    from Part1Source

    Loop:

    – Get the possible values to add for each group
    insert into @sol_step
    select sg.gn, ps.a, ps.b, row_number() over (order by sg.gn, ps.a, ps.b)
    from (select distinct gn as gn from @sol_groups) as sg
    full outer join Part1Source ps on null is null
    where ps.a not in (select a from @sol_groups where gn = sg.gn) and ps.a > (select max(a) from @sol_groups where gn = sg.gn)
    and ps.b not in (select b from @sol_groups where gn = sg.gn)

    – Test if we have reached the end
    if @@rowcount < = 0
    goto Soln

    -- Add the original values to each of the new values (using the row number as the new group number)
    insert into @sol_step
    select ss.rn, sg.a, sg.b, 0
    from @sol_step ss
    inner join @sol_groups sg on sg.gn = ss.gn

    -- Update the new values group numbers to match...
    update @sol_step set gn = rn where rn != 0

    -- Swap the table data over... is there a better way to do this?
    delete from @sol_groups
    insert into @sol_groups
    select gn, a, b from @sol_step
    delete from @sol_step

    -- Keep going...
    goto Loop

    -- Present all the solutions, do the insertion, prove it.
    Soln:
    select * from @sol_groups order by gn, a, b
    insert into Part1Dest
    select a, b from @sol_groups where gn = 1
    select * from Part1Dest

    Of course the selects after the Soln tag are superfluous (just to present something to the user)... and the solution for part 2 is similar:

    declare @sol_groups table(gn int, a int, b int, c int);
    declare @sol_step table(gn int, a int, b int, c int, rn int);

    insert into @sol_groups
    select row_number() over (order by a, b, c), a, b, c
    from Part2Source

    Loop:

    insert into @sol_step
    select sg.gn, ps.a, ps.b, ps.c, row_number() over (order by sg.gn, ps.a, ps.b, ps.c)
    from (select distinct gn as gn from @sol_groups) as sg
    full outer join Part2Source ps on null is null
    where ps.a not in (select a from @sol_groups where gn = sg.gn) and ps.a > (select max(a) from @sol_groups where gn = sg.gn)
    and ps.b not in (select b from @sol_groups where gn = sg.gn)
    and ps.c not in (select c from @sol_groups where gn = sg.gn)

    if @@rowcount <= 0
    goto Soln

    insert into @sol_step
    select ss.rn, sg.a, sg.b, sg.c, 0
    from @sol_step ss
    inner join @sol_groups sg on sg.gn = ss.gn

    update @sol_step set gn = rn where rn != 0

    delete from @sol_groups
    insert into @sol_groups
    select gn, a, b, c from @sol_step

    delete from @sol_step

    goto Loop

    Soln:
    select * from @sol_groups order by gn, a, b, c;

    insert into Part2Dest
    select a, b, c from @sol_groups where gn = 1;

    select * from Part2Dest;

    Hope this stops some other silly person like me wasting time to work out a similar solution.

  14. janelle briggs says:

    – I only have 2000 – so I assume this works on 2005!
    DECLARE @c INT,@d INT
    DECLARE c1 CURSOR FOR SELECT a,b FROM Part1Source
    OPEN c1
    FETCH NEXT FROM c1 INTO @c,@d
    WHILE @@fetch_status = 0
    BEGIN
    INSERT INTO Part1Dest
    SELECT @c,@d
    WHERE NOT EXISTS (SELECT 1 FROM Part1Dest
    WHERE a = @c OR b = @d)
    FETCH NEXT FROM c1 INTO @c,@d
    END
    CLOSE c1
    DEALLOCATE c1

  15. janelle briggs says:

    Sorry – forgot part 2 before, so here it is…
    DECLARE @d INT,@e INT, @f int
    DECLARE c1 CURSOR FOR SELECT a,b,c FROM Part2Source
    OPEN c1
    FETCH NEXT FROM c1 INTO @d,@e,@f
    WHILE @@fetch_status = 0
    BEGIN
    INSERT INTO Part2Dest
    SELECT @d,@e,@f
    WHERE NOT EXISTS (SELECT 1 FROM Part2Dest
    WHERE a = @d OR b = @e OR c = @f)
    FETCH NEXT FROM c1 INTO @d,@e,@f
    END
    CLOSE c1
    DEALLOCATE c1

  16. Mikkel Toudal Kristiansen says:

    How about this one for part 1:

    insert into Part1Dest
    select a, b
    from Part1Source
    where
    ((a = 1) and (b = 1)) or
    ((a = 2) and (b = 3)) or
    ((a = 5) and (b = 5)) or
    ((a = 9) and (b = 0)) or
    ((a = 11) and (b = 2))

    … and this one for part 2:

    insert into Part2Dest
    select a, b, c
    from Part2Source
    where
    ((a = 1) and (b = 1) and (c = 2)) or
    ((a = 2) and (b = 3) and (c = 1)) or
    ((a = 5) and (b = 5) and (c = 6)) or
    ((a = 11) and (b = 2) and (c = 9))

    Of course it works on both SQL 2000 and 2005, and of course it only works on the puzzle data provided …

  17. ChuckNewark says:

    This was the easiest solution I could come up with. They both insert six records into the destinations tables and would work for any data that was loaded into the source tables.

    Insert into Part1Dest
    Select distinct a, (Select Min(b)
    from part1source
    where part1source.a = p1s.a)
    from part1source p1s

    Insert into Part2Dest
    Select distinct a,
    (Select Min(b)
    from part2source p2si
    where p2si.a = p2s.a),
    (select min(c)
    from part2source
    where part2source.b = (Select Min(b) from
    part2source p2si where p2si.a = p2s.a))
    from part2source p2s

  18. ChuckNewark says:

    Okay I was wrong, my tables didn’t have the unique on the B columns

  19. roarfred says:

    This is a slightly different approach, which I think should work…

    The first step is to produce a list of combinations from the source table, and check how many other combinations this entry will block for. Using this, we can make a select which will choose all entries with the fewest blocks, as long as there is no other entry which will block for just as few.

    The key in my solution is this rather simple statement, which gives a list of all source entries, and a number of other combinations which cannot be used if this entry is used:

    SELECT *, (SELECT count(*) FROM Part1Source Ax WHERE (Ax.A = A.a OR Ax.B = A.b) AND (Ax.A <> A.a OR Ax.B <> A.b)) AS BlockCount FROM Part1Source A

    Now, we can use all entries from this list, given:
    1) there is no other entry with same a or b with a lower blockcount
    2) if another entry with same a or same b gives the same blockcount, choose the one with the lower a or b value

    INSERT INTO Part1Dest(a,b)

    SELECT a,b
    FROM (SELECT *, (SELECT count(*) FROM Part1Source Ax WHERE (Ax.A = A.a OR Ax.B = A.b) AND (Ax.A <> A.a OR Ax.B <> A.b)) AS BlockCount FROM Part1Source A) BC
    WHERE NOT EXISTS — another combination with fewer blocks
    (
    SELECT *
    FROM (SELECT *, (SELECT count(*) FROM Part1Source Ax WHERE (Ax.A = A.a OR Ax.B = A.b) AND (Ax.A <> A.a OR Ax.B <> A.b)) AS BlockCount FROM Part1Source A) BC2
    WHERe bc2.blockcount < bc.blockcount and (bc2.a = bc.a or bc2.b = bc.b)
    )
    AND NOT EXISTS -- another combination with same blocks and lower a or b
    (
    SELECT *
    FROM (SELECT *, (SELECT count(*) FROM Part1Source Ax WHERE (Ax.A = A.a OR Ax.B = A.b) AND (Ax.A <> A.a OR Ax.B <> A.b)) AS BlockCount FROM Part1Source A) BC2
    WHERe bc2.blockcount = bc.blockcount
    and (bc2.a = bc.a or bc2.b = bc.b)
    and (bc2.a < bc.a or bc2.b < bc.b)
    )

    Thinking of this… it is probably not a good solution in a circumstances. If an entry only blocks only one other combination, the one being blocked might block for another larger number. The solution would be to recursively find the total number of blocks in the key statement and again use it in the same final statement. This would probably require a deeper level of nesting or the use of a cursor, which would be a last resort in my opinion…

    As another personal opinion, the puzzle should maybe include a few more limitations, such as avoiding shema manipulation.

  20. davekw says:

    Here’s another solution that builds on GSquared’s and roarfred’s.

    [code]
    INSERT INTO Part1Dest
    SELECT * FROM (
    SELECT DISTINCT S1.a,
    (SELECT TOP 1 S2.b FROM Part1Source S2 WHERE (S1.a = S2.a) AND (S2.b NOT IN (
    SELECT (SELECT TOP 1 T2.b FROM Part1Source T2 WHERE (T1.a = T2.a) ORDER BY T2.b)
    FROM Part1Source T1
    WHERE (T1.a < S2.a)))
    ORDER BY S2.b) b
    FROM Part1Source S1 ) X WHERE X.b IS NOT NULL
    [/code]

  21. davekw says:

    Here’s the Part2 solution that builds on the logic above.

    INSERT INTO Part2Dest
    SELECT *
    FROM (
    SELECT DISTINCT S1.a,
    ( SELECT TOP 1 S2.b
    FROM Part2Source S2
    WHERE (S1.a = S2.a) AND (S1.c = S2.c)
    AND (S2.b NOT IN (
    SELECT
    ( SELECT TOP 1 T2.b
    FROM Part2Source T2
    WHERE (T1.a = T2.a) OR (T1.c = T2.c)
    ORDER BY T2.b
    )
    FROM Part2Source T1
    WHERE (T1.a < S2.a)
    )
    )
    ORDER BY S2.b
    ) b,
    ( SELECT TOP 1 S2.c
    FROM Part2Source S2
    WHERE (S1.a = S2.a) AND (S1.b = S2.b)
    AND (S2.c NOT IN (
    SELECT
    ( SELECT TOP 1 T2.c
    FROM Part2Source T2
    WHERE (T1.a = T2.a) OR (T1.b = T2.b)
    ORDER BY T2.c
    )
    FROM Part2Source T1
    WHERE (T1.a < S2.a)
    )
    )
    ORDER BY S2.c
    ) c
    FROM Part2Source S1
    ) X
    WHERE (X.b IS NOT NULL) AND (X.c IS NOT NULL)

  22. davekw says:

    The nested SELECTs in the above solutions ensure that when selecting a “b” value you don’t select a row where an “a” value has already been used.

    Similarly for part 2, but with the added complication of the “c” value.

  23. Steve S says:

    Davekw: try your part 1 code with this input:
    INSERT INTO @Part1Source VALUES (1, 0)
    INSERT INTO @Part1Source VALUES (1, 1)
    INSERT INTO @Part1Source VALUES (2, 0)

  24. roarfred says:

    Interesting solution, davekw, but your solution seems to ensure uniqueness rather than a high number of entries…

    Add the two combinations 1,10 and 10,1 to the Source table to see what I mean.

  25. roarfred says:

    or Steve’s numbers, of course :)

  26. davekw says:

    Thanks guys, it seems you are right.

    roarfred: Your solution seems to have the same problem that mine has: ensuring uniqueness rather than high number of entries. Add the following to the original Part 1 data: (3, 2) (3, 3) (3, 4). Your solution returns 5 rows, but I think there are at least 6 number of unique entries.

  27. davekw says:

    It seems I had one of the checks the wrong way round. < instead of >

    Try this… Maybe you guys will still break it. :)

    INSERT INTO Part1Dest
    SELECT * FROM (
    SELECT DISTINCT S1.a, (SELECT TOP 1 S2.b FROM Part1Source S2 WHERE (S1.a = S2.a) AND (S2.b NOT IN (
    SELECT (SELECT TOP 1 T2.b FROM Part1Source T2 WHERE (T1.a = T2.a) ORDER BY T2.b DESC)
    FROM Part1Source T1
    WHERE (T1.a > S2.a)))
    ORDER BY S2.b DESC) b
    FROM Part1Source S1) X
    WHERE X.b IS NOT NULL

  28. roarfred says:

    davekw, it still seems like there is a problem if you use the following values:
    INSERT INTO Part1Source VALUES (1,2)
    INSERT INTO Part1Source VALUES (3,2)
    INSERT INTO Part1Source VALUES (3,4)
    INSERT INTO Part1Source VALUES (5,4)
    INSERT INTO Part1Source VALUES (5,6)
    INSERT INTO Part1Source VALUES (7,6)

    Even worse, my attempt gives only two values from this one :( There is clearely a problem with determining the consequence of choosing one entry over the other

  29. davekw says:

    Hehe, back to the drawing board for both of us. I think I’ll go for a new approach. I did like your idea of a Block Count – maybe there is something there?

  30. davekw says:

    Looking at it again… I think my solution was almost correct, except that it sometimes returned duplicate “b” values. If I’m right then the solution is to group by “b” (see below).

    Or maybe I’m wrong???

    Anyway… here it is:

    INSERT INTO Part1Dest
    SELECT MIN(a) AS a, b
    FROM (
    SELECT DISTINCT S1.a,
    ( SELECT TOP 1 S2.b
    FROM Part1Source S2
    WHERE (S1.a = S2.a)
    AND (S2.b NOT IN (
    SELECT
    ( SELECT TOP 1 T2.b
    FROM Part1Source T2
    WHERE (T1.a = T2.a)
    ORDER BY T2.b DESC
    )
    FROM Part1Source T1
    WHERE (T1.a > S2.a)
    )
    )
    ORDER BY S2.b DESC
    ) b
    FROM Part1Source S1
    ) X
    WHERE X.b IS NOT NULL
    GROUP BY b
    ORDER BY a

  31. LZ007 says:

    Here is a solution for part1:

    ———
    insert into Part1Dest
    select a, (1- sign(a-1))*min(b)+sign(a-1)*max(b)
    from part1source
    where a <10
    group by a

    ——-

    And herer is a solution for part2
    ——-
    insert into Part2Dest
    select a, (1- sign(a-1))*min(b)+sign(a-1)*max(b),(1- sign(a-1))*min(c)+sign(a-1)*max(c)
    from Part2Source
    where a !=7
    group by a

    ——-

  32. scott2718281828 says:

    Here’s a pair of SQL 2005-only solutions:

    INSERT INTO Part1Dest
    SELECT a, b
    FROM (
    SELECT p.a, p.b,
    ROW_NUMBER() OVER (PARTITION BY p.a ORDER BY na + nb) ra,
    ROW_NUMBER() OVER (PARTITION BY p.b ORDER BY na + nb) rb
    FROM Part2Source p
    INNER JOIN (SELECT a, COUNT(*) as na FROM Part2Source GROUP BY a) AS ac ON ac.a = p.a
    INNER JOIN (SELECT b, COUNT(*) as nb FROM Part2Source GROUP BY b) AS bc ON bc.b = p.b
    ) x
    WHERE ra = 1 AND rb = 1

    INSERT INTO Part2Dest
    SELECT a, b, c
    FROM (
    SELECT p.a, p.b, p.c,
    ROW_NUMBER() OVER (PARTITION BY p.a ORDER BY na + nb + nc) ra,
    ROW_NUMBER() OVER (PARTITION BY p.b ORDER BY na + nb + nc) rb,
    ROW_NUMBER() OVER (PARTITION BY p.c ORDER BY na + nb + nc) rc
    FROM Part2Source p
    INNER JOIN (SELECT a, COUNT(*) as na FROM Part2Source GROUP BY a) AS ac ON ac.a = p.a
    INNER JOIN (SELECT b, COUNT(*) as nb FROM Part2Source GROUP BY b) AS bc ON bc.b = p.b
    INNER JOIN (SELECT c, COUNT(*) as nc FROM Part2Source GROUP BY c) AS cc ON cc.c = p.c
    ) x
    WHERE ra = 1 AND rb = 1 AND rc = 1

  33. im_bob_loucans says:

    Here’s a recursive approach to part one. I fried my brain to see if there was a way to do this recursively without mulitple insert executions. but syntactically its still one insert.

    IF EXISTS (SELECT *
    FROM sysobjects
    WHERE name = ‘inserter’
    AND TYPE = ‘P’)
    BEGIN
    DROP PROC inserter
    END

    GO

    CREATE PROC inserter
    AS
    BEGIN
    DECLARE @a INT,
    @b INT

    SELECT TOP 1 @a = a,
    @b = b
    FROM part1source
    WHERE a NOT IN (SELECT DISTINCT a
    FROM part1dest)
    AND b NOT IN (SELECT DISTINCT b
    FROM part1dest)
    ORDER BY a DESC,
    b DESC

    IF @a IS NOT NULL
    AND @b IS NOT NULL
    BEGIN
    INSERT INTO part1dest
    VALUES (@a,
    @b)

    EXEC inserter
    END
    END
    GO

    EXEC inserter
    GO

  34. Steve S says:

    scott217…. try your part one solution with this data:
    INSERT INTO @Part1Source VALUES (0, 1)
    INSERT INTO @Part1Source VALUES (1, 0)
    INSERT INTO @Part1Source VALUES (1, 1)
    INSERT INTO @Part1Source VALUES (2, 0)
    INSERT INTO @Part1Source VALUES (0, 2)

  35. Marx says:

    How about this below insert statement. Optionally you can use MAX instead of MIN. Now, i read, the requirement is get ‘as much as data’ into Part1Dest and NOT ALL of them necessarily :) .

    INSERT INTO Part1Dest
    SELECT MIN(a)’a', b
    FROM (SELECT a,MIN(b)’b’ FROM Part1Source GROUP BY a) a
    GROUP BY b

    Marx

  36. Lee says:

    /* I didn’t see any rule about not using dynamic T-SQL… */

    DECLARE @sql VARCHAR (8000)
    , @delim VARCHAR (10)

    SELECT @sql = ‘INSERT INTO Part1Dest’
    , @delim = ”

    SELECT @sql = @sql
    + CASE
    WHEN @sql LIKE ‘% ‘ + CONVERT (VARCHAR, a) + ‘,%’
    OR @sql LIKE ‘%, ‘ + CONVERT (VARCHAR, b) + ‘%’
    THEN ”
    ELSE @delim
    END
    + CASE
    WHEN @sql LIKE ‘% ‘ + CONVERT (VARCHAR, a) + ‘,%’
    OR @sql LIKE ‘%, ‘ + CONVERT (VARCHAR, b) + ‘%’
    THEN ”
    ELSE ‘ SELECT ‘ + CONVERT (VARCHAR, a)
    + ‘, ‘ + CONVERT (VARCHAR, b)
    END
    , @delim = ‘ UNION ALL’
    FROM Part1Source
    ORDER BY a
    , b

    EXEC (@sql)

  37. puzsol says:

    I like the way LZ007 managed to hide his real code, using:

    insert into Part1Dest
    select a, (1- sign(a-1))*min(b)+sign(a-1)*max(b)
    from part1source
    where a <10
    group by a

    Rather than:

    insert into Part1Dest
    select a, case when a = 1 then min(b) else max(b) end
    from part1source
    where a < 10
    group by a

    Which is much more readable by us mere mortals…. lol

  38. davekw says:

    Also, LZ007′s solution doesn’t guarantee uniqueness. Try some of the additional test cases discussed above. E.g. add (3, 2) (3, 3) (3, 4) to the original Part 1 test data.

  39. Lee says:

    /* the same approach works for the second part of the puzzle */

    DECLARE @sql VARCHAR (8000)
    , @delim VARCHAR (10)

    SELECT @sql = ‘INSERT INTO Part2Dest’
    , @delim = ”

    SELECT @sql = @sql
    + CASE
    WHEN @sql LIKE ‘%SELECT ‘ + CONVERT (VARCHAR, a) + ‘,%’
    OR @sql LIKE ‘%, ‘ + CONVERT (VARCHAR, b) + ‘, %’
    OR @sql LIKE ‘%, ‘ + CONVERT (VARCHAR, c) + ‘[^,]%’
    THEN ”
    ELSE @delim
    END
    + CASE
    WHEN @sql LIKE ‘%SELECT ‘ + CONVERT (VARCHAR, a) + ‘,%’
    OR @sql LIKE ‘%, ‘ + CONVERT (VARCHAR, b) + ‘, %’
    OR @sql LIKE ‘%, ‘ + CONVERT (VARCHAR, c) + ‘[^,]%’
    THEN ”
    ELSE ‘ SELECT ‘ + CONVERT (VARCHAR, a)
    + ‘, ‘ + CONVERT (VARCHAR, b)
    + ‘, ‘ + CONVERT (VARCHAR, c)
    END
    , @delim = ‘ UNION ALL’
    FROM Part2Source
    ORDER BY a
    , b

    EXEC (@sql)

  40. bhushanvinay says:

    /****** Object: Table [dbo].[Table_1_dest] Script Date: 11/02/2007 14:42:44 ******/
    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N’[dbo].[Table_1_dest]‘) AND type in (N’U'))
    DROP TABLE [dbo].[Table_1_dest]

    /****** Object: Table [dbo].[Table_1_src] Script Date: 11/02/2007 14:42:32 ******/
    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N’[dbo].[Table_1_src]‘) AND type in (N’U'))
    DROP TABLE [dbo].[Table_1_src]

    /****** Object: Table [dbo].[Table_1_src] Script Date: 11/02/2007 14:43:08 ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO

    CREATE TABLE [dbo].[Table_1_src](
    [a] [int] NOT NULL,
    [b] [int] NOT NULL,
    CONSTRAINT [PK_Table_1_src] PRIMARY KEY CLUSTERED
    (
    [a] ASC,
    [b] ASC
    )WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
    ) ON [PRIMARY]

    GO

    CREATE TABLE [dbo].[Table_1_dest](
    [a] [int] NOT NULL,
    [b] [int] NOT NULL
    ) ON [PRIMARY]

    GO
    ALTER TABLE [dbo].[Table_1_dest]
    WITH CHECK ADD CONSTRAINT [FK_Table_1_dest_Table_1_src]
    FOREIGN KEY([a], [b])
    REFERENCES [dbo].[Table_1_src] ([a], [b])
    GO
    ALTER TABLE [dbo].[Table_1_dest]
    CHECK CONSTRAINT [FK_Table_1_dest_Table_1_src]

    GO

    DELETE FROM TABLE_1_DEST;
    DELETE FROM TABLE_1_SRC;
    GO
    INSERT INTO TABLE_1_SRC VALUES (1,1)
    INSERT INTO TABLE_1_SRC VALUES (1,2)
    INSERT INTO TABLE_1_SRC VALUES (2,3)
    INSERT INTO TABLE_1_SRC VALUES (7,2)
    INSERT INTO TABLE_1_SRC VALUES (2,4)
    INSERT INTO TABLE_1_SRC VALUES (5,5)
    INSERT INTO TABLE_1_SRC VALUES (5,1)
    INSERT INTO TABLE_1_SRC VALUES (5,3)
    INSERT INTO TABLE_1_SRC VALUES (9,0)
    INSERT INTO TABLE_1_SRC VALUES (11,2)
    go

    INSERT INTO TABLE_1_DEST ( a,b)

    SELECT SOURCE.a, SOURCE.b
    from TABLE_1_SRC as source

  41. Lee says:

    Oops. My posted solution to #2 isn’t ideal, as coded. It gets more hits with ORDER BY DESC for each variable. I can see I didn’t really understand this problem, earlier. There is probably an optimum order to insert the input rows, and the trick is to come up with the SELECT statement that performs the best analysis. Good puzzle! Wish I had time to try again.

  42. KenJ says:

    Unlike the illustrious Mr. Factor, I decided to tackle part 1 first because it looked easier (I’m rather simple in the T-SQL department).

    My first thought was something iterative, but that’s already been well addressed. I’m also not a huge fan of the more, uh, flexible ways of executing SQL statements, so sp_executeSQL / EXEC was out.

    The following works with the puzzle’s sample, as well as the alternate data set provided by Steve S, so I hope it’s a general purpose solution.

    I used a view to keep things short, so it won’t pass the schema manipulation criterion proposed by one commenter. I guess you ‘could’ reproduce the view’s code everywhere the view is referenced, but that is just too much maintenance joy for me (as it could, no doubt, already use some maintenance).

    Well, here goes. I have this nagging feeling that I’ve missed something obvious, so let me know how you would improve it:

    --- we'll need this query two times, no, thrice
    
    --- let's use a view to stay concise
    CREATE VIEW v1 AS
    --- get one 'a' for every 'b'
    --- but this allows dupe 'a's, you'll see
      
    SELECT t0.at0.b
      
    FROM (SELECT MAX(aAS ab
            
    FROM Part1Source AS p1
            
    GROUP BY bAS t0
    --- we'll do a join, one 'b' per 'a'
         
    INNER JOIN (SELECT aMAX(b)  AS B
                     
    FROM Part1Source AS p2
                     
    GROUP BY aAS t1
    --- then in our ON, an AND we'll play
         
    ON t0.a t1.a AND t0.b t1.b
    --- this ensures our unique 'b's
    --- have unique 'a's, constraints to please
    GO
     
    INSERT Part1Dest (ab)
    --- we've cleaned up dupes, let's find the rest
    SELECT MAX(p1.aAS ap1.b
    FROM Part1Source AS p1
    --- to prevent more, we'll make a test
    WHERE
    --- we don't want 'a's we've seen before
       
    p1.a NOT IN (SELECT FROM v1)
    --- and extra 'b's, we must abhor
       
    AND p1.b NOT IN (SELECT FROM v1)
    GROUP BY p1.b
    UNION
    --- then cleaned up dupes, let's not forget
    SELECT aFROM v1
    --- now we should have a complete set
    GO
     
    SELECT AS AFROM Part1Dest
    GO
     
    DELETE Part1Dest
    GO
     
    DROP VIEW v1
    GO
  43. KenJ says:

    And, I obviously have no idea how to post formatted code in a comment. My apologies for the aberration that was once my beautifully formatted query.

    — we’ll need this query two times, no, thrice
    — let’s use a view to stay concise
    CREATE VIEW v1 AS
    — get one ‘a’ for every ‘b’
    — but this allows dupe ‘a’s, you’ll see
    SELECT t0.a, t0.b
    FROM (SELECT MAX(a) AS a, b
    FROM Part1Source AS p1
    GROUP BY b) AS t0
    — we’ll do a join, one ‘b’ per ‘a’
    INNER JOIN (SELECT a, MAX(b) AS B
    FROM Part1Source AS p2
    GROUP BY a) AS t1
    — then in our ON, an AND we’ll play
    ON t0.a = t1.a AND t0.b = t1.b
    — this ensures our unique ‘b’s
    — have unique ‘a’s, constraints to please
    GO

    INSERT Part1Dest (a, b)
    — we’ve cleaned up dupes, let’s find the rest
    SELECT MAX(p1.a) AS a, p1.b
    FROM Part1Source AS p1
    — to prevent more, we’ll make a test
    WHERE
    — we don’t want ‘a’s we’ve seen before
    p1.a NOT IN (SELECT a FROM v1)
    — and extra ‘b’s, we must abhor
    AND p1.b NOT IN (SELECT b FROM v1)
    GROUP BY p1.b
    UNION
    — then cleaned up dupes, let’s not forget
    SELECT a, b FROM v1
    — now we should have a complete set
    GO

    SELECT a AS A, b FROM Part1Dest
    GO

    DELETE Part1Dest
    GO

    DROP VIEW v1
    GO

  44. davekw says:

    Sorry to have to tell you this KenJ, but your solution doesn’t guarantee uniqueness. The following combination breaks: (2,1) (2,2) (3,2) (3,3)

  45. miro65 says:

    I found solution from scott2718281828 very elegant. So I was interested, if it general, I mean, if it still produce optimal solution if we change data. But the answer is no. If we omit data –INSERT INTO Part2Source VALUES (11,2,9), we get only 3 rows as solution. That is not OK, because if we omit 1 row in source, we should lose max 1 row in solution.
    Some idea how to generalize it?
    LP

  46. TonyBater says:

    Actually none of these solutions will work. The task was to MOVE as much data from the source tabes to the destination tables, implying that once inserted into the destination the data is deleted from the source. As there is a foreign key relationship from the destination into the source, it is not possible to ‘move’ any data from source to destination!

  47. KenJ says:

    Thanks, davekw. I figured there was something fundamentally wrong with the query :)

  48. HAL J. says:

    while 1=1 begin
    insert Part1Dest
    select top 1 *
    from Part1Source a
    where not exists( select * from Part1Dest b where b.a = a.a or a.b = b.b )
    if @@rowcount = 0 break
    end
    – and for part 2 you’d just expand the criteria to a.c = b.c
    – I figured “Move” meant “Copy”, otherwise chuck this too into the ether
    – some of you definitely win the Rube Goldberg award for T-SQL

  49. chris.leonard says:

    Unlike Phil, I started with the simple one, and unlike most all of you, I’m just too darned lazy to do a SQL 2000 solution. Well, today anyway. :o )

    Here is a SQL 2005 solution that should work for any data entered. I’m not saying the code can’t be improved upon, just that the solution seems to work, and is written from the ground up to accommodate any data.

    WITH cte AS (
    SELECT
    ps.a,
    ps.b,
    1 AS LEVEL,
    ‘.’ + CAST(ROW_NUMBER() OVER (ORDER BY a ) AS VARCHAR(MAX)) + ‘.’ AS RowID,
    ‘.’ + CAST(ps.a AS VARCHAR(MAX)) + ‘.’ AS APATH,
    ‘.’ + CAST(ps.b AS VARCHAR(MAX)) + ‘.’ AS BPATH
    FROM [dbo].[Part1Source] AS ps

    UNION ALL

    SELECT
    ps2.a,
    ps2.b,
    cte.level + 1 AS LEVEL,
    cte.RowID + CAST(ROW_NUMBER() OVER (ORDER BY ps2.a) AS VARCHAR(MAX)) + ‘.’ AS RowID,
    cte.APATH + CAST(ps2.a AS VARCHAR(MAX)) + ‘.’ AS APATH,
    cte.BPATH + CAST(ps2.b AS VARCHAR(MAX)) + ‘.’ AS BPATH
    FROM [dbo].[Part1Source] AS ps2
    JOIN cte
    ON cte.APATH NOT LIKE ‘%.’ + CAST(ps2.a AS VARCHAR(MAX)) + ‘.%’
    AND cte.BPATH NOT LIKE ‘%.’ + CAST(ps2.b AS VARCHAR(MAX)) + ‘.%’
    )
    INSERT INTO [dbo].[Part1Dest] (a,b)
    SELECT a,b
    FROM cte
    WHERE (SELECT TOP 1 RowID FROM cte ORDER BY LEVEL DESC) LIKE RowID + ‘%’
    ORDER BY a, b
    OPTION (MAXRECURSION 0)

    If you want to see how it works, replace the “INSERT INTO [dbo].[Part1Dest] (a,b) SELECT a,b” part with “SELECT *” and you should be able to work it out from there.

    Here is the same idea for part 2:

    WITH cte AS (
    SELECT
    ps.a,
    ps.b,
    ps.c,
    1 AS LEVEL,
    ‘.’ + CAST(ROW_NUMBER() OVER (ORDER BY a ) AS VARCHAR(MAX)) + ‘.’ AS RowID,
    ‘.’ + CAST(ps.a AS VARCHAR(MAX)) + ‘.’ AS APATH,
    ‘.’ + CAST(ps.b AS VARCHAR(MAX)) + ‘.’ AS BPATH,
    ‘.’ + CAST(ps.c AS VARCHAR(MAX)) + ‘.’ AS CPATH
    FROM [dbo].[Part2Source] AS ps

    UNION ALL

    SELECT
    ps2.a,
    ps2.b,
    ps2.c,
    cte.level + 1 AS LEVEL,
    cte.RowID + CAST(ROW_NUMBER() OVER (ORDER BY ps2.a) AS VARCHAR(MAX)) + ‘.’ AS RowID,
    cte.APATH + CAST(ps2.a AS VARCHAR(MAX)) + ‘.’ AS APATH,
    cte.BPATH + CAST(ps2.b AS VARCHAR(MAX)) + ‘.’ AS BPATH,
    cte.CPATH + CAST(ps2.c AS VARCHAR(MAX)) + ‘.’ AS CPATH
    FROM [dbo].[Part2Source] AS ps2
    JOIN cte
    ON cte.apath NOT LIKE ‘%.’ + CAST(ps2.a AS VARCHAR(MAX)) + ‘.%’
    AND cte.bpath NOT LIKE ‘%.’ + CAST(ps2.b AS VARCHAR(MAX)) + ‘.%’
    AND cte.cpath NOT LIKE ‘%.’ + CAST(ps2.c AS VARCHAR(MAX)) + ‘.%’
    )
    INSERT INTO Part2Dest (a,b,c)
    SELECT a,b,c
    FROM cte
    WHERE (SELECT TOP 1 RowID FROM cte ORDER BY LEVEL DESC) LIKE RowID + ‘%’
    ORDER BY a, b, c
    OPTION (MAXRECURSION 0)

    Cheers,
    Chris Leonard
    databaseguy.com / godaddy.com / [sometimes] microsoft.com

  50. Ryan Randall says:

    [code]
    --part 1
    DECLARE @i INT; SELECT @i = POWER(2, COUNT(*)) FROM Part1Source
    ; WITH
    t (i) AS (SELECT 1 UNION ALL SELECT i + 1 FROM t WHERE i < @i-1),
    u AS (SELECT a, b, ROW_NUMBER() OVER (ORDER BY a)-1 AS c FROM Part1Source),
    v AS (SELECT * FROM t CROSS JOIN u WHERE POWER(2, u.c) & t.i > 0)
    INSERT Part1Dest
    SELECT a, b FROM v WHERE i = (SELECT TOP 1 i FROM v GROUP BY i
    HAVING COUNT(*) = COUNT(DISTINCT a) AND COUNT(*) = COUNT(DISTINCT b) ORDER BY COUNT(*) DESC)
    OPTION (MAXRECURSION 0)

    --part 2
    DECLARE @j INT; SELECT @j = POWER(2, COUNT(*)) FROM Part2Source
    ; WITH
    t (i) AS (SELECT 1 UNION ALL SELECT i + 1 FROM t WHERE i < @j-1),
    u AS (SELECT *, ROW_NUMBER() OVER (ORDER BY a)-1 AS d FROM Part2Source),
    v AS (SELECT * FROM t CROSS JOIN u WHERE POWER(2, u.d) & t.i > 0)
    INSERT Part2Dest
    SELECT a, b, c FROM v WHERE i = (SELECT TOP 1 i FROM v GROUP BY i
    HAVING COUNT(*) = COUNT(DISTINCT a) AND COUNT(*) = COUNT(DISTINCT b)
    AND COUNT(*) = COUNT(DISTINCT c) ORDER BY COUNT(*) DESC)
    OPTION (MAXRECURSION 0)

    --results
    SELECT * FROM Part1Dest
    SELECT * FROM Part2Dest

    /*
    a b
    ----------- -----------
    1 1
    2 4
    5 3
    7 2
    9 0

    a b c
    ----------- ----------- -----------
    1 1 2
    2 4 7
    5 3 4
    9 0 1
    11 2 9
    */
    [/code]

  51. Jack the DBA says:

    This appears to work. I get 5 rows and it will work on 2005 or 2000. For a 2005 solution I am sure I could make it simpler using CTE’s and probably the INTERSECT and EXCEPT operators

    insert into part1dest
    Select
    T.*
    From
    part1source T
    Where
    T.a Not in (
    Select
    A.a
    From
    (Select
    A.*
    From
    part1source A
    Where
    A.a Not In (Select a From part1source B group By a having count(*) > 1)
    Union
    Select
    A.*
    From
    part1source A
    Where
    A.b Not In (Select b From part1source B group By b having count(*) > 1)) A
    Where
    A.a Not In
    (Select
    Min(a)
    From
    (Select
    A.*
    From
    part1source A
    Where
    A.a Not In (Select a From part1source B group By a having count(*) > 1)
    Union
    Select
    A.*
    From
    part1source A
    Where
    A.b Not In (Select b From part1source B group By b having count(*) > 1)) U
    Group By
    b
    Having Count(*) > 1)) And
    T.b not in (Select
    A.a
    From
    (Select
    A.*
    From
    part1source A
    Where
    A.a Not In (Select a From part1source B group By a having count(*) > 1)
    Union
    Select
    A.*
    From
    part1source A
    Where
    A.b Not In (Select b From part1source B group By b having count(*) > 1)) A
    Where
    A.a Not In
    (Select
    Min(a)
    From
    (Select
    A.*
    From
    part1source A
    Where
    A.a Not In (Select a From part1source B group By a having count(*) > 1)
    Union
    Select
    A.*
    From
    part1source A
    Where
    A.b Not In (Select b From part1source B group By b having count(*) > 1)) U
    Group By
    b
    Having Count(*) > 1))
    Union
    Select
    A.*
    From
    (Select
    A.*
    From
    part1source A
    Where
    A.a Not In (Select a From part1source B group By a having count(*) > 1)
    Union
    Select
    A.*
    From
    part1source A
    Where
    A.b Not In (Select b From part1source B group By b having count(*) > 1)) A
    Where
    A.a Not In
    (Select
    Min(a)
    From
    (Select
    A.*
    From
    part1source A
    Where
    A.a Not In (Select a From part1source B group By a having count(*) > 1)
    Union
    Select
    A.*
    From
    part1source A
    Where
    A.b Not In (Select b From part1source B group By b having count(*) > 1)) U
    Group By
    b
    Having Count(*) > 1)

  52. dbabob says:

    I may be mistaken, but I this this works.

    INSERT INTO part1dest
    SELECT MAX(a) AS a,b
    FROM (
    SELECT a,MAX(b) AS b
    FROM part1source
    GROUP BY a) AS c
    GROUP BY b

    INSERT INTO part2dest
    SELECT MAX(a) AS a, MAX(b) AS b, c FROM (
    SELECT MAX(a) AS a,b,MAX(c) AS c from
    (SELECT a,MAX(b) AS b,MAX(c) AS c
    FROM part2source
    GROUP BY a) AS a
    GROUP BY b) AS c
    GROUP BY c

  53. dbabob says:

    Forget that…. Missed a few values :(

  54. Dean says:

    Jack the DBA, Try this:

    INSERT INTO Part1Source VALUES (1,2)
    INSERT INTO Part1Source VALUES (1,3)

  55. Dean says:

    For all of you who think you have the solution, what do you get if you try this test case?

    INSERT INTO Part1Source VALUES (1,2)
    INSERT INTO Part1Source VALUES (8,9)
    INSERT INTO Part1Source VALUES (1,4)
    INSERT INTO Part1Source VALUES (5,2)
    INSERT INTO Part1Source VALUES (8,6)
    INSERT INTO Part1Source VALUES (7,9)

    If you get less than 4 rows, then it’s time to rethink. Remember, Lionel said, “as much of the data as you can”.

    Chris Leonard, it seems you didn’t overlook this part of the puzzle. Your SQL appears to generate every possible valid combination of rows, and then selects the largest. Nice.

  56. Ryan Randall says:

    –data
    DELETE FROM Part1Dest
    DELETE FROM Part1Source

    INSERT INTO Part1Source VALUES (1,2)
    INSERT INTO Part1Source VALUES (8,9)
    INSERT INTO Part1Source VALUES (1,4)
    INSERT INTO Part1Source VALUES (5,2)
    INSERT INTO Part1Source VALUES (8,6)
    INSERT INTO Part1Source VALUES (7,9)

    –part 1
    DECLARE @i INT; SELECT @i = POWER(2, COUNT(*)) FROM Part1Source
    ; WITH
    t (i) AS (SELECT 1 UNION ALL SELECT i + 1 FROM t WHERE i < @i-1),
    u AS (SELECT a, b, ROW_NUMBER() OVER (ORDER BY a)-1 AS c FROM Part1Source),
    v AS (SELECT * FROM t CROSS JOIN u WHERE POWER(2, u.c) & t.i > 0)
    INSERT Part1Dest
    SELECT a, b FROM v WHERE i = (SELECT TOP 1 i FROM v GROUP BY i
    HAVING COUNT(*) = COUNT(DISTINCT a) AND COUNT(*) = COUNT(DISTINCT b) ORDER BY COUNT(*) DESC)
    OPTION (MAXRECURSION 0)

    –results
    SELECT * FROM Part1Dest

    /*
    a b
    ———– ———–
    1 4
    5 2
    7 9
    8 6
    */

  57. ralphcroce_philly says:

    Lionel,

    Here’s my shot at puzzle 7, 1st part (the part with 2 fields). Done using SS2000, so obviously works for SS2005 also. Tomorrow I will throw in my shot at the 2nd part (3 fields). Will throw in some comments to attempt at making it a bit intelligible, asap. BTW: your puzzles are pretty awesome….are you sure you are from this planet, originally? Amazing stuff. Ok, here’s my shot at puzzle 7, 1st section. With your specified data records as source data, this generates 5 inserts.

    SELECT MIN(a) AS a, b FROM
    (
    SELECT a, MIN(b) AS b FROM
    (
    SELECT x.a, x.b FROM
    (
    SELECT s1.a, s1.b, COUNT(*) AS Ct FROM part1source s1
    INNER JOIN part1source s2 ON s1.a = s2.a
    INNER JOIN part1source s3 ON s1.b = s3.b
    GROUP BY s1.a, s1.b
    ) x
    INNER JOIN
    (
    SELECT a, MIN(Ct) AS MinCt FROM
    (
    SELECT s1.a, COUNT(*) AS Ct FROM part1source s1
    INNER JOIN part1source s2 ON s1.a = s2.a
    INNER JOIN part1source s3 ON s1.b = s3.b
    GROUP BY s1.a, s1.b
    ) a1
    GROUP BY a
    ) y
    ON x.a = y.a AND x.Ct = y.MinCt
    INNER JOIN
    (
    SELECT b, MIN(Ct) AS MinCt FROM
    (
    SELECT s1.b, COUNT(*) AS Ct FROM part1source s1
    INNER JOIN part1source s2 ON s1.a = s2.a
    INNER JOIN part1source s3 ON s1.b = s3.b
    GROUP BY s1.a, s1.b
    ) b1
    GROUP BY b
    ) z
    ON x.b = z.b AND x.Ct = z.MinCt
    ) am
    GROUP BY a
    ) bm
    GROUP BY b

    The 3-field version is coming later, along with comments to explain the logic. Thanks.

  58. orcus says:

    be nice to use a table variable rather than a temporary but in 2000…
    obviously a third set of predicates into the temporary and a longer concat for three columns

    Declare @reason as sysname

    –list of possible preceding inserts
    Select cast(A.A as sysname) + ‘,’ + cast(A.B as sysname) [Exclusion] ,cast(B.A as sysname) + ‘,’ + cast(B.B as sysname) [Reason]
    INTO #Constraints
    from part1source A,
    part1source B
    WHERE ((A.A = B.A AND B.B < A.B)
    OR (A.B = B.B AND B.A < A.A))

    – remove the preceding conflicts that refer to rows all ready excluded
    While 1=1 begin
    set @reason = ”
    select top 1 @reason = reason from #constraints where reason in (select exclusion from #constraints) Group By reason ORDER by count(*) desc
    IF @reason = ” BREAK
    Delete from #constraints where reason = @reason
    END

    INSERT part1dest
    select a,b
    FROM part1source
    WHERE cast(A as sysname) + ‘,’ + cast(B as sysname) NOT IN (Select Exclusion From #Constraints)

    Drop table #Constraints

  59. Ryan Randall says:

    ralphcroce_philly – I’m afraid your suggestion falls short with…

    INSERT INTO Part1Source VALUES (1,1)
    INSERT INTO Part1Source VALUES (1,2)
    INSERT INTO Part1Source VALUES (2,3)
    INSERT INTO Part1Source VALUES (7,2)
    INSERT INTO Part1Source VALUES (2,4)
    INSERT INTO Part1Source VALUES (5,5)
    INSERT INTO Part1Source VALUES (5,1)
    INSERT INTO Part1Source VALUES (5,3)
    INSERT INTO Part1Source VALUES (9,0)
    INSERT INTO Part1Source VALUES (11,2)

    INSERT INTO Part1Source VALUES (3,2)
    INSERT INTO Part1Source VALUES (3,3)
    INSERT INTO Part1Source VALUES (3,4)

  60. Ryan Randall says:

    orcus – I’m afraid your suggestion falls short with…

    INSERT INTO Part1Source VALUES (1,0)
    INSERT INTO Part1Source VALUES (1,1)
    INSERT INTO Part1Source VALUES (2,0)

  61. Ryan Randall says:

    HAL J. – I’m afraid your suggestion falls short with…

    INSERT INTO Part1Source VALUES (1,0)
    INSERT INTO Part1Source VALUES (1,1)
    INSERT INTO Part1Source VALUES (2,0)

  62. Ryan Randall says:

    GSquared – I’m afraid your suggestion fails with…

    INSERT INTO Part1Source VALUES (1,2)
    INSERT INTO Part1Source VALUES (3,2)
    INSERT INTO Part1Source VALUES (3,4)
    INSERT INTO Part1Source VALUES (5,4)
    INSERT INTO Part1Source VALUES (5,6)
    INSERT INTO Part1Source VALUES (7,6)

  63. Ryan Randall says:

    As far as I can tell (i.e. correct me if I’m wrong), the only solutions which work (according to my tests) are:

    Steve S’s
    chris.leonard’s (nicely done!)
    mine (the same basic idea as Steve S’s)

    I think that’s because the only way to be sure you have the most rows possible is to look at _all_ possibilities, and those are the only solutions which do that.

  64. ralphcroce_philly says:

    Ryan,
    Thanks for finding a problem in my solution. I threw it together in one session yesterday and should have been more cautious before submitting. I have a variation on my original approach that I’m going to try today and will submit that one if it works out.
    Some of the various submissions from you folks are pretty amazing.
    –ralphcroce_philly

  65. SeanZ says:

    —- Here is the solution for Part2. Solution for Part1 is simular but simpler.
    ALTER TABLE Part2Source ADD isSelected INT NOT NULL DEFAULT (0);
    DECLARE @a INT, @b INT, @c INT;
    WHILE (1 = 1)
    BEGIN
    SELECT @a = a, @b = b, @c = c
    FROM
    (
    SELECT L.a, L.b, L.c, cntr = COUNT(*)
    FROM Part2Source L INNER JOIN Part2Source R ON L.a = R.a OR L.b = R.b OR L.c = R.c
    WHERE L.isSelected = 0 AND R.isSelected = 0
    GROUP BY L.a, L.b, L.c
    ) AS T
    WHERE cntr =
    (
    SELECT MIN(cntr)
    FROM
    (
    SELECT L.a, L.b, L.c, cntr = COUNT(* )
    FROM Part2Source L INNER JOIN Part2Source R ON L.a = R.a OR L.b = R.b OR L.c = R.c
    WHERE L.isSelected = 0 AND R.isSelected = 0
    GROUP BY L.a, L.b, L.c
    ) AS T
    );

    IF @@ROWCOUNT > 0
    BEGIN
    UPDATE Part2Source
    SET isSelected = 1
    WHERE a = @a AND b = @b AND c = @c;
    DELETE Part2Source WHERE isSelected = 0 AND (a = @a OR b = @b OR c = @c);
    END
    ELSE
    BREAK;
    END;

    INSERT Part2Dest (a, b, c)
    SELECT a, b, c FROM Part2Source;

    SELECT * FROM Part2Dest;

    a b c
    1 1 2
    2 4 7
    5 5 6
    9 0 1
    11 2 9

  66. SeanZ says:

    —- This is a simplified version. Also it uses UPDATE Part2Source
    – instead of DELETE Part2Source to preserve original source data.
    —- Obviously, it works for both SQL 2000 and 2005.
    —- Here is the solution for Part2. Solution for Part1 is simular but simpler.
    ALTER TABLE Part2Source ADD isSelected INT NOT NULL DEFAULT (0);
    DECLARE @a INT, @b INT, @c INT;
    WHILE (1 = 1)
    BEGIN
    SELECT TOP 1 @a = L.a, @b = L.b, @c = L.c
    FROM Part2Source L INNER JOIN Part2Source R ON L.a = R.a OR L.b = R.b OR L.c = R.c
    WHERE L.isSelected = 0 AND R.isSelected = 0
    GROUP BY L.a, L.b, L.c
    ORDER BY COUNT(*) ASC

    IF @@ROWCOUNT > 0
    BEGIN
    UPDATE Part2Source
    SET isSelected = 1
    WHERE a = @a AND b = @b AND c = @c;
    UPDATE Part2Source
    SET isSelected = -1
    WHERE isSelected = 0 AND (a = @a OR b = @b OR c = @c);
    END
    ELSE
    BREAK;
    END;

    INSERT Part2Dest (a, b, c)
    SELECT a, b, c FROM Part2Source WHERE isSelected = 1;

    SELECT * FROM Part2Dest;

    a b c
    1 1 2
    2 4 7
    5 5 6
    9 0 1
    11 2 9

  67. chris.leonard says:


    Thanks to Ryan and Dean for saying nice things about my solution. It is interesting to me conceptually because it only requires two CTE definitions (which feels “clean”), and then off you’re off and running.

    Since people are being so kind, I thought I’d speak up and say something nasty about my solution (since nobody else has!). If I had to solve this problem for real, and supposing that the source table had, say, 35 rows in it, I would choose the Ryan / Steve S (call it “RSS”) approach over mine just because of the incredible consumption of resources my solution manages to consume. On my laptop, when there are 9 rows in the source table, it takes my solution about 40 seconds to solve the first problem while RSS is still sub-second in performance.

    Of course, neither solution is going to scale to large source data sets, but I was surprised by the inefficiency of my solution.

    Maybe we could have another puzzle that asks people to come up with the most poorly-performing, but reasonable-looking solution possible?

    This makes me wonder, though, what kind of response times I might get on a real server, using a 50-row input table? I don’t want to run it on my laptop because, well, I don’t want to burn up its CPU. On the other hand, I don’t want to run it on a server at work because, well, it’s always a moral victory to stay off the VPN on the weekend. :o )

  68. ralphcroce_philly says:

    Ryan,

    I noticed that in shooting down some solutions, you endorsed only 3 solutions, all of which work only on SS2005. Is there an implicit implication that this can only be solved recursively? With all due respect, I couldn’t live with that, because I had a strong belief that it was solvable using SS2000, and without using any looping constructs such as WHILE, etc., and all as one single statement.
    Accordingly, I went back to the drawing board, to prove that this was solvable in SS2000 using only one statement. My latest suggestion follows below (for the 2-field version). It would have been far shorter, f I had used a few views to encapsulate certain sections of code that are repeated identically in several spots; however, I suspected that using views might possibly violate the spirit of Lionel’s challenge, so I did the entire solution as one statement.

    Ryan (or anyone else), please feel free to throw any data you like at this SS2000-compatible effort.

    Over the next few days, I will submit the non-SQL description of the concept, using a few simple charts and a narrative; finally, if I get some time, I will submit a version that works for the 3-field version; the essential logic will be the same.
    –ralphcroce_philly

    INSERT INTO Part1Dest
    SELECT a, b FROM
    (
    SELECT p1.a, p1.b,
    SUM(
    CASE
    WHEN p1.tCt>p2.tCt THEN 1
    WHEN p1.tCt WHEN p1.aCt>p2.aCt THEN 1
    WHEN p1.aCt
    WHEN p1.bCt>p2.bCt THEN 1
    WHEN p1.bCt
    WHEN p1.a>p2.a THEN 1
    WHEN p1.a
    WHEN p1.b>p2.b THEN 1
    WHEN p1.b
    ELSE 0
    END
    ) + 1 AS RowNum
    FROM
    (
    SELECT p.a, p.b, aCt, bCt, aCt + bCt AS tCt FROM part1source p
    INNER JOIN (SELECT a, COUNT(*) AS aCt FROM part1source a GROUP BY a) aCts ON p.a = aCts.a
    INNER JOIN (SELECT b, COUNT(*) AS bCt FROM part1source b GROUP BY b) bCts ON p.b = bCts.b
    ) p1
    INNER JOIN
    (
    SELECT p.a, p.b, aCt, bCt, aCt + bCt AS tCt FROM part1source p
    INNER JOIN (SELECT a, COUNT(*) AS aCt FROM part1source a GROUP BY a) aCts ON p.a = aCts.a
    INNER JOIN (SELECT b, COUNT(*) AS bCt FROM part1source b GROUP BY b) bCts ON p.b = bCts.b
    ) p2
    ON p1.a <> p2.a OR p1.b <> p2.b
    GROUP BY p1.a, p1.b
    ) r
    WHERE r.RowNum IN
    (
    SELECT MIN(RowNum) As UsableRowNum FROM
    (

    SELECT px.RowNum, px.a, px.b,
    SUM(CASE WHEN px.a = py.a AND px.RowNum > py.RowNum THEN 1 ELSE 0 END) + 1 AS A_Occurrence,
    SUM(CASE WHEN px.b = py.b AND px.RowNum > py.RowNum THEN 1 ELSE 0 END) + 1 AS B_Occurrence
    FROM
    (
    SELECT p.a, p.b, r.RowNum FROM part1source p
    INNER JOIN
    (
    SELECT p1.a, p1.b,
    SUM(
    CASE
    WHEN p1.tCt>p2.tCt THEN 1
    WHEN p1.tCt WHEN p1.aCt>p2.aCt THEN 1
    WHEN p1.aCt
    WHEN p1.bCt>p2.bCt THEN 1
    WHEN p1.bCt
    WHEN p1.a>p2.a THEN 1
    WHEN p1.a
    WHEN p1.b>p2.b THEN 1
    WHEN p1.b
    ELSE 0
    END
    ) + 1 AS RowNum
    FROM
    (
    SELECT p.a, p.b, aCt, bCt, aCt + bCt AS tCt FROM part1source p
    INNER JOIN (SELECT a, COUNT(*) AS aCt FROM part1source a GROUP BY a) aCts ON p.a = aCts.a
    INNER JOIN (SELECT b, COUNT(*) AS bCt FROM part1source b GROUP BY b) bCts ON p.b = bCts.b
    ) p1
    INNER JOIN
    (
    SELECT p.a, p.b, aCt, bCt, aCt + bCt AS tCt FROM part1source p
    INNER JOIN (SELECT a, COUNT(*) AS aCt FROM part1source a GROUP BY a) aCts ON p.a = aCts.a
    INNER JOIN (SELECT b, COUNT(*) AS bCt FROM part1source b GROUP BY b) bCts ON p.b = bCts.b
    ) p2
    ON p1.a <> p2.a OR p1.b <> p2.b
    GROUP BY p1.a, p1.b
    ) r
    ON p.a = r.a AND p.b = r.b
    ) px
    INNER JOIN
    (
    SELECT p.a, p.b, r.RowNum FROM part1source p
    INNER JOIN
    (
    SELECT p1.a, p1.b,
    SUM(
    CASE
    WHEN p1.tCt>p2.tCt THEN 1
    WHEN p1.tCt
    WHEN p1.aCt>p2.aCt THEN 1
    WHEN p1.aCt
    WHEN p1.bCt>p2.bCt THEN 1
    WHEN p1.bCt
    WHEN p1.a>p2.a THEN 1
    WHEN p1.a
    WHEN p1.b>p2.b THEN 1
    WHEN p1.b
    ELSE 0
    END
    ) + 1 AS RowNum
    FROM
    (
    SELECT p.a, p.b, aCt, bCt, aCt + bCt AS tCt FROM part1source p
    INNER JOIN (SELECT a, COUNT(*) AS aCt FROM part1source a GROUP BY a) aCts ON p.a = aCts.a
    INNER JOIN (SELECT b, COUNT(*) AS bCt FROM part1source b GROUP BY b) bCts ON p.b = bCts.b
    ) p1
    INNER JOIN
    (
    SELECT p.a, p.b, aCt, bCt, aCt + bCt AS tCt FROM part1source p
    INNER JOIN (SELECT a, COUNT(*) AS aCt FROM part1source a GROUP BY a) aCts ON p.a = aCts.a
    INNER JOIN (SELECT b, COUNT(*) AS bCt FROM part1source b GROUP BY b) bCts ON p.b = bCts.b
    ) p2
    ON p1.a <> p2.a OR p1.b <> p2.b
    GROUP BY p1.a, p1.b
    ) r

    ON p.a = r.a AND p.b = r.b
    ) py
    ON px.RowNum <> py.RowNum
    GROUP BY
    px.RowNum, px.a, px.b
    ) po
    WHERE
    A_Occurrence = B_Occurrence
    GROUP BY a
    )

    –Thanks to Lionel for a very interesting challenge.
    –ralphcroce_philly

  69. Ryan Randall says:

    Hi ralphcroce_philly,

    > I noticed that in shooting down some solutions, you endorsed only 3 solutions, all of which work only on SS2005.

    That’s true.

    > Is there an implicit implication that this can only be solved recursively?

    No. I think you have a misconception here.

    Steve S’s solution doesn’t use any recursion (as far as I can see), and mine only uses recursion to generate a ‘numbers table’ without using any systems tables.

    > I had a strong belief that it was solvable using SS2000, and without using any looping constructs such as WHILE, etc., and all as one single statement.

    Yes. It is. It’s just messier (as Steve pointed out).

    Since you insist, here are my solutions rewritten for SQL 2000 and with only one statement each:

    –part 1
    INSERT dbo.Part1Dest
    SELECT * FROM dbo.Part1Source x
    WHERE
    POWER(2, (SELECT COUNT(*) FROM dbo.Part1Source
    WHERE CAST(a AS VARCHAR(MAX)) + ‘.’ + CAST(b AS VARCHAR(MAX)) < =
    CAST(x.a AS VARCHAR(MAX)) + '.' + CAST(x.b AS VARCHAR(MAX)))-1) &
    (SELECT TOP 1 i FROM
    (SELECT * FROM
    (SELECT i*256+j AS i
    FROM (SELECT DISTINCT number AS i FROM master.dbo.spt_values WHERE number BETWEEN 0 AND 255) a,
    (SELECT DISTINCT number AS j FROM master.dbo.spt_values WHERE number BETWEEN 0 AND 255) b
    WHERE i*256+j < (SELECT POWER(2, COUNT(*)) FROM dbo.Part1Source)) t,
    (SELECT *,
    (SELECT COUNT(*) FROM dbo.Part1Source
    WHERE CAST(a AS VARCHAR(10)) + '.' + CAST(b AS VARCHAR(10)) <=
    CAST(x.a AS VARCHAR(10)) + '.' + CAST(x.b AS VARCHAR(10)))-1 AS c
    FROM dbo.Part1Source x) u
    WHERE POWER(2, u.c) & t.i > 0) v
    GROUP BY i
    HAVING COUNT(*) = COUNT(DISTINCT a) AND COUNT(*) = COUNT(DISTINCT b)
    ORDER BY COUNT(*) DESC
    ) > 0

    –part 2
    INSERT dbo.Part2Dest
    SELECT * FROM dbo.Part2Source x
    WHERE
    POWER(2, (SELECT COUNT(*) FROM dbo.Part2Source
    WHERE CAST(a AS VARCHAR(MAX)) + ‘.’ + CAST(b AS VARCHAR(MAX)) < =
    CAST(x.a AS VARCHAR(MAX)) + '.' + CAST(x.b AS VARCHAR(MAX)))-1) &
    (SELECT TOP 1 i FROM
    (SELECT * FROM
    (SELECT i*256+j AS i
    FROM (SELECT DISTINCT number AS i FROM master.dbo.spt_values WHERE number BETWEEN 0 AND 255) a,
    (SELECT DISTINCT number AS j FROM master.dbo.spt_values WHERE number BETWEEN 0 AND 255) b
    WHERE i*256+j < (SELECT POWER(2, COUNT(*)) FROM dbo.Part2Source)) t,
    (SELECT *,
    (SELECT COUNT(*) FROM dbo.Part2Source
    WHERE CAST(a AS VARCHAR(10)) + '.' + CAST(b AS VARCHAR(10)) <=
    CAST(x.a AS VARCHAR(10)) + '.' + CAST(x.b AS VARCHAR(10)))-1 AS d
    FROM dbo.Part2Source x) u
    WHERE POWER(2, u.d) & t.i > 0) v
    GROUP BY i
    HAVING COUNT(*) = COUNT(DISTINCT a) AND COUNT(*) = COUNT(DISTINCT b) AND COUNT(*) = COUNT(DISTINCT c)
    ORDER BY COUNT(*) DESC
    ) > 0

  70. Ryan Randall says:

    ralphcroce_philly

    > Ryan (or anyone else), please feel free to throw any data you like at this SS2000-compatible effort.

    I’m afraid it seems to fails with…

    INSERT INTO Part1Source VALUES (1,2)
    INSERT INTO Part1Source VALUES (3,2)
    INSERT INTO Part1Source VALUES (3,4)
    INSERT INTO Part1Source VALUES (5,4)
    INSERT INTO Part1Source VALUES (5,6)
    INSERT INTO Part1Source VALUES (7,6)

  71. orcus says:

    As an ugly string building recursive stored proc in ss2000
    [Pinching off many previous solutions]

    CREATE PROC part1ins @IN AS VARCHAR(8000) AS
    DECLARE @A VARCHAR(10), @b VARCHAR(10)
    SET @A = ” SET @B = ”
    IF @IN is null BEGIN
    SELECT top 1 @A = ‘A’ + cast(A as varchar(10)) + ‘,’, @B = ‘B’ + cast(B as varchar(10)) + ‘|’ FROM (
    SELECT R.A, R.B FROM part1source E, part1source R
    WHERE ((E.A = R.A AND R.B <> E.B)
    OR (E.B = R.B AND R.A <> E.A))
    UNION ALL SELECT A, B FROM part1source ) [Exclusions]
    GROUP BY A, B ORDER BY count(*) ASC
    SET @IN = @A + @B
    END ELSE BEGIN
    SELECT top 1 @A = ‘A’ + cast(A as varchar(10)) + ‘,’, @B = ‘B’ + cast(B as varchar(10)) + ‘|’ FROM (
    SELECT R.A, R.B FROM part1source E, part1source R
    WHERE ((E.A = R.A AND R.B <> E.B)
    OR (E.B = R.B AND R.A <> E.A))
    UNION ALL SELECT A, B FROM part1source ) [Exclusions]
    WHERE @IN not like (‘%A’ + cast(A as varchar(10)) + ‘,%’)
    AND @IN not like (‘%B’ + cast(B as varchar(10)) + ‘|%’)
    GROUP BY A, B ORDER BY count(*) ASC
    SET @IN = @IN + @A + @B
    END IF @A = ” BEGIN
    SET @IN = REPLACE(REPLACE(REPLACE(REPLACE(substring(@IN,1,len(@IN)-1),’A',’ SELECT ‘),’,',’ [A],’),’B',”),’|', ‘ [B] ‘ + CHAR(13) + ‘UNION’) + ‘ [B]‘
    SET @IN = ‘INSERT part1dest’ + char(13) + ‘ ‘ + @IN
    exec(@IN)
    END ELSE EXEC part1ins @IN
    GO
    exec part1ins null

  72. orcus says:

    or even removing an obvious unded union

    CREATE PROC part1ins @IN AS VARCHAR(8000) AS
    DECLARE @A VARCHAR(10), @b VARCHAR(10)
    SET @A = ” SET @B = ”
    IF @IN is null BEGIN
    SELECT top 1 @A = ‘A’ + cast(R.A as varchar(10)) + ‘,’, @B = ‘B’ + cast(R.B as varchar(10)) + ‘|’
    FROM part1source E, part1source R WHERE E.A = R.A OR E.B = R.B
    GROUP BY R.A, R.B ORDER BY count(*) ASC
    SET @IN = @A + @B
    END ELSE BEGIN
    SELECT top 1 @A = ‘A’ + cast(R.A as varchar(10)) + ‘,’, @B = ‘B’ + cast(R.B as varchar(10)) + ‘|’
    FROM part1source E, part1source R WHERE ( E.A = R.A OR E.B = R.B )
    AND @IN not like (‘%A’ + cast(R.A as varchar(10)) + ‘,%’)
    AND @IN not like (‘%B’ + cast(R.B as varchar(10)) + ‘|%’)
    GROUP BY R.A, R.B ORDER BY count(*) ASC
    SET @IN = @IN + @A + @B
    END IF @A = ” BEGIN
    SET @IN = REPLACE(REPLACE(REPLACE(REPLACE(substring(@IN,1,len(@IN)-1),’A',’ SELECT ‘),’,',’ [A],’),’B',”),’|', ‘ [B] ‘ + CHAR(13) + ‘UNION’) + ‘ [B]‘
    SET @IN = ‘INSERT part1dest’ + char(13) + ‘ ‘ + @IN
    exec(@IN)
    END ELSE EXEC part1ins @IN
    GO
    exec part1ins null

  73. puzsol says:

    Here’s my summary so far….

    Phil’s is probably the only one that works on SQL2000 properly (Though Ryan’s is quite capable of being non cte as it is not truely recursive – more iterative)

    chris.leonard’s solution is very cute as it builds the whole solution tree in the one cte a feat I tried and failed (is that the only type of method of building a tree with a CTE? CTE’s seem to be aimed more at traversing a tree than building one.)

    Ryan randall’s is very cute as it builds the whole combination set (every possible permutation and combination), then neatly picks the ones that match the criteria – and selects the top one. It is by far the neatest

    I’m still claiming that it would be possible to code mine as select into statements, thereby meeting the strict ‘one insert’ rule, and it does work for all combinations (same category of building the solution tree).

    However you are right… all the solutions that ‘work’ can take a long time…
    I added four data sets:
    INSERT INTO Part1Source VALUES (12,4)
    INSERT INTO Part1Source VALUES (12,7)
    INSERT INTO Part1Source VALUES (13,6)
    INSERT INTO Part1Source VALUES (13,5)
    Chris’s version took 1:18, Ryans 0:03, Mine still sat at 0:00… but add another 15 entries and it starts to crawl to 10:00+ I guess that’s the problem of exploring an exponentially expanding solution space…

    I still think it would be possible to recode Phil’s original solution as dynamic-dynamic SQL to come up with a truely generic solution – which may end up being faster still – and I’m not sure if my solution couldn’t be sped up either.

    Anyway, far too much fun and diversion…. thanks for the myriad of creative solutions guys!

  74. puzsol says:

    Sorry missed Steve S’s version… which as far as I can see is the same as Ryan’s version… except it is better commented (and came first)… so the same comments apply.

  75. Ryan Randall says:

    > Phil’s is probably the only one that works on SQL2000 properly

    I guess you’ve missed the last few posts then?

  76. puzsol says:

    >I guess you’ve missed the last few posts then?

    ok, if you are referring to orcus’s solution, I did ignore it, as I had thought that a ranking solution could not guarantee finding the solution…

    However, I now that I’ve spent a bit more time on it, I think that the iterative (or recursive as written) method of picking the next available entry with the next least amount of matching records should work… but I claim that the one presented does not….

    [code]
    delete from part1source
    insert into part1source select 1,1
    insert into part1source select 1,2
    insert into part1source select 2,1
    insert into part1source select 2,3
    insert into part1source select 3,2
    insert into part1source select 3,3
    [/code]

    This only gets two values in the solution (1,2) (2,3)… it should get 3 (1,2), (2,1), (3,3)… the reason is that ALL the original values are included in the recursive ranking counts – and as they all start with a rank of 3, it is the internal SQL grouping order that prevails (a asc, b desc) – which causes the algorithm to select (2,3) instead of (2,1) etc… I’ll admit I had to muck around with the values to find this exception, but I could see it was there… so is there a way to recalculate the counts each loop based on only the values that remain (are available) – rather than using the same counts and culling which ones you see/use?

    If this issue can be resolved it should be a lot faster than the other solutions… O(n) rather than O(n^2)… (though I would note that depending on the algorithm; n can be the full row count or the min distinct count of a or b, so even there they are not all the same)

    If, on the other hand you are referring to turning your solution into 2000 compatible code… I did mention it was possible, and I didn’t vet your solution. Now that I look at your attempt, it is limited to 16 values (at least in the solution set, if not in the source set) 256*256 is only 2^16… you could expand it of course… as you could replace varchar(max) with varchar(8000)… but it’s not a self-expanding solution like your original.

  77. rmatty says:

    Here is the solutionfor part1…works only in 2005
    INSERT Part1Dest(A,B)
    SELECT A,B
    FROM
    (SELECT A,B,ROW_NUMBER() OVER(PARTITION BY B ORDER BY B) AS Rownum
    FROM
    (SELECT A,B
    FROM
    (SELECT A,B ,ROW_NUMBER() OVER(PARTITION BY A ORDER BY A) AS Rownum
    FROM Part1Source
    ) T
    WHERE Rownum = 1
    ) T1
    ) T2
    WHERE Rownum = 1
    UNION
    SELECT A,B
    FROM Part1Source
    WHERE A=B

  78. rmatty says:

    simpler than my previous version…can avoid one derived table…
    SELECT A,B
    FROM
    (SELECT A,B,ROW_NUMBER() OVER(PARTITION BY B ORDER BY B) AS Rownum
    FROM
    (SELECT A,B ,ROW_NUMBER() OVER(PARTITION BY A ORDER BY A) AS Rownum
    FROM Part1Source
    )T
    WHERE Rownum = 1
    )T1
    WHERE Rownum = 1
    UNION
    SELECT A,B
    FROM Part1Source
    WHERE A=B

  79. rmatty says:

    part 2….

    INSERT Part2Dest
    SELECT A,B,C
    FROM
    (SELECT A,B,C,ROW_NUMBER() OVER(PARTITION BY C ORDER BY C) AS Row
    FROM
    (SELECT A,B,C,ROW_NUMBER() OVER(PARTITION BY B ORDER BY B) AS Row
    FROM
    (SELECT A,B,C,ROW_NUMBER() OVER(PARTITION BY A ORDER BY A) AS Row
    FROM
    (SELECT A,B,C,ROW_NUMBER() OVER(PARTITION BY A,C ORDER BY C) AS Row
    FROM
    (SELECT A,B,C,ROW_NUMBER() OVER(PARTITION BY B,C ORDER BY B) AS Row
    FROM
    (SELECT A,B ,C,ROW_NUMBER() OVER(PARTITION BY A,B ORDER BY A) AS Row
    FROM Part2Source
    ) T
    WHERE Row = 1
    ) T1
    )T2
    WHERE Row = 1
    ) T3
    )T4
    WHERE Row = 1
    ) T5
    WHERE Row = 1
    ORDER BY A,B,C

  80. Ryan Randall says:

    rmatty – I’m afraid your suggestion for part 1 falls short with…

    INSERT INTO Part1Source VALUES (1,0)
    INSERT INTO Part1Source VALUES (1,1)
    INSERT INTO Part1Source VALUES (2,0)

  81. puzsol says:

    Not wanting to be a total party pooper, I figured out a (very minor) alteration to orcus’s solution that makes it work (at least for the set I provided…. can anyone provide proof for or against that it will/won’t work in all circumstances?):

    Just remove the entries from both tables in the sub-loops… eg:

    ALTER PROC part1ins @IN AS VARCHAR(8000) AS
    DECLARE @A VARCHAR(10), @b VARCHAR(10)
    SET @A = ” SET @B = ”
    IF @IN is null BEGIN
    SELECT top 1 @A = ‘A’ + cast(R.A as varchar(10)) + ‘,’, @B = ‘B’ + cast(R.B as varchar(10)) + ‘|’
    FROM part1source E, part1source R WHERE E.A = R.A OR E.B = R.B
    GROUP BY R.A, R.B ORDER BY count(*) ASC
    SET @IN = @A + @B
    END ELSE BEGIN
    SELECT top 1 @A = ‘A’ + cast(R.A as varchar(10)) + ‘,’, @B = ‘B’ + cast(R.B as varchar(10)) + ‘|’
    FROM part1source E, part1source R WHERE ( E.A = R.A OR E.B = R.B )
    AND @IN not like (‘%A’ + cast(R.A as varchar(10)) + ‘,%’)
    AND @IN not like (‘%B’ + cast(R.B as varchar(10)) + ‘|%’)
    AND @IN not like (‘%A’ + cast(E.A as varchar(10)) + ‘,%’)
    AND @IN not like (‘%B’ + cast(E.B as varchar(10)) + ‘|%’)
    GROUP BY R.A, R.B ORDER BY count(*) ASC
    SET @IN = @IN + @A + @B
    END IF @A = ” BEGIN
    SET @IN = REPLACE(REPLACE(REPLACE(REPLACE(substring(@IN,1,len(@IN)-1),’A',’ SELECT ‘),’,',’ [A],’),’B',”),’|', ‘ [B] ‘ + CHAR(13) + ‘UNION’) + ‘ [B]‘
    SET @IN = ‘INSERT part1dest’ + char(13) + ‘ ‘ + @IN
    exec(@IN)
    END ELSE EXEC part1ins @IN
    GO
    exec part1ins null

    Sorry rmatty, your solution looks similar to others that don’t work generically… I could be wrong but I can’t be bothered proving it right now… maybe later?

  82. Ryan Randall says:

    > Sorry rmatty, your solution looks similar to others that don’t work generically… I could be wrong but I can’t be bothered proving it right now… maybe later?

    lol – read the previous post!

    Just testing your modification now…

  83. rmatty says:

    if i remove union,i think it will work…
    SELECT A,B
    FROM
    (SELECT A,B,ROW_NUMBER() OVER(PARTITION BY B ORDER BY B) AS Rownum
    FROM
    (SELECT A,B ,ROW_NUMBER() OVER(PARTITION BY A ORDER BY A) AS Rownum
    FROM Part1Source
    )T
    WHERE Rownum = 1
    )T1
    WHERE Rownum = 1

  84. Ryan Randall says:

    rmatty – I’m afraid not. That only gives you 1 value (1, 0), when there should be 2 (1, 1) and (2, 0).

  85. Ryan Randall says:

    orcus, puzsol

    Well I couldn’t break it the revised code. This is an interesting and original idea, orcus – and not pinching off many previous solutions as you thought!

    As far as I can tell, it’s equivalent to the following code (which also passes all my simple tests, but might be easier to understand and therefore break)…

    DECLARE @t TABLE (a INT, b INT)
    DECLARE @u TABLE (a INT, b INT)

    INSERT @t SELECT * FROM dbo.Part1Source

    WHILE (SELECT COUNT(*) FROM @t) > 0
    BEGIN
    INSERT @u SELECT TOP 1 t.a, t.b FROM @t t, @t u WHERE t.a = u.a OR t.b = u.b GROUP BY t.a, t.b ORDER BY COUNT(*)
    DELETE t FROM @t t INNER JOIN @u u ON t.a = u.a OR t.b = u.b
    END
    SELECT * FROM @u ORDER BY a

    If this type of solution does work (and I’m very interested in a proof either way), it is by far the best approach, and quite a revelation to me. Very well done!

  86. orcus says:

    I hadn’t thought of string of inserted values, not like my row test. I pinched that bit.

    a) Iteratively without a procedure declaration
    b) no casting
    c) a not in table variable rather than a not like string

    Is much better, especially for a large numbers of rows.
    (or more complicated data types)

    Given that the only criteria for excluding a row is the other rows that conflict, the maximum number transferred is the same property as the least conflicting.

  87. puzsol says:

    What I find most interesting about this exercise is the number of different approaches and techniques that were used to solve the problem… and the willingness of people to borrow bits from others, enhance, prove etc.

    I guess it shows what can be achieved as a *group* rather than as a single person.

    Hope everyone else has enjoyed it as much as I have.

  88. keith.duaime says:

    This one seems to pass all the other data values that have been tried. The order by takes a while and would rather have a least() function, which could be created for the puzzle.

    TRUNCATE TABLE Part1Dest
    GO

    BEGIN
    DECLARE
    @rc INT,
    @a INT,
    @b INT

    SELECT TOP 1 @a = a, @b = b
    FROM Part1Source
    ORDER BY
    CASE
    WHEN (SELECT COUNT(DISTINCT b) FROM Part1Source s WHERE s.a = Part1Source.a) < (SELECT COUNT(DISTINCT a) FROM Part1Source s WHERE s.b = Part1Source.b) THEN
    (SELECT COUNT(DISTINCT b) FROM Part1Source s WHERE s.a = Part1Source.a)
    ELSE
    (SELECT COUNT(DISTINCT a) FROM Part1Source s WHERE s.b = Part1Source.b)
    END,
    CASE
    WHEN (SELECT COUNT(DISTINCT b) FROM Part1Source s WHERE s.a = Part1Source.a) < (SELECT COUNT(DISTINCT a) FROM Part1Source s WHERE s.b = Part1Source.b) THEN
    (SELECT COUNT(DISTINCT a) FROM Part1Source s WHERE s.b = Part1Source.b)
    ELSE
    (SELECT COUNT(DISTINCT b) FROM Part1Source s WHERE s.a = Part1Source.a)
    END
    SET @rc = @@ROWCOUNT
    WHILE @rc > 0
    BEGIN
    INSERT INTO Part1Dest(a,b)
    VALUES(@a,@b)

    SELECT TOP 1 @a = Part1Source.a, @b = Part1Source.b
    FROM Part1Source
    LEFT OUTER JOIN Part1Dest ON
    Part1Dest.a = Part1Source.a OR
    Part1Dest.b = Part1Source.b
    WHERE
    Part1Dest.a IS null
    ORDER BY
    CASE
    WHEN (SELECT COUNT(DISTINCT b) FROM Part1Source s WHERE s.a = Part1Source.a) < (SELECT COUNT(DISTINCT a) FROM Part1Source s WHERE s.b = Part1Source.b) THEN
    (SELECT COUNT(DISTINCT b) FROM Part1Source s WHERE s.a = Part1Source.a)
    ELSE
    (SELECT COUNT(DISTINCT a) FROM Part1Source s WHERE s.b = Part1Source.b)
    END,
    CASE
    WHEN (SELECT COUNT(DISTINCT b) FROM Part1Source s WHERE s.a = Part1Source.a) < (SELECT COUNT(DISTINCT a) FROM Part1Source s WHERE s.b = Part1Source.b) THEN
    (SELECT COUNT(DISTINCT a) FROM Part1Source s WHERE s.b = Part1Source.b)
    ELSE
    (SELECT COUNT(DISTINCT b) FROM Part1Source s WHERE s.a = Part1Source.a)
    END
    SET @rc = @@ROWCOUNT
    END
    END
    GO
    SELECT *
    FROM Part1Dest
    GO

  89. keith.duaime says:

    After a little more thought and didn’t like the convoluted order by statement this should work and is for the second one.

    TRUNCATE TABLE Part2Dest
    GO

    BEGIN
    DECLARE
    @rc INT,
    @a INT,
    @b INT,
    @c INT

    SELECT TOP 1 @a = Part2Source.a, @b = Part2Source.b, @c = Part2Source.c
    FROM Part2Source
    INNER JOIN (SELECT a, COUNT(*) cntA FROM Part2Source GROUP BY a) cntA ON
    cntA.a = Part2Source.a
    INNER JOIN (SELECT b, COUNT(*) cntB FROM Part2Source GROUP BY b) cntB ON
    cntB.b = Part2Source.b
    INNER JOIN (SELECT c, COUNT(*) cntC FROM Part2Source GROUP BY c) cntC ON
    cntC.c = Part2Source.c
    ORDER BY
    CASE
    WHEN cntA.cntA < = cntB.cntB AND cntA.cntA <= cntC.cntC THEN cnta.cntA
    WHEN cntB.cntB <= cntA.cntA AND cntB.cntB <= cntC.cntC THEN cntB.cntB
    ELSE cntC.cntC
    END,
    CASE
    WHEN cntA.cntA <= cntB.cntB AND cntA.cntA > cntC.cntC THEN cnta.cntA
    WHEN cntB.cntB < = cntA.cntA AND cntB.cntB > cntC.cntC THEN cntB.cntB
    ELSE cntC.cntC
    END,
    CASE
    WHEN cntA.cntA > cntB.cntB AND cntA.cntA > cntC.cntC THEN cnta.cntA
    WHEN cntB.cntB > cntA.cntA AND cntB.cntB > cntC.cntC THEN cntB.cntB
    ELSE cntC.cntC
    END
    SET @rc = @@ROWCOUNT
    WHILE @rc > 0
    BEGIN
    INSERT INTO Part2Dest(a,b,c)
    VALUES(@a,@b,@c)

    SELECT TOP 1 @a = Part2Source.a, @b = Part2Source.b, @c = Part2Source.c
    FROM Part2Source
    INNER JOIN (SELECT a, COUNT(*) cntA FROM Part2Source GROUP BY a) cntA ON
    cntA.a = Part2Source.a
    INNER JOIN (SELECT b, COUNT(*) cntB FROM Part2Source GROUP BY b) cntB ON
    cntB.b = Part2Source.b
    INNER JOIN (SELECT c, COUNT(*) cntC FROM Part2Source GROUP BY c) cntC ON
    cntC.c = Part2Source.c
    LEFT OUTER JOIN Part2Dest ON
    Part2Dest.a = Part2Source.a OR
    Part2Dest.b = Part2Source.b OR
    Part2Dest.c = Part2Source.c
    WHERE Part2Dest.a IS NULL
    ORDER BY
    CASE
    WHEN cntA.cntA < = cntB.cntB AND cntA.cntA <= cntC.cntC THEN cnta.cntA
    WHEN cntB.cntB <= cntA.cntA AND cntB.cntB <= cntC.cntC THEN cntB.cntB
    ELSE cntC.cntC
    END,
    CASE
    WHEN cntA.cntA <= cntB.cntB AND cntA.cntA > cntC.cntC THEN cnta.cntA
    WHEN cntB.cntB < = cntA.cntA AND cntB.cntB > cntC.cntC THEN cntB.cntB
    ELSE cntC.cntC
    END,
    CASE
    WHEN cntA.cntA > cntB.cntB AND cntA.cntA > cntC.cntC THEN cnta.cntA
    WHEN cntB.cntB > cntA.cntA AND cntB.cntB > cntC.cntC THEN cntB.cntB
    ELSE cntC.cntC
    END
    SET @rc = @@ROWCOUNT
    END
    END
    GO

    SELECT *
    FROM Part2Dest
    GO

  90. Celko says:

    WITH X (a, b, row_nbr) AS
    (SELECT a, b, ROW_NUMBER () OVER (ORDER BY a ASC, b DESC)
    FROM Part1Source)
    SELECT a, b, row_nbr
    FROM X AS P1
    WHERE NOT EXISTS
    (SELECT *
    FROM X AS P2
    WHERE P2.row_nbr < P1.row_nbr
    AND (P1.a = P2.a OR P1.b = P2.b));

    This should give you one valid subset. The idea is to sort and number the rows then look from the current position P1 at all the previous rows P2 to if one which repeats a value of a or b exists. That means we reject the current row. This gives me:

    a b row_nbr
    ===========
    1 2 1
    2 4 3
    5 5 5
    9 0 9

  91. mikegood says:

    Celko, this not your best effort. Correct Part 1 answer has 5 dest rows for the specified Part1Source population. One valid solution is
    1, 1
    2, 4
    5, 3
    7, 2
    9, 0

    For other populations it does worse. I’ve been using the alternate population below to help find flaws. The answer should have 4 dest rows, but the algorithm you posted above only finds two.
    1, 1
    1, 2
    2, 1
    2, 2
    2, 3
    2, 4
    3, 3
    3, 4
    4, 1

    One valid solution is
    1, 2
    2, 3
    3, 4
    4, 1

    Finally, for any source data population we should be able to re-run test with data in source cols a and b swapped, and still get the same number of dest rows in the answer. Your algorithm gets 4 rows for specified source population, but only 3 rows when data in cols is swapped, which is a sure indication of a logical flaw.

    All that said, I cannot determine if you’re onto something here, fundamentally, or not. Maybe with a little more work?

  92. mikegood says:

    keith.duaime,

    I think yours is only solution posted thus far that is correct and directly finds the best solution withouth using brute force to generate all possible solutions and choosing the best one. I arrived at similar solution (below) based on UNION of alternate groupings. But your approach based on JOINing alternate groupings is more concise & I like it better, especially for part 2.

    I’m pretty confident that this approach is by far the best performing of all the correct approaches posted so far.

    If anyone doubts the correctness of this approach, please post a source data population that will expose an error. I cannot find one, believe it is correct. FWIW, this approach also exactly matches the way that I attempt to solve these problems by hand.

    I’m not so sure that the single INSERT inside a loop meets the true spirit of the puzzle criteria? We only wrote one INSERT stmt, but multiple INSERTs will be executed. All of the other correct brute force solutions above execute only a single INSERT. I wonder if it’s possible to apply the fundamental (efficient) approach used here in non-looping single INSERT manner? I’ve tried all day long, both with and without recursive CTEs, but I’m not even close. I think I could do it if I could use aggregates and TOP in the recursive part of the CTE, but these are not allowed. I suspect it cannot be done.

    –my approach before seeing Keith’s
    while (1 = 1)
    begin
    insert Part1Dest (a, b)
    select top 1 a, b
    from (select s.a, –try grouping by a
    min(s.b) as b,
    count(s.a) as NbrRows
    from Part1Source s
    left join Part1Dest d on ((d.a = s.a) or (d.b = s.b))
    where d.a is null
    group by s.a
    union all
    select min(s.a) as a, –try grouping by b
    s.b,
    count(s.a) as NbrRows
    from Part1Source s
    left join Part1Dest d on ((d.a = s.a) or (d.b = s.b))
    where d.a is null
    group by s.b) as x
    order by NbrRows

    if (@@rowcount = 0) –quit when done inserting rows
    break
    end

  93. mikegood says:

    Here’s a shortened version of keith’s efficient solutions. I got rid of the variables, eliminated 1/2 the code by modifying the loop control, and added a Least() function for clarity.

    I also changed the ORDER BY a little bit. Keith’s ORDER BY statements clearly work, I just do not understand them. I get the leftmost part, but not the rest. I left the leftmost part as-is, but replaced the rest with sums, which I do understand…this is how I solve these problems by hand. Both ways appear to generate correct solutions.

    Pretty sure this approach is only one posted so far that could be readily extended to many more columns and many more rows. I just ran a test with 3k rows in Part1Source in under 1 sec.

    create function dbo.Least(@a int, @b int, @c int) returns int
    as
    begin
    if (@c is null) –handle 2-argument call
    return case when @a <= @b then @a else @b end

    return case –handle 3-argument call
    when @a <= @b and @a <= @c then @a
    when @b <= @a and @b <= @c then @b
    else @c
    end
    end
    go

    –part 1 solution
    while (1 = 1)
    begin
    insert Part1Dest (a, b)
    select top 1 s.a, s.b
    from Part1Source s
    join (select a, count(a) as N from Part1Source group by a) as a
    on a.a = s.a
    join (select b, count(b) as N from Part1Source group by b) as b
    on b.b = s.b
    left outer join Part1Dest d –exclude conflicting source rows
    on ((d.a = s.a) or (d.b = s.b))
    where d.a is null
    order by dbo.Least(a.N, b.N, null), a.N + b.N

    if (@@rowcount = 0)
    break –quit when no more rows can be added
    end

    –part 2 solution
    while (1 = 1)
    begin
    insert Part2Dest (a, b, c)
    select top 1 s.a, s.b, s.c
    from Part2Source s
    join (select a, count(a) as N from Part2Source group by a) as a
    on a.a = s.a
    join (select b, count(b) as N from Part2Source group by b) as b
    on b.b = s.b
    join (select c, count(c) as N from Part2Source group by c) as c
    on c.c = s.c
    left outer join Part2Dest d –exclude conflicting source rows
    on ((d.a = s.a) or (d.b = s.b) or (d.c = s.c))
    where d.a is null
    order by dbo.Least(a.N, b.N, c.N),
    dbo.Least(a.N + b.N, a.N + c.N, b.N + c.N),
    a.N + b.N + c.N

    if (@@rowcount = 0)
    break –quit when no more rows can be added
    end

  94. Ryan Randall says:

    keith.duaime, mikegood

    I’m afraid your suggestions can break down with a large number of rows.

    For example, try the following source data…

    1,1; 1,2; 1,8; 1,19; 1,33; 2,3; 2,4; 2,44; 3,1; 3,4; 3,24; 3,30; 3,31; 3,37; 3,39; 3,43; 4,18; 4,48; 5,1; 5,3; 5,5; 5,9; 5,20; 5,21; 5,35; 5,38; 5,50; 6,14; 6,24; 6,28; 6,30; 7,2; 7,5; 7,18; 7,40; 7,44; 7,46; 8,1; 8,5; 8,20; 8,29; 8,46; 8,48; 9,0; 9,17; 9,23; 9,30; 9,31; 9,35; 9,38; 9,39; 10,11; 10,24; 10,25; 10,39; 10,47; 11,2; 11,15; 11,17; 11,18; 11,21; 11,33; 11,36; 11,40; 12,11; 12,36; 12,39; 13,12; 13,17; 13,20; 13,21; 13,23; 13,34; 13,35; 14,23; 14,26; 14,32; 14,36; 14,39; 15,20; 16,1; 16,5; 16,35; 17,7; 17,10; 17,15; 17,23; 17,30; 17,34; 17,36; 17,37; 17,41; 18,7; 18,11; 18,20; 18,29; 18,50; 19,13; 19,15; 19,18; 19,23; 19,37; 19,44; 20,4; 20,7; 20,13; 20,28; 20,36; 20,37; 20,38; 21,5; 21,12; 21,13; 21,18; 21,19; 21,29; 21,34; 21,50; 22,5; 22,14; 22,18; 22,29; 23,1; 23,17; 23,20; 23,25; 23,29; 24,9; 24,14; 24,31; 24,34; 25,11; 25,13; 25,25; 25,30; 25,45; 26,7; 26,8; 26,9; 26,39; 26,43; 26,47; 27,13; 27,19; 27,22; 27,42; 27,47; 27,48; 28,3; 28,8; 28,14; 28,17; 28,19; 28,26; 28,38; 28,47; 29,2; 29,4; 29,12; 29,18; 29,25; 29,27; 29,32; 29,37; 30,8; 30,18; 30,22; 30,42; 30,43; 30,47; 31,7; 31,10; 31,20; 31,31; 31,43; 31,50; 32,1; 32,4; 32,5; 32,21; 32,29; 32,39; 32,49; 33,10; 33,12; 33,22; 33,27; 33,34; 33,49; 34,1; 34,5; 34,9; 34,15; 34,46; 35,4; 35,5; 35,6; 35,8; 35,13; 35,19; 35,23; 35,25; 35,27; 35,31; 35,39; 35,50; 36,1; 36,14; 36,19; 36,33; 36,34; 36,41; 36,45; 36,47; 37,15; 37,21; 37,23; 37,32; 37,35; 37,44; 37,46; 37,47; 37,48; 38,2; 38,12; 38,15; 38,23; 38,36; 38,38; 38,39; 38,47; 39,16; 39,17; 39,18; 40,10; 40,11; 40,16; 40,19; 40,20; 40,27; 40,29; 40,34; 40,49; 41,8; 41,13; 41,22; 41,26; 41,33; 42,23; 42,26; 42,50; 43,14; 43,26; 43,42; 43,50; 45,1; 45,3; 45,6; 45,18; 45,19; 45,24; 45,33; 45,36; 45,41; 45,44; 46,8; 46,15; 46,16; 46,42; 47,2; 47,3; 47,5; 47,9; 47,12; 47,14; 47,18; 47,36; 48,2; 48,16; 48,18; 48,20; 48,36; 49,5; 49,9; 49,17; 49,21; 49,22; 49,35; 50,40; 50,44; 50,48;

    Your code produces an answer with 48 rows, whereas the code in my previous post shows a 49-row solution is possible.

  95. mikegood says:

    Nuts. I was afraid of this originally. But then I was unable to find a source set that broke it and I got carried away with the delusion. Most of the time I’ve spent on this has been spent trying to build a test data set to disprove the approach, I just never succeeded.

    I’d never paid much attention to that approach in your previous post because it has 2 insert stmts (you didn’t include the 2nd one that inserts into the real dest table, but it has to be there). So, knowing how all the other approaches scaled…with this large of a source table, I hit GO and figured I’d check progress next hear (in about 8hrs or so). I was stunned how fast this ran.

    So I see you’re right…48 vs 49. Now I just would like to pinpoint the root of the flaw. I can’t believe it’s simply caused by large number of rows. I hope that if we understood it better we could cause the approach to break down with 10-15 rows of data?

    And if we could break it, possibly we could fix it? Probably not. Maybe fundamentally it was naive to try to solve potentially hard problem only examining the consequences of the current choice without also considering the the next choice & the next. As you and Chris & others already did. I suspect that’s it.

    Well, there are a lot of differences between the your 49 and my 48 dest rows, nothing stands out to me. I reckon I’ll start eliminating source data rows one by one until I get a small set that still breaks the approach.

    PS – How on earth did you find such a source data set? And if it took some doing, what made you keep trying?!

  96. mikegood says:

    Ryan, am orking to narrow the dataset. About 1/2 way through my first pass, have eliminated many rows and the kd/mg approach is now off by 3 — 45 vs 48 rows.

    1,33;2,44;3,31;3,37;5,1;5,5;5,9;5,20;5,21;5,35;5,38;5,50;6,14;6,24;6,28;6,30;7,18;7,40;7,44;7,46;8,1;8,29;8,46;8,48;9,0;9,17;9,23;9,30;9,31;9,35;9,38;9,39;10,24;10,25;10,39;10,47;11,2;11,15;11,17;11,18;11,21;11,33;11,36;11,40;12,36;12,39;13,21;13,23;13,34;13,35;14,26;14,32;14,36;14,39;15,20;16,1;16,35;17,10;17,34;17,36;18,50;19,23;20,36;21,5;21,12;21,13;21,18;21,19;21,29;21,34;21,50;22,14;22,29;23,17;23,20;23,25;23,29;24,9;24,14;24,31;24,34;25,11;25,25;25,30;25,45;26,9;26,39;26,43;26,47;27,19;27,22;27,42;27,47;28,3;28,14;28,17;29,2;29,4;29,12;29,18;29,25;29,27;29,32;29,37;30,18;30,22;30,42;30,47;31,10;31,31;31,43;31,50;32,1;32,4;32,5;32,21;32,29;32,39;32,49;33,10;33,12;33,22;33,27;33,34;33,49;34,1;34,5;34,9;34,15;34,46;35,4;35,5;35,6;35,8;35,13;35,19;35,23;35,25;35,27;35,31;35,39;35,50;36,1;36,14;36,19;36,33;36,34;36,41;36,45;36,47;37,15;37,21;37,23;37,32;37,35;37,44;37,46;37,47;37,48;38,2;38,12;38,15;38,23;38,36;38,38;38,39;38,47;39,16;39,17;39,18;40,10;40,11;40,16;40,19;40,20;40,27;40,29;40,34;40,49;41,8;41,13;41,22;41,26;41,33;42,23;42,26;42,50;43,14;43,26;43,42;43,50;45,1;45,3;45,6;45,18;45,19;45,24;45,33;45,36;45,41;45,44;46,8;46,15;46,16;46,42;47,2;47,3;47,5;47,9;47,12;47,14;47,18;47,36;48,2;48,16;48,18;48,20;48,36;49,5;49,9;49,17;49,21;49,22;49,35;50,40;50,44;50,48;

    I’m not done yet, but just noticed something interesting: when I add back 31, 20 to the above source data set, your algorithm yields 47 rows. With that row absent, your approach yields 48. That’s not right is it?

  97. mikegood says:

    Here is a small source data set that will break the kd-mg approach: 1,2; 1,4; 2,1; 2,3; 3,4; 4,2; 4,3; The correct answer is 4 rows, kd-mg gets only 3.

    With the data this small, it was easy to see the flaw in both keith’s original post & my subsequent mods to it. The subqueries that determine count of affected rows were performing this count against the entire source data set, not just the remaining set of rows…the exclusionary outer join does not get applied until too late.

    This got me looking back at the original code in my first post…turns out it did not suffer from this problem. I’ve run that original approach against Ryan’s data set, as well as several others that I’ve built since working with it, and so far cannot get it to produce fewer rows than returned by the algorithm Ryan last posted.

    Fixing keith’s original or my modified versions of keith’s approach is simple, just add the exclusionary outer join to the subqueries (which eliminates need for the original outer join in the outer query).

    while (1 = 1)
    begin
    insert Part1Dest (a, b)
    select top 1 s.a, s.b
    from Part1Source s
    join (select s1.a, count(s1.a) as N from Part1Source s1
    left join Part1Dest d on ((d.a = s1.a) or (d.b = s1.b))
    where d.a is null group by s1.a) as a
    on a.a = s.a
    join (select s1.b, count(s1.b) as N from Part1Source s1
    left join Part1Dest d on ((d.a = s1.a) or (d.b = s1.b))
    where d.a is null group by s1.b) as b
    on b.b = s.b
    order by case when a.N < b.N then a.N else b.N end, a.N + b.N

    if (@@rowcount = 0) –quit when done inserting rows
    break
    end

    I think part1 is actually simple enough that looking only at immediate consquences of the current decision is good enough to find the best solution. This is may not be true for part2, though…. So far I cannot find a data set to prove this wrong, and it would be helpful to have a fast correct part2 algorithm to compare results with.

  98. mikegood says:

    This set causes approach posted November 13, 2007 5:53 AM EST by Ryan Randall to break down. Only gets 14 rows, approach in my last post gets 15 rows. Eliminate 17,3 from the source data set and both yield 15 rows.

    1,3; 1,4; 1,9; 2,5; 2,12; 2,13; 3,16; 4,10; 5,4; 5,9; 5,5; 6,13; 6,10; 6,6; 7,3; 8,10; 9,2; 9,17; 10,14; 10,16; 10,2; 11,7; 11,1; 11,8; 11,11; 12,14; 12,1; 12,16; 13,16; 13,5; 13,12; 14,5; 14,6; 14,17; 15,15; 15,10; 15,2; 16,1; 16,3; 17,3;

  99. mikegood says:

    I have spent some time trying to create a tricky source data set for part2, but no luck. Maybe this problem is not complex after all. Below is revised code based on keith.dumaine’s post to solve both parts. It will quickly find the correct solution even when there are thousands of source data rows.

    create function dbo.Least(@a int, @b int, @c int) returns int
    as
    begin
    if (@c is null) –handle 2-argument call
    return case when @a <= @b then @a else @b end

    return case –handle 3-argument call
    when @a <= @b and @a <= @c then @a
    when @b <= @a and @b <= @c then @b
    else @c
    end
    end
    go

    –part 1 solution
    while (1 = 1)
    begin
    insert Part1Dest (a, b)
    select top 1 s.a, s.b
    from Part1Source s
    join (select s1.a, count(s1.a) as N from Part1Source s1
    left join Part1Dest d on ((d.a = s1.a) or (d.b = s1.b))
    where d.a is null group by s1.a) as a
    on a.a = s.a
    join (select s1.b, count(s1.b) as N from Part1Source s1
    left join Part1Dest d on ((d.a = s1.a) or (d.b = s1.b))
    where d.a is null group by s1.b) as b
    on b.b = s.b
    order by dbo.Least(a.N, b.N, null), a.N + b.N

    if (@@rowcount = 0)
    break –quit when no more rows can be added
    end

    –part 2 solution
    while (1 = 1)
    begin
    insert Part2Dest (a, b, c)
    select top 1 s.a, s.b, s.c
    from Part2Source s
    join (select s2.a, count(s2.a) as N from Part2Source s2
    left join Part2Dest d on ((d.a = s2.a) or (d.b = s2.b) or (d.c = s2.c))
    where d.a is null group by s2.a) as a
    on a.a = s.a
    join (select s2.b, count(s2.b) as N from Part2Source s2
    left join Part2Dest d on ((d.a = s2.a) or (d.b = s2.b) or (d.c = s2.c))
    where d.a is null group by s2.b) as b
    on b.b = s.b
    join (select s2.c, count(s2.c) as N from Part2Source s2
    left join Part2Dest d on ((d.a = s2.a) or (d.b = s2.b) or (d.c = s2.c))
    where d.a is null group by s2.c) as c
    on c.c = s.c
    order by dbo.Least(a.N, b.N, c.N),
    dbo.Least(a.N + b.N, a.N + c.N, b.N + c.N),
    a.N + b.N + c.N

    if (@@rowcount = 0)
    break –quit when no more rows can be added
    end

  100. Ryan Randall says:

    > This set causes approach posted November 13, 2007 5:53 AM EST by Ryan Randall to break down.

    Great work, mikegood! I had my suspicions, and it’s good to have a proof one way or another. I’d run out of energy on this puzzle by then to figure it out for myself!

  101. Ryan Randall says:

    mikegood

    > It will quickly find the correct solution even when there are thousands of source data rows.

    Just out of interest, how are you testing for correctness? Awesome work over the last few days, by the way.

    Incidentally, I realised shortly after November 13 that this problem probably falls under graph theory (something I have never studied). If any non-brute-force approach truly works, it would be interesting to find if there was a theorem to support the key idea of the approach.

    Just a thought.

  102. mikegood says:

    Ryan, initially, with small sets I was testing against results from your original or chris leonard’s brute force approach. I’m convinced these are both accurate.

    When I posted flawed approach earlier, I clearly wasn’t testing enough. I had created what I thought were tricky source data sets, but they weren’t good enough. Since you found my flaw, I’ve been using that large set you provided & tweaking it somewhat, and comparing results of my algorithm with those generated by your Nov 13 post. Your Nov 13th post has been the standard.

    And I did had no reason to suspect it. Only reason I drilled in further on Nov 13th approach was that a fluke I happened to observe the results grow by 1 when I removed a row from the source set. Have since spent a little time trying to figure out where the problem is, and have to admit I don’t understand it well enough. It looks pretty similar to the approach I’ve been working with, doesn’t it? If so, I bet problem is with the order by stmt, maybe it’s too simple. I posted smallest source set I could find that exhibited the problem, but it is not very small, not sure if you can figure out where the root problem is?

    I really have tried to create a source set that attempts to bait the modified keith.dumaine approach into making a bad decision (one that appears to be best, but is short-sighted), but I just don’t think it’s possible. Even for part 2. I took lot of math in school but nothing like this. But I bet a math major could settle this pretty easily, determine if non-brute force approach is feasible or impossible.

    So back to your question, how am I testing for correctness? Well you caught me, I made an unsupported assertion of correctness. For anything other than small source sets, your Nov 13th post is still best alternate routine to compare results with. So I’m basically asserting correctness until someone like yourself proves otherwise. So far you’ve proven to be very good at this!

    Am about out of energy myself. Too bad I didn’t see this earlier. I only noticed when the monthly simple-talk email mentioned my old friend Chris Leonard.

  103. Celko says:

    Mike Good was right about that not being my best work. My goal was to get the first acceptable answer. When you do NP-complete problems in SQL, you usually get the entire set of candidate answers and the query runs like glue. Here is my “glue query”:

    – create the original source data
    CREATE TABLE Source
    (pair_nbr INTEGER NOT NULL UNIQUE,
    a INTEGER NOT NULL,
    b INTEGER NOT NULL,
    PRIMARY KEY (a, b));

    INSERT INTO Source VALUES (1, 1, 1);
    INSERT INTO Source VALUES (2, 1, 2);
    INSERT INTO Source VALUES (3, 2, 3);
    INSERT INTO Source VALUES (4, 7, 2);
    INSERT INTO Source VALUES (5, 2, 4);
    INSERT INTO Source VALUES (6, 5, 5);
    INSERT INTO Source VALUES (7, 5, 1);
    INSERT INTO Source VALUES (8, 5, 3);
    INSERT INTO Source VALUES (9, 9, 0);
    INSERT INTO Source VALUES (10, 11, 2);

    – CTE to set up all subsets of the ten sample pairs
    WITH Flags (i)
    AS (VALUES (‘t’), (‘f’)),

    –CROSS JOIN from Hell to get all subsets and random candidate id numbers
    Subsets (subset_nbr, f01, f02, f03, f04, f05, f06, f07, f08, f09, f10)
    AS
    (SELECT ROW_NUMBER() OVER () AS subset_nbr,
    F01.i, F02.i, F03.i, F04.i, F05.i,
    F06.i, F07.i, F08.i, F09.i, F10.i
    FROM Flags AS F01, Flags AS F02, Flags AS F03, Flags AS F04, Flags AS F05,
    Flags AS F06, Flags AS F07, Flags AS F08, Flags AS F09, Flags AS F10),

    –filter out pairs from the permutations
    Candidates(subset_nbr, a, b)
    AS (
    SELECT subset_nbr, S.a, S.b FROM Subsets AS P, Source AS S WHERE S.pair_nbr = 1 AND P.f01 = ‘t’
    UNION ALL
    SELECT subset_nbr, S.a, S.b FROM Subsets AS P, Source AS S WHERE S.pair_nbr = 2 AND P.f02 = ‘t’
    UNION ALL
    SELECT subset_nbr, S.a, S.b FROM Subsets AS P, Source AS S WHERE S.pair_nbr =3 AND P.f03 = ‘t’
    UNION ALL
    SELECT subset_nbr, S.a, S.b FROM Subsets AS P, Source AS S WHERE S.pair_nbr = 4 AND P.f04= ‘t’
    UNION ALL
    SELECT subset_nbr, S.a, S.b FROM Subsets AS P, Source AS S WHERE S.pair_nbr = 5 AND P.f05 = ‘t’
    UNION ALL
    SELECT subset_nbr, S.a, S.b FROM Subsets AS P, Source AS S WHERE S.pair_nbr = 6 AND P.f06 = ‘t’
    UNION ALL
    SELECT subset_nbr, S.a, S.b FROM Subsets AS P, Source AS S WHERE S.pair_nbr = 7 AND P.f07 = ‘t’
    UNION ALL
    SELECT subset_nbr, S.a, S.b FROM Subsets AS P, Source AS S WHERE S.pair_nbr = 8 AND P.f08 = ‘t’
    UNION ALL
    SELECT subset_nbr, S.a, S.b FROM Subsets AS P, Source AS S WHERE S.pair_nbr = 9 AND P.f09 = ‘t’
    UNION ALL
    SELECT subset_nbr, S.a, S.b FROM Subsets AS P, Source AS S WHERE S.pair_nbr = 10 AND P.f10 = ‘t’)

    SELECT subset_nbr, a, b — return the winners
    FROM Candidates AS C1
    WHERE subset_nbr
    IN (SELECT C2.subset_nbr — find all perms that meet uniqueness criteria
    FROM Candidates AS C2
    GROUP BY C2.subset_nbr
    HAVING COUNT(a) = COUNT(DISTINCT a)
    AND COUNT(b) = COUNT(DISTINCT b)
    — AND COUNT(*) >= 5) — so you don’t go nuts looking at the output
    ORDER BY subset_nbr, a, b;

    The bad news!
    1) This gives duplicate answers under different subset_nbrs. Yes, I could do a relational division and remove them, but that would really be messy to read.

    2) This is probably going to scale better than you think. Richard Romley wrote an 81-way self-join for a Sudoku solver which also returns all possible grids. It is quite fast, even with hundreds of solutions. This is much simpler.

  104. Celko says:

    Oops! My error; this does not generate duplicates. Here are the answers for the original problem:

    172 1 1
    172 2 4
    172 5 5
    172 9 0
    172 11 2

    =========
    186 1 1
    186 2 4
    186 5 5
    186 7 2
    186 9 0

    =========
    427 1 1
    427 2 3
    427 5 5
    427 9 0
    427 11 2

    =========
    441 1 1
    441 2 3
    441 5 5
    441 7 2
    441 9 0

    =========
    652 1 1
    652 2 4
    652 5 3
    652 9 0
    652 11 2

    =========
    666 1 1
    666 2 4
    666 5 3
    666 7 2
    666 9 0

Leave a Reply