Lionel Clarke

Software Engineer - Red Gate Software

SQL Puzzle 7

Published Friday, October 26, 2007 10:53 AM

/*
Well, it has been a while since I posted up a puzzle. I  blame this on the very unflattering picture of myself attached to the blog. This causes me a natural reluctance to put it in a position where people might see it! It is that and ...ahem... the cat ate my homework ...ahem... blog entry. Anyway, this puzzle has a couple of parts. Since I am short on ideas, the first part is the same as the second part just with one extra column.

The puzzle is very simple. You have to move as much of the data as you can from the source tables to the destination tables. There is one restriction though;  you can only use one insert statement in each part! Extra points are definitely awarded for a solution that works in both SQL Server 2000 and 2005. So, to summarize, one insert statement to move data from Part1Source -> Part1Dest, and one insert stament to move Part2Source -> Part2Dest.

Have fun!

Lionel


CREATE TABLE Part1Source
(
    a INT,
    b INT,
    
    PRIMARY KEY (a,b)
)
    
INSERT INTO Part1Source VALUES (1,1)
INSERT INTO Part1Source VALUES (1,2)
INSERT INTO Part1Source VALUES (2,3)
INSERT INTO Part1Source VALUES (7,2)
INSERT INTO Part1Source VALUES (2,4)
INSERT INTO Part1Source VALUES (5,5)
INSERT INTO Part1Source VALUES (5,1)
INSERT INTO Part1Source VALUES (5,3)
INSERT INTO Part1Source VALUES (9,0)
INSERT INTO Part1Source VALUES (11,2)

CREATE TABLE Part1Dest
(
    a INT UNIQUE,
    b INT UNIQUE,
    
    FOREIGN KEY (a,b)  REFERENCES Part1Source(a,b)    
)

------------------------ Part 2 ------------------------

CREATE TABLE Part2Source
(
    a INT,
    b INT,
    c INT,
    
    PRIMARY KEY (a,b,c)
)
    
    
INSERT INTO Part2Source VALUES (1,1,2)
INSERT INTO Part2Source VALUES (1,2,2)
INSERT INTO Part2Source VALUES (2,3,1)
INSERT INTO Part2Source VALUES (7,2,1)
INSERT INTO Part2Source VALUES (2,4,7)
INSERT INTO Part2Source VALUES (5,5,6)
INSERT INTO Part2Source VALUES (5,1,4)
INSERT INTO Part2Source VALUES (5,3,4)
INSERT INTO Part2Source VALUES (9,0,1)
INSERT INTO Part2Source VALUES (11,2,9)

CREATE TABLE Part2Dest
(
    a INT UNIQUE,
    b INT UNIQUE,
    c INT UNIQUE,
    
    FOREIGN KEY (a,b,c)  REFERENCES Part2Source(a,b,c)    
)

*/
by Lionel

Comments

 

GSquared said:

part 1
INSERT INTO part1dest (ab)

SELECT DISTINCT a,
    (
    
SELECT MIN(b)
        
FROM part1source t2
        
WHERE part1source.a
            
AND NOT IN
        
(
        
SELECT MIN(b)
            
FROM part1source t3
            
WHERE <> part1source.a
            
GROUP BY a))
    
FROM part1source
    
WHERE
    
(
    
SELECT MIN(b)
        
FROM part1source t2
        
WHERE part1source.a
            
AND NOT IN
        
(
        
SELECT MIN(b)
            
FROM part1source t3
            
WHERE <> part1source.a
            
GROUP BY a)) IS NOT NULL
    
UNION
SELECT DISTINCT 
a,
    (
    
SELECT MAX(b)
        
FROM part1source t2
        
WHERE part1source.a
            
AND NOT IN
        
(
        
SELECT MAX(b)
            
FROM part1source t3
            
WHERE <> part1source.a
            
GROUP BY a))
    
FROM part1source
    
WHERE
    
(
    
SELECT MAX(b)
        
FROM part1source t2
        
WHERE part1source.a
            
AND NOT IN
        
(
        
SELECT MAX(b)
            
FROM part1source t3
            
WHERE <> part1source.a
            
GROUP BY a)) IS NOT NULL
        AND 
NOT IN
    
(
    
SELECT DISTINCT a
        
FROM part1source
        
WHERE
        
(
        
SELECT MIN(b)
            
FROM part1source t2
            
WHERE part1source.a
                
AND NOT IN
            
(
            
SELECT MIN(b)
                
FROM part1source t3
                
WHERE <> part1source.a
                
GROUP BY a)) IS NOT NULL)
    
UNION ALL
SELECT DISTINCT MIN(a), b
    
FROM part1source
    
WHERE NOT IN
    
(
    
SELECT DISTINCT a
        
FROM part1source
        
WHERE
        
(
        
SELECT MIN(b)
            
FROM part1source t2
            
WHERE part1source.a
                
AND NOT IN
            
(
            
SELECT MIN(b)
                
FROM part1source t3
                
WHERE <> part1source.a
                
GROUP BY a)) IS NOT NULL
        
UNION
    SELECT DISTINCT 
a
        
FROM part1source
        
WHERE
        
(
        
SELECT MAX(b)
            
FROM part1source t2
            
WHERE part1source.a
                
AND NOT IN
            
(
            
SELECT MAX(b)
                
FROM part1source t3
                
WHERE <> part1source.a
                
GROUP BY a)) IS NOT NULL
            AND 
NOT IN
        
(
        
SELECT DISTINCT a
            
FROM part1source
            
WHERE
            
(
            
SELECT MIN(b)
                
FROM part1source t2
                
WHERE part1source.a
                    
AND NOT IN
                
(
                
SELECT MIN(b)
                    
FROM part1source t3
                    
WHERE <> part1source.a
                    
GROUP BY a)) IS NOT NULL))
        AND 
NOT IN
    
(
    
SELECT DISTINCT
        
(
        
SELECT MIN(b)
            
FROM part1source t2
            
WHERE part1source.a
                
AND NOT IN
            
(
            
SELECT MIN(b)
                
FROM part1source t3
                
WHERE <> part1source.a
                
GROUP BY a))
        
FROM part1source
        
WHERE
        
(
        
SELECT MIN(b)
            
FROM part1source t2
            
WHERE part1source.a
                
AND NOT IN
            
(
            
SELECT MIN(b)
                
FROM part1source t3
                
WHERE <> part1source.a
                
GROUP BY a)) IS NOT NULL
        
UNION
    SELECT DISTINCT
        
(
        
SELECT MAX(b)
            
FROM part1source t2
            
WHERE part1source.a
                
AND NOT IN
            
(
            
SELECT MAX(b)
                
FROM part1source t3
                
WHERE <> part1source.a
                
GROUP BY a))
        
FROM part1source
        
WHERE
        
(
        
SELECT MAX(b)
            
FROM part1source t2
            
WHERE part1source.a
                
AND NOT IN
            
(
            
SELECT MAX(b)
                
FROM part1source t3
                
WHERE <> part1source.a
                
GROUP BY a)) IS NOT NULL
            AND 
NOT IN
        
(
        
SELECT DISTINCT a
            
FROM part1source
            
WHERE
            
(
            
SELECT MIN(b)
                
FROM part1source t2
                
WHERE part1source.a
                    
AND NOT IN
                
(
                
SELECT MIN(b)
                    
FROM part1source t3
                    
WHERE <> part1source.a
                    
GROUP BY a)) IS NOT NULL))
    
GROUP BY 
If that's the pattern you're looking for, Part 2 could have a similar solution, just with more union statements.
October 26, 2007 10:13 AM
 

Phil Factor said:

I thought I'd go straight to Part 2 as it seemed more difficult. There are 432 solutions, some of which only manage three insertions. Many manage four, but a lot manage five. I think I've kept to the rules as you said nothing about not using an EXECUTE statement, and the stipulation of only one INSERT seemed rather unfair, but I kept to it!

I have a lingering suspicion that there is a simpler solution, but I can't think of it.

DECLARE @command VARCHAR(255)


SELECT TOP @command='
Insert into Part2dest
select '
+CAST(f.[a] AS VARCHAR)+', '
        
+CAST(f.[b] AS VARCHAR)+', '
        
+CAST(f.[c] AS VARCHAR)+'
union
select '
+CAST(g.[a] AS VARCHAR)+', '
       
+CAST(g.[b] AS VARCHAR)+', '
       
+CAST(g.[c] AS VARCHAR)+'
union
select '
+CAST(h.[a] AS VARCHAR)+', '
       
+CAST(h.[b] AS VARCHAR)+', '
       
+CAST(h.[c] AS VARCHAR)+'
 '
+COALESCE('union
select '
+CAST(i.[a] AS VARCHAR)+', '
       
+CAST(i.[b] AS VARCHAR)+', '
       
+CAST(i.[c] AS VARCHAR)+'
'
,'')+COALESCE('union
select '
+CAST(j.[a] AS VARCHAR)+', '
       
+CAST(j.[b] AS VARCHAR)+', '
       
+CAST(j.[c] AS VARCHAR)+'
'
,'') +'
'
FROM part2source f
   
LEFT OUTER JOIN part2source g
   
ON f.a<>g.a AND f.b<>g.b AND f.c<>g.c 
LEFT OUTER JOIN part2source h
   
ON h.a NOT IN (g.a,f.a
   AND 
h.b NOT IN (g.b,f.b)
   AND 
h.c NOT IN (g.c,f.c)
LEFT OUTER JOIN part2source i
   
ON i.a NOT IN (h.a,g.a,f.a
   AND 
i.b NOT IN (h.b,g.b,f.b)
   AND 
i.c NOT IN (h.c,g.c,f.c)
LEFT OUTER JOIN part2source j
   
ON j.a NOT IN (i.a,h.a,g.a,f.a
   AND 
j.b NOT IN (i.b,h.b,g.b,f.b)
   AND 
j.c NOT IN (i.c,h.c,g.c,f.c)
--the following lines are just a check
--that there are no s-entry solutions
/*LEFT outer JOIN part2source k
ON k.a not IN (j.a,i.a,h.a,g.a,f.a) 
AND k.b NOT IN (j.b,i.b,h.b,g.b,f.b)
AND k.c NOT IN (j.c,i.c,h.c,g.c,f.c)*/
WHERE j.a IS NOT NULL-- change this to 
--WHERE i.a IS NULL --to get the worst solution!

EXECUTE (@command)
October 28, 2007 1:19 PM
 

Phil Factor said:

For some reason, I feel compelled to provide the part 1 'cheat' solution. Is there a prize for this? I fancy a Simple-Talk goody bag after all this work
DECLARE @command VARCHAR(255)

--there are 936 possible solutions
SELECT TOP @command='
Insert into Part1dest
select '
+CAST(f.[a] AS VARCHAR)+', '
        
+CAST(f.[b] AS VARCHAR)+'
union
select '
+CAST(g.[a] AS VARCHAR)+', '
       
+CAST(g.[b] AS VARCHAR)+'
union
select '
+CAST(h.[a] AS VARCHAR)+', '
       
+CAST(h.[b] AS VARCHAR)+'
 '
+COALESCE('union
select '
+CAST(i.[a] AS VARCHAR)+', '
       
+CAST(i.[b] AS VARCHAR)+'
'
,'')+COALESCE('union
select '
+CAST(j.[a] AS VARCHAR)+', '
       
+CAST(j.[b] AS VARCHAR)+'
'
,'') +'
'
FROM part1source f
   
LEFT OUTER JOIN part1source g
   
ON f.a<>g.a AND f.b<>g.b 
LEFT OUTER JOIN part1source h
   
ON h.a NOT IN (g.a,f.a
   AND 
h.b NOT IN (g.b,f.b)
LEFT OUTER JOIN part1source i
   
ON i.a NOT IN (h.a,g.a,f.a
   AND 
i.b NOT IN (h.b,g.b,f.b)
LEFT OUTER JOIN part1source j
   
ON j.a NOT IN (i.a,h.a,g.a,f.a
   AND 
j.b NOT IN (i.b,h.b,g.b,f.b)
--the following lines are just a check
--that there are no s-entry solutions
LEFT OUTER JOIN part1source k
ON k.a NOT IN (j.a,i.a,h.a,g.a,f.a
AND 
k.b NOT IN (j.b,i.b,h.b,g.b,f.b)
WHERE j.a IS NOT NULL-- change this to 
--WHERE j.a IS NULL --to get the worst solution!

EXECUTE (@command)
October 28, 2007 1:49 PM
 

Adam Machanic said:

OK, I used only one insert statement ;-)

---
DECLARE @i INT

SET 
@i 1
WHILE @i >= 0
    
BEGIN
    DECLARE 
@s VARCHAR(200)
    
SELECT
        
@s 'ALTER TABLE Part1Dest 
DROP ' 
CONSTRAINT_NAME
        
FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS
        
WHERE
                
TABLE_NAME 'Part1Dest'
            
AND CONSTRAINT_TYPE 'UNIQUE'
    
EXEC (@s)
        
SET @i @i 1
    
END
INSERT 
Part1Dest
    
SELECT *
    
FROM Part1Source
    
---
October 28, 2007 9:10 PM
 

Steve S said:

Hmm, some of that SQL is a bit unwieldy. Here's a method structured around SQL 2005 Common Table Expressions. I guess it could be done as a single big load of JOINs but I like the CTE syntax for its readability:

; WITH
[Sequence] AS (
-- Get a numeric sequence at least as long as the 2^N possible combinations of data points
select TOP (SELECT POWER(2,COUNT(1)) FROM Part1Source) -1+ROW_NUMBER() OVER (ORDER BY a.name) AS num
from sys.sysobjects a CROSS JOIN sys.sysobjects b
),
DataPoints AS (
-- Assign a sequential ID to each data point
SELECT a, b, -1+ROW_NUMBER() OVER (ORDER BY a) AS Position FROM Part1Source
),
ValidCombinations AS (
-- Determine which of the 2^N combinations are valid (i.e. meet uniqueness constraints)
SELECT [Sequence].num, COUNT(DataPoints.a) AS NumPoints
FROM DataPoints
CROSS JOIN [Sequence]
WHERE POWER(2,DataPoints.Position) & [Sequence].num > 0
GROUP BY [Sequence].num
HAVING COUNT(DISTINCT DataPoints.a) = COUNT(DataPoints.a) -- All the a's must be distinct
AND COUNT(DISTINCT DataPoints.b) = COUNT(DataPoints.a) -- All the b's must be distinct
),
ChosenCombination AS (
-- Pick one of the combinations that contains the most data points
SELECT TOP 1 num
FROM ValidCombinations
ORDER BY NumPoints DESC
)
-- Insert the data points that correspond to the "best" combination
INSERT INTO Part1Dest ([a],[b])
SELECT dp.a, dp.b
FROM DataPoints dp
CROSS JOIN ChosenCombination
WHERE POWER(2, dp.Position) & ChosenCombination.num > 0
October 29, 2007 6:59 AM
 

SQLzombie said:

If you dont care about  err messages , this wud be the solution for part 1..
DECLARE @cur CURSOR 
DECLARE 
@c INT,@d INT
    SET 
@cur = CURSOR fast_forward FOR
SELECT 

    
FROM Part1Source
OPEN @cur
            
FETCH next 
    
FROM @cur INTO @c,@d
WHILE  @@fetch_status 0
    
BEGIN
    
rep:
    
INSERT INTO  Part1Dest 
        
SELECT @c,@d
    
IF @@error =4145
                    
GOTO rep
                    
FETCH next 
            
FROM @cur INTO @c,@d
    
END
            
October 30, 2007 8:04 PM
 

JRMorris said:

This probably isn't what you meant by "use only one insert".. but:
DECLARE @a INT
DECLARE
@b INT
DECLARE
cur_Uniques CURSOR
         FOR
SELECT
a, b
  
FROM part1source
OPEN  cur_Uniques
        
FETCH next
  
FROM cur_Uniques INTO @a, @b
WHILE @@fetch_status = 0
  
BEGIN
   INSERT INTO
part1dest
      
SELECT @a, @b
      
WHERE @a NOT IN (
      
SELECT a
        
FROM part1dest)
         AND
@b NOT IN (
      
SELECT b
        
FROM part1dest)
            
FETCH next
      
FROM cur_Uniques INTO @a, @b
  
END
         CLOSE  
cur_Uniques
        
DEALLOCATE  cur_Uniques
October 31, 2007 11:06 AM
 

Drew1022 said:

It might not be the best code, but then again, SQL isn't involved in my day job. This is my contribution for part 1. I can probably extend this to work for part 2 as well.
/************ SOLUTION TO PART 1 ********************/

    -- I started off by coming up with this statement to 
    -- build up a view of the table to give me something 
    -- better to key off of and help with finding the 
    -- combination that will allow me to insert the most 
    -- rows. 
SELECT (ABS(bCount aCount) + aCount bCountpriority,  
(CAST(AS VARCHAR) + '_' CAST(AS VARCHAR)) a_b
(CASE WHEN aCount bCount THEN ELSE ENDaORb
(
CASE WHEN aCount bCount THEN 'b' ELSE 'a' ENDaORbIndicator,
 
abCounts.*
    
FROM
    
(
    
SELECT a, (
        
SELECT COUNT(*) 
            
FROM Part1Source p1s2 
            
WHERE p1s2.a p1s1.aaCountb, (
        
SELECT COUNT(*) 
            
FROM Part1Source p1s3 
            
WHERE p1s3.b p1s1.bbCount
        
FROM Part1Source p1s1
    
AS abCounts
    
ORDER BY priorityaORb
    
-- I then applied it here to actually select a subset 
    -- of the records that is *hopefully* close to the 
    -- maximum I can get. I have no idea if this will 
    -- always get the most rows no matter what combinations 
    -- are in the source table. It gets 5 with the current 
    -- data set for this problem. 
SELECT 
    
FROM
    
(
    
SELECT (ABS(bCount aCount) + aCount bCountpriority,
(
CAST(AS VARCHAR) + '_' CAST(AS VARCHAR)) a_b,
(
CASE WHEN aCount bCount THEN ELSE ENDaORb,
(
CASE WHEN aCount bCount THEN 'b' ELSE 'a' ENDaORbIndicator,
abCounts.*       FROM        (
        
SELECT a, (
            
SELECT COUNT(*) 
                
FROM Part1Source p1s2 
                
WHERE p1s2.a p1s1.aaCountb, (
            
SELECT COUNT(*) 
                
FROM Part1Source p1s3 
                
WHERE p1s3.b p1s1.bbCount
            
FROM Part1Source p1s1
        
AS abCounts
    
AS sub1
    
WHERE NOT IN
    
(
    
SELECT a
        
FROM
        
(
        
SELECT (ABS(bCount aCount) + aCount bCountpriority,
(
CAST(AS VARCHAR) + '_' CAST(AS VARCHAR)) a_b,
(
CASE WHEN aCount bCount THEN ELSE ENDaORb,
(
CASE WHEN aCount bCount 
THEN 'b' ELSE 'a' ENDaORbIndicator,
abCounts.*
            
FROM
            
(
            
SELECT a, (
                
SELECT COUNT(*) 
                    
FROM Part1Source p1s2 
                    
WHERE p1s2.a p1s1.aaCountb, (
                
SELECT COUNT(*) 
                    
FROM Part1Source p1s3 
                    
WHERE p1s3.b p1s1.bbCount
                
FROM Part1Source p1s1
            
AS abCounts
        
sub2
        
WHERE sub2.priority sub1.priority
            
OR
        (
                    
sub2.priority sub1.priority
                
AND sub2.aORb sub1.aORb
        
)
    )
        AND 
NOT IN
    
(
    
SELECT b
        
FROM
        
(
        
SELECT (ABS(bCount aCount) + aCount bCountpriority,
(
CAST(AS VARCHAR) + '_' CAST(AS VARCHAR)) a_b
(
CASE WHEN aCount bCount THEN ELSE ENDaORb
(
CASE WHEN aCount bCount 
THEN 'b' ElSE 'a' ENDaORbIndicator,
abCounts.*
            
FROM
            
(
            
SELECT a, (
                
SELECT COUNT(*) 
                    
FROM Part1Source p1s2 
                    
WHERE p1s2.a p1s1.aaCountb, (
                
SELECT COUNT(*) 
                    
FROM Part1Source p1s3 
                    
WHERE p1s3.b p1s1.bbCount
                
FROM Part1Source p1s1
            
AS abCounts
        
sub3
        
WHERE sub3.priority sub1.priority
            
OR
        (
                    
sub3.priority sub1.priority
                
AND sub3.aORb sub1.aORb
        
)
    )
 
October 31, 2007 12:26 PM
 

Drew1022 said:

Oh man, you guys need an edit button here for me. I forgot to include the little INSERT line I had for it and alter the main select to just select a, b ... I was testing it without doing the insert part so that I didn't have to keep clearing the destination table. I won't report all my code here with that little difference.

Maybe I should get back to my COBOL and RPG instead of sitting here monkeying with these things? I don't think this is in my job description, heh. ;)
October 31, 2007 12:31 PM
 

im_bob_loucans said:

Heres what I tried - wasted first half day at work on this - but sometimes just gotta do it - it uses no insert statements and is 2000 and 2005 compatible - i dont know if its bullet proof - maybe i'll write some unit tests around it for the second half:
IF EXISTS (SELECT 
          
FROM   sysobjects 
          
WHERE  name 'part1dest' 
                 
AND TYPE 'U'
   
BEGIN 
       DROP TABLE 
part1dest 
   
END 
    
SELECT   MIN
(aAS a
        

INTO     part1dest 
FROM     (SELECT   AS a
                  
MIN(bAS 
         
FROM     (SELECT   MIN(aAS a
                            

                   
FROM     (SELECT   AS a
                                      
MIN(bAS 
                             
FROM     part1source 
                             
GROUP BY a
                   
GROUP BY 
                                                          
                   
UNION 
                                      
                   SELECT   MAX
(aAS a
                            

                   
FROM     (SELECT   AS a
                                      
MAX(bAS 
                             
FROM     part1source 
                             
GROUP BY a
                   
GROUP BY b
         
GROUP BY a
GROUP BY 
ORDER BY MIN(a), 
        

        
IF EXISTS (SELECT 
          
FROM   sysobjects 
          
WHERE  name 'part1dest' 
                 
AND TYPE 'U'
   
BEGIN 
       ALTER TABLE 
part1dest 
        
ADD CONSTRAINT uk_a UNIQUEa  
        
       
ALTER TABLE part1dest 
        
ADD CONSTRAINT uk_b UNIQUEb  
        
       
ALTER TABLE part1dest 
        
ADD CONSTRAINT fk_ab FOREIGN KEY
            
REFERENCES part1source
   
END 
October 31, 2007 2:03 PM
 

honga said:

The puzzle does not require the logical layer has to be done in SQL. So, Here is my answer for both 2 parts.

Part1:
INSERT INTO part1dest (a, b)
select a,b from Part1Source
where a*b in (1,6,25,14,0)

Part2:
INSERT INTO part2dest (a,b,c)
select a,b,c from Part2Source
where a*b*c in (0,2,56,150,198)
October 31, 2007 6:32 PM
 

im_bob_loucans said:

Good call Honga - sometimes we waste so much time with generic apply to all situations solutions we forget that the solution is very simple.  I bow to you.  But it was fun nonetheless.
October 31, 2007 8:26 PM
 

puzsol said:

Ok, so I didn't stick to the 'only one insert' requirement... but many of the solutions used a select into, which if you ask me is the same thing, or they were restricted to the solution of the data provided (max limits etc)... and if you are going to do that then Honga's solution is by far the smallest and simplest (as long as a human can resolve the solution in the first place)....  I liked Phil's solution the best - basically expanding the solution tree, then picking the rows with the most data - but it is still limited to only working for this dataset.... you could of course add another dynamic layer to make it work for any dataset.... (I'm not about to though)

Anyway, here is a solution (that doesn't stick to the 'rules', but if you replace the table vars with temp tables, and the inserts into selects, then you could make it fit the 'rules')...

-- Part 1
-- declare the tables used to build the solution
declare @sol_groups table(gn int, a int, b int);
declare @sol_step table(gn int, a int, b int, rn int);

-- Get the base data
insert into @sol_groups
select row_number() over (order by a, b), a, b
from Part1Source

Loop:

-- Get the possible values to add for each group
insert into @sol_step
select sg.gn, ps.a, ps.b, row_number() over (order by sg.gn, ps.a, ps.b)
from (select distinct gn as gn from @sol_groups) as sg
full outer join Part1Source ps on null is null
where ps.a not in (select a from @sol_groups where gn = sg.gn) and ps.a > (select max(a) from @sol_groups where gn = sg.gn)
and ps.b not in (select b from @sol_groups where gn = sg.gn)

-- Test if we have reached the end
if @@rowcount <= 0
goto Soln

-- Add the original values to each of the new values (using the row number as the new group number)
insert into @sol_step
select ss.rn, sg.a, sg.b, 0
from @sol_step ss
inner join @sol_groups sg on sg.gn = ss.gn

-- Update the new values group numbers to match...
update @sol_step set gn = rn where rn != 0

-- Swap the table data over... is there a better way to do this?
delete from @sol_groups
insert into @sol_groups
select gn, a, b from @sol_step
delete from @sol_step

-- Keep going...
goto Loop

-- Present all the solutions, do the insertion, prove it.
Soln:
select * from @sol_groups order by gn, a, b
insert into Part1Dest
select a, b from @sol_groups where gn = 1
select * from Part1Dest


Of course the selects after the Soln tag are superfluous (just to present something to the user)... and the solution for part 2 is similar:


declare @sol_groups table(gn int, a int, b int, c int);
declare @sol_step table(gn int, a int, b int, c int, rn int);

insert into @sol_groups
select row_number() over (order by a, b, c), a, b, c
from Part2Source

Loop:

insert into @sol_step
select sg.gn, ps.a, ps.b, ps.c, row_number() over (order by sg.gn, ps.a, ps.b, ps.c)
from (select distinct gn as gn from @sol_groups) as sg
full outer join Part2Source ps on null is null
where ps.a not in (select a from @sol_groups where gn = sg.gn) and ps.a > (select max(a) from @sol_groups where gn = sg.gn)
and ps.b not in (select b from @sol_groups where gn = sg.gn)
and ps.c not in (select c from @sol_groups where gn = sg.gn)

if @@rowcount <= 0
goto Soln

insert into @sol_step
select ss.rn, sg.a, sg.b, sg.c, 0
from @sol_step ss
inner join @sol_groups sg on sg.gn = ss.gn

update @sol_step set gn = rn where rn != 0

delete from @sol_groups
insert into @sol_groups
select gn, a, b, c from @sol_step

delete from @sol_step

goto Loop

Soln:
select * from @sol_groups order by gn, a, b, c;

insert into Part2Dest
select a, b, c from @sol_groups where gn = 1;

select * from Part2Dest;


Hope this stops some other silly person like me wasting time to work out a similar solution.

November 1, 2007 12:58 AM
 

janelle briggs said:

-- I only have 2000 - so I assume this works on 2005!
DECLARE @c INT,@d INT    
DECLARE c1 CURSOR FOR SELECT a,b FROM Part1Source
OPEN c1            
FETCH NEXT FROM c1 INTO @c,@d
WHILE  @@fetch_status = 0    
BEGIN
INSERT INTO  Part1Dest      
SELECT @c,@d  
WHERE NOT EXISTS (SELECT 1 FROM Part1Dest
WHERE a = @c OR b = @d)
FETCH NEXT FROM c1 INTO @c,@d    
END
CLOSE c1
DEALLOCATE c1
November 1, 2007 1:06 AM
 

janelle briggs said:

Sorry - forgot part 2 before, so here it is...
DECLARE @d INT,@e INT, @f int  
DECLARE c1 CURSOR FOR SELECT a,b,c FROM Part2Source
OPEN c1            
FETCH NEXT FROM c1 INTO @d,@e,@f
WHILE  @@fetch_status = 0    
BEGIN
INSERT INTO  Part2Dest      
SELECT @d,@e,@f
WHERE NOT EXISTS (SELECT 1 FROM Part2Dest
WHERE a = @d OR b = @e OR c = @f)
FETCH NEXT FROM c1 INTO @d,@e,@f
END
CLOSE c1
DEALLOCATE c1
November 1, 2007 1:12 AM
 

Mikkel Toudal Kristiansen said:

How about this one for part 1:

insert into Part1Dest
select a, b
from Part1Source
where
((a = 1) and (b = 1)) or
((a = 2) and (b = 3)) or
((a = 5) and (b = 5)) or
((a = 9) and (b = 0)) or
((a = 11) and (b = 2))

... and this one for part 2:

insert into Part2Dest
select a, b, c
from Part2Source
where
((a = 1) and (b = 1) and (c = 2)) or
((a = 2) and (b = 3) and (c = 1)) or
((a = 5) and (b = 5) and (c = 6)) or
((a = 11) and (b = 2) and (c = 9))

Of course it works on both SQL 2000 and 2005, and of course it only works on the puzzle data provided ...
November 1, 2007 3:12 AM
 

ChuckNewark said:

This was the easiest solution I could come up with.  They both insert six records into the destinations tables and would work for any data that was loaded into the source tables.

Insert into Part1Dest
Select distinct a, (Select Min(b)
                                         from part1source  
                                         where part1source.a = p1s.a)
from part1source p1s


Insert into Part2Dest
Select distinct a,
                                    (Select Min(b)
                                         from part2source p2si  
                                         where p2si.a = p2s.a),
                                    (select min(c)
                                        from part2source
                                        where part2source.b = (Select Min(b) from
                                                              part2source p2si where p2si.a = p2s.a))
from part2source p2s
November 1, 2007 5:39 AM
 

ChuckNewark said:

Okay I was wrong, my tables didn't have the unique on the B columns
November 1, 2007 5:43 AM
 

roarfred said:

This is a slightly different approach, which I think should work...

The first step is to produce a list of combinations from the source table, and check how many other combinations this entry will block for. Using this, we can make a select which will choose all entries with the fewest blocks, as long as there is no other entry which will block for just as few.

The key in my solution is this rather simple statement, which gives a list of all source entries, and a number of other combinations which cannot be used if this entry is used:

SELECT *, (SELECT count(*) FROM Part1Source Ax WHERE (Ax.A = A.a OR Ax.B = A.b) AND (Ax.A <> A.a OR Ax.B <> A.b)) AS BlockCount FROM Part1Source A

Now, we can use all entries from this list, given:
1) there is no other entry with same a or b with a lower blockcount
2) if another entry with same a or same b gives the same blockcount, choose the one with the lower a or b value


INSERT INTO Part1Dest(a,b)

SELECT a,b
FROM (SELECT *, (SELECT count(*) FROM Part1Source Ax WHERE (Ax.A = A.a OR Ax.B = A.b) AND (Ax.A <> A.a OR Ax.B <> A.b)) AS BlockCount FROM Part1Source A) BC
WHERE NOT EXISTS -- another combination with fewer blocks
(
SELECT *
FROM (SELECT *, (SELECT count(*) FROM Part1Source Ax WHERE (Ax.A = A.a OR Ax.B = A.b) AND (Ax.A <> A.a OR Ax.B <> A.b)) AS BlockCount FROM Part1Source A) BC2
WHERe bc2.blockcount < bc.blockcount and (bc2.a = bc.a or bc2.b = bc.b)
)
AND NOT EXISTS -- another combination with same blocks and lower a or b
(
SELECT *
FROM (SELECT *, (SELECT count(*) FROM Part1Source Ax WHERE (Ax.A = A.a OR Ax.B = A.b) AND (Ax.A <> A.a OR Ax.B <> A.b)) AS BlockCount FROM Part1Source A) BC2
WHERe bc2.blockcount = bc.blockcount
and (bc2.a = bc.a or bc2.b = bc.b)
and (bc2.a < bc.a or bc2.b < bc.b)
)


Thinking of this... it is probably not a good solution in a circumstances. If an entry only blocks only one other combination, the one being blocked might block for another larger number. The solution would be to recursively find the total number of blocks in the key statement and again use it in the same final statement. This would probably require a deeper level of nesting or the use of a cursor, which would be a last resort in my opinion...

As another personal opinion, the puzzle should maybe include a few more limitations, such as avoiding shema manipulation.
November 1, 2007 6:37 AM
 

davekw said:

Here's another solution that builds on GSquared's and roarfred's.

[code]
INSERT INTO Part1Dest
SELECT * FROM (
SELECT DISTINCT S1.a,
(SELECT TOP 1 S2.b FROM Part1Source S2 WHERE (S1.a = S2.a) AND (S2.b NOT IN (
SELECT (SELECT TOP 1 T2.b FROM Part1Source T2 WHERE (T1.a = T2.a) ORDER BY T2.b)
FROM Part1Source T1
WHERE (T1.a < S2.a)))
ORDER BY S2.b) b
FROM Part1Source S1 ) X WHERE X.b IS NOT NULL
[/code]
November 1, 2007 6:49 AM
 

davekw said:

Here's the Part2 solution that builds on the logic above.

INSERT INTO Part2Dest
SELECT *
FROM (
 SELECT DISTINCT S1.a,
   ( SELECT TOP 1 S2.b
     FROM Part2Source S2
     WHERE (S1.a = S2.a) AND (S1.c = S2.c)
     AND (S2.b NOT IN (
         SELECT
           ( SELECT TOP 1 T2.b
             FROM Part2Source T2
             WHERE (T1.a = T2.a) OR (T1.c = T2.c)
             ORDER BY T2.b
           )
         FROM Part2Source T1
         WHERE (T1.a < S2.a)
         )
       )
     ORDER BY S2.b
   ) b,
   ( SELECT TOP 1 S2.c
     FROM Part2Source S2
     WHERE (S1.a = S2.a) AND (S1.b = S2.b)
     AND (S2.c NOT IN (
         SELECT
           ( SELECT TOP 1 T2.c
             FROM Part2Source T2
             WHERE (T1.a = T2.a) OR (T1.b = T2.b)
             ORDER BY T2.c
           )
         FROM Part2Source T1
         WHERE (T1.a < S2.a)
         )
       )
     ORDER BY S2.c
   ) c
 FROM Part2Source S1
 ) X
WHERE (X.b IS NOT NULL) AND (X.c IS NOT NULL)
November 1, 2007 6:53 AM
 

davekw said:

The nested SELECTs in the above solutions ensure that when selecting a "b" value you don't select a row where an "a" value has already been used.

Similarly for part 2, but with the added complication of the "c" value.
November 1, 2007 6:57 AM
 

Steve S said:

Davekw: try your part 1 code with this input:
INSERT  INTO @Part1Source VALUES  (1, 0)
INSERT  INTO @Part1Source VALUES  (1, 1)
INSERT  INTO @Part1Source VALUES  (2, 0)
November 1, 2007 7:03 AM
 

roarfred said:

Interesting solution, davekw, but your solution seems to ensure uniqueness rather than a high number of entries...

Add the two combinations 1,10 and 10,1 to the Source table to see what I mean.
November 1, 2007 7:07 AM
 

roarfred said:

or Steve's numbers, of course :)
November 1, 2007 7:08 AM
 

davekw said:

Thanks guys, it seems you are right.

roarfred: Your solution seems to have the same problem that mine has: ensuring uniqueness rather than high number of entries. Add the following to the original Part 1 data: (3, 2) (3, 3) (3, 4). Your solution returns 5 rows, but I think there are at least 6 number of unique entries.
November 1, 2007 7:43 AM
 

davekw said:

It seems I had one of the checks the wrong way round. < instead of >

Try this... Maybe you guys will still break it.  :)

INSERT INTO Part1Dest
SELECT * FROM (
 SELECT DISTINCT S1.a, (SELECT TOP 1 S2.b FROM Part1Source S2 WHERE (S1.a = S2.a) AND (S2.b NOT IN (
   SELECT (SELECT TOP 1 T2.b FROM Part1Source T2 WHERE (T1.a = T2.a) ORDER BY T2.b DESC)
   FROM Part1Source T1
   WHERE (T1.a > S2.a)))
 ORDER BY S2.b DESC) b
FROM Part1Source S1) X
WHERE X.b IS NOT NULL
November 1, 2007 8:00 AM
 

roarfred said:

davekw, it still seems like there is a problem if you use the following values:
INSERT INTO Part1Source VALUES (1,2)
INSERT INTO Part1Source VALUES (3,2)
INSERT INTO Part1Source VALUES (3,4)
INSERT INTO Part1Source VALUES (5,4)
INSERT INTO Part1Source VALUES (5,6)
INSERT INTO Part1Source VALUES (7,6)
...

Even worse, my attempt gives only two values from this one :( There is clearely a problem with determining the consequence of choosing one entry over the other
November 1, 2007 8:19 AM
 

davekw said:

Hehe, back to the drawing board for both of us. I think I'll go for a new approach. I did like your idea of a Block Count - maybe there is something there?
November 1, 2007 8:24 AM
 

davekw said:

Looking at it again... I think my solution was almost correct, except that it sometimes returned duplicate "b" values. If I'm right then the solution is to group by "b" (see below).

Or maybe I'm wrong???

Anyway... here it is:

INSERT INTO Part1Dest
SELECT MIN(a) AS a, b
FROM (
 SELECT DISTINCT S1.a,
   ( SELECT TOP 1 S2.b
     FROM Part1Source S2
     WHERE (S1.a = S2.a)
     AND (S2.b NOT IN (
         SELECT
           ( SELECT TOP 1 T2.b
             FROM Part1Source T2
             WHERE (T1.a = T2.a)
             ORDER BY T2.b DESC
           )
         FROM Part1Source T1
         WHERE (T1.a > S2.a)
         )
       )
     ORDER BY S2.b DESC
   ) b
 FROM Part1Source S1
 ) X
WHERE X.b IS NOT NULL
GROUP BY b
ORDER BY a
November 1, 2007 9:12 AM
 

LZ007 said:

Here is a solution for part1:

---------
insert into Part1Dest
select a, (1- sign(a-1))*min(b)+sign(a-1)*max(b)
from part1source
where a <10
group by a

-------

And herer is a solution for part2
-------
insert into Part2Dest
select a, (1- sign(a-1))*min(b)+sign(a-1)*max(b),(1- sign(a-1))*min(c)+sign(a-1)*max(c)
from Part2Source
where a !=7
group by a

-------
November 1, 2007 9:49 AM
 

scott2718281828 said:

Here's a pair of SQL 2005-only solutions:

INSERT INTO Part1Dest
SELECT a, b
FROM (
SELECT p.a, p.b,
ROW_NUMBER() OVER (PARTITION BY p.a ORDER BY na + nb) ra,
ROW_NUMBER() OVER (PARTITION BY p.b ORDER BY na + nb) rb
FROM Part2Source p
INNER JOIN (SELECT a, COUNT(*) as na FROM Part2Source GROUP BY a) AS ac ON ac.a = p.a
INNER JOIN (SELECT b, COUNT(*) as nb FROM Part2Source GROUP BY b) AS bc ON bc.b = p.b
) x
WHERE ra = 1 AND rb = 1

INSERT INTO Part2Dest
SELECT a, b, c
FROM (
SELECT p.a, p.b, p.c,
ROW_NUMBER() OVER (PARTITION BY p.a ORDER BY na + nb + nc) ra,
ROW_NUMBER() OVER (PARTITION BY p.b ORDER BY na + nb + nc) rb,
ROW_NUMBER() OVER (PARTITION BY p.c ORDER BY na + nb + nc) rc
FROM Part2Source p
INNER JOIN (SELECT a, COUNT(*) as na FROM Part2Source GROUP BY a) AS ac ON ac.a = p.a
INNER JOIN (SELECT b, COUNT(*) as nb FROM Part2Source GROUP BY b) AS bc ON bc.b = p.b
INNER JOIN (SELECT c, COUNT(*) as nc FROM Part2Source GROUP BY c) AS cc ON cc.c = p.c
) x
WHERE ra = 1 AND rb = 1 AND rc = 1
November 1, 2007 10:04 AM
 

im_bob_loucans said:

Here's a recursive approach to part one.  I fried my brain to see if there was a way to do this recursively without mulitple insert executions.  but syntactically its still one insert.

IF EXISTS (SELECT *
          FROM   sysobjects
          WHERE  name = 'inserter'
                 AND TYPE = 'P')
   BEGIN
       DROP PROC inserter
   END

GO

CREATE PROC inserter
AS
   BEGIN
       DECLARE  @a INT,
                @b INT
       
       SELECT   TOP 1 @a = a,
                      @b = b
       FROM     part1source
       WHERE    a NOT IN (SELECT DISTINCT a
                          FROM   part1dest)
                AND b NOT IN (SELECT DISTINCT b
                              FROM   part1dest)
       ORDER BY a DESC,
                b DESC
       
       IF @a IS NOT NULL
          AND @b IS NOT NULL
           BEGIN
               INSERT INTO part1dest
               VALUES     (@a,
                           @b)
               
               EXEC inserter
           END
   END
GO

EXEC inserter
GO
November 1, 2007 10:28 AM
 

Steve S said:

scott217.... try your part one solution with this data:
INSERT  INTO @Part1Source VALUES  (0, 1)
INSERT  INTO @Part1Source VALUES  (1, 0)
INSERT  INTO @Part1Source VALUES  (1, 1)
INSERT  INTO @Part1Source VALUES  (2, 0)
INSERT  INTO @Part1Source VALUES  (0, 2)
November 1, 2007 10:34 AM
 

Marx said:

How about this below insert statement. Optionally you can use MAX instead of MIN. Now, i read, the requirement is get 'as much as data' into Part1Dest and NOT ALL of them necessarily :).

INSERT INTO Part1Dest
SELECT MIN(a)'a', b
FROM (SELECT a,MIN(b)'b' FROM Part1Source GROUP BY a) a
GROUP BY b

Marx
November 1, 2007 1:09 PM
 

Lee said:

/* I didn't see any rule about not using dynamic T-SQL... */

DECLARE @sql   VARCHAR (8000)
     , @delim VARCHAR (10)

SELECT @sql   = 'INSERT INTO Part1Dest'
    , @delim = ''

SELECT @sql   = @sql
             + CASE
                 WHEN @sql LIKE '% '  + CONVERT (VARCHAR, a) + ',%'
                   OR @sql LIKE '%, ' + CONVERT (VARCHAR, b) + '%'
                 THEN ''
                 ELSE @delim
               END
             + CASE
                 WHEN @sql LIKE '% '  + CONVERT (VARCHAR, a) + ',%'
                   OR @sql LIKE '%, ' + CONVERT (VARCHAR, b) + '%'
                 THEN ''
                 ELSE ' SELECT ' + CONVERT (VARCHAR, a)
                    +       ', ' + CONVERT (VARCHAR, b)
               END
    , @delim = ' UNION ALL'
FROM Part1Source
ORDER BY a
      , b

EXEC (@sql)

November 1, 2007 4:04 PM
 

puzsol said:

I like the way LZ007 managed to hide his real code, using:

insert into Part1Dest
select a, (1- sign(a-1))*min(b)+sign(a-1)*max(b)
from part1source
where a <10
group by a

Rather than:

insert into Part1Dest
select a, case when a = 1 then min(b) else max(b) end
from part1source
where a < 10
group by a

Which is much more readable by us mere mortals.... lol
November 2, 2007 1:27 AM
 

davekw said:

Also, LZ007's solution doesn't guarantee uniqueness. Try some of the additional test cases discussed above. E.g. add (3, 2) (3, 3) (3, 4) to the original Part 1 test data.
November 2, 2007 7:46 AM
 

Lee said:

/* the same approach works for the second part of the puzzle */

DECLARE @sql   VARCHAR (8000)
     , @delim VARCHAR (10)

SELECT @sql   = 'INSERT INTO Part2Dest'
    , @delim = ''

SELECT @sql   = @sql
             + CASE
                 WHEN @sql LIKE '%SELECT ' + CONVERT (VARCHAR, a) + ',%'
                   OR @sql LIKE      '%, ' + CONVERT (VARCHAR, b) + ', %'
                   OR @sql LIKE      '%, ' + CONVERT (VARCHAR, c) + '[^,]%'
                 THEN ''
                 ELSE @delim
               END
             + CASE
                 WHEN @sql LIKE '%SELECT ' + CONVERT (VARCHAR, a) + ',%'
                   OR @sql LIKE      '%, ' + CONVERT (VARCHAR, b) + ', %'
                   OR @sql LIKE      '%, ' + CONVERT (VARCHAR, c) + '[^,]%'
                 THEN ''
                 ELSE ' SELECT ' + CONVERT (VARCHAR, a)
                    +       ', ' + CONVERT (VARCHAR, b)
                    +       ', ' + CONVERT (VARCHAR, c)
               END
    , @delim = ' UNION ALL'
FROM Part2Source
ORDER BY a
      , b

EXEC (@sql)
November 2, 2007 8:21 AM
 

bhushanvinay said:


/****** Object:  Table [dbo].[Table_1_dest]    Script Date: 11/02/2007 14:42:44 ******/
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Table_1_dest]') AND type in (N'U'))
DROP TABLE [dbo].[Table_1_dest]


/****** Object:  Table [dbo].[Table_1_src]    Script Date: 11/02/2007 14:42:32 ******/
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Table_1_src]') AND type in (N'U'))
DROP TABLE [dbo].[Table_1_src]


/****** Object:  Table [dbo].[Table_1_src]    Script Date: 11/02/2007 14:43:08 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[Table_1_src](
[a] [int] NOT NULL,
[b] [int] NOT NULL,
CONSTRAINT [PK_Table_1_src] PRIMARY KEY CLUSTERED
(
[a] ASC,
[b] ASC
)WITH (PAD_INDEX  = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]


GO

CREATE TABLE [dbo].[Table_1_dest](
[a] [int] NOT NULL,
[b] [int] NOT NULL
) ON [PRIMARY]

GO
ALTER TABLE [dbo].[Table_1_dest]  
WITH CHECK ADD  CONSTRAINT [FK_Table_1_dest_Table_1_src]
FOREIGN KEY([a], [b])
REFERENCES [dbo].[Table_1_src] ([a], [b])
GO
ALTER TABLE [dbo].[Table_1_dest]
CHECK CONSTRAINT [FK_Table_1_dest_Table_1_src]


GO

DELETE FROM TABLE_1_DEST;
DELETE FROM TABLE_1_SRC;
GO
INSERT INTO TABLE_1_SRC VALUES (1,1)
INSERT INTO TABLE_1_SRC VALUES (1,2)
INSERT INTO TABLE_1_SRC VALUES (2,3)
INSERT INTO TABLE_1_SRC VALUES (7,2)
INSERT INTO TABLE_1_SRC VALUES (2,4)
INSERT INTO TABLE_1_SRC VALUES (5,5)
INSERT INTO TABLE_1_SRC VALUES (5,1)
INSERT INTO TABLE_1_SRC VALUES (5,3)
INSERT INTO TABLE_1_SRC VALUES (9,0)
INSERT INTO TABLE_1_SRC VALUES (11,2)
go


INSERT INTO TABLE_1_DEST ( a,b)

SELECT SOURCE.a, SOURCE.b
from TABLE_1_SRC as source
November 2, 2007 9:46 AM
 

Lee said:

Oops.  My posted solution to #2 isn't ideal, as coded.  It gets more hits with ORDER BY DESC for each variable.  I can see I didn't really understand this problem, earlier.  There is probably an optimum order to insert the input rows, and the trick is to come up with the SELECT statement that performs the best analysis.  Good puzzle!  Wish I had time to try again.
November 2, 2007 1:46 PM
 

KenJ said:

Unlike the illustrious Mr. Factor, I decided to tackle part 1 first because it looked easier (I'm rather simple in the T-SQL department).

My first thought was something iterative, but that's already been well addressed.  I’m also not a huge fan of the more, uh, flexible ways of executing SQL statements, so sp_executeSQL / EXEC was out.

The following works with the puzzle's sample, as well as the alternate data set provided by Steve S, so I hope it's a general purpose solution.

I used a view to keep things short, so it won't pass the schema manipulation criterion proposed by one commenter.  I guess you 'could' reproduce the view's code everywhere the view is referenced, but that is just too much maintenance joy for me (as it could, no doubt, already use some maintenance).

Well, here goes.  I have this nagging feeling that I've missed something obvious, so let me know how you would improve it:


<pre style="font-size: 12px;"><font color="green">---&nbsp;we'll&nbsp;need&nbsp;this&nbsp;query&nbsp;two&nbsp;times,&nbsp;no,&nbsp;thrice
<br>---&nbsp;let's&nbsp;use&nbsp;a&nbsp;view&nbsp;to&nbsp;stay&nbsp;concise
<br></font><font color="blue">CREATE&nbsp;VIEW&nbsp;</font><font color="black">v1&nbsp;</font><font color="blue">AS
<br></font><font color="green">---&nbsp;get&nbsp;one&nbsp;'a'&nbsp;for&nbsp;every&nbsp;'b'
<br>---&nbsp;but&nbsp;this&nbsp;allows&nbsp;dupe&nbsp;'a's,&nbsp;you'll&nbsp;see
<br>&nbsp;&nbsp;</font><font color="blue">SELECT&nbsp;</font><font color="black">t0.a</font><font color="gray">,&nbsp;</font><font color="black">t0.b
<br>&nbsp;&nbsp;</font><font color="blue">FROM&nbsp;</font><font color="gray">(</font><font color="blue">SELECT&nbsp;MAX</font><font color="gray">(</font><font color="black">a</font><font color="gray">)&nbsp;</font><font color="blue">AS&nbsp;</font><font color="black">a</font><font color="gray">,&nbsp;</font><font color="black">b
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">FROM&nbsp;</font><font color="black">Part1Source&nbsp;</font><font color="blue">AS&nbsp;</font><font color="black">p1
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">GROUP&nbsp;BY&nbsp;</font><font color="black">b</font><font color="gray">)&nbsp;</font><font color="blue">AS&nbsp;</font><font color="black">t0
<br></font><font color="green">---&nbsp;we'll&nbsp;do&nbsp;a&nbsp;join,&nbsp;one&nbsp;'b'&nbsp;per&nbsp;'a'
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">INNER&nbsp;JOIN&nbsp;</font><font color="gray">(</font><font color="blue">SELECT&nbsp;</font><font color="black">a</font><font color="gray">,&nbsp;</font><font color="blue">MAX</font><font color="gray">(</font><font color="black">b</font><font color="gray">)&nbsp;&nbsp;</font><font color="blue">AS&nbsp;</font><font color="black">B
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">FROM&nbsp;</font><font color="black">Part1Source&nbsp;</font><font color="blue">AS&nbsp;</font><font color="black">p2
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">GROUP&nbsp;BY&nbsp;</font><font color="black">a</font><font color="gray">)&nbsp;</font><font color="blue">AS&nbsp;</font><font color="black">t1
<br></font><font color="green">---&nbsp;then&nbsp;in&nbsp;our&nbsp;ON,&nbsp;an&nbsp;AND&nbsp;we'll&nbsp;play
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">ON&nbsp;</font><font color="black">t0.a&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">t1.a&nbsp;</font><font color="gray">AND&nbsp;</font><font color="black">t0.b&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">t1.b
<br></font><font color="green">---&nbsp;this&nbsp;ensures&nbsp;our&nbsp;unique&nbsp;'b's
<br>---&nbsp;have&nbsp;unique&nbsp;'a's,&nbsp;constraints&nbsp;to&nbsp;please
<br></font><font color="black">GO
<br>&nbsp;
<br></font><font color="blue">INSERT&nbsp;</font><font color="black">Part1Dest&nbsp;</font><font color="gray">(</font><font color="black">a</font><font color="gray">,&nbsp;</font><font color="black">b</font><font color="gray">)
<br></font><font color="green">---&nbsp;we've&nbsp;cleaned&nbsp;up&nbsp;dupes,&nbsp;let's&nbsp;find&nbsp;the&nbsp;rest
<br></font><font color="blue">SELECT&nbsp;MAX</font><font color="gray">(</font><font color="black">p1.a</font><font color="gray">)&nbsp;</font><font color="blue">AS&nbsp;</font><font color="black">a</font><font color="gray">,&nbsp;</font><font color="black">p1.b
<br></font><font color="blue">FROM&nbsp;</font><font color="black">Part1Source&nbsp;</font><font color="blue">AS&nbsp;</font><font color="black">p1
<br></font><font color="green">---&nbsp;to&nbsp;prevent&nbsp;more,&nbsp;we'll&nbsp;make&nbsp;a&nbsp;test
<br></font><font color="blue">WHERE
<br></font><font color="green">---&nbsp;we&nbsp;don't&nbsp;want&nbsp;'a's&nbsp;we've&nbsp;seen&nbsp;before
<br>&nbsp;&nbsp;&nbsp;</font><font color="black">p1.a&nbsp;</font><font color="gray">NOT&nbsp;</font><font color="blue">IN&nbsp;</font><font color="gray">(</font><font color="blue">SELECT&nbsp;</font><font color="black">a&nbsp;</font><font color="blue">FROM&nbsp;</font><font color="black">v1</font><font color="gray">)
<br></font><font color="green">---&nbsp;and&nbsp;extra&nbsp;'b's,&nbsp;we&nbsp;must&nbsp;abhor
<br>&nbsp;&nbsp;&nbsp;</font><font color="gray">AND&nbsp;</font><font color="black">p1.b&nbsp;</font><font color="gray">NOT&nbsp;</font><font color="blue">IN&nbsp;</font><font color="gray">(</font><font color="blue">SELECT&nbsp;</font><font color="black">b&nbsp;</font><font color="blue">FROM&nbsp;</font><font color="black">v1</font><font color="gray">)
<br></font><font color="blue">GROUP&nbsp;BY&nbsp;</font><font color="black">p1.b
<br></font><font color="blue">UNION
<br></font><font color="green">---&nbsp;then&nbsp;cleaned&nbsp;up&nbsp;dupes,&nbsp;let's&nbsp;not&nbsp;forget
<br></font><font color="blue">SELECT&nbsp;</font><font color="black">a</font><font color="gray">,&nbsp;</font><font color="black">b&nbsp;</font><font color="blue">FROM&nbsp;</font><font color="black">v1
<br></font><font color="green">---&nbsp;now&nbsp;we&nbsp;should&nbsp;have&nbsp;a&nbsp;complete&nbsp;set
<br></font><font color="black">GO
<br>&nbsp;
<br></font><font color="blue">SELECT&nbsp;</font><font color="black">a&nbsp;</font><font color="blue">AS&nbsp;</font><font color="black">A</font><font color="gray">,&nbsp;</font><font color="black">b&nbsp;</font><font color="blue">FROM&nbsp;</font><font color="black">Part1Dest
<br>GO
<br>&nbsp;
<br></font><font color="blue">DELETE&nbsp;</font><font color="black">Part1Dest
<br>GO
<br>&nbsp;
<br></font><font color="blue">DROP&nbsp;VIEW&nbsp;</font><font color="black">v1
<br>GO</font></pre>
November 2, 2007 10:48 PM
 

KenJ said:

And, I obviously have no idea how to post formatted code in a comment.  My apologies for the aberration that was once my beautifully formatted query.

--- we'll need this query two times, no, thrice
--- let's use a view to stay concise
CREATE VIEW v1 AS
--- get one 'a' for every 'b'
--- but this allows dupe 'a's, you'll see
 SELECT t0.a, t0.b
 FROM (SELECT MAX(a) AS a, b
       FROM Part1Source AS p1
       GROUP BY b) AS t0
--- we'll do a join, one 'b' per 'a'
    INNER JOIN (SELECT a, MAX(b)  AS B
                FROM Part1Source AS p2
                GROUP BY a) AS t1
--- then in our ON, an AND we'll play
    ON t0.a = t1.a AND t0.b = t1.b
--- this ensures our unique 'b's
--- have unique 'a's, constraints to please
GO

INSERT Part1Dest (a, b)
--- we've cleaned up dupes, let's find the rest
SELECT MAX(p1.a) AS a, p1.b
FROM Part1Source AS p1
--- to prevent more, we'll make a test
WHERE
--- we don't want 'a's we've seen before
  p1.a NOT IN (SELECT a FROM v1)
--- and extra 'b's, we must abhor
  AND p1.b NOT IN (SELECT b FROM v1)
GROUP BY p1.b
UNION
--- then cleaned up dupes, let's not forget
SELECT a, b FROM v1
--- now we should have a complete set
GO

SELECT a AS A, b FROM Part1Dest
GO

DELETE Part1Dest
GO

DROP VIEW v1
GO
November 2, 2007 10:51 PM
 

davekw said:

Sorry to have to tell you this KenJ, but your solution doesn't guarantee uniqueness. The following combination breaks: (2,1) (2,2) (3,2) (3,3)
November 3, 2007 3:06 AM
 

miro65 said:

I found solution from scott2718281828 very elegant. So I was interested, if it general, I mean, if it still produce optimal solution if we change data. But the answer is no. If we omit data --INSERT INTO Part2Source VALUES (11,2,9), we get only 3 rows as solution. That is not OK, because if we omit 1 row in source, we should lose max 1 row in solution.
Some idea how to generalize it?
LP
November 4, 2007 4:56 AM
 

TonyBater said:

Actually none of these solutions will work. The task was to MOVE as much data from the source tabes to the destination tables, implying that once inserted into the destination the data is deleted from the source. As there is a foreign key relationship from the destination into the source, it is not possible to 'move' any data from source to destination!
November 5, 2007 6:24 AM
 

KenJ said:

Thanks, davekw.  I figured there was something fundamentally wrong with the query :)
November 5, 2007 8:03 AM
 

HAL J. said:

while 1=1 begin
 insert Part1Dest
 select top 1 *
 from Part1Source a
 where not exists( select * from Part1Dest b where b.a = a.a or a.b = b.b )
 if @@rowcount = 0 break
end
-- and for part 2 you'd just expand the criteria to a.c = b.c
-- I figured "Move" meant "Copy", otherwise chuck this too into the ether
-- some of you definitely win the Rube Goldberg award for T-SQL
November 8, 2007 1:50 AM
 

chris.leonard said:

Unlike Phil, I started with the simple one, and unlike most all of you, I'm just too darned lazy to do a SQL 2000 solution.  Well, today anyway.  :o)

Here is a SQL 2005 solution that should work for any data entered.  I'm not saying the code can't be improved upon, just that the solution seems to work, and is written from the ground up to accommodate any data.

WITH cte AS (
  SELECT
     ps.a,
     ps.b,
     1 AS LEVEL,
     '.' + CAST(ROW_NUMBER() OVER (ORDER BY a ) AS VARCHAR(MAX)) + '.' AS RowID,
     '.' + CAST(ps.a AS VARCHAR(MAX)) + '.' AS APATH,
     '.' + CAST(ps.b AS VARCHAR(MAX)) + '.' AS BPATH
  FROM [dbo].[Part1Source] AS ps
 
  UNION ALL

  SELECT
     ps2.a,
     ps2.b,
     cte.level + 1 AS LEVEL,
     cte.RowID + CAST(ROW_NUMBER() OVER (ORDER BY ps2.a) AS VARCHAR(MAX)) + '.' AS RowID,
     cte.APATH + CAST(ps2.a AS VARCHAR(MAX)) + '.' AS APATH,
     cte.BPATH + CAST(ps2.b AS VARCHAR(MAX)) + '.' AS BPATH
  FROM [dbo].[Part1Source] AS ps2
  JOIN cte
  ON cte.APATH NOT LIKE '%.' + CAST(ps2.a AS VARCHAR(MAX)) + '.%'
  AND cte.BPATH NOT LIKE '%.' + CAST(ps2.b AS VARCHAR(MAX)) + '.%'
  )
INSERT INTO [dbo].[Part1Dest] (a,b)
SELECT a,b
FROM cte
WHERE (SELECT TOP 1 RowID FROM cte ORDER BY LEVEL DESC) LIKE RowID + '%'
ORDER BY a, b
OPTION (MAXRECURSION 0)

If you want to see how it works, replace the "INSERT INTO [dbo].[Part1Dest] (a,b) SELECT a,b" part with "SELECT *" and you should be able to work it out from there.

Here is the same idea for part 2:

WITH cte AS (
  SELECT
     ps.a,
     ps.b,
     ps.c,
     1 AS LEVEL,
     '.' + CAST(ROW_NUMBER() OVER (ORDER BY a ) AS VARCHAR(MAX)) + '.' AS RowID,
     '.' + CAST(ps.a AS VARCHAR(MAX)) + '.' AS APATH,
     '.' + CAST(ps.b AS VARCHAR(MAX)) + '.' AS BPATH,
     '.' + CAST(ps.c AS VARCHAR(MAX)) + '.' AS CPATH
  FROM [dbo].[Part2Source] AS ps
 
  UNION ALL

  SELECT
     ps2.a,
     ps2.b,
     ps2.c,
     cte.level + 1 AS LEVEL,
     cte.RowID + CAST(ROW_NUMBER() OVER (ORDER BY ps2.a) AS VARCHAR(MAX)) + '.' AS RowID,
     cte.APATH + CAST(ps2.a AS VARCHAR(MAX)) + '.' AS APATH,
     cte.BPATH + CAST(ps2.b AS VARCHAR(MAX)) + '.' AS BPATH,
     cte.CPATH + CAST(ps2.c AS VARCHAR(MAX)) + '.' AS CPATH
  FROM [dbo].[Part2Source] AS ps2
  JOIN cte
  ON cte.apath NOT LIKE '%.' + CAST(ps2.a AS VARCHAR(MAX)) + '.%'
  AND cte.bpath NOT LIKE '%.' + CAST(ps2.b AS VARCHAR(MAX)) + '.%'
  AND cte.cpath NOT LIKE '%.' + CAST(ps2.c AS VARCHAR(MAX)) + '.%'
  )
INSERT INTO Part2Dest (a,b,c)
SELECT a,b,c
FROM cte
WHERE (SELECT TOP 1 RowID FROM cte ORDER BY LEVEL DESC) LIKE RowID + '%'
ORDER BY a, b, c
OPTION (MAXRECURSION 0)

Cheers,
Chris Leonard
databaseguy.com / godaddy.com / [sometimes] microsoft.com
November 8, 2007 2:02 AM
 

Ryan Randall said:

[code]
--part 1
DECLARE @i INT; SELECT @i = POWER(2, COUNT(*)) FROM Part1Source
; WITH
t (i) AS (SELECT 1 UNION ALL SELECT i + 1 FROM t WHERE i < @i-1),
u AS (SELECT a, b, ROW_NUMBER() OVER (ORDER BY a)-1 AS c FROM Part1Source),
v AS (SELECT * FROM t CROSS JOIN u WHERE POWER(2, u.c) & t.i > 0)
INSERT Part1Dest
SELECT a, b FROM v WHERE i = (SELECT TOP 1 i FROM v GROUP BY i
   HAVING COUNT(*) = COUNT(DISTINCT a) AND COUNT(*) = COUNT(DISTINCT b) ORDER BY COUNT(*) DESC)
OPTION (MAXRECURSION 0)

--part 2
DECLARE @j INT; SELECT @j = POWER(2, COUNT(*)) FROM Part2Source
; WITH
t (i) AS (SELECT 1 UNION ALL SELECT i + 1 FROM t WHERE i < @j-1),
u AS (SELECT *, ROW_NUMBER() OVER (ORDER BY a)-1 AS d FROM Part2Source),
v AS (SELECT * FROM t CROSS JOIN u WHERE POWER(2, u.d) & t.i > 0)
INSERT Part2Dest
SELECT a, b, c FROM v WHERE i = (SELECT TOP 1 i FROM v GROUP BY i
   HAVING COUNT(*) = COUNT(DISTINCT a) AND COUNT(*) = COUNT(DISTINCT b)
      AND COUNT(*) = COUNT(DISTINCT c) ORDER BY COUNT(*) DESC)
OPTION (MAXRECURSION 0)

--results
SELECT * FROM Part1Dest
SELECT * FROM Part2Dest

/*
a           b
----------- -----------
1           1
2           4
5           3
7           2
9           0

a           b           c
----------- ----------- -----------
1           1           2
2           4           7
5           3           4
9           0           1
11          2           9
*/
[/code]
November 8, 2007 7:11 AM
 

Jack the DBA said:

This appears to work.  I get 5 rows and it will work on 2005 or 2000.  For a 2005 solution I am sure I could make it simpler using CTE's and probably the INTERSECT and EXCEPT operators

insert into part1dest
Select
   T.*
From
   part1source T
Where
   T.a Not in (    
                   Select
                       A.a
                   From
                    (Select
                           A.*
                       From
                           part1source A
                       Where
                           A.a Not In  (Select a From part1source B group By a having count(*) > 1)
                       Union
                       Select
                          A.*
                       From
                           part1source A
                       Where
                           A.b Not In  (Select b From part1source B group By b having count(*) > 1)) A
                   Where
                       A.a Not In
                   (Select
                       Min(a)
                   From
                        (Select
                           A.*
                       From
                           part1source A
                       Where
                           A.a Not In  (Select a From part1source B group By a having count(*) > 1)
                       Union
                       Select
                          A.*
                       From
                           part1source A
                       Where
                           A.b Not In  (Select b From part1source B group By b having count(*) > 1)) U
                   Group By
                       b
                   Having Count(*) > 1)) And
T.b not in (Select
                       A.a
                   From
                    (Select
                           A.*
                       From
                           part1source A
                       Where
                           A.a Not In  (Select a From part1source B group By a having count(*) > 1)
                       Union
                       Select
                          A.*
                       From
                           part1source A
                       Where
                           A.b Not In  (Select b From part1source B group By b having count(*) > 1)) A
                   Where
                       A.a Not In
                   (Select
                       Min(a)
                   From
                        (Select
                           A.*
                       From
                           part1source A
                       Where
                           A.a Not In  (Select a From part1source B group By a having count(*) > 1)
                       Union
                       Select
                          A.*
                       From
                           part1source A
                       Where
                           A.b Not In  (Select b From part1source B group By b having count(*) > 1)) U
                   Group By
                       b
                   Having Count(*) > 1))
Union
Select
   A.*
From
    (Select
           A.*
       From
           part1source A
       Where
           A.a Not In  (Select a From part1source B group By a having count(*) > 1)
       Union
       Select
          A.*
       From
           part1source A
       Where
           A.b Not In  (Select b From part1source B group By b having count(*) > 1)) A
Where
       A.a Not In
                       (Select
                           Min(a)
                       From
                            (Select
                               A.*
                           From
                               part1source A
                           Where
                               A.a Not In  (Select a From part1source B group By a having count(*) > 1)
                           Union
                           Select
                              A.*
                           From
                               part1source A
                           Where
                               A.b Not In  (Select b From part1source B group By b having count(*) > 1)) U
                       Group By
                           b
                       Having Count(*) > 1)

November 8, 2007 9:54 AM
 

dbabob said:

I may be mistaken, but I this this works.

INSERT INTO part1dest
SELECT MAX(a) AS a,b
FROM (
SELECT a,MAX(b) AS b
FROM part1source
GROUP BY a) AS c
GROUP BY b

INSERT INTO part2dest
SELECT MAX(a) AS a, MAX(b) AS b, c FROM (
SELECT MAX(a) AS a,b,MAX(c) AS c from
(SELECT a,MAX(b) AS b,MAX(c) AS c
FROM part2source
GROUP BY a) AS a
GROUP BY b) AS c
GROUP BY c
November 8, 2007 12:20 PM
 

dbabob said:

Forget that.... Missed a few values :(
November 8, 2007 12:27 PM
 

Dean said:

Jack the DBA, Try this:

INSERT INTO Part1Source VALUES (1,2)
INSERT INTO Part1Source VALUES (1,3)
November 8, 2007 1:39 PM
 

Dean said:

For all of you who think you have the solution, what do you get if you try this test case?

INSERT INTO Part1Source VALUES (1,2)
INSERT INTO Part1Source VALUES (8,9)
INSERT INTO Part1Source VALUES (1,4)
INSERT INTO Part1Source VALUES (5,2)
INSERT INTO Part1Source VALUES (8,6)
INSERT INTO Part1Source VALUES (7,9)

If you get less than 4 rows, then it's time to rethink.  Remember, Lionel said, "as much of the data as you can".

Chris Leonard, it seems you didn't overlook this part of the puzzle.  Your SQL appears to generate every possible valid combination of rows, and then selects the largest.  Nice.
November 8, 2007 1:53 PM
 

Ryan Randall said:

--data
DELETE FROM Part1Dest
DELETE FROM Part1Source

INSERT INTO Part1Source VALUES (1,2)
INSERT INTO Part1Source VALUES (8,9)
INSERT INTO Part1Source VALUES (1,4)
INSERT INTO Part1Source VALUES (5,2)
INSERT INTO Part1Source VALUES (8,6)
INSERT INTO Part1Source VALUES (7,9)

--part 1
DECLARE @i INT; SELECT @i = POWER(2, COUNT(*)) FROM Part1Source
; WITH
t (i) AS (SELECT 1 UNION ALL SELECT i + 1 FROM t WHERE i < @i-1),
u AS (SELECT a, b, ROW_NUMBER() OVER (ORDER BY a)-1 AS c FROM Part1Source),
v AS (SELECT * FROM t CROSS JOIN u WHERE POWER(2, u.c) & t.i > 0)
INSERT Part1Dest
SELECT a, b FROM v WHERE i = (SELECT TOP 1 i FROM v GROUP BY i
  HAVING COUNT(*) = COUNT(DISTINCT a) AND COUNT(*) = COUNT(DISTINCT b) ORDER BY COUNT(*) DESC)
OPTION (MAXRECURSION 0)

--results
SELECT * FROM Part1Dest

/*
a           b
----------- -----------
1           4
5           2
7           9
8           6
*/
November 8, 2007 2:21 PM
 

ralphcroce_philly said:

Lionel,

Here's my shot at puzzle 7, 1st part (the part with 2 fields).  Done using SS2000, so obviously works for SS2005 also.  Tomorrow I will throw in my shot at the 2nd part (3 fields).  Will throw in some comments to attempt at making it a bit intelligible, asap.  BTW:  your puzzles are pretty awesome....are you sure you are from this planet, originally?  Amazing stuff.   Ok, here's my shot at puzzle 7, 1st section.  With your specified data records as source data, this generates 5 inserts.

SELECT MIN(a) AS a, b FROM
(
SELECT a, MIN(b) AS b FROM
(
SELECT x.a, x.b FROM
(
SELECT s1.a, s1.b, COUNT(*) AS Ct FROM part1source s1
INNER JOIN part1source s2 ON s1.a = s2.a
INNER JOIN part1source s3 ON s1.b = s3.b
GROUP BY s1.a, s1.b
) x
INNER JOIN
(
SELECT a, MIN(Ct) AS MinCt FROM
(
SELECT s1.a, COUNT(*) AS Ct FROM part1source s1
INNER JOIN part1source s2 ON s1.a = s2.a
INNER JOIN part1source s3 ON s1.b = s3.b
GROUP BY s1.a, s1.b
) a1
GROUP BY a
) y
ON x.a = y.a AND x.Ct = y.MinCt
INNER JOIN
(
SELECT b, MIN(Ct) AS MinCt FROM
(
SELECT s1.b, COUNT(*) AS Ct FROM part1source s1
INNER JOIN part1source s2 ON s1.a = s2.a
INNER JOIN part1source s3 ON s1.b = s3.b
GROUP BY s1.a, s1.b
) b1
GROUP BY b
) z
ON x.b = z.b AND x.Ct = z.MinCt
) am
GROUP BY a
) bm
GROUP BY b

The 3-field version is coming later, along with comments to explain the logic. Thanks.
November 8, 2007 7:47 PM
 

orcus said:

be nice to use a table variable rather than a temporary but in 2000...
obviously a third set of predicates into the temporary and a longer concat for three columns

Declare @reason as sysname

--list of possible preceding inserts
Select cast(A.A as sysname) + ',' + cast(A.B as sysname) [Exclusion] ,cast(B.A as sysname) + ',' + cast(B.B as sysname) [Reason]
INTO #Constraints
from part1source A,
    part1source B
WHERE ((A.A = B.A AND B.B < A.B)
  OR (A.B = B.B AND B.A < A.A))

-- remove the preceding conflicts that refer to rows all ready excluded
While 1=1 begin
 set @reason = ''
 select top 1 @reason = reason from #constraints where reason in (select exclusion from #constraints) Group By reason ORDER by count(*) desc  
 IF @reason = '' BREAK
 Delete from #constraints where reason = @reason
END

INSERT part1dest
select a,b
FROM part1source
WHERE cast(A as sysname) + ',' + cast(B as sysname) NOT IN (Select Exclusion From #Constraints)

Drop table #Constraints
November 9, 2007 1:47 AM
 

Ryan Randall said:

ralphcroce_philly - I'm afraid your suggestion falls short with...

INSERT INTO Part1Source VALUES (1,1)
INSERT INTO Part1Source VALUES (1,2)
INSERT INTO Part1Source VALUES (2,3)
INSERT INTO Part1Source VALUES (7,2)
INSERT INTO Part1Source VALUES (2,4)
INSERT INTO Part1Source VALUES (5,5)
INSERT INTO Part1Source VALUES (5,1)
INSERT INTO Part1Source VALUES (5,3)
INSERT INTO Part1Source VALUES (9,0)
INSERT INTO Part1Source VALUES (11,2)

INSERT INTO Part1Source VALUES (3,2)
INSERT INTO Part1Source VALUES (3,3)
INSERT INTO Part1Source VALUES (3,4)
November 9, 2007 4:19 AM
 

Ryan Randall said:

orcus - I'm afraid your suggestion falls short with...

INSERT INTO Part1Source VALUES (1,0)
INSERT INTO Part1Source VALUES (1,1)
INSERT INTO Part1Source VALUES (2,0)
November 9, 2007 4:20 AM
 

Ryan Randall said:

HAL J. - I'm afraid your suggestion falls short with...

INSERT INTO Part1Source VALUES (1,0)
INSERT INTO Part1Source VALUES (1,1)
INSERT INTO Part1Source VALUES (2,0)
November 9, 2007 4:26 AM
 

Ryan Randall said:

GSquared - I'm afraid your suggestion fails with...

INSERT INTO Part1Source VALUES (1,2)
INSERT INTO Part1Source VALUES (3,2)
INSERT INTO Part1Source VALUES (3,4)
INSERT INTO Part1Source VALUES (5,4)
INSERT INTO Part1Source VALUES (5,6)
INSERT INTO Part1Source VALUES (7,6)
November 9, 2007 4:31 AM
 

Ryan Randall said:

As far as I can tell (i.e. correct me if I'm wrong), the only solutions which work (according to my tests) are:

Steve S's
chris.leonard's (nicely done!)
mine (the same basic idea as Steve S's)

I think that's because the only way to be sure you have the most rows possible is to look at _all_ possibilities, and those are the only solutions which do that.
November 9, 2007 4:39 AM
 

ralphcroce_philly said:

Ryan,
Thanks for finding a problem in my solution.  I threw it together in one session yesterday and should have been more cautious before submitting.   I have a variation on my original approach that I'm going to try today and will submit that one if it works out.  
Some of the various submissions from you folks are pretty amazing.  
--ralphcroce_philly
November 9, 2007 8:55 AM
 

SeanZ said:

---- Here is the solution for Part2.  Solution for Part1 is simular but simpler.
ALTER TABLE Part2Source ADD isSelected INT NOT NULL DEFAULT (0);
DECLARE @a INT, @b INT, @c INT;
WHILE (1 = 1)
BEGIN
SELECT @a = a, @b = b, @c = c
FROM
(
SELECT L.a, L.b, L.c, cntr = COUNT(*)
FROM Part2Source L INNER JOIN Part2Source R ON L.a = R.a OR L.b = R.b OR L.c = R.c
WHERE L.isSelected = 0 AND R.isSelected = 0
GROUP BY L.a, L.b, L.c
) AS T
WHERE cntr =
(
SELECT MIN(cntr)
FROM
(
SELECT L.a, L.b, L.c, cntr = COUNT(* )
FROM Part2Source L INNER JOIN Part2Source R ON L.a = R.a OR L.b = R.b OR L.c = R.c
WHERE L.isSelected = 0 AND R.isSelected = 0
GROUP BY L.a, L.b, L.c
) AS T
);

IF @@ROWCOUNT > 0
BEGIN
UPDATE Part2Source
SET isSelected = 1
WHERE a = @a AND b = @b AND c = @c;
DELETE Part2Source WHERE isSelected = 0 AND (a = @a OR b = @b OR c = @c);
END
ELSE
BREAK;
END;

INSERT Part2Dest (a, b, c)
SELECT a, b, c FROM Part2Source;

SELECT * FROM Part2Dest;

a b c
1 1 2
2 4 7
5 5 6
9 0 1
11 2 9
November 9, 2007 11:18 AM
 

SeanZ said:

---- This is a simplified version.  Also it uses UPDATE Part2Source
--      instead of DELETE Part2Source to preserve original source data.
---- Obviously, it works for both SQL 2000 and 2005.
---- Here is the solution for Part2.  Solution for Part1 is simular but simpler.
ALTER TABLE Part2Source ADD isSelected INT NOT NULL DEFAULT (0);
DECLARE @a INT, @b INT, @c INT;
WHILE (1 = 1)
BEGIN
SELECT TOP 1 @a = L.a, @b = L.b, @c = L.c
FROM Part2Source L INNER JOIN Part2Source R ON L.a = R.a OR L.b = R.b OR L.c = R.c
WHERE L.isSelected = 0 AND R.isSelected = 0
GROUP BY L.a, L.b, L.c
ORDER BY COUNT(*) ASC

IF @@ROWCOUNT > 0
BEGIN
UPDATE Part2Source
SET isSelected = 1
WHERE a = @a AND b = @b AND c = @c;
UPDATE Part2Source
SET isSelected = -1
WHERE isSelected = 0 AND (a = @a OR b = @b OR c = @c);
END
ELSE
BREAK;
END;

INSERT Part2Dest (a, b, c)
SELECT a, b, c FROM Part2Source WHERE isSelected = 1;

SELECT * FROM Part2Dest;

a b c
1 1 2
2 4 7
5 5 6
9 0 1
11 2 9
November 9, 2007 4:52 PM
 

chris.leonard said:

<idleChatter>
Thanks to Ryan and Dean for saying nice things about my solution.  It is interesting to me conceptually because it only requires two CTE definitions (which feels "clean"), and then off you're off and running.  

Since people are being so kind, I thought I'd speak up and say something nasty about my solution (since nobody else has!).  If I had to solve this problem for real, and supposing that the source table had, say, 35 rows in it, I would choose the Ryan / Steve S (call it "RSS") approach over mine just because of the incredible consumption of resources my solution manages to consume.  On my laptop, when there are 9 rows in the source table, it takes my solution about 40 seconds to solve the first problem while RSS is still sub-second in performance.  

Of course, neither solution is going to scale to large source data sets, but I was surprised by the inefficiency of my solution.

Maybe we could have another puzzle that asks people to come up with the most poorly-performing, but reasonable-looking solution possible?  

This makes me wonder, though, what kind of response times I might get on a real server, using a 50-row input table?  I don't want to run it on my laptop because, well, I don't want to burn up its CPU.  On the other hand, I don't want to run it on a server at work because, well, it's always a moral victory to stay off the VPN on the weekend.  :o)
</idleChatter>
November 10, 2007 1:27 AM
 

ralphcroce_philly said:

Ryan,

I noticed that in shooting down some solutions, you endorsed only 3 solutions, all of which work only on SS2005.  Is there an implicit implication that this can only be solved recursively?  With all due respect,  I couldn't live with that, because I had a strong belief that it was solvable using SS2000, and without using any looping constructs such as WHILE, etc., and all as one single statement.
Accordingly, I went back to the drawing board, to prove that this was solvable in SS2000 using only one statement.    My latest suggestion follows below (for the 2-field version).   It would have been far shorter, f I had used a few views to encapsulate certain sections of code that are repeated identically in several spots; however, I suspected that using views might possibly violate the spirit of Lionel's challenge, so I did the entire solution as one statement.

Ryan (or anyone else), please feel free to throw any data you like at this SS2000-compatible effort.  

Over the next few days, I will submit the non-SQL description of the concept, using a few simple charts and a narrative; finally, if I get some time, I will submit a version that works for the 3-field version; the essential logic will be the same.
--ralphcroce_philly

INSERT INTO Part1Dest
SELECT a, b FROM
(
SELECT p1.a, p1.b,
SUM(
CASE
WHEN p1.tCt>p2.tCt THEN 1
WHEN p1.tCt<p2.tCt THEN 0
WHEN p1.aCt>p2.aCt THEN 1
WHEN p1.aCt<p2.aCt THEN 0
WHEN p1.bCt>p2.bCt THEN 1
WHEN p1.bCt<p2.bCt THEN 0
WHEN p1.a>p2.a THEN 1
WHEN p1.a<p2.a THEN 0
WHEN p1.b>p2.b THEN 1
WHEN p1.b<p2.b THEN 0
ELSE 0
END
) + 1 AS RowNum
FROM
(
SELECT p.a, p.b, aCt, bCt, aCt + bCt AS tCt FROM part1source p
INNER JOIN (SELECT a, COUNT(*) AS aCt FROM part1source a GROUP BY a) aCts ON p.a = aCts.a
INNER JOIN (SELECT b, COUNT(*) AS bCt FROM part1source b GROUP BY b) bCts ON p.b = bCts.b
) p1
INNER JOIN
(
SELECT p.a, p.b, aCt, bCt, aCt + bCt AS tCt FROM part1source p
INNER JOIN (SELECT a, COUNT(*) AS aCt FROM part1source a GROUP BY a) aCts ON p.a = aCts.a
INNER JOIN (SELECT b, COUNT(*) AS bCt FROM part1source b GROUP BY b) bCts ON p.b = bCts.b
) p2
ON p1.a <> p2.a OR p1.b <> p2.b
GROUP BY p1.a, p1.b
) r
WHERE r.RowNum IN
(
SELECT MIN(RowNum) As UsableRowNum FROM
(

SELECT px.RowNum, px.a, px.b,
SUM(CASE WHEN px.a = py.a AND px.RowNum > py.RowNum THEN 1 ELSE 0 END) + 1 AS A_Occurrence,
SUM(CASE WHEN px.b = py.b AND px.RowNum > py.RowNum THEN 1 ELSE 0 END) + 1 AS B_Occurrence
FROM
(
SELECT p.a, p.b, r.RowNum FROM part1source p
INNER JOIN
(
SELECT p1.a, p1.b,
SUM(
CASE
WHEN p1.tCt>p2.tCt THEN 1
WHEN p1.tCt<p2.tCt THEN 0
WHEN p1.aCt>p2.aCt THEN 1
WHEN p1.aCt<p2.aCt THEN 0
WHEN p1.bCt>p2.bCt THEN 1
WHEN p1.bCt<p2.bCt THEN 0
WHEN p1.a>p2.a THEN 1
WHEN p1.a<p2.a THEN 0
WHEN p1.b>p2.b THEN 1
WHEN p1.b<p2.b THEN 0
ELSE 0
END
) + 1 AS RowNum
FROM
(
SELECT p.a, p.b, aCt, bCt, aCt + bCt AS tCt FROM part1source p
INNER JOIN (SELECT a, COUNT(*) AS aCt FROM part1source a GROUP BY a) aCts ON p.a = aCts.a
INNER JOIN (SELECT b, COUNT(*) AS bCt FROM part1source b GROUP BY b) bCts ON p.b = bCts.b
) p1
INNER JOIN
(
SELECT p.a, p.b, aCt, bCt, aCt + bCt AS tCt FROM part1source p
INNER JOIN (SELECT a, COUNT(*) AS aCt FROM part1source a GROUP BY a) aCts ON p.a = aCts.a
INNER JOIN (SELECT b, COUNT(*) AS bCt FROM part1source b GROUP BY b) bCts ON p.b = bCts.b
) p2
ON p1.a <> p2.a OR p1.b <> p2.b
GROUP BY p1.a, p1.b
) r
ON p.a = r.a AND p.b = r.b
) px
INNER JOIN
(
SELECT p.a, p.b, r.RowNum FROM part1source p
INNER JOIN
(
SELECT p1.a, p1.b,
SUM(
CASE
WHEN p1.tCt>p2.tCt THEN 1
WHEN p1.tCt<p2.tCt THEN 0
WHEN p1.aCt>p2.aCt THEN 1
WHEN p1.aCt<p2.aCt THEN 0
WHEN p1.bCt>p2.bCt THEN 1
WHEN p1.bCt<p2.bCt THEN 0
WHEN p1.a>p2.a THEN 1
WHEN p1.a<p2.a THEN 0
WHEN p1.b>p2.b THEN 1
WHEN p1.b<p2.b THEN 0
ELSE 0
END
) + 1 AS RowNum
FROM
(
SELECT p.a, p.b, aCt, bCt, aCt + bCt AS tCt FROM part1source p
INNER JOIN (SELECT a, COUNT(*) AS aCt FROM part1source a GROUP BY a) aCts ON p.a = aCts.a
INNER JOIN (SELECT b, COUNT(*) AS bCt FROM part1source b GROUP BY b) bCts ON p.b = bCts.b
) p1
INNER JOIN
(
SELECT p.a, p.b, aCt, bCt, aCt + bCt AS tCt FROM part1source p
INNER JOIN (SELECT a, COUNT(*) AS aCt FROM part1source a GROUP BY a) aCts ON p.a = aCts.a
INNER JOIN (SELECT b, COUNT(*) AS bCt FROM part1source b GROUP BY b) bCts ON p.b = bCts.b
) p2
ON p1.a <> p2.a OR p1.b <> p2.b
GROUP BY p1.a, p1.b
) r

ON p.a = r.a AND p.b = r.b
) py
ON px.RowNum <> py.RowNum
GROUP BY
px.RowNum, px.a, px.b
) po
WHERE
A_Occurrence = B_Occurrence
GROUP BY a
)

--Thanks to Lionel for a very interesting challenge.
--ralphcroce_philly

November 11, 2007 8:43 AM
 

Ryan Randall said:

Hi ralphcroce_philly,

> I noticed that in shooting down some solutions, you endorsed only 3 solutions, all of which work only on SS2005.

That's true.

> Is there an implicit implication that this can only be solved recursively?

No. I think you have a misconception here.

Steve S's solution doesn't use any recursion (as far as I can see), and mine only uses recursion to generate a 'numbers table' without using any systems tables.

> I had a strong belief that it was solvable using SS2000, and without using any looping constructs such as WHILE, etc., and all as one single statement.

Yes. It is. It's just messier (as Steve pointed out).


Since you insist, here are my solutions rewritten for SQL 2000 and with only one statement each:


--part 1
INSERT dbo.Part1Dest
SELECT * FROM dbo.Part1Source x
WHERE
   POWER(2, (SELECT COUNT(*) FROM dbo.Part1Source
             WHERE CAST(a AS VARCHAR(MAX))   + '.' + CAST(b AS VARCHAR(MAX)) <=
                   CAST(x.a AS VARCHAR(MAX)) + '.' + CAST(x.b AS VARCHAR(MAX)))-1) &
  (SELECT TOP 1 i FROM
          (SELECT * FROM
              (SELECT i*256+j AS i
               FROM (SELECT DISTINCT number AS i FROM master.dbo.spt_values WHERE number BETWEEN 0 AND 255) a,
                    (SELECT DISTINCT number AS j FROM master.dbo.spt_values WHERE number BETWEEN 0 AND 255) b
               WHERE i*256+j < (SELECT POWER(2, COUNT(*)) FROM dbo.Part1Source)) t,
              (SELECT *,
                   (SELECT COUNT(*) FROM dbo.Part1Source
                    WHERE CAST(a AS VARCHAR(10))   + '.' + CAST(b AS VARCHAR(10)) <=
                          CAST(x.a AS VARCHAR(10)) + '.' + CAST(x.b AS VARCHAR(10)))-1 AS c
               FROM dbo.Part1Source x) u
           WHERE POWER(2, u.c) & t.i > 0) v
   GROUP BY i
   HAVING COUNT(*) = COUNT(DISTINCT a) AND COUNT(*) = COUNT(DISTINCT b)
   ORDER BY COUNT(*) DESC
   ) > 0

--part 2
INSERT dbo.Part2Dest
SELECT * FROM dbo.Part2Source x
WHERE
   POWER(2, (SELECT COUNT(*) FROM dbo.Part2Source
             WHERE CAST(a AS VARCHAR(MAX))   + '.' + CAST(b AS VARCHAR(MAX)) <=
                   CAST(x.a AS VARCHAR(MAX)) + '.' + CAST(x.b AS VARCHAR(MAX)))-1) &
  (SELECT TOP 1 i FROM
          (SELECT * FROM
              (SELECT i*256+j AS i
               FROM (SELECT DISTINCT number AS i FROM master.dbo.spt_values WHERE number BETWEEN 0 AND 255) a,
                    (SELECT DISTINCT number AS j FROM master.dbo.spt_values WHERE number BETWEEN 0 AND 255) b
               WHERE i*256+j < (SELECT POWER(2, COUNT(*)) FROM dbo.Part2Source)) t,
              (SELECT *,
                   (SELECT COUNT(*) FROM dbo.Part2Source
                    WHERE CAST(a AS VARCHAR(10))   + '.' + CAST(b AS VARCHAR(10)) <=
                          CAST(x.a AS VARCHAR(10)) + '.' + CAST(x.b AS VARCHAR(10)))-1 AS d
               FROM dbo.Part2Source x) u
           WHERE POWER(2, u.d) & t.i > 0) v
   GROUP BY i
   HAVING COUNT(*) = COUNT(DISTINCT a) AND COUNT(*) = COUNT(DISTINCT b) AND COUNT(*) = COUNT(DISTINCT c)
   ORDER BY COUNT(*) DESC
   ) > 0
November 11, 2007 11:51 AM
 

Ryan Randall said:

ralphcroce_philly

> Ryan (or anyone else), please feel free to throw any data you like at this SS2000-compatible effort.

I'm afraid it seems to fails with...

INSERT INTO Part1Source VALUES (1,2)
INSERT INTO Part1Source VALUES (3,2)
INSERT INTO Part1Source VALUES (3,4)
INSERT INTO Part1Source VALUES (5,4)
INSERT INTO Part1Source VALUES (5,6)
INSERT INTO Part1Source VALUES (7,6)
November 11, 2007 11:57 AM
 

orcus said:

As an ugly string building recursive stored proc in ss2000
[Pinching off many previous solutions]

CREATE PROC part1ins @IN AS VARCHAR(8000) AS
DECLARE @A VARCHAR(10), @b VARCHAR(10)
SET @A = '' SET @B = ''
IF @IN is null BEGIN
 SELECT top 1 @A = 'A' + cast(A as varchar(10)) + ',', @B = 'B' + cast(B as varchar(10)) + '|' FROM (
   SELECT R.A, R.B FROM part1source E, part1source R
    WHERE ((E.A = R.A AND R.B <> E.B)
       OR  (E.B = R.B AND R.A <> E.A))
   UNION ALL SELECT A, B FROM part1source ) [Exclusions]
 GROUP BY A, B ORDER BY count(*) ASC
 SET @IN = @A + @B
END ELSE BEGIN
 SELECT top 1 @A = 'A' + cast(A as varchar(10)) + ',', @B = 'B' + cast(B as varchar(10)) + '|' FROM (
   SELECT R.A, R.B FROM part1source E, part1source R
    WHERE ((E.A = R.A AND R.B <> E.B)
       OR  (E.B = R.B AND R.A <> E.A))
    UNION ALL SELECT A, B FROM part1source ) [Exclusions]
 WHERE @IN not like ('%A' + cast(A as varchar(10)) + ',%')
   AND @IN not like ('%B' + cast(B as varchar(10)) + '|%')
 GROUP BY A, B ORDER BY count(*) ASC
 SET @IN = @IN + @A + @B
END IF @A = '' BEGIN
 SET @IN = REPLACE(REPLACE(REPLACE(REPLACE(substring(@IN,1,len(@IN)-1),'A',' SELECT '),',',' [A],'),'B',''),'|', ' [B] ' + CHAR(13) + 'UNION') + ' [B]'
 SET @IN = 'INSERT part1dest' + char(13) + '     '  + @IN
 exec(@IN)
END ELSE EXEC part1ins @IN
GO
exec part1ins null
November 11, 2007 10:22 PM
 

orcus said:

or even removing an obvious unded union

CREATE PROC part1ins @IN AS VARCHAR(8000) AS
DECLARE @A VARCHAR(10), @b VARCHAR(10)
SET @A = '' SET @B = ''
IF @IN is null BEGIN
SELECT top 1 @A = 'A' + cast(R.A as varchar(10)) + ',', @B = 'B' + cast(R.B as varchar(10)) + '|'
  FROM part1source E, part1source R WHERE E.A = R.A OR E.B = R.B
GROUP BY R.A, R.B ORDER BY count(*) ASC
SET @IN = @A + @B
END ELSE BEGIN
SELECT top 1 @A = 'A' + cast(R.A as varchar(10)) + ',', @B = 'B' + cast(R.B as varchar(10)) + '|'
  FROM part1source E, part1source R WHERE ( E.A = R.A OR E.B = R.B )
  AND @IN not like ('%A' + cast(R.A as varchar(10)) + ',%')
  AND @IN not like ('%B' + cast(R.B as varchar(10)) + '|%')
GROUP BY R.A, R.B ORDER BY count(*) ASC
SET @IN = @IN + @A + @B
END IF @A = '' BEGIN
SET @IN = REPLACE(REPLACE(REPLACE(REPLACE(substring(@IN,1,len(@IN)-1),'A',' SELECT '),',',' [A],'),'B',''),'|', ' [B] ' + CHAR(13) + 'UNION') + ' [B]'
SET @IN = 'INSERT part1dest' + char(13) + '     '  + @IN
exec(@IN)
END ELSE EXEC part1ins @IN
GO
exec part1ins null
November 11, 2007 10:58 PM
 

puzsol said:

Here's my summary so far....

Phil's is probably the only one that works on SQL2000 properly  (Though Ryan's is quite capable of being non cte as it is not truely recursive - more iterative)

chris.leonard's solution is very cute as it builds the whole solution tree in the one cte a feat I tried and failed (is that the only type of method of building a tree with a CTE?  CTE's seem to be aimed more at traversing a tree than building one.)

Ryan randall's is very cute as it builds the whole combination set (every possible permutation and combination), then neatly picks the ones that match the criteria - and selects the top one.  It is by far the neatest

I'm still claiming that it would be possible to code mine as select into statements, thereby meeting the strict 'one insert' rule, and it does work for all combinations (same category of building the solution tree).

However you are right... all the solutions that 'work' can take a long time...
I added four data sets:
INSERT INTO Part1Source VALUES (12,4)
INSERT INTO Part1Source VALUES (12,7)
INSERT INTO Part1Source VALUES (13,6)
INSERT INTO Part1Source VALUES (13,5)
Chris's version took 1:18, Ryans 0:03, Mine still sat at 0:00... but add another 15 entries and it starts to crawl to 10:00+ I guess that's the problem of exploring an exponentially expanding solution space...

I still think it would be possible to recode Phil's original solution as dynamic-dynamic SQL to come up with a truely generic solution - which may end up being faster still - and I'm not sure if my solution couldn't be sped up either.

Anyway, far too much fun and diversion.... thanks for the myriad of creative solutions guys!
November 11, 2007 11:50 PM
 

puzsol said:

Sorry missed Steve S's version... which as far as I can see is the same as Ryan's version... except it is better commented (and came first)... so the same comments apply.
November 11, 2007 11:54 PM
 

Ryan Randall said:

> Phil's is probably the only one that works on SQL2000 properly

I guess you've missed the last few posts then?
November 12, 2007 5:03 AM
 

puzsol said:

>I guess you've missed the last few posts then?

ok, if you are referring to orcus's solution, I did ignore it, as I had thought that a ranking solution could not guarantee finding the solution...

However, I now that I've spent a bit more time on it, I think that the iterative (or recursive as written) method of picking the next available entry with the next least amount of matching records should work... but I claim that the one presented does not....

[code]
delete from part1source
insert into part1source select 1,1
insert into part1source select 1,2
insert into part1source select 2,1
insert into part1source select 2,3
insert into part1source select 3,2
insert into part1source select 3,3
[/code]

This only gets two values in the solution (1,2) (2,3)... it should get 3 (1,2), (2,1), (3,3)... the reason is that ALL the original values are included in the recursive ranking counts - and as they all start with a rank of 3, it is the internal SQL grouping order that prevails (a asc, b desc) - which causes the algorithm to select (2,3) instead of (2,1) etc... I'll admit I had to muck around with the values to find this exception, but I could see it was there... so is there a way to recalculate the counts each loop based on only the values that remain (are available) - rather than using the same counts and culling which ones you see/use?

If this issue can be resolved it should be a lot faster than the other solutions... O(n) rather than O(n^2)... (though I would note that depending on the algorithm; n can be the full row count or the min distinct count of a or b, so even there they are not all the same)

If, on the other hand you are referring to turning your solution into 2000 compatible code... I did mention it was possible, and I didn't vet your solution.  Now that I look at your attempt, it is limited to 16 values (at least in the solution set, if not in the source set) 256*256 is only 2^16... you could expand it of course...  as you could replace varchar(max) with varchar(8000)... but it's not a self-expanding solution like your original.
November 12, 2007 11:03 PM
 

rmatty said:

Here is the solutionfor part1...works only in 2005
INSERT Part1Dest(A,B)
SELECT A,B
FROM
   (SELECT A,B,ROW_NUMBER() OVER(PARTITION BY B ORDER BY B) AS Rownum
   FROM
     (SELECT A,B
      FROM
   (SELECT  A,B ,ROW_NUMBER() OVER(PARTITION BY A ORDER BY A) AS Rownum
    FROM Part1Source
   ) T
   WHERE Rownum = 1
  ) T1
) T2
WHERE Rownum = 1
UNION
SELECT A,B
FROM   Part1Source
WHERE A=B
November 13, 2007 2:03 AM
 

rmatty said:

simpler than my previous version...can avoid one derived table...
SELECT A,B
FROM
  (SELECT A,B,ROW_NUMBER() OVER(PARTITION BY B ORDER BY B) AS Rownum
   FROM
    (SELECT A,B ,ROW_NUMBER() OVER(PARTITION BY A ORDER BY A) AS Rownum
     FROM Part1Source
     )T
  WHERE  Rownum = 1
 )T1
WHERE  Rownum = 1
UNION
SELECT A,B
FROM   Part1Source
WHERE A=B
November 13, 2007 3:07 AM
 

rmatty said:

part 2....

INSERT Part2Dest
SELECT A,B,C
FROM
 (SELECT A,B,C,ROW_NUMBER() OVER(PARTITION BY C ORDER BY C) AS Row
  FROM
   (SELECT A,B,C,ROW_NUMBER() OVER(PARTITION BY B ORDER BY B) AS Row
  FROM
  (SELECT A,B,C,ROW_NUMBER() OVER(PARTITION BY A ORDER BY A) AS Row
FROM
(SELECT A,B,C,ROW_NUMBER() OVER(PARTITION BY A,C ORDER BY C) AS Row
 FROM
(SELECT A,B,C,ROW_NUMBER() OVER(PARTITION BY B,C ORDER BY B) AS Row
FROM
 (SELECT  A,B ,C,ROW_NUMBER() OVER(PARTITION BY A,B ORDER BY A) AS Row
  FROM Part2Source
 ) T
WHERE Row = 1
) T1
)T2
WHERE Row = 1
 ) T3
   )T4
WHERE Row = 1
) T5
WHERE Row = 1
ORDER BY A,B,C
November 13, 2007 3:36 AM
 

Ryan Randall said:

rmatty - I'm afraid your suggestion for part 1 falls short with...

INSERT INTO Part1Source VALUES (1,0)
INSERT INTO Part1Source VALUES (1,1)
INSERT INTO Part1Source VALUES (2,0)
November 13, 2007 4:36 AM
 

puzsol said:

Not wanting to be a total party pooper, I figured out a (very minor) alteration to orcus's solution that makes it work (at least for the set I provided.... can anyone provide proof for or against that it will/won't work in all circumstances?):

Just remove the entries from both tables in the sub-loops... eg:

ALTER PROC part1ins @IN AS VARCHAR(8000) AS
DECLARE @A VARCHAR(10), @b VARCHAR(10)
SET @A = '' SET @B = ''
IF @IN is null BEGIN
SELECT top 1 @A = 'A' + cast(R.A as varchar(10)) + ',', @B = 'B' + cast(R.B as varchar(10)) + '|'
 FROM part1source E, part1source R WHERE E.A = R.A OR E.B = R.B
GROUP BY R.A, R.B ORDER BY count(*) ASC
SET @IN = @A + @B
END ELSE BEGIN
SELECT top 1 @A = 'A' + cast(R.A as varchar(10)) + ',', @B = 'B' + cast(R.B as varchar(10)) + '|'
 FROM part1source E, part1source R WHERE ( E.A = R.A OR E.B = R.B )
 AND @IN not like ('%A' + cast(R.A as varchar(10)) + ',%')
 AND @IN not like ('%B' + cast(R.B as varchar(10)) + '|%')
 AND @IN not like ('%A' + cast(E.A as varchar(10)) + ',%')
 AND @IN not like ('%B' + cast(E.B as varchar(10)) + '|%')
GROUP BY R.A, R.B ORDER BY count(*) ASC
SET @IN = @IN + @A + @B
END IF @A = '' BEGIN
SET @IN = REPLACE(REPLACE(REPLACE(REPLACE(substring(@IN,1,len(@IN)-1),'A',' SELECT '),',',' [A],'),'B',''),'|', ' [B] ' + CHAR(13) + 'UNION') + ' [B]'
SET @IN = 'INSERT part1dest' + char(13) + '     '  + @IN
exec(@IN)
END ELSE EXEC part1ins @IN
GO
exec part1ins null

Sorry rmatty, your solution looks similar to others that don't work generically... I could be wrong but I can't be bothered proving it right now... maybe later?
November 13, 2007 4:44 AM
 

Ryan Randall said:

> Sorry rmatty, your solution looks similar to others that don't work generically... I could be wrong but I can't be bothered proving it right now... maybe later?

lol - read the previous post!

Just testing your modification now...
November 13, 2007 4:48 AM
 

rmatty said:

if i remove union,i think it will work...
SELECT A,B
FROM
 (SELECT A,B,ROW_NUMBER() OVER(PARTITION BY B ORDER BY B) AS Rownum
  FROM
   (SELECT A,B ,ROW_NUMBER() OVER(PARTITION BY A ORDER BY A) AS Rownum
    FROM Part1Source
    )T
 WHERE  Rownum = 1
)T1
WHERE  Rownum = 1

November 13, 2007 5:23 AM
 

Ryan Randall said:

rmatty - I'm afraid not. That only gives you 1 value (1, 0), when there should be 2 (1, 1) and (2, 0).
November 13, 2007 5:27 AM
 

Ryan Randall said:

orcus, puzsol

Well I couldn't break it the revised code. This is an interesting and original idea, orcus - and not pinching off many previous solutions as you thought!

As far as I can tell, it's equivalent to the following code (which also passes all my simple tests, but might be easier to understand and therefore break)...

DECLARE @t TABLE (a INT, b INT)
DECLARE @u TABLE (a INT, b INT)

INSERT @t SELECT * FROM dbo.Part1Source

WHILE (SELECT COUNT(*) FROM @t) > 0
BEGIN
   INSERT @u SELECT TOP 1 t.a, t.b FROM @t t, @t u WHERE t.a = u.a OR t.b = u.b GROUP BY t.a, t.b ORDER BY COUNT(*)
   DELETE t FROM @t t INNER JOIN @u u ON t.a = u.a OR t.b = u.b
END
SELECT * FROM @u ORDER BY a

If this type of solution does work (and I'm very interested in a proof either way), it is by far the best approach, and quite a revelation to me. Very well done!
November 13, 2007 5:53 AM
 

orcus said:

I hadn't thought of string of inserted values, not like my row test.  I pinched that bit.

a) Iteratively without a procedure declaration
b) no casting
c) a not in table variable rather than a not like string

Is much better, especially for a large numbers of rows.
(or more complicated data types)

Given that the only criteria for excluding a row is the other rows that conflict, the maximum number transferred is the same property as the least conflicting.
November 13, 2007 5:17 PM
 

puzsol said:

What I find most interesting about this exercise is the number of different approaches and techniques that were used to solve the problem... and the willingness of people to borrow bits from others, enhance, prove etc.  

I guess it shows what can be achieved as a *group* rather than as a single person.

Hope everyone else has enjoyed it as much as I have.
November 13, 2007 6:08 PM
 

keith.duaime said:

This one seems to pass all the other data values that have been tried. The order by takes a while and would rather have a least() function, which could be created for the puzzle.

TRUNCATE TABLE Part1Dest
GO

BEGIN
DECLARE
@rc INT,
@a INT,
@b INT

SELECT TOP 1 @a = a, @b = b
FROM Part1Source
ORDER BY
CASE
WHEN (SELECT COUNT(DISTINCT b) FROM Part1Source s WHERE s.a = Part1Source.a) < (SELECT COUNT(DISTINCT a) FROM Part1Source s WHERE s.b = Part1Source.b) THEN
(SELECT COUNT(DISTINCT b) FROM Part1Source s WHERE s.a = Part1Source.a)
ELSE
(SELECT COUNT(DISTINCT a) FROM Part1Source s WHERE s.b = Part1Source.b)
END,
CASE
WHEN (SELECT COUNT(DISTINCT b) FROM Part1Source s WHERE s.a = Part1Source.a) < (SELECT COUNT(DISTINCT a) FROM Part1Source s WHERE s.b = Part1Source.b) THEN
(SELECT COUNT(DISTINCT a) FROM Part1Source s WHERE s.b = Part1Source.b)
ELSE
(SELECT COUNT(DISTINCT b) FROM Part1Source s WHERE s.a = Part1Source.a)
END
SET @rc = @@ROWCOUNT
WHILE @rc > 0
BEGIN
INSERT INTO Part1Dest(a,b)
VALUES(@a,@b)

SELECT TOP 1 @a = Part1Source.a, @b = Part1Source.b
FROM Part1Source
LEFT OUTER JOIN Part1Dest ON
Part1Dest.a = Part1Source.a OR
Part1Dest.b = Part1Source.b
WHERE
Part1Dest.a IS null
ORDER BY
CASE
WHEN (SELECT COUNT(DISTINCT b) FROM Part1Source s WHERE s.a = Part1Source.a) < (SELECT COUNT(DISTINCT a) FROM Part1Source s WHERE s.b = Part1Source.b) THEN
(SELECT COUNT(DISTINCT b) FROM Part1Source s WHERE s.a = Part1Source.a)
ELSE
(SELECT COUNT(DISTINCT a) FROM Part1Source s WHERE s.b = Part1Source.b)
END,
CASE
WHEN (SELECT COUNT(DISTINCT b) FROM Part1Source s WHERE s.a = Part1Source.a) < (SELECT COUNT(DISTINCT a) FROM Part1Source s WHERE s.b = Part1Source.b) THEN
(SELECT COUNT(DISTINCT a) FROM Part1Source s WHERE s.b = Part1Source.b)
ELSE
(SELECT COUNT(DISTINCT b) FROM Part1Source s WHERE s.a = Part1Source.a)
END
SET @rc = @@ROWCOUNT
END
END
GO
SELECT *
FROM Part1Dest
GO
November 27, 2007 7:29 AM
 

keith.duaime said:

After a little more thought and didn't like the convoluted order by statement this should work and is for the second one.

TRUNCATE TABLE Part2Dest
GO

BEGIN
   DECLARE
       @rc   INT,
       @a    INT,
       @b    INT,
       @c    INT

   SELECT TOP 1 @a = Part2Source.a, @b = Part2Source.b, @c = Part2Source.c
   FROM Part2Source
   INNER JOIN (SELECT a, COUNT(*) cntA FROM Part2Source GROUP BY a) cntA ON
       cntA.a = Part2Source.a
   INNER JOIN (SELECT b, COUNT(*) cntB FROM Part2Source GROUP BY b) cntB ON
       cntB.b = Part2Source.b
   INNER JOIN (SELECT c, COUNT(*) cntC FROM Part2Source GROUP BY c) cntC ON
       cntC.c = Part2Source.c
   ORDER BY
       CASE
           WHEN cntA.cntA <= cntB.cntB AND cntA.cntA <= cntC.cntC THEN cnta.cntA
           WHEN cntB.cntB <= cntA.cntA AND cntB.cntB <= cntC.cntC THEN cntB.cntB
           ELSE cntC.cntC
       END,
       CASE
           WHEN cntA.cntA <= cntB.cntB AND cntA.cntA > cntC.cntC THEN cnta.cntA
           WHEN cntB.cntB <= cntA.cntA AND cntB.cntB > cntC.cntC THEN cntB.cntB
           ELSE cntC.cntC
       END,
       CASE
           WHEN cntA.cntA > cntB.cntB AND cntA.cntA > cntC.cntC THEN cnta.cntA
           WHEN cntB.cntB > cntA.cntA AND cntB.cntB > cntC.cntC THEN cntB.cntB
           ELSE cntC.cntC
       END
   SET @rc = @@ROWCOUNT
   WHILE @rc > 0
   BEGIN
       INSERT INTO Part2Dest(a,b,c)
           VALUES(@a,@b,@c)

       SELECT TOP 1 @a = Part2Source.a, @b = Part2Source.b, @c = Part2Source.c
       FROM Part2Source
       INNER JOIN (SELECT a, COUNT(*) cntA FROM Part2Source GROUP BY a) cntA ON
           cntA.a = Part2Source.a
       INNER JOIN (SELECT b, COUNT(*) cntB FROM Part2Source GROUP BY b) cntB ON
           cntB.b = Part2Source.b
       INNER JOIN (SELECT c, COUNT(*) cntC FROM Part2Source GROUP BY c) cntC ON
           cntC.c = Part2Source.c
       LEFT OUTER JOIN Part2Dest ON
           Part2Dest.a = Part2Source.a OR
           Part2Dest.b = Part2Source.b OR
           Part2Dest.c = Part2Source.c
       WHERE Part2Dest.a IS NULL
       ORDER BY
           CASE
               WHEN cntA.cntA <= cntB.cntB AND cntA.cntA <= cntC.cntC THEN cnta.cntA
               WHEN cntB.cntB <= cntA.cntA AND cntB.cntB <= cntC.cntC THEN cntB.cntB
               ELSE cntC.cntC
           END,
           CASE
               WHEN cntA.cntA <= cntB.cntB AND cntA.cntA > cntC.cntC THEN cnta.cntA
               WHEN cntB.cntB <= cntA.cntA AND cntB.cntB > cntC.cntC THEN cntB.cntB
               ELSE cntC.cntC
           END,
           CASE
               WHEN cntA.cntA > cntB.cntB AND cntA.cntA > cntC.cntC THEN cnta.cntA
               WHEN cntB.cntB > cntA.cntA AND cntB.cntB > cntC.cntC THEN cntB.cntB
               ELSE cntC.cntC
           END
       SET @rc = @@ROWCOUNT
   END
END
GO

SELECT *
FROM Part2Dest
GO
November 27, 2007 11:45 AM
 

Celko said:

WITH X (a, b, row_nbr) AS
(SELECT a,  b,   ROW_NUMBER ()  OVER (ORDER BY a ASC, b DESC)
FROM Part1Source)
SELECT a, b, row_nbr
FROM X AS P1
WHERE  NOT EXISTS
 (SELECT *
     FROM X AS P2
WHERE P2.row_nbr < P1.row_nbr
  AND (P1.a = P2.a OR P1.b = P2.b));

This should give you one valid subset.  The idea is to sort and number the rows then look from the current position P1 at all the previous rows P2 to if one which repeats a value of a or b exists.  That means we reject the current row.  This gives me:

a      b      row_nbr
===========
1 2 1
2 4 3
5 5 5
9 0 9

December 21, 2007 1:37 PM
 

mikegood said:

Celko, this not your best effort.  Correct Part 1 answer has 5 dest rows for the specified Part1Source population.  One valid solution is
 1, 1
 2, 4
 5, 3
 7, 2
 9, 0

For other populations it does worse.  I've been using the alternate population below to help find flaws.  The answer should have 4 dest rows, but the algorithm you posted above only finds two.  
 1, 1
 1, 2
 2, 1
 2, 2
 2, 3
 2, 4
 3, 3
 3, 4
 4, 1
 
One valid solution is
 1, 2
 2, 3
 3, 4
 4, 1

Finally, for any source data population we should be able to re-run test with data in source cols a and b swapped, and still get the same number of dest rows in the answer.  Your algorithm gets 4 rows for specified source population, but only 3 rows when data in cols is swapped, which is a sure indication of a logical flaw.  

All that said, I cannot determine if you're onto something here, fundamentally, or not.  Maybe with a little more work?
December 30, 2007 12:34 PM
 

mikegood said:

keith.duaime,

I think yours is only solution posted thus far that is correct and directly finds the best solution withouth using brute force to generate all possible solutions and choosing the best one.   I arrived at similar solution (below) based on UNION of alternate groupings.  But your approach based on JOINing alternate groupings is more concise & I like it better, especially for part 2.

I'm pretty confident that this approach is by far the best performing of all the correct approaches posted so far.  

If anyone doubts the correctness of this approach, please post a source data population that will expose an error.  I cannot find one, believe it is correct.  FWIW, this approach also exactly matches the way that I attempt to solve these problems by hand.  

I'm not so sure that the single INSERT inside a loop meets the true spirit of the puzzle criteria?  We only wrote one INSERT stmt, but multiple INSERTs will be executed.  All of the other correct brute force solutions above execute only a single INSERT.  I wonder if it's possible to apply the fundamental (efficient) approach used here in non-looping single INSERT manner?  I've tried all day long, both with and without recursive CTEs, but I'm not even close.  I think I could do it if I could use aggregates and TOP in the recursive part of the CTE, but these are not allowed.  I suspect it cannot be done.

--my approach before seeing Keith's
while (1 = 1)
begin
   insert  Part1Dest (a, b)
   select  top 1 a, b
     from (select  s.a,                        --try grouping by a
                   min(s.b) as b,
                   count(s.a) as NbrRows
             from  Part1Source s
             left  join Part1Dest d on  ((d.a = s.a) or (d.b = s.b))
            where  d.a is null
            group  by s.a
           union all
           select  min(s.a) as a,              --try grouping by b
                   s.b,
                   count(s.a) as NbrRows
             from  Part1Source s
             left  join Part1Dest d on  ((d.a = s.a) or (d.b = s.b))
            where  d.a is null
            group  by s.b) as x
   order  by NbrRows

   if (@@rowcount = 0)                 --quit when done inserting rows
       break
end
December 30, 2007 7:34 PM
 

mikegood said:

Here's a shortened version of keith's efficient solutions.  I got rid of the variables, eliminated 1/2 the code by modifying the loop control, and added a Least() function for clarity.

I also changed the ORDER BY a little bit.  Keith's ORDER BY statements clearly work, I just do not understand them.  I get the leftmost part, but not the rest.  I left the leftmost part as-is, but replaced the rest with sums, which I do understand...this is how I solve these problems by hand.  Both ways appear to generate correct solutions.

Pretty sure this approach is only one posted so far that could be readily extended to many more columns and many more rows.  I just ran a test with 3k rows in Part1Source in under 1 sec.

create function dbo.Least(@a int, @b int, @c int) returns int
as
begin
if (@c is null) --handle 2-argument call
   return case when @a <= @b then @a else @b end

return case     --handle 3-argument call
        when @a <= @b and @a <= @c then @a
        when @b <= @a and @b <= @c then @b
        else @c
      end
end
go

--part 1 solution
while (1 = 1)
begin
   insert  Part1Dest (a, b)
   select  top 1 s.a, s.b
     from  Part1Source s
     join (select a, count(a) as N from Part1Source group by a) as a
       on  a.a = s.a
     join (select b, count(b) as N from Part1Source group by b) as b
       on  b.b = s.b
     left  outer join Part1Dest d      --exclude conflicting source rows
       on  ((d.a = s.a) or (d.b = s.b))
    where  d.a is null
    order  by dbo.Least(a.N, b.N, null), a.N + b.N

   if (@@rowcount = 0)
       break   --quit when no more rows can be added
end

--part 2 solution
while (1 = 1)
begin
   insert  Part2Dest (a, b, c)
   select  top 1 s.a, s.b, s.c
     from  Part2Source s
     join (select a, count(a) as N from Part2Source group by a) as a
       on  a.a = s.a
     join (select b, count(b) as N from Part2Source group by b) as b
       on  b.b = s.b
     join (select c, count(c) as N from Part2Source group by c) as c
       on  c.c = s.c
     left  outer join Part2Dest d      --exclude conflicting source rows
       on  ((d.a = s.a) or (d.b = s.b) or (d.c = s.c))
    where  d.a is null
    order  by dbo.Least(a.N, b.N, c.N),
              dbo.Least(a.N + b.N, a.N + c.N, b.N + c.N),
              a.N + b.N + c.N

   if (@@rowcount = 0)
       break   --quit when no more rows can be added
end
December 30, 2007 8:11 PM
 

Ryan Randall said:

keith.duaime, mikegood

I’m afraid your suggestions can break down with a large number of rows.

For example, try the following source data…

1,1; 1,2; 1,8; 1,19; 1,33; 2,3; 2,4; 2,44; 3,1; 3,4; 3,24; 3,30; 3,31; 3,37; 3,39; 3,43; 4,18; 4,48; 5,1; 5,3; 5,5; 5,9; 5,20; 5,21; 5,35; 5,38; 5,50; 6,14; 6,24; 6,28; 6,30; 7,2; 7,5; 7,18; 7,40; 7,44; 7,46; 8,1; 8,5; 8,20; 8,29; 8,46; 8,48; 9,0; 9,17; 9,23; 9,30; 9,31; 9,35; 9,38; 9,39; 10,11; 10,24; 10,25; 10,39; 10,47; 11,2; 11,15; 11,17; 11,18; 11,21; 11,33; 11,36; 11,40; 12,11; 12,36; 12,39; 13,12; 13,17; 13,20; 13,21; 13,23; 13,34; 13,35; 14,23; 14,26; 14,32; 14,36; 14,39; 15,20; 16,1; 16,5; 16,35; 17,7; 17,10; 17,15; 17,23; 17,30; 17,34; 17,36; 17,37; 17,41; 18,7; 18,11; 18,20; 18,29; 18,50; 19,13; 19,15; 19,18; 19,23; 19,37; 19,44; 20,4; 20,7; 20,13; 20,28; 20,36; 20,37; 20,38; 21,5; 21,12; 21,13; 21,18; 21,19; 21,29; 21,34; 21,50; 22,5; 22,14; 22,18; 22,29; 23,1; 23,17; 23,20; 23,25; 23,29; 24,9; 24,14; 24,31; 24,34; 25,11; 25,13; 25,25; 25,30; 25,45; 26,7; 26,8; 26,9; 26,39; 26,43; 26,47; 27,13; 27,19; 27,22; 27,42; 27,47; 27,48; 28,3; 28,8; 28,14; 28,17; 28,19; 28,26; 28,38; 28,47; 29,2; 29,4; 29,12; 29,18; 29,25; 29,27; 29,32; 29,37; 30,8; 30,18; 30,22; 30,42; 30,43; 30,47; 31,7; 31,10; 31,20; 31,31; 31,43; 31,50; 32,1; 32,4; 32,5; 32,21; 32,29; 32,39; 32,49; 33,10; 33,12; 33,22; 33,27; 33,34; 33,49; 34,1; 34,5; 34,9; 34,15; 34,46; 35,4; 35,5; 35,6; 35,8; 35,13; 35,19; 35,23; 35,25; 35,27; 35,31; 35,39; 35,50; 36,1; 36,14; 36,19; 36,33; 36,34; 36,41; 36,45; 36,47; 37,15; 37,21; 37,23; 37,32; 37,35; 37,44; 37,46; 37,47; 37,48; 38,2; 38,12; 38,15; 38,23; 38,36; 38,38; 38,39; 38,47; 39,16; 39,17; 39,18; 40,10; 40,11; 40,16; 40,19; 40,20; 40,27; 40,29; 40,34; 40,49; 41,8; 41,13; 41,22; 41,26; 41,33; 42,23; 42,26; 42,50; 43,14; 43,26; 43,42; 43,50; 45,1; 45,3; 45,6; 45,18; 45,19; 45,24; 45,33; 45,36; 45,41; 45,44; 46,8; 46,15; 46,16; 46,42; 47,2; 47,3; 47,5; 47,9; 47,12; 47,14; 47,18; 47,36; 48,2; 48,16; 48,18; 48,20; 48,36; 49,5; 49,9; 49,17; 49,21; 49,22; 49,35; 50,40; 50,44; 50,48;

Your code produces an answer with 48 rows, whereas the code in my previous post shows a 49-row solution is possible.
December 31, 2007 10:07 AM
 

mikegood said:

Nuts.  I was afraid of this originally.  But then I was unable to find a source set that broke it and I got carried away with the delusion.  Most of the time I've spent on this has been spent trying to build a test data set to disprove the approach, I just never succeeded.  

I'd never paid much attention to that approach in your previous post because it has 2 insert stmts (you didn't include the 2nd one that inserts into the real dest table, but it has to be there).  So, knowing how all the other approaches scaled...with this large of a source table, I hit GO and figured I'd check progress next hear (in about 8hrs or so).  I was stunned how fast this ran.  

So I see you're right...48 vs 49.  Now I just would like to pinpoint the root of the flaw.  I can't believe it's simply caused by large number of rows.  I hope that if we understood it better we could cause the approach to break down with 10-15 rows of data?  

And if we could break it, possibly we could fix it?  Probably not.  Maybe fundamentally it was naive to try to solve potentially hard problem only examining the consequences of the current choice without also considering the the next choice & the next.  As you and Chris & others already did.  I suspect that's it.  

Well, there are a lot of differences between the your 49 and my 48 dest rows, nothing stands out to me.  I reckon I'll start eliminating source data rows one by one until I get a small set that still breaks the approach.  

PS - How on earth did you find such a source data set?  And if it took some doing, what made you keep trying?!
December 31, 2007 4:06 PM
 

mikegood said:

Ryan, am orking to narrow the dataset.  About 1/2 way through my first pass, have eliminated many rows and the kd/mg approach is now off by 3 -- 45 vs 48 rows.

1,33;2,44;3,31;3,37;5,1;5,5;5,9;5,20;5,21;5,35;5,38;5,50;6,14;6,24;6,28;6,30;7,18;7,40;7,44;7,46;8,1;8,29;8,46;8,48;9,0;9,17;9,23;9,30;9,31;9,35;9,38;9,39;10,24;10,25;10,39;10,47;11,2;11,15;11,17;11,18;11,21;11,33;11,36;11,40;12,36;12,39;13,21;13,23;13,34;13,35;14,26;14,32;14,36;14,39;15,20;16,1;16,35;17,10;17,34;17,36;18,50;19,23;20,36;21,5;21,12;21,13;21,18;21,19;21,29;21,34;21,50;22,14;22,29;23,17;23,20;23,25;23,29;24,9;24,14;24,31;24,34;25,11;25,25;25,30;25,45;26,9;26,39;26,43;26,47;27,19;27,22;27,42;27,47;28,3;28,14;28,17;29,2;29,4;29,12;29,18;29,25;29,27;29,32;29,37;30,18;30,22;30,42;30,47;31,10;31,31;31,43;31,50;32,1;32,4;32,5;32,21;32,29;32,39;32,49;33,10;33,12;33,22;33,27;33,34;33,49;34,1;34,5;34,9;34,15;34,46;35,4;35,5;35,6;35,8;35,13;35,19;35,23;35,25;35,27;35,31;35,39;35,50;36,1;36,14;36,19;36,33;36,34;36,41;36,45;36,47;37,15;37,21;37,23;37,32;37,35;37,44;37,46;37,47;37,48;38,2;38,12;38,15;38,23;38,36;38,38;38,39;38,47;39,16;39,17;39,18;40,10;40,11;40,16;40,19;40,20;40,27;40,29;40,34;40,49;41,8;41,13;41,22;41,26;41,33;42,23;42,26;42,50;43,14;43,26;43,42;43,50;45,1;45,3;45,6;45,18;45,19;45,24;45,33;45,36;45,41;45,44;46,8;46,15;46,16;46,42;47,2;47,3;47,5;47,9;47,12;47,14;47,18;47,36;48,2;48,16;48,18;48,20;48,36;49,5;49,9;49,17;49,21;49,22;49,35;50,40;50,44;50,48;

I'm not done yet, but just noticed something interesting: when I add back 31, 20 to the above source data set, your algorithm yields 47 rows.  With that row absent, your approach yields 48.  That's not right is it?
January 1, 2008 11:47 PM
 

mikegood said:

Here is a small source data set that will break the kd-mg approach:  1,2; 1,4; 2,1; 2,3; 3,4; 4,2; 4,3;  The correct answer is 4 rows, kd-mg gets only 3.

With the data this small, it was easy to see the flaw in both keith's original post & my subsequent mods to it.  The subqueries that determine count of affected rows were performing this count against the entire source data set, not just the remaining set of rows...the exclusionary outer join does not get applied until too late.

This got me looking back at the original code in my first post...turns out it did not suffer from this problem.  I've run that original approach against Ryan's data set, as well as several others that I've built since working with it, and so far cannot get it to produce fewer rows than returned by the algorithm Ryan last posted.

Fixing keith's original or my modified versions of keith's approach is simple, just add the exclusionary outer join to the subqueries (which eliminates need for the original outer join in the outer query).

while (1 = 1)
begin
   insert  Part1Dest (a, b)
   select  top 1 s.a, s.b
     from  Part1Source s
     join (select  s1.a, count(s1.a) as N from Part1Source s1
             left  join Part1Dest d on ((d.a = s1.a) or (d.b = s1.b))
            where  d.a is null group by s1.a) as a
       on  a.a = s.a
     join (select  s1.b, count(s1.b) as N from Part1Source s1
             left  join Part1Dest d on ((d.a = s1.a) or (d.b = s1.b))
            where  d.a is null group by s1.b) as b
       on  b.b = s.b
    order  by case when a.N < b.N then a.N else b.N end, a.N + b.N

   if (@@rowcount = 0)                 --quit when done inserting rows
       break
end

I think part1 is actually simple enough that looking only at immediate consquences of the current decision is good enough to find the best solution.  This is may not be true for part2, though....  So far I cannot find a data set to prove this  wrong, and it would be helpful to have a fast correct part2 algorithm to compare results with.
January 2, 2008 3:35 AM
 

mikegood said:

This set causes approach posted November 13, 2007 5:53 AM EST by Ryan Randall to break down.  Only gets 14 rows, approach in my last post gets 15 rows.  Eliminate 17,3 from the source data set and both yield 15 rows.

1,3; 1,4; 1,9; 2,5; 2,12; 2,13; 3,16; 4,10; 5,4; 5,9; 5,5; 6,13; 6,10; 6,6; 7,3; 8,10; 9,2; 9,17; 10,14; 10,16; 10,2; 11,7; 11,1; 11,8; 11,11; 12,14; 12,1; 12,16; 13,16; 13,5; 13,12; 14,5; 14,6; 14,17; 15,15; 15,10; 15,2; 16,1; 16,3; 17,3;
January 3, 2008 3:47 PM
 

mikegood said:

I have spent some time trying to create a tricky source data set for part2, but no luck.  Maybe this problem is not complex after all.  Below is revised code based on keith.dumaine's post to solve both parts.  It will quickly find the correct solution even when there are thousands of source data rows.

create function dbo.Least(@a int, @b int, @c int) returns int
as
begin
if (@c is null) --handle 2-argument call
  return case when @a <= @b then @a else @b end

return case     --handle 3-argument call
       when @a <= @b and @a <= @c then @a
       when @b <= @a and @b <= @c then @b
       else @c
     end
end
go

--part 1 solution
while (1 = 1)
begin
  insert  Part1Dest (a, b)
     select  top 1 s.a, s.b
       from  Part1Source s
       join (select  s1.a, count(s1.a) as N from Part1Source s1
               left  join Part1Dest d on ((d.a = s1.a) or (d.b = s1.b))
              where  d.a is null group by s1.a) as a
         on  a.a = s.a
       join (select  s1.b, count(s1.b) as N from Part1Source s1
               left  join Part1Dest d on ((d.a = s1.a) or (d.b = s1.b))
              where  d.a is null group by s1.b) as b
         on  b.b = s.b
   order  by dbo.Least(a.N, b.N, null), a.N + b.N

  if (@@rowcount = 0)
      break   --quit when no more rows can be added
end

--part 2 solution
while (1 = 1)
begin
  insert  Part2Dest (a, b, c)
  select  top 1 s.a, s.b, s.c
    from  Part2Source s
    join (select s2.a, count(s2.a) as N from Part2Source s2
               left  join Part2Dest d on ((d.a = s2.a) or (d.b = s2.b) or (d.c = s2.c))
              where  d.a is null group by s2.a) as a
      on  a.a = s.a
    join (select s2.b, count(s2.b) as N from Part2Source s2
               left  join Part2Dest d on ((d.a = s2.a) or (d.b = s2.b) or (d.c = s2.c))
              where  d.a is null group by s2.b) as b
      on  b.b = s.b
    join (select s2.c, count(s2.c) as N from Part2Source s2
               left  join Part2Dest d on ((d.a = s2.a) or (d.b = s2.b) or (d.c = s2.c))
              where  d.a is null group by s2.c) as c
      on  c.c = s.c
   order  by dbo.Least(a.N, b.N, c.N),
             dbo.Least(a.N + b.N, a.N + c.N, b.N + c.N),
             a.N + b.N + c.N

  if (@@rowcount = 0)
      break   --quit when no more rows can be added
end
January 3, 2008 4:14 PM
 

Ryan Randall said:

> This set causes approach posted November 13, 2007 5:53 AM EST by Ryan Randall to break down.

Great work, mikegood! I had my suspicions, and it's good to have a proof one way or another. I'd run out of energy on this puzzle by then to figure it out for myself!
January 3, 2008 4:55 PM
 

Ryan Randall said:

mikegood

> It will quickly find the correct solution even when there are thousands of source data rows.

Just out of interest, how are you testing for correctness? Awesome work over the last few days, by the way.

Incidentally, I realised shortly after November 13 that this problem probably falls under graph theory (something I have never studied). If any non-brute-force approach truly works, it would be interesting to find if there was a theorem to support the key idea of the approach.

Just a thought.
January 3, 2008 5:02 PM
 

mikegood said:

Ryan, initially, with small sets I was testing against results from your original or chris leonard's brute force approach.  I'm convinced these are both accurate.

When I posted flawed approach earlier, I clearly wasn't testing enough.  I had created what I thought were tricky source data sets, but they weren't good enough. Since you found my flaw, I've been using that large set you provided & tweaking it somewhat, and comparing results of my algorithm with those generated by your Nov 13 post.  Your Nov 13th post has been the standard.

And I did had no reason to suspect it.  Only reason I drilled in further on Nov 13th approach was that a fluke I happened to observe the results grow by 1 when I removed a row from the source set.  Have since spent a little time trying to figure out where the problem is, and have to admit I don't understand it well enough.  It looks pretty similar to the approach I've been working with, doesn't it?  If so, I bet problem is with the order by stmt, maybe it's too simple.  I posted smallest source set I could find that exhibited the problem, but it is not very small, not sure if you can figure out where the root problem is?

I really have tried to create a source set that attempts to bait the modified keith.dumaine approach into making a bad decision (one that appears to be best, but is short-sighted), but I just don't think it's possible.  Even for part 2.   I took lot of math in school but nothing like this.  But I bet a math major could settle this pretty easily, determine if non-brute force approach is feasible or impossible.

So back to your question, how am I testing for correctness?  Well you caught me, I made an unsupported assertion of correctness.  For anything other than small source sets, your Nov 13th post is still best alternate routine to compare results with.  So I'm basically asserting correctness until someone like yourself proves otherwise.  So far you've proven to be very good at this!

Am about out of energy myself.  Too bad I didn't see this earlier.  I only noticed when the monthly simple-talk email mentioned my old friend Chris Leonard.  
January 4, 2008 1:03 AM
 

Celko said:

Mike Good was right about that not being my best work.  My goal was to get the first acceptable answer.  When you do NP-complete problems in SQL, you usually get the entire set of candidate answers and the query runs like glue.   Here is my "glue query":

-- create the original source data
CREATE TABLE Source
(pair_nbr INTEGER NOT NULL UNIQUE,
a INTEGER NOT NULL,
b INTEGER NOT NULL,
PRIMARY KEY (a, b));
   
INSERT INTO Source VALUES (1, 1, 1);
INSERT INTO Source VALUES (2, 1, 2);
INSERT INTO Source VALUES (3, 2, 3);
INSERT INTO Source VALUES (4, 7, 2);
INSERT INTO Source VALUES (5, 2, 4);
INSERT INTO Source VALUES (6, 5, 5);
INSERT INTO Source VALUES (7, 5, 1);
INSERT INTO Source VALUES (8, 5, 3);
INSERT INTO Source VALUES (9, 9, 0);
INSERT INTO Source VALUES (10, 11, 2);

-- CTE to set up all subsets of the ten sample pairs
WITH Flags (i)
AS (VALUES ('t'), ('f')),

--CROSS JOIN from Hell to get all subsets and random candidate id numbers
Subsets (subset_nbr, f01, f02, f03, f04, f05, f06, f07, f08, f09, f10)
AS
(SELECT ROW_NUMBER() OVER () AS subset_nbr,
F01.i, F02.i, F03.i, F04.i, F05.i,
F06.i, F07.i, F08.i, F09.i, F10.i
FROM Flags AS F01, Flags AS F02, Flags AS F03, Flags AS F04, Flags AS F05,
 Flags AS F06, Flags AS F07, Flags AS F08, Flags AS F09, Flags AS F10),

--filter out pairs from the permutations
Candidates(subset_nbr, a, b)
AS (
SELECT subset_nbr, S.a, S.b  FROM Subsets AS P, Source AS S WHERE S.pair_nbr = 1 AND P.f01 = 't'
UNION ALL
SELECT subset_nbr, S.a, S.b  FROM Subsets AS P, Source AS S WHERE S.pair_nbr = 2 AND P.f02 = 't'
UNION ALL
SELECT subset_nbr, S.a, S.b FROM Subsets AS P, Source AS S WHERE S.pair_nbr =3 AND P.f03 = 't'
UNION ALL
SELECT subset_nbr,  S.a, S.b  FROM Subsets AS P, Source AS S WHERE S.pair_nbr = 4 AND P.f04= 't'
UNION ALL
SELECT subset_nbr, S.a, S.b  FROM Subsets AS P, Source AS S WHERE S.pair_nbr = 5 AND P.f05 = 't'
UNION ALL
SELECT subset_nbr, S.a, S.b  FROM Subsets AS P, Source AS S WHERE S.pair_nbr = 6 AND P.f06 = 't'
UNION ALL
SELECT subset_nbr, S.a, S.b  FROM Subsets AS P, Source AS S WHERE S.pair_nbr = 7 AND P.f07 = 't'
UNION ALL
SELECT subset_nbr, S.a, S.b  FROM Subsets AS P, Source AS S WHERE S.pair_nbr = 8 AND P.f08 = 't'
UNION ALL
SELECT subset_nbr, S.a, S.b  FROM Subsets AS P, Source AS S WHERE S.pair_nbr = 9 AND P.f09 = 't'
UNION ALL
SELECT subset_nbr, S.a, S.b  FROM Subsets AS P, Source AS S WHERE S.pair_nbr = 10 AND P.f10 = 't')

SELECT subset_nbr, a, b -- return the winners
FROM Candidates AS C1
WHERE subset_nbr
IN (SELECT C2.subset_nbr  -- find all perms that meet uniqueness criteria
   FROM Candidates AS C2
GROUP BY C2.subset_nbr
HAVING COUNT(a) =  COUNT(DISTINCT a)
    AND  COUNT(b) =  COUNT(DISTINCT b)
--   AND COUNT(*) >= 5)  -- so you don't go nuts looking at the output
ORDER BY subset_nbr, a, b;

The bad news!
1) This gives duplicate answers under different subset_nbrs.  Yes, I could do a relational division and remove them, but that would really be messy to read.  

2) This is probably going to scale better than you think.  Richard Romley wrote an 81-way self-join for a Sudoku solver which also returns all possible grids.  It is quite fast, even with hundreds of solutions.  This is much simpler.  

February 5, 2008 3:04 PM
 

Celko said:

Oops! My error; this does not generate duplicates.  Here are the answers for the original problem:

172 1 1
172 2 4
172 5 5
172 9 0
172 11 2

=========
186 1 1
186 2 4
186 5 5
186 7 2
186 9 0

=========
427 1 1
427 2 3
427 5 5
427 9 0
427 11 2

=========
441 1 1
441 2 3
441 5 5
441 7 2
441 9 0

=========
652 1 1
652 2 4
652 5 3
652 9 0
652 11 2

=========
666 1 1
666 2 4
666 5 3
666 7 2
666     9       0
February 5, 2008 3:22 PM
You need to sign in to comment on this blog


















<October 2007>
SuMoTuWeThFrSa
30123456
78910111213
14151617181920
21222324252627
28293031123
45678910
Finding Stuff in SQL Server Database DDL
 You'd have thought that nothing would be easier than using SQL Server Management Studio (SSMS) for... Read more...

Mission Critical: SQL Server 2008 Performance Tuning Task List
 In which Buck Woody imagines how the US military would have tackled DBA checklists for... Read more...

Simple Query tuning with STATISTICS IO and Execution plans
 A great deal can be gleaned from the use of the STATISTICS IO and the execution plan, when you are... Read more...

Switching rows and columns in SQL
 When they use SQL Server, one the commoner questions that Ms Access programmers ask is 'Where's the... Read more...

Writing Efficient SQL: Set-Based Speed Phreakery
 Phil Factor's SQL Speed Phreak challenge is an event where coders battle to produce the fastest code to... Read more...