Click here to monitor SSC
Av rating:
Total votes: 10
Total comments: 9


Joe Celko
Voting Paradoxes: a SQL Stumper
28 October 2011

Voting systems can become very complex, and some of them are easy to manipulate by tactical voting. Joe takes a couple of voting systems and wonders how you would implement them in SQL. He's even more curious as to how you, the reader, would do so.

Right now we in the States are being bombarded with Republican Presidential candidates debating on television, immediately followed by public opinion polls. You do not known how many candidates are in the game from week to week, nor who is the front runner unless you follow the debates every day.

This kind of voting appears in a lot of business situations, even though we do not usually think of it as a ballot. Any purchase choice among several products is a vote, pushing the buttons on your television remote is a vote for one program out of the gazillion channels on your satellite dish and so many other decisions.

The voting problem has been around for centuries, but the modern answer was given by Economist Kenneth Arrow. He won the Nobel Memorial Prize in Economics and National Medal in Science (2004), so we know he is smart. His theorem is known as “Arrow’s Impossibility Theorem” in the literature. He started with some simple assumptions about rules we want in a “fair election system” and then he shows that things don't work like you think they should. Given at least two voters and at least three candidates, it is impossible to aggregate individual preferences without violating some of the desirable conditions.

The desired conditions are:

  1. Every voter has a strictly monotonic ordered preference function. In symbols, P(c1, c2) returns the best-liked candidate of the pair. Again, in symbols, P(c1, c2) Ù P(c1, c2) Þ P(c1, c3). Or in English there is no “scissors, paper, stone” circular rankings or ties.
  2. Every voter votes his conscience and locks in his preference function. That means given candidates X and Y, and a preference of X over Y, then the voter cast his vote this way in every ballot. There is no political game playing allowed.
  3. There is no dictator. No one voter holds enough votes to control the outcome by himself. You will also see this expressed as “one man, one vote”, but this condition can be weaker than that. For example, in a stock holder's meeting, one shareholder can have more votes (shares) than another.
  4. Majority rules. If every voter prefers candidate X over candidate Y, then the electorate prefers X over Y.

This seems to be a reasonable voting system. We can see that if there are two candidates, then we will get either a tie or a winner; let's ignore ties for the rest of this article. The problems start with three candidates and three voters. Here is a look-up table of the preference functions.

 

c1

c2

c3

Voter A

1

2

3

Voter B

2

3

1

Voter C

3

1

2

If you look at the table for a bit, you will see a “scissors, paper stone” relationship in this. There is no clear mandate from the electorate. This is called the Condorcet Paradox, named after the Marquis de Condorcet (1743 – 1794) who first wrote about it. It is easier to see with an election made up of pair-wise ballots.

If we hold an election between candidates c1 and c2, the winner is c1 by two votes from voters A and C. We now face off with c1 and c3 on a second ballot. The winner is c3.

Now let's have another election. But this time the first ballot pairs c1 and c3, with an come for c3. The second ballot pairs the winner, c3 against c2. The winner is c2.

Finally, let's hold a third election. The first ballot pairs c2 and c3 to get a victory for c3. The second ballot pairs the winner, c3 against c1. The winner is c1.

As a practical matter, a voter could have falsely cast a ballot for a weaker candidate in the first ballot so that his favorite would win in the final ballot. If that is not clear, look at the first election that starts with c1 and c2. Voter B falsely cast his first vote for c2, who wins. The second ballot of the election is now between c2 and c3 so the winner is c2, not the original winner c3. This is why we made rule #2 about voting your conscience.

Voting Systems

Let's consider a few different voting systems. We will give examples from a nine member electorate and a four candidate slate.

Plurality Voting

The winner is a simple Plurality Voting system is the candidate that got the most votes. Not a majority, mind you. So if one candidate has a small but loyal voter bloc, the most unpopular candidate will win.

Hare Voting

This is a “last man standing” system in which least popular candidate is removed from the next ballot until there is a winner. The two places this is used in the real world are television reality show where we vote a contestant “off the island” each week and the ballot for the Science Fiction Hugo awards.

Borda Counts

John Claude de Borda was another French mathematician who was interested in voting system. His approach was to have voters assign a number of votes to each candidate and total them. Even if there was no clear winner, the candidate with the best total of votes would be the winner. The weights assigned start at zero for the least liked candidate and increment to (n-1) votes for your favorite.

The names of the voters are not important; we really care about the number of votes that each preference function has in a given election.

Examples

We are looking at blocs of votes. Now look at a four way race with a total of nine votes in play:

 

First

Second

Third

Fourth

3 votes

c1

c4

c2

c3

1 vote

c1

c2

c3

c4

1 vote

c2

c3

c4

c1

1 vote

c2

c3

c1

c4

1 vote

c3

c2

c4

c1

1 vote

c3

c4

c2

c1

1 vote

c4

c3

c2

c1

If you use a plurality vote, sum up the First place column. The winner is Candidate c1 who does not have a majority, but holds a large bloc vote.

 

Vote count

c1

4

c2

2

c3

2

c4

1

The Hare system gives us Candidate c3. Here is how each ballot works:

 

Ballot #1

Ballot #2

Ballot #3

Ballot #4

c1

4

4

4

Loser

c2

2

2

Loser

Loser

c3

2

3

5

9

c4

1

Loser

Loser

Loser

Now let's do a Borda vote on the same electorate. The winner is Candidate c2, 15 points. It is easier to arrange the array a little differently. Here is the vote totals.

 

Bloc #1

Bloc #2

Bloc #3

Bloc #4

Bloc #5

Bloc #6

Bloc #7

Total

c1

3*3

3

0

1

0

0

0

13

c2

3*1

2

3

3

2

1

1

15

c3

3*0

1

2

2

3

3

2

13

c4

3*2

0

1

0

1

2

3

13

If you arrange the pair-wise votes in the sequence c1, c2, c3, c4 the winner is c4

c1

4

c2

5

 

c2

6

c3

3

 

c2

4

c4

5

SQL and Voting Experiments

Now that I have destroyed any hope of having a provably fair voting system, we get to write code. Let's start with tables for voters and for candidates, but not worry too much about the details since their existence is all we need for holding an election.

CREATE TABLE Voters

(voter_id CHAR(3) NOT NULL PRIMARY KEY

     CHECK(voter_id LIKE 'v[0-9][0-9]),

..);

 

INSERT INTO Voters

('v00'), ('v01'), ('v02'), ..

 

CREATE TABLE Candidates

(candidate_id CHAR(3) NOT NULL PRIMARY KEY

     CHECK(candidate_id LIKE 'c[0-9][0-9]),

..);

 

INSERT INTO Candidates ---at least three of them, please

('c00'), ('c01'), ('c02'), ..

The preference function is a relationship, so it gets its own table.

CREATE TABLE Voter_Preferences

(voter_id CHAR(3) NOT NULL

   REFERENCES Voters(voter_id)

    ON DELETE CASCADE,

  candidate_id CHAR(3) NOT NULL

    REFERENCES Candidates(candidate_id)

    ON DELETE CASCADE,

  PRIMARY KEY (voter_id, candidate_id),

  pref_rank SMALLINT NOT NULL

    CHECK(pref_rank > 0));

Now we need queries and views for the various voting methods. It would also be nice to have some data with more than two candidates. Let's try a simple plurality vote to get things started

WITH

Ballotbox (candidate_id, vote_cnt)

AS

  (SELECT candidate_id, COUNT(*)

     FROM Voter_Preferences

    WHERE pref_rank = 1

    GROUP BY candidate_id)

 

   SELECT candidate_id AS winner_candidate_id

     FROM Ballotbox AS B1

    WHERE B1.vote_cnt

         =(SELECT MAX (B2.vote_cnt)

             FROM Ballotbox AS B2);

This will give us ties. How do we want to handle them? Skipping that question, let's write a simple function to do pair-wise ballots.

CREATE PROCEDURE Pairwise_Ballot

(@c1 CHAR(3), @c2 CHAR(3))

AS

;WITH

Single_Voter_Prefs (voter_id, candidate_id, pref_rank)

AS

(SELECT voter_id, candidate_id,

         ROW_NUMBER()

         OVER (PARTITION BY voter_id

                   ORDER BY pref_rank ASC)

         AS pref_rank

    FROM Voter_Preferences

   WHERE candidate_id IN (@c1, @c2)),

Single_Voter_Winners (voter_id, candidate_id)

AS

(SELECT voter_id, candidate_id

    FROM Single_Voter_Prefs

   WHERE pref_rank = 1)

 

SELECT candidate_id, COUNT(*) AS vote_cnt

   FROM Single_Voter_Winners

  GROUP BY candidate_id;

There are other voting systems and you might want to research them. I happen to like the Russian system where you vote against candidates, not for them. Yes, it is possible for any or all candidate to go negative.

I used to favor adding “None of the above”, but then I realized that the government printing office would put this option on the first line of the ballots.

Homework

Now it is time for homework assignments:

  1. Write a Borda voting procedure.
  2. Write a Hare voting procedure.

The best answer gets a copy of the Fourth edition of SQL FOR SMARTIES and a $100 Amazon Voucher. Start your SQL engines, Gentlemen!

The best answer will be given a prize of a $100 Amazon voucher. A month after the challenge is sent out, the judge's decision and comments will be sent out in the newsletter, and published on the site.

Joe Celko will judge the answers to this puzzle. Your answer should :
   1) Solve the problem -- Duh!
   2) Avoid proprietary features in SQL Server that will not port or be good across
        all releases, present and future.
   3) Be clever but not obscure.
   4) Explain how you got your answer.  



This article has been viewed 2062 times.
Joe Celko

Author profile: Joe Celko

Joe Celko is one of the most widely read of all writers about SQL, and was the winner of the DBMS Magazine Reader's Choice Award four consecutive years. He is an independent consultant living in Austin, TX. He has taught SQL in the US, UK, the Nordic countries, South America and Africa.
He served 10 years on ANSI/ISO SQL Standards Committee and contributed to the SQL-89 and SQL-92 Standards.
He has written over 800 columns in the computer trade and academic press, mostly dealing with data and databases. He is the author of eight books on SQL for Morgan-Kaufmann, including the best selling SQL FOR SMARTIES.
Joe is a well-known figure on Newsgroups and Forums, and he is famous for his his dry wit. He is also interested in Science Fiction.

Search for other articles by Joe Celko

Rate this article:   Avg rating: from a total of 10 votes.


Poor

OK

Good

Great

Must read
 
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: You Like Him, You Really Like Him
Posted by: BuggyFunBunny (view profile)
Posted on: Saturday, October 29, 2011 at 1:17 PM
Message: Plurality:
-- So if one candidate has a small but loyal voter bloc, the most unpopular candidate will win.

I don't think so. The winner is the least unpopular; s/he has the most remaining votes after deducting the bloc votes. For the statement to be true; well, I don't see how it could be. Ross Perot should know.

And your systems didn't include two quite common (on the right side of the Pond, anyway):
- runoff, where the two highest (if none wins a majority) are re-voted.
- coalition governance, where blocs form government post election if none has a majority.

Subject: Setup
Posted by: puzsol (view profile)
Posted on: Tuesday, November 01, 2011 at 8:26 PM
Message: Given the Celko setup:


CREATE TABLE Voters
(voter_id CHAR(3) NOT NULL PRIMARY KEY
CHECK(voter_id LIKE 'v[0-9][0-9]'),
);

INSERT INTO Voters
values ('v01'), ('v02'), ('v03'), ('v04'), ('v05'), ('v06'), ('v07'), ('v08'), ('v09')

CREATE TABLE Candidates
(candidate_id CHAR(3) NOT NULL PRIMARY KEY
CHECK(candidate_id LIKE 'c[0-9][0-9]'),
);

INSERT INTO Candidates
values ('c01'), ('c02'), ('c03'), ('c04')

CREATE TABLE Voter_Preferences
(voter_id CHAR(3) NOT NULL
REFERENCES Voters(voter_id)
ON DELETE CASCADE,
candidate_id CHAR(3) NOT NULL
REFERENCES Candidates(candidate_id)
ON DELETE CASCADE,
PRIMARY KEY (voter_id, candidate_id),
pref_rank SMALLINT NOT NULL
CHECK(pref_rank > 0));

insert into Voter_Preferences
values ('v01', 'c01', 1), ('v02', 'c01', 1), ('v03', 'c01', 1)
, ('v01', 'c04', 2), ('v02', 'c04', 2), ('v03', 'c04', 2)
, ('v01', 'c02', 3), ('v02', 'c02', 3), ('v03', 'c02', 3)
, ('v01', 'c03', 4), ('v02', 'c03', 4), ('v03', 'c03', 4)

insert into Voter_Preferences
values ('v04', 'c01', 1), ('v04', 'c02', 2), ('v04', 'c03', 3), ('v04', 'c04', 4)

insert into Voter_Preferences
values ('v05', 'c02', 1), ('v05', 'c03', 2), ('v05', 'c04', 3), ('v05', 'c01', 4)

insert into Voter_Preferences
values ('v06', 'c02', 1), ('v06', 'c03', 2), ('v06', 'c01', 3), ('v06', 'c04', 4)

insert into Voter_Preferences
values ('v07', 'c03', 1), ('v07', 'c02', 2), ('v07', 'c04', 3), ('v07', 'c01', 4)

insert into Voter_Preferences
values ('v08', 'c03', 1), ('v08', 'c04', 2), ('v08', 'c02', 3), ('v08', 'c01', 4)

insert into Voter_Preferences
values ('v09', 'c04', 1), ('v09', 'c03', 2), ('v09', 'c02', 3), ('v09', 'c01', 4)

Subject: Hare solution
Posted by: puzsol (view profile)
Posted on: Tuesday, November 01, 2011 at 8:29 PM
Message: Here is a solution to the Hare method:
declare @hare_preferences table(hare_round smallint, votes smallint, voter_block varchar(20), candidate_id CHAR(3), pref_rank smallint)
declare @rank smallint
select @rank = MAX(pref_rank) from Voter_Preferences

insert into @hare_preferences
select @rank, COUNT(*), vb.block, candidate_id, vp.pref_rank
from Voter_Preferences vp
inner join
(
select v.voter_id
, (
select candidate_id + ''
from Voter_Preferences
where voter_id = v.voter_id
and pref_rank < @rank
order by pref_rank for XML path('')
) as block
from Voters v
) as vb on vb.voter_id = vp.voter_id
where vp.pref_rank < @rank
group by vb.block, candidate_id, vp.pref_rank

loop:

select @rank = @rank - 1

;with next_round as
(
select @rank as hare_round, hpi.votes, hpi.voter_block, hpi.candidate_id, ROW_NUMBER() over(partition by hpi.voter_block order by hpi.pref_rank) as pref_rank
from @hare_preferences hpi
where hpi.hare_round = @rank+1
and hpi.candidate_id <> (select top 1 candidate_id from @hare_preferences where hare_round = @rank+1 and pref_rank = 1 group by candidate_id order by sum(votes))
), new_block as
(
select nr.voter_block
, isnull(
(
select candidate_id + ''
from next_round n
where n.voter_block = nr.voter_block
and n.pref_rank < @rank
order by nr.pref_rank for XML path('')
), '') as block
from next_round nr
where nr.pref_rank = 1
)
insert into @hare_preferences
select @rank, SUM(nr.votes), nb.block, nr.candidate_id, nr.pref_rank
from next_round nr
inner join new_block nb on nb.voter_block = nr.voter_block
where nr.pref_rank < @rank
group by nb.block, nr.candidate_id, nr.pref_rank

if @@ROWCOUNT > 0
goto loop

insert into @hare_preferences
select @rank, SUM(hp.votes), '', (select top 1 candidate_id from @hare_preferences where hare_round = @rank+1 order by votes desc), 1
from @hare_preferences hp where hp.hare_round = @rank+1

select * from @hare_preferences where hare_round = 1

You could make the table var a permanent table and just populate it once, it keeps the results for each round - so you can check the calculations if you like.

The last select statment just gets the final answer, remove the where clause to see the whole table.

I'm sure there is something better that could be done, but at least it's a start.

Subject: Voting in oz
Posted by: puzsol (view profile)
Posted on: Tuesday, November 01, 2011 at 8:45 PM
Message: Down under we use a 'two party preferred' system. where you count each voter's first preference as the primary count, then the two highest candidates are selected and all the other votes are re-distributed to the first two and the totals compared....

Eg primary votes:
c1:4
c2:2
c3:2
c4:1

(I don't know exactly what happens in a tie, but it usually boils down to the two main parties, so we'll select c1 and c2 here or it boils down to the hare method with this data)

resulting in
c1: 4
c2: 5
(as those who voted for c3 or c4 put c2 before c1)
With c2 winning the election (and c1 demanding a recount most likely)

This can also result in the 'wrong' candidate getting in if number 2 and 3 collaborate on preferences, (and together have more voters than number 1) but it works well enough.


Subject: Hare explained
Posted by: puzsol (view profile)
Posted on: Tuesday, November 01, 2011 at 8:56 PM
Message: I basically wanted to group the votes based on the preferences that would count... eg first time round, only the first 3 preferences really matter (if your first 3 preferences aren't used then it's the other guy).

By creating a composite key (eg, c01c02c04) I was able to group like preferences together as a 'block' this is reduced by one preference each round as candidates are eliminated. (eg if c03 OR c04 is eliminated the key above becomes c01c02, next round it might become c02.

In this manner you reduce the voter preference table to a candidate preference order table which reduces fairly fast and in the first round is guaranteed to have less records than the original table. (3/4 or less)

I'm thinking that by modifying the order or operations a bit it could be further streamlined, but I don't have the time to spend on it.

Subject: Borda solution
Posted by: puzsol (view profile)
Posted on: Tuesday, November 01, 2011 at 9:00 PM
Message: select v.candidate_id, SUM(4-v.pref_rank) as total
from Voter_Preferences v
group by v.candidate_id

Am I missing something?

Subject: Hare solution v2
Posted by: puzsol (view profile)
Posted on: Wednesday, November 02, 2011 at 12:37 AM
Message: Of course if you don't like the xml creation of composite keys, and you don't care about the full results each round, you could do something like this instead:


declare @hare_rank table (candidate_id char(3), hare_rank smallint, votes smallint)

declare @c smallint
select @c = COUNT(*) from Candidates

insert into @hare_rank
select top 1 vp.candidate_id, @c, COUNT(*)
from Voter_Preferences vp
where vp.pref_rank = 1
group by vp.candidate_id
order by COUNT(*)

while @@ROWCOUNT > 0
begin
set @c = @c - 1

insert into @hare_rank
select top 1 vpr.candidate_id, @c, COUNT(*)
from (
select vp.candidate_id, ROW_NUMBER() over(partition by vp.voter_id order by vp.pref_rank) as pref_rank
from Voter_Preferences vp
left join @hare_rank hr on hr.candidate_id = vp.candidate_id
where hr.candidate_id is null
) as vpr
where vpr.pref_rank = 1
group by vpr.candidate_id
order by COUNT(*)
end

select * from @hare_rank order by hare_rank

which just records the loser at each stage, and how many votes they had when they were eliminated... simpler, but requires trawling over the whole vote table once per candidate... anyone care to do a performance comparison?

Subject: Have v3
Posted by: puzsol (view profile)
Posted on: Thursday, November 03, 2011 at 6:41 AM
Message: Of course is you add a voter who votes like this:

insert into Voters values ('v10')

insert into Voter_Preferences
values ('v10', 'c02', 1), ('v10', 'c04', 2), ('v10', 'c03', 3), ('v10', 'c01', 4)

Then you get a deadlock in the second round... what to do in this case? Pick a random one? Pick your least favourite and fudge it? Alphabetical order (was there a classic programming error where candidates with double-letters were picked as the winner, or was that a TV show?)... I figure, just eliminate both... hence version 3:


declare @hare_rank table (candidate_id char(3), hare_rank smallint, votes smallint)

declare @c smallint
set @c = (select COUNT(*) from Candidates) - (select COUNT(*) from @hare_rank)

insert into @hare_rank
select cr.candidate_id, @c, cr.Votes
from
(
select vp.candidate_id, COUNT(*) as Votes, RANK() OVER(order by count(*)) as Position
from Voter_Preferences vp
where vp.pref_rank = 1
group by vp.candidate_id
) as cr
where cr.Position = 1

while @@ROWCOUNT > 0
begin
set @c = (select COUNT(*) from Candidates) - (select COUNT(*) from @hare_rank)

insert into @hare_rank
select cpr.candidate_id, @c, cpr.Votes
from (
select vpr.candidate_id, COUNT(*) as Votes, RANK() OVER(order by count(*)) as Position
from (
select vp.candidate_id, ROW_NUMBER() over(partition by vp.voter_id order by vp.pref_rank) as pref_rank
from Voter_Preferences vp
left join @hare_rank hr on hr.candidate_id = vp.candidate_id
where hr.candidate_id is null
) as vpr
where vpr.pref_rank = 1
group by vpr.candidate_id
) as cpr
where cpr.Position = 1
end

select * from @hare_rank order by hare_rank

Subject: Hare v4
Posted by: puzsol (view profile)
Posted on: Thursday, November 03, 2011 at 4:09 PM
Message: Then I pointed out to myself that the first statment is just a special case of the loop statement... Always striving for minimal SQL I figured to remove it... so here is v4:


declare @hare_rank table (candidate_id char(3), hare_rank smallint, votes smallint)
declare @c smallint

loop:
set @c = (select COUNT(*) from Candidates) - (select COUNT(*) from @hare_rank)

insert into @hare_rank
select cpr.candidate_id, @c, cpr.Votes
from (
select vpr.candidate_id, COUNT(*) as Votes, RANK() OVER(order by count(*)) as Position
from (
select vp.candidate_id, ROW_NUMBER() over(partition by vp.voter_id order by vp.pref_rank) as pref_rank
from Voter_Preferences vp
left join @hare_rank hr on hr.candidate_id = vp.candidate_id
where hr.candidate_id is null
) as vpr
where vpr.pref_rank = 1
group by vpr.candidate_id
) as cpr
where cpr.Position = 1
if @@ROWCOUNT > 0 goto loop

select * from @hare_rank order by hare_rank

 










Phil Factor
Automated Script-generation with Powershell and SMO
 In the first of a series of articles on automating the process of building, modifying and copying SQL Server... Read more...



 View the blog
SQL Source Control: The Development Story
 Often, there is a huge difference between software being easy to use, and easy to develop. When your... Read more...

How to Import Data from HTML pages
 It turns out that there are plenty of ways to get data into SQL Server from websites, whether the data... Read more...

SQL Scripts Manager: An Appreciation
 SQL Scripts Manager is Simple-Talk's present to its readers. William Brewer was an enthusiastic... Read more...

Hosted Team Foundation Server 2010 Review
 Team Foundation Server (TFS) has expanded its remit to support the whole software development process,... Read more...

The Parodist: A SQL Server Application
 Every year, we ask Phil Factor to celebrate the holiday season with an article on SQL Server... Read more...

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 Database design and implementation is the cornerstone of any data centric project (read 99.9% of... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Beginning SQL Server 2005 Reporting Services Part 2
 Continuing his in-depth tour of SQL Server 2005 Reporting Services, Steve Joubert demonstrates the most... Read more...

Creating CSV Files Using BCP and Stored Procedures
 Nigel Rivett demonstrates some core techniques for extracting SQL Server data into CSV files, focussing... Read more...

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.

Join Simple Talk