It is awkward to do 'Graph databases' in SQL to explore the sort of relationships and memberships in social networks because equivalence relations are classes (a set of sets) rather than sets. However one can explore graphs in SQL if the relationship has all three of the mathematical properties needed for an equivalence relationship.
Equivalence Classes & Cliques
We keep harping that SQL is based on sets, but how many of us ever go back and reread any old text books
on set theory? In particular, one concept gets forgotten is equivalence relations on sets, which create equivalence
classes. These classes are disjoint and we can put an element from a set into one of them with some kind of rule.
Let's use ~ as the notation for an equivalence relation. The
definition is that it has three properties:
 The relation is reflexive: A ~ A. This means the relation
applies to itself.
 The relation is symmetric:
: if (A ~ B) ⇔ (B ~ A).
This means when the relation applies to two different elements in the set, it applies both ways.
 The relation is transitive: (A ~ B) ∧
(B ~ C)
⇒ (A ~ C).
This means we can deduce all elements in a class by applying the relation to any known members of the class.
This is pretty important for programming, as we will see.
So, if this is such a basic set operation, why isn't it in SQL? The problem is that it produces
classes, not a set! A class is a set of sets, but a table is just a set. The notation for each class is a pair of
square bracket that contain some representative of each set. Assume we have

a
Î A

b
Î
B
These expressions are all the same:
 A ~ B
 [a] = [b]
 [a] Ç [b] ≠ Æ
A common example is the
set Z of integers
and any MOD() operator will give you
equivalence classes.
The MOD(n,
2) operation gives you one class consisting of all even numbers, and the other consisting of all odd numbers. The nice
part is that you can compute the MOD() function with arithmetic.
An equivalence classes can also be defined by a set of sets. This is an important concept in
graph theory and graph database. If you have not seen it, Google “Six
Degrees of Kevin Bacon”. The idea is that
any individual involved in the Hollywood
film industry can be linked through his or her film roles
to Kevin Bacon within six steps.
Let's steal a sample set from a SQL Forum posting of friends, where ~ now means “is a friend of” and that
we can use “foaf”  “Friend of a friend”  to build cliches.
 {Fred, Jessica, John, Mary}
 {Albert, Nancy, Peter}
 {Abby}
 {Frank, Joe}
A complete graph means that each node is connected to every other node by one edge. If a subgraph is
complete, it is actually called a Clique in graph theory!
Obviously, if you have a clique with (k) nodes in it, you can a complete subgraph of any size (j <k). A
maximal clique is a clique that is not a subset of any other clique (some authors reserve the term clique for maximal
clique.
SQL seems to have two equivalence relations that are major parts of the language. The first is plain
old vanilla equal (=) for scalar values in the base data types. SQL is just like any other programming language, but we
also have NULLs to consider. Oops! We all know that (NULL = NULL) is not true. So we have to exclude NULLs or work
around them.
The other “almost” equivalence relation is GROUP BY in which the class is the grouping columns. A quick
example would be “SELECT city_name, state_code FROM Customers GROUP BY state_code;” and this is still not quite right
because we have to do some kind of aggregation on the nongrouping columns in the query.
Graphs in SQL
The idiom for graphs in SQL is to use a table with the nodes involved and a second table with pairs of
nodes (a, b) to model the edges, which model our ~ relation.. This is classic RDBMS; the first table is entities (e.g.
cities on a map, electronic components in a device, etc) and the second table is a relationship (e.g. roads between
cities, wiring between components, etc).
The bad news is that when you use it for equivalence relations, it can get big. A class with one member is
has one row to show the edge back to itself. A class with two members {a, b} has {(a, a), (b, b), (a, b), (b, a)}
to
show the ~ properties as edges. Three members gives us six rows; { (a, a), (b, b), (c, c), (a, b), (a, c), (b, a), (b,
c), (c, a), (c, b)}. The general formula is (n * (n1)) + n, which is pretty close to
(n!).
First, load the table a few facts we might already know:
CREATE TABLE Friends
(lft_member VARCHAR(20) NOT NULL,
rgt_member
VARCHAR(20) NOT NULL,
PRIMARY
KEY (lft_member, rgt_member))
INSERT INTO Friends (lft_member,
rgt_member)
VALUES
('John', 'Mary'),
('Mary', 'Jessica'),
('Peter', 'Albert'),
('Peter', 'Nancy'),
('Abby', 'Abby'),
('Jessica', 'Fred'),
('Joe', 'Frank');
Reflexive Rows
The first thing we noticed is that only Abby has a reflexive property row. We need to add those rows and
can do this with basic set operators. But it is not that easy, as you can see with this insertion statement:
INSERT INTO Friends
SELECT
X.lft_member, X.rgt_member
FROM
((SELECT
lft_member, lft_member FROM Friends AS F1
UNION
SELECT
rgt_member, rgt_member FROM Friends AS F2)
EXCEPT
SELECT
lft_member, rgt_member FROM Friends AS F3)
AS X
(lft_member, rgt_member);
The use of table aliases is tricky. You have to be sure that the SQL engine will construct a table in such a way that
you do not get scoping problems. The UNION and EXCEPT operators are used to assure that we do not have primary key
violations.
Abby 
Abby 
Albert 
Albert 
Frank 
Frank 
Fred 
Fred 
Jessica 
Jessica 
Jessica 
Fred 
Joe 
Joe 
Joe 
Frank 
John 
Mary 
John 
John 
Mary 
Mary 
Mary 
Jessica 
Nancy 
Nancy 
Peter 
Peter 
Peter 
Nancy 
Peter 
Albert 
Symmetric Rows
Look at the update table and there are no { (a, b), (b, a)}
pairs. We need to add this second property to the relation table. This follows the pattern of set operators we just
used:
INSERT INTO Friends
SELECT
X.lft_member, X.rgt_member
FROM
(
(SELECT
F1.rgt_member, F1.lft_member
FROM Friends AS F1
WHERE F1.lft_member <> F1.rgt_member)
EXCEPT
(SELECT
F2.lft_member, F2.rgt_member FROM Friends AS F2)
) AS X
(lft_member, rgt_member);
This will give us:
Abby 
Abby 
Albert 
Peter 
Albert 
Albert 
Frank 
Joe 
Frank 
Frank 
Fred 
Jessica 
Fred 
Fred 
Jessica 
Mary 
Jessica 
Jessica 
Jessica 
Fred 
Joe 
Joe 
Joe 
Frank 
John 
Mary 
John 
John 
Mary 
Mary 
Mary 
John 
Mary 
Jessica 
Nancy 
Peter 
Nancy 
Nancy 
Peter 
Peter 
Peter 
Nancy 
Peter 
Albert 
Transitive Rows
Transitivity goes on forever. Well, until we have what
mathematicians call a closure. This means we have gotten a set of elements that have all of the valid relationships. In
some cases, these sets are infinite. But in database, we can only have insanely large tables.
Let's pull a subset that is part oi a clique of 4 friends.
Fred 
Jessica 
Fred 
Fred 
Jessica 
Mary 
Jessica 
Jessica 
Jessica 
Fred 
John 
Mary 
John 
John 
Mary 
Mary 
Mary 
John 
Mary 
Jessica 
Now apply the transitive relation to get some of the missing edges of a graph:
INSERT
INTO Friends
SELECT
X.lft_member, X.rgt_member
FROM
((SELECT
F1.lft_member, F2.rgt_member
FROM Friends AS F1, Friends AS F2
WHERE F1.rgt_member = F2.lft_member)
EXCEPT
(SELECT
F3.lft_member, F3.rgt_member FROM Friends AS F3)
) AS X
(lft_member, rgt_member);
This is a simple statement using the definition of transitivity, without a universal quantifier or loop on it. It will
add these rows to this subset:
Fred 
Mary 
Jessica 
John 
John 
Jessica 
Mary 
Fred 
When you look at Mary and Jessica, you
have all of their friends. However, Fred and John do not know that they are friends.
So we invoke the statement again and get those two rows. If you try it a third time, there are zero rows added.
The first thought is this sounds like a job for a recursive statement. But it is not that easy. If the original graph
has a cycle in it, you can hang in an infinite loop if you try to use a recursive CTE. The assumption is that each
clique has a spanning graph in the pairs in. Oops! New term: a spanning graph is a subgraph
that includes the nodes and some or all of the edges of the graph. A complete graph is the spanning graph has all the possible edges, so
each node is directly connected to any other node.
Cliques
Now change your mindset from graphs to sets. Let's look at the size of the cliques:
WITH X
(member, clique_size)
AS
(SELECT
lft_member, COUNT(*) FROM Friends GROUP BY lft_member)
SELECT * FROM X ORDER BY clique_size;
Abby 
1 
Frank 
2 
Joe 
2 
Nancy 
3 
Peter 
3 
Albert 
3 
Fred 
4 
Jessica 
4 
John 
4 
Mary 
4 
What we want to do is to
assign a number to each clique. This sample data is biased by the fact that the cliques are all
different sizes. You cannot simply use the size to assign a clique_nbr.
Create this table load it with the names.
CREATE
TABLE Cliques
(clique_member VARCHAR(20) NOT NULL PRIMARY KEY,
clique_nbr INTEGER);
We can start by assigning everyone their own clique, using whatever your favorite method is:
INSERT
INTO Cliques
VALUES
('Abby',
1),
('Frank',
2),
('Joe',
3),
('Nancy',
4),
('Peter',
5),
('Albert', 6),
('Fred',
7),
('Jessica', 8),
('John',
9),
('Mary',
10);
Everyone is equally a member of their clique in this model. That means we could start with anyone to get the rest of
their clique. The update is very straight forward. Take any clique number, find the member to whom it belongs and use
that name to build that clique. But which number to we use? We could use the
MIN, MAX or a random number in the clique; I will use the MAX for no particular reason
The worst situation would be a bunch of hermits, so the number of clique would be the cardinality of set. That is not
likely or useful in the real world. Let's put the update in the body of a loop.
BEGIN
DECLARE
@clique_loop INTEGER;
DECLARE
@mem CHAR(10);
SET
@clique_loop = (SELECT MAX(clique_nbr) FROM Cliques);
WHILE
@clique_loop > 0
BEGIN
SET @mem
= (SELECT clique_member FROM Cliques WHERE clique_nbr = @clique_loop);
UPDATE
Cliques
SET clique_nbr
= (SELECT clique_nbr
FROM Cliques
WHERE clique_member =
@mem)
WHERE clique_member
IN (SELECT rgt_member
FROM Friends
WHERE
lft_member = @mem);
SET
@clique_loop = @clique_loop  1;
SELECT *
FROM Cliques;
END;
END;
As a programming assignment, replace the simple counting loop with one that does not do updates with
clique_loop values
that were removed in the prior iteration. This can save a lot of work in a larger social network than the sample data
used here.
Adding Members
CREATE
PROCEDURE Clique_Merge
@clique_member_1 CHAR(10),
@clique_member_2 CHAR(10)
AS
UPDATE
Cliques
SET clique_nbr
=(SELECT MAX(clique_nbr)
FROM Cliques
WHERE clique_member
IN
(@clique_member_1, @clique_member_2))
WHERE clique_nbr
=(SELECT MIN (clique_nbr)
FROM Cliques
WHERE clique_member
IN (@clique_member_1, @clique_member_2));
Deleting a member or moving him to another clique is trivial.
Conclusion
This article is full of handy tricks to use with SQL. But cliques and equivalence classes are better handled with a
graph database than with SQL. This is no surprise; that is what they were meant to do! The graph database can also
handle relationships that do not have all three of the mathematical properties needed for an equivalence relationship.
If any readers want to add to this set of tricks, please do.