Matrix Math in SQL

Relational Databases have tables as data structures, not arrays. This makes it tricky and slow to do matrix operations, but it doesn't mean it is impossible to do. Joe gives the Celko Slant on how to go about doing Matrix Math in SQL.

I’ve just finished teaching a math class at a local school at night, and we had a section on matrix math and determinants. A matrix is not quite the same thing as an array. Matrices are mathematical structures with particular properties that we cannot take the time to discuss here. You can find that information in a college freshman algebra book. This is similar to subtle differences between “hierarchies and trees”.

Whereas an array is merely a data structure who elements are accessed by a numeric value called an index, a matrix is an array with mathematical operations defined on it. A matrix can be one, two, three or more dimensional structures. The most common mathematical convention has be to use the letters i, j and k for the subscripts.

SQL had neither arrays or matrices because the only data structure is a table. Just as there is no obvious way to model trees and hierarchies in SQL, we need ways to model multi-dimensional arrays in SQL.

Arrays via Named Columns

An array in procedural programming languages has a name, and subscripts by which the array elements are referenced. The array elements are all of the same data type and the subscripts are all sequential integers. Some languages start subscripts at zero, some start at one, and some let the user set the upper and lower bounds of each subscript. Algol 60 even allowed arrays to be declared with dynamic subscripts using expressions in the declarations when program control enters a block; the array is deallocated on exciting the block. Most languages are not that complex. For example, a Pascal array declaration would look like this:

Array foobar has five elements foobar[1], foobar[2], foobar[3], foobar[4], and foobar[5]. This is most often mapped into an SQL declaration as

The elements cannot be accessed by the use of a subscript in this table as they can in a true array. That is, to set the array elements equal to zero in Pascal takes one statement with a FOR loop in it:

FOR i := 1 TO 5 DO foobar[i] := 0;

The same action in SQL would be performed with the statement ….

… because there is no subscript which can be iterated in a loop. Any access has to be based on column names and not on subscripts. These “pseudo-subscripts” lead to building column names on the fly in dynamic SQL, giving code that is both slow and dangerous. In short this is the wrong way to do this. Try again.

Arrays via Subscript Columns

Another approach of faking a multi-dimensional array is to map arrays into a table with an integer column for each subscript, thus:

This looks more complex than the first approach, but it is closer to what the original Pascal declaration was doing behind the scenes. Subscripts resolve to unique physical addresses, so it is not possible to have two values for foobar[i]; hence, i is a key. The Pascal compiler will check to see that the subscripts are within the declared range; hence the CHECK() clause.

The first advantage of this approach is that multidimensional arrays are easily handled by adding another column for each subscript. The Pascal declaration …

… is mapped over to …

If the original 'one element_value/one column' approach were used, the table declaration would have 120 columns, named “element_value_111” through “element_value_345“. This would be too many names to handle in any reasonable way.

But this is still not right. The first problem is that this allows holes in the array. We might actually want this when we have a lot of missing values and want to save storage. But most of the time, we need to have the all the cells of the array modeled in the table.

I have assumed you know the idiom …

… which holds sequence of integer from 1 to some value (n). You will see this same thing called “Tally”, “Numbers” or other names.

Matrix Operations in SQL

Though it is possible to do many matrix operations in SQL, it is not a good idea, because such queries and operations will eat up resources and run much too long. SQL was never meant to be a language for calculations. Having said that, here are some tricks for you to do.

The presence of NULLs is not defined in linear algebra and I have no desire to invent a three-valued linear algebra of my own. Another problem is that a matrix has rows and columns that are not the same as the rows and columns of an SQL table; as you read the rest of this article, be careful not to confuse the two. Let’s try some basic operations.

If you want to have flashbacks to college math, then look at the matrix calculator at http://www.bluebit.gr/matrix-calculator/

Matrix Equality

This test for matrix equality is from the article “SQL Matrix Processing” by Mrdalj, Vujovic, and Jovanovic in 1996 July issue of DATABASE PROGRAMING & DESIGN. Two matrices are equal if their cardinalities and the cardinality of the their intersection are all equal.

You have to decide how to use this query in your context. If it returns one number, they are the same and otherwise they are different.

This was written before we had the rest of the basic set operations:

There are other ways to write this.

Matrix Addition

Matrix addition and subtraction are possible only between matrices of the same dimensions. The obvious way to do the addition is simply:

But properly, you ought to add some checking to be sure the matrices match.

All of the checking for subscripts was done for you under the covers in procedural languages. But SQL has to do it explicitly. Ugh!

Matrix Multiplication

Multiplication by a scalar constant is direct and easy:

Matrix multiplication is not as big a mess as might be expected. The first matrix must have the same number of rows as the second matrix has columns. Matrix multiplication is defined as A[i, k] * B[k, j] = C[i, j]. This illustration from Wikipedia explains it in an intuitive way.

1558-LostCelko.gif

Notice that even with square matrices, A*B and B*A might not be the same. Let’s look at an example of matrix multiplication

This will give us:

Matrix Transpose

The transpose of a matrix swaps the rows and columns. It is easy to do:

Multiplication by a column or row vector is just a special case of matrix multiplication, but a bit easier. Given the vector V and MatrixA:

The mathematical notation is a superscript T after the matrix name. The only interesting rules are that (AB)T = BTAT and (AT)T = A. But the multiplication commutes: (AB)C = A(BC).   

Square Matrices

Matrices with the same number of rows as they have columns have some nice properties, so we seek them out. The n by n Identity matrix has the property that A*I = I*A = A. The identity matrix has ones in its main diagonal and zeroes everywhere else.

You could put this in a VIEW, but the advantage of a materialized base table is that the Main_Diagonal constraint is available for the optimizer. Multiplication by one and zero is really easy to optimize.

The inverse of a square matrix, shown by a superscript -1, has the property that AA-1 = A-1A = I, but A-1 might not exist at all.

Summary

It is possible to do fancy matrix operations in SQL, but the code becomes so complex, and the execution time so long, that it is simply not worth the effort. However, multiplication and addition can be very handy, specially if the original data is kept in a matrix.

If a reader would like to submit queries for eigenvalues, inverses and determinants, I will be happy to put them in future editions of one of my books. But I do not want to use it in my real work.

Tags: , ,

  • 49785 views

  • Rate
    [Total: 50    Average: 4.5/5]
  • Celko

    Wikipedia diagram is missing
    The Wikipedia diagram is missing.

    http://en.wikipedia.org/wiki/File:Matrix_multiplication_diagram_2.svg

  • Celko

    Determinants anyone?
    I consider determinants in SQL to be a waste of time. But it is a damn good puzzle! Anyoen want to try it? Look up Lewis Carroll’s method; it might be easier than the usual approach.

    There is a good story about "Dodgson’s method of Determinants". Queen Victoria saw a theater production of ALICE IN WONDERLAND and made a royal request that he dedicate his next book to her. He did; this is it 🙂

  • Celko

    Determinants anyone?
    I consider determinants in SQL to be a waste of time. But it is a damn good puzzle! Anyoen want to try it? Look up Lewis Carroll’s method; it might be easier than the usual approach.

    There is a good story about "Dodgson’s method of Determinants". Queen Victoria saw a theater production of ALICE IN WONDERLAND and made a royal request that he dedicate his next book to her. He did; this is it 🙂

  • Rockstar

    Nice!
    Thanks Joe, this reminded me of a project I worked on many years ago in grad school. I needed to build a matrix in order to simulate the population density of a globular cluster. I had often thought about trying to create a matrix using t-sql.

    I have also wondered about the ability to pass a matrix as a parameter in a stored procedure. I know t-sql isn’t the language for such things, but the thought was always intriguing!

    Best,

    Tom

  • Robert young

    The Third Way
    I don’t have the time now to fully demonstrate, but since you’re scheduled to give the keynote at the PostgreSQL conference (one announcement says it’s about, "…what we will do next in the world of databases.") how about "extending" to the world of PG + R + PL/R? R has all the linear algebra you’ll need, and PL/R provides the I/O betwixt PG and R. Might look good on the CV, too.

    Further, since PL/R just rides on top of the PG support for C as embedded language, any engine which supports C (DB2 comes to mind) could do the same. If MS gets around to it, SQL Server could too.

    IBM now has SPSS, and other engines are adopting R, more or less. Might stir things up a bit.

  • Jesse

    no more than a curiosity
    clearly SQL is not the place to be doing real matrix algebra.

    I’d add a subtitle to this article to make it more clear: "Matrix Math in SQL: don’t do it"

  • Anonymous

    Matrix
    I would use a Matrix CLR type. If I have time I will implement the type and the possible operations.

  • Anonymous

    Associates, not commutes
    (AB)C = A(BC) does not mean "the multiplication commutes" – it’s the associative property.

    The multiplication does not commute, and the author acknowledged that by saying A*B does not necessarily equal B*A

  • Anonymous

    Pedant.
    How does one excite a block? Block! Look! Cookies!

  • TurboBeagle

    Inverse
    Anyone have thoughts on how to derive an inverse (if one exists) in SQL?

  • Robert young

    Inverse
    That’s why Celko offered up calcing the determinant: it has to be non-singular, and is the ‘denominator’ in the inverse calc.

    http://en.wikipedia.org/wiki/Determinant

    Near the bottom: http://www.sosmath.com/matrix/inverse/inverse.html

    is the calc of the inverse, requiring the determinant. I’d still just as soon pass the data to R.

  • karthik

    Matrix Usage
    Joe,

    How this technique is used in real time projects?
    I never experienced such a requirement for the last 6 years. My humble request is, from your next article, you can also mention how that particulr things will be used in real time project? without this technique what are all the performance bottle neck will come? like that mention all the points. It would be really benifit for all the readers.

  • Brian from Chicago

    Determinants, Inverse Matrices, and Multivariate Regressions
    I have just finished prototyping a "SQL Regression" that involved calculating determinants and inverting matrices. It is a fairly extensive body of work, and I am planning to blog about it. But I would really like to present my solution to Mr. Celko first. Does anyone know how I should go about contacting him?

    brianchristopherbrown@hotmail.com

  • Reason

    After 12537 views
    Nobody has noticed that the result of the Matrix multiplication is matrix B, not matrix C?

    A is 2X3
    B is 3X3
    so C = AXB should be 2X3.

  • mmj

    VectorV in transpose?
    Hi Joe,

    In the matrix transpose – where is the vectorv defined as? or did you mean TransA?

    thanks
    mmj