Statistics in SQL: Simple Linear Regressions

Although linear regressions can get complicated, most jobs involving the plotting of a trendline are easy. Simple Linear Regression is handy for the SQL Programmer in making a prediction of a linear trend and giving a figure for the level probability for the prediction, and what is more, they are easy to do with the aggregation that is built into SQL.

Let’s imagine that we have two variables, X and Y which we then plot using scatter graph.

C:\Users\andrew.clarke\SkyDrive\Documents\scatterPlot.gif

It looks a bit like someone has fired a shotgun at a wall but is there a relationship between the two variables? If so, what is it? There seems to be a weak positive linear relationship between the two variables here so we can be fairly confident of plotting a trendline.

Here is the data, and we will proceed to calculate the slope and intercept. We will also calculate the correlation.

It gives the result…

Alpha was calculated by this expression, …

… Beta was calculated thus.

… and Rho by this

The Pearson’s Product Moment correlation can be used to calculate the probability of this being a significant correlation via the Fisher Transformation.

The slope (beta) and intercept (alpha) can be used to calculate any point on the trendline.

We can, for example calculate the value of Y when X is 100 by the equation
y’ = a + bx

Is, in our example

We can calculate the trendline X,Y points by slightly altering the SQL

We can then plot a trend line

C:\Users\andrew.clarke\SkyDrive\Documents\scatterPlot.gif

We are, of course, assuming a linear relationship and parametric data. With Pearson’s Rho and the number of X-Y pairs, we can go on to estimate the fisher transformation (t= 1.912) and probability of 0.061825.

The PowerShell to do the Gnuplot graphs is here

 

See also the previous in this series

Tags:

  • 2351 views

  • Rate
    [Total: 4    Average: 5/5]