Click here to monitor SSC
Dwain Camps

Calculating the Median Value within a Partitioned Set Using T-SQL

17 December 2013

It is ironic that one of the most essential of statistical aggregations, the median, has been so difficult in the past to calculate efficiently  in SQL. Although the recent window functions provide the solution, there isn't an obviously superior algorithm performance-wise, particularly when working across partitioned sets.  Dwain Camps sets the candidates to work and identifies the winners and losers.

Let’s start by defining what the statistical median is. From the Wiki page for the median it is described as follows:

“The median is the numerical value separating the higher half of a data sample, a population, or a probability distribution, from the lower half.”

The median is an important measure to be able to calculate for data, because it underlies the ways we calculate probability. It is difficult to draw valid conclusions from data without being able to calculate the median. In some statistical analyses, the median is a better predictor of the expected value for a set of observations than the mean (average). To calculate the median, in the simplest terms, if you have an odd number (n) of items in an ordered set, the median is the value where (n+1)/2 is the row containing the median value. If there is an even number of rows, the value is the average (mean) of the two values appearing in the n/2 and the 1+n/2 rows. All divisions in this description are integer divisions and standard operator precedence applies. When a set is partitioned it means that one or more columns, when aggregated together, form a group or partition.

I’m not sure exactly what got me to thinking about calculating the median in SQL, but it was long ago and I realized almost immediately that this could be quite a challenging assignment. SQL didn’t, at the time, have available any built-in functions for the purpose and I did find quite a few competing solutions out there but no clarity on the performance of the various methods.

When I eventually got back to thinking about median, it was mainly because I found that SQL 2012 now offers an analytic function called PERCENTILE_CONT that is capable of calculating the median within a partitioned set when it is passed an argument of 0.5 and the ordering within that set is specified.

I decided that I’d try to find the best solution for calculating a median and see how PERCENTILE_CONT compared with other approaches. My initial investigation led me to a page on the StackOverflow web site that listed quite a few competing solutions. Close examination of them identified several that really didn’t work worth a hoot; some even delivering incorrect results, but more importantly it led me to Aaron Bertrand’s article published in August of 2012 entitled: What is the fastest way to calculate the median? Much to my chagrin, he had pretty well covered the topic of performance.

However, as Adam Machanic pointed out in a response to Aaron Bertrand’s article, it would be nice to see which solutions proved fastest across partitioned sets, and that was my intent all along!

Setting up Some Test Data and my Testing Methodology

The SQL script below will create and populate a table for testing that each proposed solution returns the correct results set for the median within each partition (ID). We’ve added the non-unique index to include our IDs and values (N) because we expect most solutions will benefit from it. We’ll be testing our solutions with and without this index.

CREATE TABLE #MedianValues
(
   ID INT
   ,N INT
);
CREATE INDEX i1 ON #MedianValues (ID, N);

INSERT INTO #MedianValues
VALUES (1,1),(1,2),(1,3),(1,4),(1,5)
   ,(2,1),(2,2),(2,3),(2,4)
   ,(3,10),(3,2),(3,10),(3,4)
   ,(4,1),(4,5),(4,1),(4,3),(4,3);

The data contains four sets: two with an even number of rows and two with an odd number of rows, each with and without duplicates. We’ve chosen this simple data because it is pretty easy to see by inspection what the correct value for the median is within each partition. It is important to include duplicates because a few of the solutions I examined were failing for that case.

ID    Ordered Sets         Median
1     {1, 2, 3, 4, 5}      3
2     {1, 2, 3, 4}         2.5
3     {2, 4, 10, 10}       7
4     {1, 1, 3, 3, 5}      3

My testing methodology is to create or adapt a solution from the non-partitioned sets case, check using the test data above to make sure it generates the correct median value for each set, and then run that code through a 1,000,000 row test harness (described later) to see how it performs vs. a baseline case. To save our readers time, we’ve included only solutions here that did reasonably well in the 1M row test harness and returned correct results.

Solution #1: From Aaron Bertrand’s Article

If we take a look at Aaron Bertrand’s article linked above, we see from the performance results that there were two solutions that seemed to provide the best performance in his test cases. These were labelled as 2005_C and 2012_B. In the code below, I have adapted the 2005_C solution to the partitioned sets case and we’ll use this as the baseline for our initial performance test.

SELECT ID, Median = AVG(1.0 * N)
FROM
(
   SELECT ID, o.N, rn=ROW_NUMBER() OVER (PARTITION BY ID ORDER BY o.N), c.c
   FROM #MedianValues AS o
   CROSS APPLY (SELECT c=COUNT(*) FROM #MedianValues b WHERE o.ID=b.ID) AS c
) AS x
WHERE rn IN ((c + 1)/2, (c + 2)/2)
GROUP BY ID;

Note that Aaron Bertrand attributes this solution to SQL MVP Itzik Ben-Gan, but from here on out we’ll refer to it as “Aaron Bertrand’s solution” to give credit to him for reporting on its performance.

Try as I might, I was unable to adapt case 2012_B (a paging approach also suggested by Mr. Itzik Ben-Gan) to partitioned sets. Perhaps someone more familiar with using SQL 2012 paging in that way might have more success. A pity too as it was the fastest of all the solutions he tested.

Solution #2: Using the SQL 2012 PERCENTILE_CONT Analytical Function

I indicated at the beginning that I was interested to see how various solutions compared performance-wise against the new SQL 2012 analytical function PERCENTILE_CONT. In his article, Aaron Bertrand showed quite clearly that it didn’t perform nearly as well as Solution #1. But just to be sure, and for comparative purposes we coded up this solution against our test data.

SELECT ID, Median=MAX(Median)
FROM
(
   SELECT ID, N
      ,Median=PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY N) OVER (PARTITION BY ID)
   FROM #MedianValues
) a
GROUP BY ID;

Note that PERCENTILE_CONT is designed to include the partitions’ median value in each row of the subject set, so in order to condense that to just the partitions we need to have a GROUP BY. Clearly the solution is pretty succinct and once you get used to the new syntax, reasonably easy to understand.

Using our testing methodology, we ran this solution in our 1M row test harness against the baseline (Solution #1) and tabularized those results with and without the INDEX.

Solution

With INDEX

Without INDEX

CPU(ms)

Elapsed(ms)

CPU(ms)

Elapsed(ms)

#1: Baseline

2573

2894

2883

3172

#2: SQL 2012

6163

5540

13432

3947

#2 % O/(U) #1

140%

91%

366%

24%

The last row of the table (#2 % O/(U) #1) is calculated as 100*(#2 - #1)/#1, making it easy to observe the magnitude of the difference between the solutions. Positive numbers mean #2 is slower than #1. The best case (elapsed time without the INDEX) showed that PERCENTILE_CONT was 24% slower than Solution #1. This verifies the results that Aaron Bertrand published in his article.

Solution #3: Let’s Try Something New

When I write an article I try to make sure I can contribute something new. Given the plethora of solutions out there for the problem of calculating median, I knew I’d need to use something unusual otherwise anything I come up with would either have already been suggested or be derivative.

I fell back upon a relatively obscure technique that became available in SQL 2008 for doing un-pivot, using CROSS APPLY with a Table Row Constructor (also known as the CROSS APPLY VALUES approach to UNPIVOT). The idea came to me because I needed to develop the counts across each partition and then calculate the rows to retrieve from that. In some cases (odd number of rows) there was one row of interest, while in the other (even number of rows) there were two rows of interest. Since both are calculated from the count of rows, we’ll use this approach to un-pivot them. Here is my solution to this problem:

WITH Counts AS
(
   SELECT ID, c
   FROM
   (
      SELECT ID, c1=(c+1)/2, c2=CASE c%2 WHEN 0 THEN 1+c/2 ELSE 0 END
      FROM
      (
      SELECT ID, c=COUNT(*)
      FROM #MedianValues
      GROUP BY ID
      ) a
   ) a
   CROSS APPLY (VALUES(c1),(c2)) b(c)
)
SELECT a.ID, Median=AVG(0.+b.N)
FROM
(
   SELECT ID, N
      ,rn=ROW_NUMBER() OVER (PARTITION BY ID ORDER BY N)
   FROM #MedianValues a
) a
CROSS APPLY
(
   SELECT N
   FROM Counts b
   WHERE a.ID = b.ID AND a.rn = b.c
) b
GROUP BY a.ID;

In the Counts Common Table Expression (CTE), our inner derived query calculates the counts across each partition. From that we calculate c1 (the row number needed for the odd rows case) and c2 when needed for the additional row we need for the even numbered rows case. Finally, we use CROSS APPLY VALUES on c1, c2 to un-pivot them to separate rows, thus creating a table we can CROSS APPLY back to our original table on the partition elements and the calculated row numbers.

While it looks complicated, a quick check of the results against the sample data shows that the correct results are generated, and a look at the performance against 1M rows of test data looks promising against our baseline query.

Solution #4: Another Solution From Aaron Bertrand’s Article

In the same response to Aaron Bertrand’s article where he suggested checking cases for calculating medians against partitioned sets, Adam Machanic suggested another solution. I have done my best to adapt the solution he proposed as directly as possible to the partitioned sets case. Since this was not one of the solutions Aaron Bertrand tested, I was curious how it might perform for this case.

WITH Counts AS
(
   SELECT ID, c=COUNT(*)
   FROM #MedianValues
   GROUP BY ID
)
SELECT a.ID, Median=AVG(0.+N)
FROM Counts a
CROSS APPLY
(
   SELECT TOP(((a.c - 1) / 2) + (1 + (1 - a.c % 2)))
      N, r=ROW_NUMBER() OVER (ORDER BY N)
   FROM #MedianValues b
   WHERE a.ID = b.ID
   ORDER BY N
) p
WHERE r BETWEEN ((a.c - 1) / 2) + 1 AND (((a.c - 1) / 2) + (1 + (1 - a.c % 2)))
GROUP BY a.ID;

The results from this query are correct for our four sets case and the performance results from our test with 1M rows look extremely promising.

The Performance Test Harness

In order to create a 1,000,000 row test harness, we’ll make use of our old and trusted friend the Tally table. In this case, we’ll be CROSS JOINing two sys tables with a ROW_NUMBER() to get 1M rows, but any other Tally table you prefer will do.

DECLARE @N INT = 10000; -- 10,000 = ~1,000,000 rows
WITH Tally (n) AS
(
   SELECT TOP (@N) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
   FROM sys.all_columns a CROSS JOIN sys.all_columns b
)
INSERT INTO #MedianValues
SELECT a.n, ABS(CHECKSUM(NEWID()))%@N
-- Always create exactly 100 IDs from the Tally table
FROM (SELECT TOP 100 n FROM Tally) a
-- Generate either an odd or even number of rows within IDs
CROSS APPLY
(
   SELECT TOP (@N-a.n%2) n
   FROM Tally
)b;

Note that our test harness only generates “approximately” one million rows. Within the CROSS APPLY, there is a TOP statement that ensures that for successive IDs, we’ll include either an odd or an even number of rows within that partition. This helps to ensure that any solution that is biased either towards even or odd numbers of rows, will be handled fairly. So for each million rows we’re actually about 50 rows short, but that’s close enough for our purposes.

Performance Test Results

Before graphing the performance results, we decided to eliminate those from PERCENTILE_CONT, because at larger row counts that solution tended to obscure the differences between the other results. In all cases, solutions #1, #3 and #4 were faster in both elapsed time and CPU time.

The first graphic compares the elapsed times from 1M to 4M rows, with and without the INDEX. Solutions are numbered as above with author’s initials included (AB=Aaron Bertrand, DC=Dwain Camps and AM=Adam Machanic). I will take this opportunity to apologize to Mr. Bertrand, Mr. Machanic and Mr. Ben-Gan if I’ve adapted their solutions in a different way than they would have preferred.

We see that with an INDEX, Adam Machanic’s solution is significantly faster than the others, while without an INDEX my solution manages a slight lead.

CPU times tell a similar story to elapsed times, although mine with an index doesn’t trail in CPU time quite as much as it did in elapsed time.

All of our performance tests were done in SQL 2008 R2 (64 bit) running on Windows 7 Professional (64 bit) using a Dell Inspiron laptop w-Intel Core i5-2430M @ 2.4 GHz, 8GB RAM.

SQL Profiler was used to capture the results of five runs for each row set and Excel was used to calculate from this an average for elapsed and CPU times.

Conclusion

We have explored some of the available T-SQL solutions for calculating the statistical median in this article. We included the built-in SQL 2012 function in our initial 1M row test, PERCENTILE_CONT but we eliminated it from our final graphic results because it generally does not appear to be a “high performance” solution, as Aaron Bertrand previously showed for the case of sets without partitions. Even though the syntax is new this solution does seem rather simple to understand, so in the rare cases where understandability is more important than performance, it still offers some merit.

All solutions provided focus specifically on data that contained partitioned sets.

We’ve seen that in the worst case for the solutions we explored in depth, four million data rows partitioned into 100 sets could be processed in about 12 seconds or less, which in my opinion is not too bad. We found that in the case of data that is indexed, Mr. Machanic’s solution was able to rip through that same four million rows in about 0.5 second, which was pretty remarkable.

If you have a favorite method to calculate the median that we didn’t cover, please feel free to add it to one of the test harnesses described below.

  • Basic Test Harness for Median.sql – A basic test harness that allows testing against known sets of 1M rows of test data.
  • Profiler Test Harness for Median.sql – Designed for use with SQL Profiler (to run each batch five times), this test harness suppresses the output results from each query.

If you find that it is a contender in either elapsed or CPU time, we’d love to hear about it in the comments section below.

Dwain Camps

Author profile:

Dwain Camps has been a project manager for many years. Because performance of applications can be a critical success factor for projects, he has been evangelizing on the need to develop highly performing SQL. By mentoring and authoring articles on SQL, he hopes to train a future generation of software engineers on the right and wrong ways to deliver SQL code. He also has a special interest in developing solutions to complex, data intensive problems using high performance SQL because the declarative nature of SQL allows development of algorithmically unique solutions that procedural languages may not be capable of.

Search for other articles by Dwain Camps

Rate this article:   Avg rating: from a total of 11 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Do you have the time to test this?
Posted by: Peso (view profile)
Posted on: Thursday, December 19, 2013 at 4:53 PM
Message: SELECT d.ID,
AVG(1E * w.n)
FROM (
SELECT ID,
COUNT(*) AS y
FROM #MedianValues
GROUP BY ID
) AS d
CROSS APPLY (
SELECT z.n
FROM #MedianValues AS z
WHERE z.ID = d.ID
ORDER BY z.n
OFFSET (d.y - 1) / 2 ROWS
FETCH NEXT 2 - d.y % 2 ROWS ONLY
) AS w
GROUP BY d.ID;

Subject: Little less CPU
Posted by: Peso (view profile)
Posted on: Thursday, December 19, 2013 at 5:04 PM
Message: SELECT d.ID,
w.Value
FROM (
SELECT ID,
COUNT(*) AS y
FROM #MedianValues
GROUP BY ID
) AS d
CROSS APPLY (
SELECT AVG(0E + n)
FROM (
SELECT z.n
FROM #MedianValues AS z
WHERE z.ID = d.ID
ORDER BY z.n
OFFSET (d.y - 1) / 2 ROWS
FETCH NEXT 2 - d.y % 2 ROWS ONLY
) AS f
) AS w(Value);

Subject: Response to PESO
Posted by: Dwain.C (view profile)
Posted on: Thursday, December 19, 2013 at 9:31 PM
Message: PESO! How gratifying to see gurus like you wasting your precious time reading my articles! :) I am particularly happy that you had time to look into Mr. Ben-Gan's solution on SQL 2012 that I couldn't get to run.

I do have time so here we go using my 1M row sniff test (results returned to the SSMS pane) with INDEX:

Solution #1: From Aaron Bertrand's MEDIAN
SQL Server Execution Times:
CPU time = 3040 ms, elapsed time = 2119 ms.

Solution #3: Dwain's MEDIAN
SQL Server Execution Times:
CPU time = 842 ms, elapsed time = 864 ms.

Solution #4: From Adam Machanic's MEDIAN
SQL Server Execution Times:
CPU time = 203 ms, elapsed time = 211 ms.

Solution #5: #1 From PESO (based on IBG FETCH)
SQL Server Execution Times:
CPU time = 171 ms, elapsed time = 175 ms.

Solution #6: #2 From PESO (FETCH improved)
SQL Server Execution Times:
CPU time = 156 ms, elapsed time = 148 ms.

Without the INDEX:

Solution #1: From Aaron Bertrand's MEDIAN
SQL Server Execution Times:
CPU time = 2980 ms, elapsed time = 2113 ms.

Solution #3: Dwain's MEDIAN
SQL Server Execution Times:
CPU time = 4696 ms, elapsed time = 1905 ms.

Solution #4: From Adam Machanic's MEDIAN
SQL Server Execution Times:
CPU time = 3586 ms, elapsed time = 2975 ms.

Solution #5: #1 From PESO (based on IBG FETCH)
SQL Server Execution Times:
CPU time = 4057 ms, elapsed time = 3303 ms.

Solution #6: #2 From PESO (FETCH improved)
SQL Server Execution Times:
CPU time = 3495 ms, elapsed time = 3305 ms.

I also gave it a try at 4M rows to see how it scales.

With INDEX (4M rows):

Solution #1: From Aaron Bertrand's MEDIAN
SQL Server Execution Times:
CPU time = 11105 ms, elapsed time = 10341 ms.

Solution #3: Dwain's MEDIAN
SQL Server Execution Times:
CPU time = 3323 ms, elapsed time = 3328 ms.

Solution #4: From Adam Machanic's MEDIAN
SQL Server Execution Times:
CPU time = 1592 ms, elapsed time = 450 ms.

Solution #5: #1 From PESO (based on IBG FETCH)
SQL Server Execution Times:
CPU time = 1264 ms, elapsed time = 343 ms.

Solution #6: #2 From PESO (FETCH improved)
SQL Server Execution Times:
CPU time = 1372 ms, elapsed time = 390 ms.

Without INDEX (4M rows):

Solution #1: From Aaron Bertrand's MEDIAN
SQL Server Execution Times:
CPU time = 11731 ms, elapsed time = 10923 ms.

Solution #3: Dwain's MEDIAN
SQL Server Execution Times:
CPU time = 16692 ms, elapsed time = 9467 ms.

Solution #4: From Adam Machanic's MEDIAN
SQL Server Execution Times:
CPU time = 16084 ms, elapsed time = 12720 ms.

Solution #5: #1 From PESO (based on IBG FETCH)
SQL Server Execution Times:
CPU time = 16550 ms, elapsed time = 13571 ms.

Solution #6: #2 From PESO (FETCH improved)
SQL Server Execution Times:
CPU time = 16101 ms, elapsed time = 13141 ms.

Clearly it scales well too!

Subject: RE: Little less CPU
Posted by: Dwain.C (view profile)
Posted on: Thursday, December 19, 2013 at 9:38 PM
Message: I forgot to mention, that second solution you posted is now the winning entry in the with INDEX category for elapsed time @1M rows.

Don't have time to run the full profiler testing (to see how it averages out) at the moment.

Subject: Sql Clr Median
Posted by: Claude H (not signed in)
Posted on: Monday, December 23, 2013 at 6:30 AM
Message: Hi,

Good article, thanks for the info.

This got me wondering how well my sqlclr Median function would perform compared to all these methods.

SqlClr:
CPU time = 2091 ms, elapsed time = 3465 ms

#1:
CPU time = 4420 ms, elapsed time = 375 ms

#2:
CPU time = 16676 ms, elapsed time = 13096 ms

#3:
CPU time = 1903 ms, elapsed time = 3241 ms

#4:
CPU time = 530 ms, elapsed time = 537 ms


The sqlclr one is less of an eyesore to look at:
SELECT ID, Median(N)
FROM #MedianValues
GROUP BY ID;

but no match for Solution #4 for performance :)

Subject: A bit of geek history
Posted by: Joe Celko (not signed in)
Posted on: Monday, December 23, 2013 at 9:45 AM
Message: The Median problem has an interesting history. Back in the early 1990's, Chris Date and I had dueling magazine columns in the two newsstand database magazines of the day, DBMS and DATABASE PROGRAMMING & DESIGN. They wound up being owned by the same publisher, so we had to do something to assure that both magazines sold. A “feud” is a great way to do this – ever hear of the Jack Benny and Fred Allen Feud  on old time radio?

For several months, he would propose a median and I would counter with another. The various solutions were discussed in editions of my SQL FOR SMARTIES years later.

They all stink today and Ia m pullign them from the 5th edition. Hey, we had to use self-joins and other crude tricks because of the lack of features! Perhaps the worst one in the bunch was to UNION the table to itself to get a non-table with an even number of rows. That was Chris and not me. I had 3-way self joins! :(


Subject: RE: Sql Clr Median
Posted by: Dwain.C (view profile)
Posted on: Monday, December 23, 2013 at 6:26 PM
Message: Hey Claude,

Thanks for posting. Surprising the CLR didn't fare so well. Take a look at what PESO provided and tell me that doesn't rock!

Subject: RE: A bit of geek history
Posted by: Dwain.C (view profile)
Posted on: Monday, December 23, 2013 at 6:29 PM
Message: Hey Joe,

Good to hear from you again and Merry Christmas!

I like the history lesson. I know this problem goes back a long ways. Maybe not as far back as Jack Benny and Fred Allen (never heard of him but I have heard of Jack), but far enough.

I hope the solutions here give some food for thought if you replace the oldies in your book! Don't forget to spell my name right :)

Subject: Medians - an alternative approach.
Posted by: dataminor (view profile)
Posted on: Sunday, January 19, 2014 at 4:31 AM
Message: Great article as usual Dwain. Here's a viewpoint from left field if performance is important. It requires re-evaluating the misunderstood concept of "accuracy".

Now that we have datasets comprising millions and millions of rows and hugely powerful computers to trawl through all those rows in mere seconds, it's easy to lose sight of what we might be wanting to achieve. What are we going to do with the median once we have it? How accurate does it need to be, to be useful? Before we had the luxury of the processing power we have now, statisticians were able to show that representative sampling gave us usefully "accurate" results at a fraction of the cost of considering every observation in the population. If the median price of a widget is $103.10 and random sampling gives us a median widget price of between $102.20 and $104.05, 99% of the time, then it's important to know whether that variation makes any difference to the decisions being made on the data.

It would be interesting to see whether a random sample of values (say n= 4,000) drawn from the population of 1,000,000 would produce a wildly different median, and whether the overhead of the random sampling [say by abs(checksum(newid()))] would be compensated for by the faster calculation of the median.

I would have thought that you could only defend the "accuracy" of a million row median, if you could vouch for the accuracy of the underlying observations. In a lot of circumstances you simply can't. Unless you can, you're probably just as well off with a 4,000 row randomly sampled median. It would be an added bonus if turned out to require less processing.

Cheers


Subject: Medians and column stores.
Posted by: Dave.Poole (view profile)
Posted on: Wednesday, January 22, 2014 at 1:21 AM
Message: I'm wondering if a MEDIAN function will be added to SQL Server now that it contains column store indexes.

The arrangement of values within a column store makes an efficient median calculation as a built in function much more viable.

Subject: Quartile values to go with the medians
Posted by: keith.fearnley (view profile)
Posted on: Wednesday, January 22, 2014 at 3:29 AM
Message: Thanks for a great article (and to those following up, especially Peso, nice one).
I recently needed medians along with MIN, MAX, COUNT and quartile values for producing data for whisker plots. I wondered how the addition of these extra outputs would affect the choice of approach here.
I had to produce a hurried output and I'm not happy with what I did (excuses are available, of course) and it would be great to have a good quality general solution.

Subject: RE: Medians - an alternative approach.
Posted by: Dwain.C (view profile)
Posted on: Monday, February 03, 2014 at 10:28 PM
Message: Unfortunately, I not a statistician so I can't really comment on how much accuracy is enough. You may have a point though. Medians are usually much more important in small sample sets.

Subject: RE: Medians - an alternative approach.
Posted by: Dwain.C (view profile)
Posted on: Monday, February 03, 2014 at 10:29 PM
Message: Unfortunately, I'm not a statistician so I can't really comment on how much accuracy is enough. You may have a point though. Medians are usually much more important in small sample sets.

Subject: RE: Medians and column stores.
Posted by: Dwain.C (view profile)
Posted on: Monday, February 03, 2014 at 10:30 PM
Message: I'm guessing that since they already have PERCENTILE_CONT, they probably won't add MEDIAN. But who knows (except Microsoft)?

Subject: RE: Quartile values to go with the medians
Posted by: Dwain.C (view profile)
Posted on: Monday, February 03, 2014 at 10:31 PM
Message: I am curious exactly what that Quartile calculation was. Did you use PERCENTILE_CONT(0.25)?

Subject: Extension of the Median
Posted by: Dwain.C (view profile)
Posted on: Tuesday, February 11, 2014 at 6:40 PM
Message: Aaron Bertrand has just published an excellent follow on piece regarding calculating the Median using SQL:

http://www.sqlperformance.com/2014/02/t-sql-queries/grouped-median

Called "Best approaches for grouped median"

Don't miss it!

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

Searching for Strings in SQL Server Databases
 Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Highway to Database Recovery
 Discover the best backup and recovery articles on Simple-Talk, all in one place. Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.