Archive for January 18th, 2010
From Stack Overflow:
I have this query:WITH CTE AS ( SELECT e_id, scale, ROW_NUMBER() OVER(PARTITION BY e_id ORDER BY scale ASC) AS rn, COUNT(scale) OVER(PARTITION BY e_id) AS cn FROM ScoreMaster WHERE scale IS NOT NULL ) SELECT e_id, AVG(scale) AS [AVG], STDEV(scale) AS [StdDev], AVG(CASE WHEN 2 * rn - cn BETWEEN 0 AND 2 THEN scale END) AS [FinancialMedian] MAX(CASE WHEN 2 * rn - cn BETWEEN 0 AND 2 THEN scale END) AS [StatisticalMedian] FROM CTE GROUP BY e_id
How do I add Mode to this query?
A quick reminder: in statistics, mode is the value that occurs most frequently in a data set.
In other words, for each
e_id, mode is the (exact) value of
scale shared by most records with this
Unlike other statistical parameters used in this query, mode is not guaranteed to have a single value. If, say, 10 records have
scale = 1 and 10 other records have
scale = 2 (and all other values of
scale are shared by less than 10 records), then there are two modes in this set (and the set, hence, is called bimodal). Likewise, there can be trimodal, quadrimodal or, generally speaking, multimodal sets.
This means that we should define a way on how to choose this mode.
There can be three approaches to this:
- Return every modal value
- Return a single modal value
- Return an aggregate of all modal values
To check all queries, we will generate a simple trimodal dataset:
Read the rest of this entry »