SQL Select Statement For Calculating A Running Average Column

Question

I am trying to have a running average column in the SELECT statement based on a column from the n previous rows in the same SELECT statement. The average I need is based on the n previous rows in the resultset.

Let me explain

Id        Number       Average
 1             1          NULL
 2             3          NULL
 3             2          NULL
 4             4             2 <----- Average of (1, 3, 2),Numbers from previous 3 rows
 5             6             3 <----- Average of (3, 2, 4),Numbers from previous 3 rows
 .             .             .
 .             .             .

The first 3 rows of the Average column are null because there are no previous rows. The row 4 in the Average column shows the average of the Number column from the previous 3 rows.

I need some help trying to construct a SQL Select statement that will do this.

Im thinking this is one of those really rare cases where cursors are going to be fastest... just keep the last 3 rows in vars ... — Sam Saffron, May 26 '09 at 15:36
@sambo99 - Actually, there are many ways to do this in a set-based fashion, and for any large set of data they will usually be faster than using cursor — Tom H, May 26 '09 at 15:39
Any set based solution will have to do self joins, if you want to run through the data only once i suspect a cursor is the only way. — Sam Saffron, May 26 '09 at 15:48
A cursor may not be a good option here because the "average over previous n rows" need. Here, n could be any number. — HYP, May 26 '09 at 15:51

score 11 · Accepted Answer · answered May 26 '09 at 15:54

This should do it:

--Test Data
CREATE TABLE    RowsToAverage
    (
    ID int NOT NULL,
    Number int NOT NULL
    )

INSERT  RowsToAverage(ID, Number)
SELECT  1, 1
UNION ALL
SELECT  2, 3
UNION ALL
SELECT  3, 2
UNION ALL
SELECT  4, 4
UNION ALL
SELECT  5, 6
UNION ALL
SELECT  6, 8
UNION ALL
SELECT  7, 10

--The query
;WITH   NumberedRows
AS
(
SELECT  rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM    RowsToAverage rta
)

SELECT  nr.ID, nr.Number,
        CASE
            WHEN nr.RowNumber <=3 THEN NULL
            ELSE (  SELECT  avg(Number) 
                    FROM    NumberedRows 
                    WHERE   RowNumber < nr.RowNumber
                    AND     RowNumber >= nr.RowNumber - 3
                )
        END AS MovingAverage
FROM    NumberedRows nr

Very elegant solution. With 9,000 rows, it takes about 45 seconds on my dev server. Is there any way to use this technique more efficiently. — Yes - that Jake., Sep 06 '11 at 20:40

ichiban · Answer 2 · 2009-05-26T17:05:39.777

8

Assuming that the Id column is sequential, here's a simplified query for a table named "MyTable":

SELECT 
    b.Id,
    b.Number,
    (
      SELECT 
       AVG(a.Number) 
      FROM 
       MyTable a 
     WHERE 
       a.id >= (b.Id - 3) 
       AND a.id < b.Id
       AND b.Id > 3 
     ) as Average
FROM 
    MyTable b;

edited May 26 '09 at 17:05

answered May 26 '09 at 16:51

ichiban

6,163
3
26
34

This can also work if no rows were deleted in the table. I accepted Aaron Alton's solution because of the row_number() OVER (ORDER BY rta.ID ASC) works for all cases. – HYP May 26 '09 at 17:27

score 2 · Answer 3 · answered May 26 '09 at 20:04

A simple self join would seem to perform much better than a row referencing subquery

Generate 10k rows of test data:

drop table test10k
create table test10k (Id int, Number int, constraint test10k_cpk primary key clustered (id))

;WITH digits AS (
    SELECT 0 as Number
    UNION SELECT 1
    UNION SELECT 2
    UNION SELECT 3
    UNION SELECT 4
    UNION SELECT 5
    UNION SELECT 6
    UNION SELECT 7
    UNION SELECT 8
    UNION SELECT 9
)
,numbers as (
    SELECT 
        (thousands.Number * 1000) 
        + (hundreds.Number * 100) 
        + (tens.Number * 10) 
        + ones.Number AS Number
    FROM digits AS ones 
    CROSS JOIN digits AS tens
    CROSS JOIN digits AS hundreds
    CROSS JOIN digits AS thousands
)
insert test10k (Id, Number)
select Number, Number
from numbers

I would pull the special case of the first 3 rows out of the main query, you can UNION ALL those back in if you really want it in the row set. Self join query:

;WITH   NumberedRows
AS
(
    SELECT  rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
    FROM    test10k rta
)

SELECT  nr.ID, nr.Number,
    avg(trailing.Number) as MovingAverage
FROM    NumberedRows nr
    join NumberedRows as trailing on trailing.RowNumber between nr.RowNumber-3 and nr.RowNumber-1
where nr.Number > 3
group by nr.id, nr.Number

On my machine this takes about 10 seconds, the subquery approach that Aaron Alton demonstrated takes about 45 seconds (after I changed it to reflect my test source table) :

;WITH   NumberedRows
AS
(
    SELECT  rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
    FROM    test10k rta
)
SELECT  nr.ID, nr.Number,
    CASE
            WHEN nr.RowNumber <=3 THEN NULL
            ELSE (  SELECT  avg(Number) 
                            FROM    NumberedRows 
                            WHERE   RowNumber < nr.RowNumber
                            AND             RowNumber >= nr.RowNumber - 3
                    )
    END AS MovingAverage
FROM    NumberedRows nr

If you do a SET STATISTICS PROFILE ON, you can see the self join has 10k executes on the table spool. The subquery has 10k executes on the filter, aggregate, and other steps.

score 1 · Answer 4 · answered May 26 '09 at 15:55

1

Edit: I missed the point that it should average the three previous records...

For a general running average, I think something like this would work:

SELECT
    id, number, 
    SUM(number) OVER (ORDER BY ID) / 
       ROW_NUMBER() OVER (ORDER BY ID) AS [RunningAverage]
FROM myTable
ORDER BY ID

answered May 26 '09 at 15:55

richardtallent

33,506
13
81
119

when using @Aaron Alton's RowsToAverage table (I changed FROM MyTable to FROM RowsToAverage), I get an error: Msg 102, Level 15, State 1, Line 3 Incorrect syntax near 'order'. – KM. May 26 '09 at 16:10
Which RDBMS are you using? Windowing functions are only available in SQL 2005 and greater. – Aaron Alton May 26 '09 at 16:58
I should add that the OP mentioned they're using SQL 2008. – Aaron Alton May 26 '09 at 16:58

score 0 · Answer 5 · edited May 23 '17 at 11:46

0

Check out some solutions here. I'm sure that you could adapt one of them easily enough.

edited May 23 '17 at 11:46

Community

1
1

answered May 26 '09 at 15:38

Tom H

45,807
14
84
124

1

Whilst this may theoretically answer the question, [it would be preferable](http://meta.stackexchange.com/q/8259) to include the essential parts of the answer here, and provide the link for reference. – Sklivvz Feb 03 '14 at 23:41

score 0 · Answer 6 · answered May 26 '09 at 16:06

If you want this to be truly performant, and arn't afraid to dig into a seldom-used area of SQL Server, you should look into writing a custom aggregate function. SQL Server 2005 and 2008 brought CLR integration to the table, including the ability to write user aggregate functions. A custom running total aggregate would be the most efficient way to calculate a running average like this, by far.

score 0 · Answer 7 · answered May 26 '09 at 18:43

Alternatively you can denormalize and store precalculated running values. Described here:

http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx

Performance of selects is as fast as it goes. Of course, modifications are slower.

SQL Select Statement For Calculating A Running Average Column

7 Answers7

Linked