7

I am trying to have a running average column in the SELECT statement based on a column from the n previous rows in the same SELECT statement. The average I need is based on the n previous rows in the resultset.

Let me explain

Id        Number       Average
 1             1          NULL
 2             3          NULL
 3             2          NULL
 4             4             2 <----- Average of (1, 3, 2),Numbers from previous 3 rows
 5             6             3 <----- Average of (3, 2, 4),Numbers from previous 3 rows
 .             .             .
 .             .             .

The first 3 rows of the Average column are null because there are no previous rows. The row 4 in the Average column shows the average of the Number column from the previous 3 rows.

I need some help trying to construct a SQL Select statement that will do this.

Jose Basilio
  • 49,569
  • 12
  • 117
  • 116
HYP
  • 87
  • 1
  • 2
  • 8
  • What kind of SQL database are you using? – Sam Saffron May 26 '09 at 15:33
  • Im thinking this is one of those really rare cases where cursors are going to be fastest... just keep the last 3 rows in vars ... – Sam Saffron May 26 '09 at 15:36
  • @sambo99 - Actually, there are many ways to do this in a set-based fashion, and for any large set of data they will usually be faster than using cursor – Tom H May 26 '09 at 15:39
  • Any set based solution will have to do self joins, if you want to run through the data only once i suspect a cursor is the only way. – Sam Saffron May 26 '09 at 15:48
  • A cursor may not be a good option here because the "average over previous n rows" need. Here, n could be any number. – HYP May 26 '09 at 15:51
  • true ... you could use a table var for the window though .. – Sam Saffron May 26 '09 at 16:11

7 Answers7

11

This should do it:

--Test Data
CREATE TABLE    RowsToAverage
    (
    ID int NOT NULL,
    Number int NOT NULL
    )

INSERT  RowsToAverage(ID, Number)
SELECT  1, 1
UNION ALL
SELECT  2, 3
UNION ALL
SELECT  3, 2
UNION ALL
SELECT  4, 4
UNION ALL
SELECT  5, 6
UNION ALL
SELECT  6, 8
UNION ALL
SELECT  7, 10

--The query
;WITH   NumberedRows
AS
(
SELECT  rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM    RowsToAverage rta
)

SELECT  nr.ID, nr.Number,
        CASE
            WHEN nr.RowNumber <=3 THEN NULL
            ELSE (  SELECT  avg(Number) 
                    FROM    NumberedRows 
                    WHERE   RowNumber < nr.RowNumber
                    AND     RowNumber >= nr.RowNumber - 3
                )
        END AS MovingAverage
FROM    NumberedRows nr
Aaron Alton
  • 22,110
  • 6
  • 33
  • 32
  • Very elegant solution. With 9,000 rows, it takes about 45 seconds on my dev server. Is there any way to use this technique more efficiently. – Yes - that Jake. Sep 06 '11 at 20:40
8

Assuming that the Id column is sequential, here's a simplified query for a table named "MyTable":

SELECT 
    b.Id,
    b.Number,
    (
      SELECT 
       AVG(a.Number) 
      FROM 
       MyTable a 
     WHERE 
       a.id >= (b.Id - 3) 
       AND a.id < b.Id
       AND b.Id > 3 
     ) as Average
FROM 
    MyTable b;
ichiban
  • 6,163
  • 3
  • 26
  • 34
  • This can also work if no rows were deleted in the table. I accepted Aaron Alton's solution because of the row_number() OVER (ORDER BY rta.ID ASC) works for all cases. – HYP May 26 '09 at 17:27
2

A simple self join would seem to perform much better than a row referencing subquery

Generate 10k rows of test data:

drop table test10k
create table test10k (Id int, Number int, constraint test10k_cpk primary key clustered (id))

;WITH digits AS (
    SELECT 0 as Number
    UNION SELECT 1
    UNION SELECT 2
    UNION SELECT 3
    UNION SELECT 4
    UNION SELECT 5
    UNION SELECT 6
    UNION SELECT 7
    UNION SELECT 8
    UNION SELECT 9
)
,numbers as (
    SELECT 
        (thousands.Number * 1000) 
        + (hundreds.Number * 100) 
        + (tens.Number * 10) 
        + ones.Number AS Number
    FROM digits AS ones 
    CROSS JOIN digits AS tens
    CROSS JOIN digits AS hundreds
    CROSS JOIN digits AS thousands
)
insert test10k (Id, Number)
select Number, Number
from numbers 

I would pull the special case of the first 3 rows out of the main query, you can UNION ALL those back in if you really want it in the row set. Self join query:

;WITH   NumberedRows
AS
(
    SELECT  rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
    FROM    test10k rta
)

SELECT  nr.ID, nr.Number,
    avg(trailing.Number) as MovingAverage
FROM    NumberedRows nr
    join NumberedRows as trailing on trailing.RowNumber between nr.RowNumber-3 and nr.RowNumber-1
where nr.Number > 3
group by nr.id, nr.Number

On my machine this takes about 10 seconds, the subquery approach that Aaron Alton demonstrated takes about 45 seconds (after I changed it to reflect my test source table) :

;WITH   NumberedRows
AS
(
    SELECT  rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
    FROM    test10k rta
)
SELECT  nr.ID, nr.Number,
    CASE
            WHEN nr.RowNumber <=3 THEN NULL
            ELSE (  SELECT  avg(Number) 
                            FROM    NumberedRows 
                            WHERE   RowNumber < nr.RowNumber
                            AND             RowNumber >= nr.RowNumber - 3
                    )
    END AS MovingAverage
FROM    NumberedRows nr

If you do a SET STATISTICS PROFILE ON, you can see the self join has 10k executes on the table spool. The subquery has 10k executes on the filter, aggregate, and other steps.

ahains
  • 1,872
  • 12
  • 10
1

Edit: I missed the point that it should average the three previous records...

For a general running average, I think something like this would work:

SELECT
    id, number, 
    SUM(number) OVER (ORDER BY ID) / 
       ROW_NUMBER() OVER (ORDER BY ID) AS [RunningAverage]
FROM myTable
ORDER BY ID
richardtallent
  • 33,506
  • 13
  • 81
  • 119
  • when using @Aaron Alton's RowsToAverage table (I changed FROM MyTable to FROM RowsToAverage), I get an error: Msg 102, Level 15, State 1, Line 3 Incorrect syntax near 'order'. – KM. May 26 '09 at 16:10
  • Which RDBMS are you using? Windowing functions are only available in SQL 2005 and greater. – Aaron Alton May 26 '09 at 16:58
  • I should add that the OP mentioned they're using SQL 2008. – Aaron Alton May 26 '09 at 16:58
0

Check out some solutions here. I'm sure that you could adapt one of them easily enough.

Community
  • 1
  • 1
Tom H
  • 45,807
  • 14
  • 84
  • 124
  • 1
    Whilst this may theoretically answer the question, [it would be preferable](http://meta.stackexchange.com/q/8259) to include the essential parts of the answer here, and provide the link for reference. – Sklivvz Feb 03 '14 at 23:41
0

If you want this to be truly performant, and arn't afraid to dig into a seldom-used area of SQL Server, you should look into writing a custom aggregate function. SQL Server 2005 and 2008 brought CLR integration to the table, including the ability to write user aggregate functions. A custom running total aggregate would be the most efficient way to calculate a running average like this, by far.

jrista
  • 31,580
  • 14
  • 88
  • 128
0

Alternatively you can denormalize and store precalculated running values. Described here:

http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx

Performance of selects is as fast as it goes. Of course, modifications are slower.

A-K
  • 16,510
  • 7
  • 52
  • 71