5

Background

I have time series data on a monthly basis and I would like to sum values for each ID, grouped by month and then have the month names as columns rather than as rows.

Example

+----+------------+-------+-------+
| id | extra_info | month | value |
+----+------------+-------+-------+
| 1  | abc        | jan   | 10    |
| 1  | abc        | feb   | 20    |
| 2  | def        | jan   | 10    |
| 2  | def        | feb   | 5     |
| 1  | abc        | jan   | 15    |
| 3  | ghi        | mar   | 15    |

Desired Result

+----+------------+-----+-----+-----+
| id | extra_info | jan | feb | mar |
+----+------------+-----+-----+-----+
| 1  | abc        | 25  | 20  | 0   |
| 2  | def        | 10  | 5   | 0   |
| 3  | ghi        | 0   | 0   | 15  |

Current Approach

I can easily group by month, summing the values. Which gets me to:

-----------------------------------
| id | extra_info | month | value |
+----+------------+-------+-------+
| 1  | abc        | jan   | 25    |
| 1  | abc        | feb   | 20    |
| 2  | def        | jan   | 10    |
| 2  | def        | feb   | 5     |
| 3  | ghi        | mar   | 15    |

But I now need those months as column names. Not sure where to go from here.

Additional Information

  • In terms of language, this query is to be run in postgres.
  • The months above are just examples, obviously the real data set is much larger and covers all 12 months across thousands of IDs

Any ideas from an SQL guru very much appreciated!

Erwin Brandstetter
  • 539,169
  • 125
  • 977
  • 1,137
Pete Hamilton
  • 7,402
  • 6
  • 32
  • 58

2 Answers2

9

You can use an aggregate function with a CASE expression to turn the rows into columns:

select id,
  extra_info,
  sum(case when month = 'jan' then value else 0 end) jan,
  sum(case when month = 'feb' then value else 0 end) feb,
  sum(case when month = 'mar' then value else 0 end) mar,
  sum(case when month = 'apr' then value else 0 end) apr,
  sum(case when month = 'may' then value else 0 end) may,
  sum(case when month = 'jun' then value else 0 end) jun,
  sum(case when month = 'jul' then value else 0 end) jul,
  sum(case when month = 'aug' then value else 0 end) aug,
  sum(case when month = 'sep' then value else 0 end) sep,
  sum(case when month = 'oct' then value else 0 end) oct,
  sum(case when month = 'nov' then value else 0 end) nov,
  sum(case when month = 'dec' then value else 0 end) "dec"
from yt
group by id, extra_info

See SQL Fiddle with Demo

Taryn
  • 234,956
  • 54
  • 359
  • 399
  • 1
    Thanks! I vaguely remember seeing something like this before, wish my brain was more reliable! Is there some way of doing it without manually handling the cases? Works great for months but I'm wondering if there's a more generic form which could be applied to categories with 100s of entries? – Pete Hamilton May 16 '13 at 21:59
  • @PeterHamilton Yes, you can use the `crosstab` function. Here is a great answer from another user -- http://stackoverflow.com/questions/15506199/dynamic-alternative-to-pivot-with-case-and-group-by/15514334#15514334 – Taryn May 16 '13 at 22:04
  • 1
    Exactly what i needed! Had a version that was using sub querys for each month and would take hours to build verse seconds using this. – Ominus May 01 '17 at 17:03
2

tablefunc module

I would use crosstab() for this. Install the additional module tablefunc if you don't have already:

CREATE EXTENSION tablefunc

Basics here:
PostgreSQL Crosstab Query

How to deal with extra columns:
Pivot on Multiple Columns using Tablefunc

Advanced usage:
Dynamic alternative to pivot with CASE and GROUP BY

Setup

CREATE TEMP TABLE tbl
   (id int, extra_info varchar(3), month date, value int);
   
INSERT INTO tbl (id, extra_info, month, value)
VALUES
   (1, 'abc', '2012-01-01', 10),
   (1, 'abc', '2012-02-01', 20),
   (2, 'def', '2012-01-01', 10),
   (2, 'def', '2012-02-01', 5),
   (1, 'abc', '2012-01-01', 15),
   (3, 'ghi', '2012-03-01', 15);

I am using an actual date in the base table, since I am assuming are just hiding that in a effort to simplify your question. But with just month names, there would be nothing to ORDER BY.

Query

SELECT * FROM crosstab(
     $$SELECT id, extra_info, to_char(month, 'mon'), sum(value) AS value
       FROM   tbl
       GROUP  BY 1,2,month
       ORDER  BY 1,2,month$$

    ,$$VALUES
      ('jan'::text), ('feb'), ('mar'), ('apr'), ('may'), ('jun')
    , ('jul'),       ('aug'), ('sep'), ('oct'), ('nov'), ('dec')$$
   )
AS ct (id  int, extra text
   , jan int, feb int, mar int, apr int, may int, jun int
   , jul int, aug int, sep int, oct int, nov int, dec int);

Result:

 id | extra | jan | feb | mar | apr | may | jun | jul | aug | sep | oct | nov | dec
----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----
  1 | abc   |  25 |  20 |     |     |     |     |     |     |     |     |     |
  2 | def   |  10 |   5 |     |     |     |     |     |     |     |     |     |
  3 | ghi   |     |     |  15 |     |     |     |     |     |     |     |     |

Installing the tablefunc module requires some overhead and some learning, but the resulting queries are much faster and shorter and more versatile.

Community
  • 1
  • 1
Erwin Brandstetter
  • 539,169
  • 125
  • 977
  • 1,137