1

I have an example table (year) as below in PostgreSQL 9.5:

Name | 2010 | 2011 | 2012
-------------------------
A    |   10 |      |   40
B    |   10 |   20 |   30 

Now, if I write a simple query as shown below to take average for columns (2010, 2011, 2012) I will get the correct result for B but result for A will be NULL because of the NULL in the 2011 column:

select (2010+2011+2012)/3 as avg from year

Is there any way to write a query so that I can take average of only non-NULLs in a row?

MDCCL
  • 8,520
  • 3
  • 30
  • 61
Jio
  • 113
  • 6
  • 4
    It’s certainly solvable, but with that data model you might run into more challenges like that, having a Name,Year,Value table might be the easier way to go. – eckes Aug 10 '19 at 13:27
  • Unfortunately it is not that way. Because there are 5000 Names and years from 2000 to 2019. So a table with Name, Year, Value would to be too long – Jio Aug 10 '19 at 14:21
  • 3
    A normalized table would only have 100.000 rows - that's a really small table. –  Aug 10 '19 at 14:30
  • yes, it could be manageable in this case but not in all case. therefore i am still curious to know if there is any wayto take correct average along nullable columns such that it only considers column with non-null value. – Jio Aug 10 '19 at 14:41
  • 2
    There is no advantage in having a wide table compared to a (partitioned) long table. Especially if it is only a simple factor of 20. Besides, solving it for a few named column is easy, but for even more columns it’s just madness. – eckes Aug 10 '19 at 14:45
  • 1
    coalesce(2010,0) + coalesce(2011,0) + coalesce(2012,0) – McNets Aug 10 '19 at 14:45
  • 1
    Well NULL is no value so replacing it with zero will not be correct. – Jio Aug 10 '19 at 14:47
  • 1
    With a properly normalized model avg() would also properly take care of NULL values. Adding columns like that is never a good idea. Modern hardware and databases can easily deal with much, much larger tables. –  Aug 10 '19 at 15:04

3 Answers3

3

The only correct and scalable solution to that problem is to normalize your model.

However, you can normalize the data "on the fly" and then use standard SQL aggregation on the result.

select name, avg(t.val)
from the_table
  cross join unnest (array["2010","2011","2012"]) with ordinality as t(val)
group by name;

Online example: https://rextester.com/TGTC30399

But I would strongly recommend to fix your data model by properly normalizing it.

  • thanks a lot for your answer. I had a look at the online example you provided and i am wondering why the average for row "A" is 25 and not 33.333 – Jio Aug 10 '19 at 15:33
  • @Jio because I forgot to add the fourth column (another good reason to not continue with that bad data model) –  Aug 10 '19 at 15:57
  • True. point noted, thanks all ! – Jio Aug 10 '19 at 15:58
  • 1
    @Jio If that answer solved your question then please accept it, so that your question is marked as resolved. –  Aug 11 '19 at 15:50
3

Postgres has the little-known function num_nonnulls() for your case exactly. The manual:

returns the number of non-null arguments

SELECT (COALESCE("2010", 0)
      + COALESCE("2011", 0)
      + COALESCE("2012", 0))
      / num_nonnulls("2010","2011","2012") AS avg
FROM tbl t;

fiddle

Related:

Make sure to double-quote your otherwise illegal names and use COALESCE for each term, or the sum is null if any of them is null.

For operations on many columns see my older answer. (But this is faster.)

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
2

In most cases a normalized relational design would be the proper solution, as has been commented.

While you are stuck with your design - and there are cases where it makes sense (like to minimize storage for big tables) - listing all column names may be tedious and error-prone. Here is an alternative:

SELECT name, avg(value::int)  -- cast to the type actually used
FROM   tbl t, jsonb_each_text(to_jsonb(t) - 'name')  -- exclude non-value columns
GROUP  BY 1;

Instead of listing all columns to include, remove the one (or few) column(s) to exclude. Also keeps working if you add more value columns later ("2013", "2014", ...)

Drawback: this casts the values to JSON and back, which adds a tiny cost.

Related:

Aside, don't use numbers as column names, which requires double-quoting. Use something like c2010, c2011, ...

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
  • Thanks for your solution. I have a question. sounds crazy but is it possible to take average of values in column at position 10 to 15. IN that case i dont have to give explicit column name. I ask because the data i have has columns in format: 2010-01, 2010-02, 2010-03,... 2010-12, 2011-01, 2011-02, and so on till 2018. I have want to take average of every year. – Jio Aug 12 '19 at 16:07
  • @Jio: Not that crazy. Not even that hard, once you know how. Please ask a new question with relevant details. – Erwin Brandstetter Aug 12 '19 at 21:21