1

I have a table in a Postgres database with monthly columns from 2012 to the end of 2018:

create table sales_data (
  part_number text not null,
  customer text not null,
  qty_2012_01 numeric,
  qty_2012_02 numeric,
  qty_2012_03 numeric,
  ...
  qty_2018_10 numeric,
  qty_2018_11 numeric,
  qty_2018_12 numeric,
  constraint sales_data_pk primary key (part_number, customer)
);

The data is populated from a large function that pulls data from an extremely wide variety of sources. It involves many left joins -- for example, in combining history with future data, where a single item may have history but not future demand or vice versa. Or, certain customers may not have data as far back or forward as we want.

The problem I'm coming up with is due to the left joins (and the nature of the data I'm pulling), a significant number of the values I am pulling are null. I would like any null to simply be zero to simplify any queries against this table, specifically aggregate functions that say 1 + null + 2 = null.

I could modify the function and add hundreds of coalesce statements. However, I was hoping there was another way around this, even if it means modifying the values after the fact. That said, this would mean adding 84 update statements at the end of the function:

update sales_data set qty_2012_01 = 0 where qty_2012_01 is null;
update sales_data set qty_2012_02 = 0 where qty_2012_02 is null;
update sales_data set qty_2012_03 = 0 where qty_2012_03 is null;
... 78 more like this...
update sales_data set qty_2018_10 = 0 where qty_2018_10 is null;
update sales_data set qty_2018_11 = 0 where qty_2018_11 is null;
update sales_data set qty_2018_12 = 0 where qty_2018_12 is null;

I'm missing something, right? Is there an easier way?

I was hoping the default setting on the column would force a zero, but it doesn't work when the function is explicitly telling it to insert a null. Likewise, if I make the column non-nullable, it just pukes on my insert -- I was hoping that might force the invocation of the default.

By the way, the insert-then-update strategy is one I chastise others for, so I understand this is less than ideal. This function is a bit of a beast, and it does require some occasional maintenance (long story). My primary goal is to keep the function as readable and maintainable as possible -- NOT to make the function uber-efficient. The table itself is not huge -- less than a million records after all is said and done -- and we run the function to populate it once or twice a month.

Erwin Brandstetter
  • 539,169
  • 125
  • 977
  • 1,137
Hambone
  • 14,235
  • 7
  • 44
  • 64
  • 1
    "*with monthly columns starting 2012 to the end of 2018:*" - why? If you properly normalize your data model, then this is just a single `update` statement. – a_horse_with_no_name Oct 23 '15 at 16:27
  • @a_horse_with_no_name -- a fair question, indeed. The data is actually rendered in Excel in the form of a chart. If we did this as normalized data, it would be millions of rows which would then have to be summarized in a pivot table/chart. The denormalization was by design, for rapid performance within the display tool. We have the data elsewhere, normalized. The function takes those various data sources and populates this table – Hambone Oct 23 '15 at 18:56

2 Answers2

1

While INSERT statement itself you can COALESCE (col_name, 0) will fix the issue. You can add NOT NULL also to maintain data integrity .

Assuming Inserting data from Temp Table

INSERT INTO sales_data (qty_2012_01, qty_2012_02)
SELECT COALESCE(qty_2012_01, 0), COALESCE(qty_2012_01, 0)
FROM temp_sales_data;

Single Update

UPDATE sales_date SET
qty_2012_01 = COALESCE(qty_2012_01, 0),
qty_2012_02 = COALESCE(qty_2012_02, 0)
..
..
WHERE qty_2012_01 IS NULL 
OR qty_2012_02 IS NULL 
...
....

The above query will update all the columns in single update.

Shankar
  • 776
  • 6
  • 24
  • I think @Hambone specified that he is looking for another solution. – Nick Oct 23 '15 at 18:11
  • I got that Thanks Nick, Updating my answer. – Shankar Oct 23 '15 at 18:11
  • Thanks... while I was hoping to avoid this type of thing (a lengthy function to maintain), I did +1 your suggestion since it's an improvement over what I had, doing it in a single transaction and minimizing updates. – Hambone Oct 26 '15 at 16:00
1

There is not built-in feature (I would know of). Short of spelling out COALESCE(col, 0) everywhere you can write a function to replace all NULL values with 0 in all numeric columns of a table:

CREATE OR REPLACE FUNCTION f_convert_numeric_null(_tbl regclass)
  RETURNS void AS
$func$
BEGIN
    RAISE NOTICE '%',  -- test output for debugging
    -- EXECUTE         -- payload
   (SELECT 'UPDATE ' || _tbl
        || ' SET '   || string_agg(format('%1$s = COALESCE(%1$s, 0)', col), ', ')
        || ' WHERE ' || string_agg(col || ' IS NULL', ' OR ')
   FROM  (
      SELECT quote_ident(attname) AS col
      FROM   pg_attribute
      WHERE  attrelid = _tbl                -- valid, visible, legal table name 
      AND    attnum >= 1                    -- exclude tableoid & friends
      AND    NOT attisdropped               -- exclude dropped columns
      AND    NOT attnotnull                 -- exclude columns defined NOT NULL
      AND    atttypid = 'numeric'::regtype  -- only numeric columns
      ORDER  BY attnum
      ) sub
   );
END
$func$  LANGUAGE plpgsql;

Concatenates and executes a query of the form:

UPDATE sales_data
SET    qty_2012_01 = COALESCE(qty_2012_01, 0)
     , qty_2012_02 = COALESCE(qty_2012_02, 0)
     , qty_2012_03 = COALESCE(qty_2012_03, 0)
       ... 
 WHERE qty_2012_01 IS NULL OR
       qty_2012_02 IS NULL OR
       qty_2012_03 IS NULL ... ;

Works for any table with any column names. All numeric columns are updated. Only rows that actually change are touched.

Since the function is massively invasive, I added a child-safety device. Quote the RAISE NOTICE line and unquote EXECUTE to prime the bomb.

Call:

SELECT f_convert_numeric_null('sales_data');

My primary goal is to keep the function as readable and maintainable as possible.

That should do it.

SQL Fiddle.

The parameter is type regclass, so pass the table name, possibly schema-qualified, non-standard identifiers must be double-quoted - names like "mySchema"."0dumb tablename".

Write your query results to a temporary table, run the function on the temp table and then INSERT into the actual table.

Related:

Community
  • 1
  • 1
Erwin Brandstetter
  • 539,169
  • 125
  • 977
  • 1,137
  • This is pretty creative, and it definitely helps to keep my function clean, as well as fixing all values at the same time (less dead rows in the table when it's done). It also brings about another idea -- do you think a trigger could be used to intercept the insert and change a null to a 0? – Hambone Oct 26 '15 at 15:59
  • 1
    @Hambone: Sure. Similar to http://stackoverflow.com/a/14035890/939860 or http://stackoverflow.com/a/25797129/939860. Again, you have to spell out all columns or use (possibly more expensive) dynamic SQL. – Erwin Brandstetter Oct 26 '15 at 16:43
  • That's actually perfect... Thanks for that reference. – Hambone Oct 26 '15 at 17:13