I need to calculate the depth of a descendant from it's ancestor. When a record has object_id = parent_id = ancestor_id, it is considered a root node (the ancestor). I have been trying to get a WITH RECURSIVE query running with PostgreSQL 9.4.
I do not control the data or the columns. The data and table schema comes from an external source. The table is growing continuously. Right now by about 30k records per day. Any node in the tree can be missing and they will be pulled from an external source at some point. They are usually pulled in created_at DESC order but the data is pulled with asynchronous background jobs.
We initially had a code solution to this problem, but now having 5M+ rows, it takes almost 30 minutes to complete.
Example table definition and test data:
CREATE TABLE objects (
id serial NOT NULL PRIMARY KEY,
customer_id integer NOT NULL,
object_id integer NOT NULL,
parent_id integer,
ancestor_id integer,
generation integer NOT NULL DEFAULT 0
);
INSERT INTO objects(id, customer_id , object_id, parent_id, ancestor_id, generation)
VALUES (2, 1, 2, 1, 1, -1), --no parent yet
(3, 2, 3, 3, 3, -1), --root node
(4, 2, 4, 3, 3, -1), --depth 1
(5, 2, 5, 4, 3, -1), --depth 2
(6, 2, 6, 5, 3, -1), --depth 3
(7, 1, 7, 7, 7, -1), --root node
(8, 1, 8, 7, 7, -1), --depth 1
(9, 1, 9, 8, 7, -1); --depth 2
Note that object_id is not unique, but the combination (customer_id, object_id) is unique.
Running a query like this:
WITH RECURSIVE descendants(id, customer_id, object_id, parent_id, ancestor_id, depth) AS (
SELECT id, customer_id, object_id, parent_id, ancestor_id, 0
FROM objects
WHERE object_id = parent_id
UNION
SELECT o.id, o.customer_id, o.object_id, o.parent_id, o.ancestor_id, d.depth + 1
FROM objects o
INNER JOIN descendants d ON d.parent_id = o.object_id
WHERE
d.id <> o.id
AND
d.customer_id = o.customer_id
) SELECT * FROM descendants d;
I would like the generation column to be set as the depth that was calculated. When a new record is added, the generation column is set as -1. There are some cases where a parent_id may not have been pulled yet. If the parent_id does not exist, it should leave the generation column set to -1.
The final data should look like:
id | customer_id | object_id | parent_id | ancestor_id | generation
2 1 2 1 1 -1
3 2 3 3 3 0
4 2 4 3 3 1
5 2 5 4 3 2
6 2 6 5 3 3
7 1 7 7 7 0
8 1 8 7 7 1
9 1 9 8 7 2
The result of the query should be to update the generation column to the correct depth.
I started working from the answers to this related question on SO.
updatethe table with the result of your recursive CTE? – Jan 12 '16 at 07:38ancestor_idis already set, so you only need to assign the generation from the CTE.depth ? – Jan 12 '16 at 16:22