2

I have a table edges, which describes relations in a graph:

CREATE TABLE IF NOT EXISTS edges (
   src INT NOT NULL REFERENCES nodes(id) ON UPDATE CASCADE ON DELETE CASCADE
  ,tgt INT NOT NULL REFERENCES nodes(id) ON UPDATE CASCADE ON DELETE CASCADE
  ,rel TEXT NOT NULL
  ,rel_type INT NOT NULL
  ,PRIMARY KEY (src, tgt, rel)
  ,UNIQUE (src, tgt, rel)
  );

After inserts:

select * from edges;
 src | tgt |    rel    | rel_type 
-----+-----+-----------+----------
   1 |   2 | 5.4.2.2   |        2
   2 |   3 | 5.3.1.9   |        2
   ...
   5 |   6 | 2.7.1.2   |        1
   5 |   6 | 2.7.1.147 |        1
   6 |   2 | 5.3.1.9   |        2
   6 |   3 | 5.3.1.9   |        2
   ...

I am using rel_type to specify edge directionality (0: undirected; 1: source to target; 2: bidirectional).

Hence, inserting (3, 2, '5.3.1.9', 2) is redundant with respect to the second entry above (for example) -- which already expresses the reciprocal relationship 2 --> 3 and 3 --> 2.

How can I add a constraint that prevents the insertion of those redundant, reciprocal relations -- ideally ON CONFLICT DO NOTHING?

Basically, something like (these don't work: first due to syntax; second due to other issues):

ALTER TABLE edges ADD CONSTRAINT duplicate_rel_check CHECK ((src, tgt) <> (tgt, src) WHERE rel_type = 2);

or

CREATE UNIQUE INDEX ON edges ( greatest(src, tgt, rel_type=2), least(tgt, src, rel_type=2) );
Victoria Stuart
  • 181
  • 2
  • 8

1 Answers1

4

A UNIQUE, multicolumn, partial expression index should do the trick:

CREATE UNIQUE INDEX ON edges (LEAST(src, tgt), GREATEST(src, tgt), rel)
WHERE rel_type = 2;

Works with INSERT ... ON CONFLICT DO NOTHING.

Related:

Aside: If those are IP addresses, consider the data type cidr or ip4 from the additional module ip4r for column rel.

Erwin Brandstetter
  • 175,982
  • 27
  • 439
  • 600
  • Awesome; thank you! Re: cidr: these are biomolecular data (5.4.2.2 etc. are enzyme codes, representing the edges: https://www.genome.jp/dbget-bin/www_bget?ec:5.4.2.2 ...). I have another table (nodes), that specifies the source and target nodes, e.g. C00113, d-glucose 1-phoshate, https://www.genome.jp/dbget-bin/www_bget?C00103) ... https://www.genome.jp/kegg-bin/show_pathway?org_name=hsadd&mapno=00010 – Victoria Stuart Jun 21 '19 at 02:08
  • 1
    @VictoriaStuart: I see. The sample just looked like IP addresses. :) – Erwin Brandstetter Jun 21 '19 at 02:34
  • I nonetheless much appreciated the heads-up on that datatype. ;-) – Victoria Stuart Jun 21 '19 at 03:27
  • @victoria: just curious - would those sorts of data not better be represented and queried as RDF triples ? – Albert Godfrind Jun 22 '19 at 07:21
  • @AlbertGodfrind: yes; I am storing data in PostgreSQL, and constructing relational property graphs from those data. – Victoria Stuart Jun 22 '19 at 13:57