0

So I need to clean up some DB entries that have mistakenly been created with trailing and/or leading whitespaces.

I wrote a small query that gives us all the duplicates, their IDs and what projects they belong to.

(cf. https://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=4cf4b42210f48d571acd416123fb9cfb )

I now want to filter by how many distinct projects the duplicates belong to as we have to treat them differently if they're in different projects.

Basically, I'm looking for <unique elements of duplicates_projects> in

WITH all_entries AS (
    SELECT
        e.id AS id,
        trim(e.name) AS name,
        e.project_id AS project_id
    FROM
        entries e
),
duplicates_count AS (
    SELECT
       name,
       count(1) AS matches
    FROM all_entries
    GROUP BY name
    HAVING count(1) > 1
),
all_duplicates AS (
    SELECT
        name,
        matches,
        ARRAY (SELECT e.id FROM all_entries e WHERE e.name = dc.name) AS duplicates_ids,
        ARRAY (SELECT e.project_id FROM all_entries e WHERE e.name = dc.name) AS duplicates_projects
    FROM duplicates_count dc
)
SELECT * FROM all_duplicates WHERE array_length(<unique elements of duplicates_projects>, 1) > 1;
;

I did find eliminate duplicate array values in postgres, but

aggregate functions are not allowed in WHERE

and

SELECT * FROM all_duplicates d
WHERE
    array_length(
        ARRAY(SELECT DISTINCT UNNEST(a.duplicates_projects) FROM all_duplicates a WHERE a.name = d.name), 1
    ) > 1
;

works but seems overly complicated.

Is there a simpler way to do this?

User1291
  • 6,823
  • 6
  • 45
  • 90

0 Answers0