Postgres: Insert distinct values from one table into another table with constrains in place

Question

I'm trying to insert data from one PostgreSQL 9.3 table to another while removing duplicate entries.

I have two tables containing email addresses.

The main table contains emails and a tag for each email address. The combination (email, tag) must be unique, and for this there is a constraint Unique(email, tag) on it.

The second table get created on the fly from text files which contain only the email address. There are many duplicates there.

I need to import the data from the temp table into the main one without breaking the constraint above. For a certain file containing email addresses the tag is constant.

Table structures:

CREATE TABLE emails (   
  email character varying(128),
  tag bigint,
  CONSTRAINT "unique-tag-email" UNIQUE (email, tag) )

and

CREATE TABLE emails_temp (email character varying(128)

Here is my query:

insert into emails(tag,email) 
select 
 655,t.email 
from 
 emails_temp as t 
where 
 not exists ( select email from emails where email = t.email )

Note: 655 is just a tag of a certain group of email addresses.

This is the error I get:

ERROR: duplicate key value violates unique constraint "unique-tag-email" SQL state: 23505 Detail: Key (email, tag)=(user@hotmail.com, 655) already exists.

There are indeed, two email addresses user@hotmail.com in the file.

Useless to say, because of this error, there is nothing added in the main table (email).

What am I doing wrong?

You are not doing anything wrong. You are getting what you coded. 1st There is a unique key. 2nd The where condition in the correlated query select results that violate the unique key....Since user@hotmail.com, 655 already exists, the constraint is violated and therefore the DML won't succeed — Jason Krs, Jan 09 '17 at 21:58
well, to insert distinct values you may want to try selecting distinct values: insert ... select distinct 655, t.email ... — mustaccio, Jan 09 '17 at 22:18
Thanks for the comments.However, I was under the assumption that the clause where not exists ( select email from emails where email = t.email ) is there exactly to prevent email addresses to be duplicated.
Select distinct doesn't work either. I get the same error as before. — Sorin, Jan 10 '17 at 09:14

score 5 · Accepted Answer · edited Apr 13 '17 at 12:42

5

There are 3 possible kinds of duplicates:

Duplicates within the rows of the bulk insert.
Duplicates between inserted rows and existing rows.
Duplicates between inserted rows and concurrently inserted / updated rows from other transactions.

Just like I explained in this closely related answer:

Using EXCEPTION to ignore duplicates during bulk inserts

But things have become easier for 2. and 3. since Postgres 9.5 introduced UPSERT (INSERT .. ON CONFLICT DO NOTHING).

INSERT INTO emails(tag,email)
SELECT DISTINCT 655, email
FROM   emails_temp
ON CONFLICT (email) DO NOTHING;

If your duplicates only stem from duplicate entries in the source (1.), like you indicated, then all you need is DISTINCT. Works in any version of Postgres:

INSERT INTO emails(tag,email)
SELECT DISTINCT 655, email
FROM   emails_temp;

edited Apr 13 '17 at 12:42

Community

1

answered Jan 10 '17 at 05:34

Erwin Brandstetter

175,982
27
439
600

Erwin, thanks for the idea. I read about it too, but I can't upgrade to PSQL 9.5. I am stuck with 9.3 for some other reasons which I can't control. – Sorin Jan 10 '17 at 09:21
@Sorin: Seems like all you need is DISTINCT, really. I added a bit above. – Erwin Brandstetter Jan 10 '17 at 14:09
1

For previous versions without ON CONFLICT: INSERT INTO emails (tag,email) SELECT DISTINCT 655, email FROM emails_temp EXCEPT SELECT tag, email FROM emails WHERE tag = 655 ; – ypercubeᵀᴹ Jan 10 '17 at 15:25

Postgres: Insert distinct values from one table into another table with constrains in place

1 Answers1