1

I had a strange error where I did something like the following:

SELECT *
FROM table1
WHERE NAME COLLATE LATIN1_General_CI_AS NOT IN (select name FROM table2)

Name was a nvarchar(50) null. Table2 had null values in the name column. It would then match names that did not exist in table2 because of the null values. Meaning that this would return 0 results. And if I did:

SELECT *
FROM table1
WHERE NAME COLLATE LATIN1_General_CI_AS NOT IN (
  select name FROM table2 where name is not null
)

I would get some results.

One table had collation Danish_Norwegian_CI_AS and the other one latin1. Maybe the COLLATE is the source of the problem?

Can anyone explain why this happens? Maybe something with set theory?

marc_s
  • 704,970
  • 168
  • 1,303
  • 1,425
Anders Rune Jensen
  • 3,707
  • 2
  • 40
  • 53

3 Answers3

1

NULL does not work as some folks would expect when used in a NOT IN or NOT EXISTS expression.

Basically, SQL Server can't say for sure that something isn't in a subset if the subset contains NULL since NULL is a state of having an unknown value.

I don't think collation even enters into it, the queries are different.

See a full explanation of NULL in NOT IN here.

Community
  • 1
  • 1
JNK
  • 60,688
  • 15
  • 118
  • 136
1

I think I would work around the issue by using

SELECT * 
  FROM table1 t1
 WHERE NOT EXISTS ( SELECT *
                      FROM table2 t2
                     WHERE (t1.NAME COLLATE LATIN1_General_CI_AS = t2.name)
                        OR (t1.NAME IS NULL AND t2.name IS NULL))

No fancy theory though, just practical experience =P

deroby
  • 5,764
  • 2
  • 18
  • 31
1

It has nothing to do with collation. Try set ansi_nulls off and the first query will work as you expected.

The reason is that when you try to compare something to NULL (e.g. a = NULL or a <> NULL) the result is UNKNOWN and the entire query fails. If you set ansi_nulls to off then the same comparison evaluates to TRUE or FALSE and the query "works".

In other words:

WHEN a NOT IN (1, 2, null)

means

WHEN a <> 1 AND a <> 2 AND a <> null

WHERE the last part a <> null evaluates to UNKNOWN and the query fails to return any rows.

lalibi
  • 3,022
  • 3
  • 31
  • 40