2

I'm querying on this table:

SKU
aaaa
bbbb
bbbb
NULL

Here's the query:

select *
from TEST as N                                      
where  N.SKU NOT IN (select SKU
                 from TEST
                 group by SKU
                 having count(*)>1);

I expect the query returns 'aaaa', however, it returns nothing.

The reason I expect that is because the subquery below only returns 'bbbb':

select SKU
from TEST
group by SKU
having count(*)>1   

Therefore, 'aaaa' NOT IN the subquery result.

To show the bug please copy and paste these statements in your MySQL IDE to create schema:

drop table if exists TEST;
create table TEST(
SKU varchar(255)
);
insert into TEST values('aaaa'),('bbbb'),('bbbb'),(NULL);

Even more confusing, try this new table below, re-run the query, and check result:

drop table if exists TEST;
create table TEST(
SKU varchar(255)
);
insert into TEST values('aaaa'),('cccc'),('dddd'),('bbbb'),('bbbb'),(NULL);

Expect 'aaaa', 'cccc', 'dddd' are all returned, however 'aaaa' is missing, yet 'cccc' and 'dddd' are returned.

Shawn
  • 21
  • 2
  • 1
    Which version of MySQL? – ypercubeᵀᴹ Feb 27 '18 at 17:36
  • @ypercubeᵀᴹ 5.6.37 – Shawn Feb 27 '18 at 17:43
  • You most probably have 2 rows with NULL. – ypercubeᵀᴹ Feb 27 '18 at 17:44
  • It could be a bug of course. I don't have 5.6.37 to test – ypercubeᵀᴹ Feb 27 '18 at 17:51
  • I just tried this out in 5.7.12 for Windows and I get nothing – RolandoMySQLDBA Feb 27 '18 at 17:52
  • @ypercubeᵀᴹ I just tested it in 5.7.19 for AWS. and I get nothing. – Shawn Feb 27 '18 at 18:23
  • Does the table have a primary key? – ypercubeᵀᴹ Feb 27 '18 at 19:06
  • Note: Deleting the NULL values from table TEST restores the expected behavior. Also, error appears to be independent of the column type. I tried a similar test with INT instead of VARCHAR and got the same behavior. – Willem Renzema Feb 27 '18 at 19:06
  • @ypercubeᵀᴹ The TEST table does not, based on the schema provided. I tried adding another column as an auto increment primary key and there was no change in behavior. – Willem Renzema Feb 27 '18 at 19:10
  • @Shawn Related: https://bugs.mysql.com/bug.php?id=39046 and https://dev.mysql.com/doc/refman/5.7/en/all-subqueries.html However, the linked bug is not exactly the same, as the situation you have reported should have no NULL values in the subquery. After digging as much as I could I could find nothing in the documentation about HAVING behaving differently in a subquery vs a normal SELECT. Hopefully this information is useful when submitting your bug report. – Willem Renzema Feb 27 '18 at 19:50
  • @WillemRenzema I see the table in question. I'm asking for the real problem because for the specific table, I'd use just select sku from test group by sku having count(*)=1;. The bug - if there is one - is probably due to one (or a combination of) NOT IN, nulls, HAVING and correlated subqueries. For the real table, I'd try a NOT EXISTS query but that would require a UNIQUE column. That's why I asked for the PK. – ypercubeᵀᴹ Feb 27 '18 at 21:22
  • @ypercubeᵀᴹ so the real scenario is to get all the items in NetSuite whose SKU has no duplicates. NetSuite internal_ID is the key. SKU arguably should be key, at least it is unique. The original query is: select *
    from (select * from NS_DOWNLOAD where NS_DOWNLOAD.INACTIVE = 'No' ) as N
    
    where  N.SKU NOT IN (select SKU
              from (select * from NS_DOWNLOAD where NS_DOWNLOAD.INACTIVE = 'No' ) as K
              group by SKU
              having count(*)>1);
    
    – Shawn Feb 27 '18 at 23:18
  • @ypercubeᵀᴹ In my case, real issue was: We have over 140000 items, I was expecting the whole query returns 72929 items, however it returns 72928 items. One expected item was missing. Each attribute of this missing item is good, I checked it out. I've also checked query, seemed correct. So I've checked several other expected items with same properties as this missing one, they were good, they were in the results. No idea why only this one is missing. Turns out one of the 140000 items has NULL in its SKU. Remove that bad one, do the query again, missing item is back, weird. – Shawn Feb 27 '18 at 23:39
  • @ypercubeᵀᴹ I reproduced the problem I met by having edited the post and attached a new test case example. – Shawn Feb 28 '18 at 00:02
  • NULLs behave differently with many operators - differently than one would expect. Esepcially Nulls and NOT IN is a common cause of trouble. I'd write your query like this: select n.* from NS_DOWNLOAD as n where n.INACTIVE = 'No' and not exists (select * from NS_DOWNLOAD as b where b.INACTIVE = 'No' and b.sku = n.sku and b.internal_ID <> n.internal_ID);. The nulls should be in the result as well - even if they are a million! - (and index on (inactive, sku, internal_id) might be useful for efficiency.) – ypercubeᵀᴹ Feb 28 '18 at 01:04
  • @Shawn - is the missing one (of 72929) the first or last -- either alphabetically or in order of insertion? (Just looking for clues.) – Rick James Mar 05 '18 at 21:22
  • @Rick James neither first nor last... nor random. The same row somewhere in the middle was missing. guessing it might relate to some hash function thing, I don't know. – Shawn Mar 15 '18 at 22:42

2 Answers2

2

This problem looks weird.

It reminds me of a post I wrote 7 years ago : Problem with MySQL subquery

Sometimes, values in subquery results may disappear intermittently while processing.

This problem was usually associated with non-SELECT queries involving subqueries.

This looks like something even worse. You should file a bug report on this one.

RolandoMySQLDBA
  • 182,700
  • 33
  • 317
  • 520
0

This works OK, if subquery returns only one row:

SELECT * 
FROM TEST as N 
WHERE N.SKU NOT IN (
  CAST((select SKU from TEST group by SKU having count(*)>1) AS CHAR)
)

However, if it returns more then 1 row, error will be triggered. Another solution (if the subquery returns more than 1 row):

SELECT * 
FROM TEST as N 
WHERE N.SKU IN (
  (select SKU from TEST group by SKU having count(*) <= 1)
)
Boolean_Type
  • 101
  • 2