35

These two querys gives me the exact same result:

select * from topics where name='Harligt';
select * from topics where name='Härligt';

How is this possible? Seems like mysql translates åäö to aao when it searches. Is there some way to turn this off?

I use utf-8 encoding everywhere as far as i know. The same problem occurs both from terminal and from php.

Josh Lee
  • 161,055
  • 37
  • 262
  • 269
Martin
  • 5,037
  • 11
  • 41
  • 60

5 Answers5

39

Yes, this is standard behaviour in the non-language-specific unicode collations.

9.1.13.1. Unicode Character Sets

To further illustrate, the following equalities hold in both utf8_general_ci and utf8_unicode_ci (for the effect this has in comparisons or when doing searches, see Section 9.1.7.7, “Examples of the Effect of Collation”):

Ä = A Ö = O Ü = U

See also Examples of the effect of collation

You need to either

  • use a collation that doesn't have this "feature" (namely utf8_bin, but that has other consequences)

  • use a different collation for the query only. This should work:

     select * from topics where name='Harligt' COLLATE utf8_bin;
    

it becomes more difficult if you want to do a case insensitive LIKE but not have the Ä = A umlaut conversion. I know no mySQL collation that is case insensitive and does not do this kind of implicit umlaut conversion. If anybody knows one, I'd be interested to hear about it.

Related:

Community
  • 1
  • 1
Pekka
  • 431,103
  • 135
  • 960
  • 1,075
5

Since you are in Sweden I'd recommend using the Swedish collation. Here's an example showing the difference it makes:

CREATE TABLE topics (name varchar(100) not null) CHARACTER SET utf8;

INSERT topics (name) VALUES ('Härligt');

select * from topics where name='Harligt';
'Härligt'

select * from topics where name='Härligt';
'Härligt'    

ALTER TABLE topics MODIFY name VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_swedish_ci;

select * from topics where name='Harligt';
<no results>

select * from topics where name='Härligt';
'Härligt'

Note that in this example I only changed the one column to Swedish collation, but you should probably do it for your entire database, all tables, all varchar columns.

Mark Byers
  • 767,688
  • 176
  • 1,542
  • 1,434
  • This should be working for other languages than swedish so i think utf8_bin is the collation i want, but thanks for letting me know that i can change collation on only one column, that will be really helpful. – Martin Apr 09 '10 at 12:27
  • Note that utf8_bin is case sensitive so "härligt" != "Härligt" (applies to unique indexes too). – serbaut Apr 09 '10 at 13:23
2

While collations are one way of solving this, the much more straightforward way seems to me to be the BINARY keyword:

 SELECT 'a' = 'ä', BINARY 'a' = 'ä'

will return 1|0

In your case:

SELECT * FROM topics WHERE BINARY name='Härligt';

See also https://www.w3schools.com/sql/func_mysql_binary.asp

E. Villiger
  • 821
  • 8
  • 25
1

you want to check your collation settings, collation is the property that sets which characters are identical.

these 2 pages should help you

http://dev.mysql.com/doc/refman/5.1/en/charset-general.html

http://dev.mysql.com/doc/refman/5.1/en/charset-mysql.html

chris
  • 9,335
  • 1
  • 25
  • 26
0

Here you can see some collation charts. http://collation-charts.org/mysql60/. I'm no sure which is the used utf8_general_ci though.

Here is the chart for utf8_swedish_ci. It shows which characters it interprets as the same. http://collation-charts.org/mysql60/mysql604.utf8_swedish_ci.html

jiv-e
  • 465
  • 6
  • 8