hatefull mysql

GAAA!!!

Yesterday I got a "funny" bug report: If the user 'märrie' logs in, she gets the account of user 'marrie'. As the system in question gets it's data from a combination of different sources (SAP, a semi-external single sign on system, some local data) I expected some hellish encoding problems (which would be strange, because we switched the whole system over to utf8 two years ago).

So I start my bug-hunt at the lowest level and connect to the MySQL DB. I do a quick "select username from users where username = 'märrie"' (just for basic sanity checking) and get back:

marrie

märrie

WTF??

To cut a long bug-hunt short: MySQL considers 'ä' and 'a' to be the same character (at least when using default utf8 settings) - not only for sorting, where this makes a little bit sense, but also for selecting, which is totally pointless.

The solution: use a collate of 'utf8_bin':

mysql> SELECT 'ä' = 'a' COLLATE utf8_general_ci;



+------------------------------------+

| 'ä' = 'a' COLLATE utf8_general_ci |

+------------------------------------+

| 1 |

+------------------------------------+

1 row in set (0.00 sec)



mysql> SELECT 'ä' = 'a' COLLATE utf8_bin;

+-----------------------------+

| 'ä' = 'a' COLLATE utf8_bin |

+-----------------------------+

| 0 |

+-----------------------------+

1 row in set (0.00 sec)

/me hates MySQL!

Original: http://use.perl.org/~domm/journal/37070

Legacy comments

Aristotle: Mmmmh (orignal post)

$ cd projects/MahDayJobb/

$ cat sql/schemapatch-00039.sql

ALTER DATABASE CHARACTER SET = 'utf8' COLLATE = 'utf8_unicode_ci';