/ domm

I hack Perl for fun and
profit.

Atom Icom Follow me on Atom!
<<<<<<<<<<
30.09.2008: Twin City Perl Workshop 2008 CfPaper Deadline approaching
16.09.2008: TwinCity Perl Workshop "Travel Grants"
05.09.2008: Twin City Perl Workshop 2008 Call for Everything
05.09.2008: /me loves git
16.08.2008: Things I learned at YAPC::Europe 2008
15.08.2008: some impressions from YAPC::Europe 2008
10.08.2008: prepareing for YAPC::Europe 2008 in Copenhagen
31.07.2008: hatefull mysql

GAAA!!!

Yesterday I got a "funny" bug report: If the user 'märrie' logs in, she gets the account of user 'marrie'. As the system in question gets it's data from a combination of different sources (SAP, a semi-external single sign on system, some local data) I expected some hellish encoding problems (which would be strange, because we switched the whole system over to utf8 two years ago).

So I start my bug-hunt at the lowest level and connect to the MySQL DB. I do a quick "select username from users where username = 'märrie"' (just for basic sanity checking) and get back:

marrie

märrie

WTF??

To cut a long bug-hunt short: MySQL considers 'ä' and 'a' to be the same character (at least when using default utf8 settings) - not only for sorting, where this makes a little bit sense, but also for selecting, which is totally pointless.

The solution: use a collate of 'utf8_bin':

mysql> SELECT 'ä' = 'a' COLLATE utf8_general_ci;



+------------------------------------+

| 'ä' = 'a' COLLATE utf8_general_ci |

+------------------------------------+

| 1 |

+------------------------------------+

1 row in set (0.00 sec)



mysql> SELECT 'ä' = 'a' COLLATE utf8_bin;

+-----------------------------+

| 'ä' = 'a' COLLATE utf8_bin |

+-----------------------------+

| 0 |

+-----------------------------+

1 row in set (0.00 sec)

/me hates MySQL!

Original: http://use.perl.org/~domm/journal/37070

Legacy comments

Aristotle: Mmmmh (orignal post)

$ cd projects/MahDayJobb/

$ cat sql/schemapatch-00039.sql

ALTER DATABASE CHARACTER SET = 'utf8' COLLATE = 'utf8_unicode_ci';

Comments (via disqus)

22.07.2008: translations
04.07.2008: Two proposals for YAPC::Europe::2009
>>>>>>>>>>