/ domm

I hack Perl for fun and

Follow me on twitter!
Atom Icom ... on Atom!
02.10.2012: Vienna.pm TechMeet: DBIx::SchemaChecksum
28.09.2012: Call for Papers: Austrian Perl Workshops 2012
20.09.2012: Generating PDF contact sheets with PDF::Create
11.09.2012: Some new Blio features
04.09.2012: Vienna.pm September 2012 Techmeet
25.08.2012: Which tag to use for YAPCs?
23.08.2012: Things I learned at YAPC::Europe 2012
09.08.2012: Blio - my blogging "engine"
01.06.2012: App-ArchiveDevelCover 1.001
10.05.2012: Don't use Cache::Memcached for UTF8 strings

Yesterday we (Maroš and me) encountered a strange bug in our mail sending code. The first time a user requested a password reset, she got a nice mail. The second time, it was double encoded garbage. After some research we figured out that the recently added caching of the template was causing the problem.

We put a perfectly valid UTF-8 string into memcached via Cache::Memcached, but we got back garbage. Of course we first assumed it was our mistake and added even more tests. But finally we did the sensible thing and inspected the UTF-8 flag using Encode::is_utf8 and the wonderful Devel::Peek.

Before caching, we got (for a scalar containing "töst"):

SV = PV(0x13fdc20) at 0x141d860
  REFCNT = 1
  PV = 0x1416b60 "t\303\266st"\0 [UTF8 "t\x{f6}st"]
  CUR = 5
  LEN = 16

After, we got:

SV = PV(0x1e65c20) at 0x1e85818
  REFCNT = 1
  PV = 0x1e7eb60 "t\303\266st"\0
  CUR = 5
  LEN = 16

Still the same bytes, but no UTF8, and thus very broken output. Remember the n-th Rule Of Unicode Handling In Perl? "Thou shalt not mix UTF-8 and non-UTF-8 strings!"

The quick hack would have been to just add the UTF-8 flag to the scalar by doing Encode::_utf8_on($template) but this is evil. So we skipped that. And only now we had the idea to look at the rt.cpan.org queue for Cache::Memcached where we not only found a bug report covering exactly our problem that also contained a patch to fix it, but learned that apparently the maintainers ignore the RT bug queue. Fair enough, their choice.

But as we weren't confident that this bug will be solved soon, we looked for alternatives and found that Cache::Memcached::Fast comes with an utf8 option that fixes the handling of UTF-8 strings.

So we're now using Cache::Memcached::Fast and the mails work!

P.S.: The whole lost UTF-8 flag problem only occurs when you retrieve a plain string from memcached. If you get a ref that contains UTF-8 strings everything works, because apparently Storable (which is uses by Cache::Memcached to flatten your data structure) does honor the UTF-8 flag

Comments (via disqus)