Fixing a non-broken search form

I recently had to take over a project from a friend of mine under not very happy circumstances. I still have to say that the code looks exactly like you would imagine it after ten years of development by a single person...

My first task (after setting everything up on my laptop, deleting a lot of *.bak and *~ files, and putting the rest into git) was to handle a complaint by a user. She complained that a certain search did not yield any results, even though it should definitely find some stuff.

So I started to reproduce the bug on my local system, by entering the search term ("die. Zeitung für Erwachsenenbildung"). No results. My first though was that "die." (the German article, not the English verb!) was some sort of stop-word. But a search for "Zeitung für Erwachsenenbildung" also had no results. In fact, the system informed me that the search for "zeitung AND für AND erwachsenenbildung" had no results.

My next suspect was the Umlaut-ü in "für". I tried "zeitung erwachsenenbildung". No result. Now I started guessing. "zeitung erwachsenen*". Results! So wildcards were working! But lots of the results were irrelevant, so I tacked the "die" back in, and now I got the expected result back.

Now that I had results I took a look at the resulting articles, and copy'n'pasted the exact title of this damn Zeitung into the search box. And I got the correct results! WTF! I checked, double-checked and triple-checked the search string. Every time I typed in the query (making very sure I typed it in exactly): No hits. Every time I pasted it: Correct hits.

So I started to assume that the title of the thing I was searching for was somehow invisibly different from what I typed. Again I suspected a different encoding of the "ü". So I hit the database directly (via psql) to retrieve the value that was stored there. The "ü" was ok, but there was an extra invisible character at the end of the "Erwachsenenbildung"! No idea how the entered it, but I assume a copy'n'paste mistake...

The fix: told them to correct the data.

Moral of this story: It's always the users fault :-)