Yesterday I had to write some code that goes through ~700.000 datasets (seperated into 6 tables) and denormalise them (see the yet unwritten Part II). As we're using DBIx::Class, I first used it. Even though I avoided some in/deflators and used
columns to only get the stuff I needed, the process took ages (~50 items per second or aprox 4 hours for the whole job). Well, 4 hours might be bearable, but this was only the Swiss dataset. The German one, which we have to tackle soon, is at least 10 times as big, and 40 hours is just a way too long runtime.
So I rewrote the core of the programm using raw DBI calls (I had several flashbacks to the 90's
And that was before koki told me to set
work_mem to a bigger value. I choose 30MB, and got another speedup to ~900 items per second. Sweet!
Of course, it's completely unfair to compare DBI with DBIx::Class performance wise, because DBIx::Class is so much nicer to work with. But if speed is an issue, dropping back to raw DBI is the way to go.
If you write raw SQL, don’t use DBI itself, use DBIx::Simple [p3rl.org]. It is chock full of sensibly named methods for fetching results in various forms, making client code much nicer to read.