Bulk downloading all episodes of a podcast

2021-01-03
⇦
⇧
⇨

In some regards, I'm a very old school person. For example I do not like the concept of streaming audio (via Spotify et.al.). I want MP3s on my hard disk (and/or vinyl on my record player). I want access to my music when I'm offline (and I'm offline a lot) and without using a so-called smart phone (I prefer vintage USB-stick MP3 players). My partner thinks the same (I guess 25+ years of my propaganda had some influence..).

But "modern" sites make it rather hard to actually download content (even if it's free). They offer links to a myriad of apps, but often no download button. At least a lot of podcasts still provide an RSS feed. So when my partner cannot download a newly discovered podcast, she asked me if I can do it for her. Which I'm of course happy to do, and which is often done with a few lines of code:

#!/usr/bin/env perl

use strict;
use warnings;
use 5.030;

use XML::Feed;
use URI;
use String::Ident;

my $feed = XML::Feed->parse( URI->new( $ARGV[0] ) );

for my $entry ( $feed->entries ) {
    my $date = $entry->issued;
    $date =~ s/T.*$//;
    my $filename = join( '-', $date, String::Ident->cleanup( $entry->title ) ) . '.mp3';
    next if -f $filename;
    say "wget -O $filename " . $entry->enclosure->url;
}

Here I use XML::Feed to fetch and parse the RSS feed, passing in the URL as the first command line argument. I create a nice filename based on the date the podcast was issued (removing the time-part) and a cleanup()ed version of the title. (String::Ident is a nice little helper module Jozef created for a project we were working on some time ago).

If the filename already exists in the current directory, we skip, because we don't need to download it again.

Then I output a wget command to download the URL (provided by $entry->enclosure->url) and save it under the nice filename.

Why do I not download the file directly in the script?

I just find it easier to use an external tool, especially as I like to pipe the output of this script into a file, so I can munge the file a bit. Eg, for this podcast, I did not download all 131 episodes, but only the 5 oldest and the 5 newest:

~/media/podcasts$ fetch_podcast.pl https://example.com/podcast.rss > all
~/media/podcasts$ head -n 5 all > test_it
~/media/podcasts$ tail -n 5 all >> test_it
~/media/podcasts$ bash test_it
...

Nice and easy!