Koha
Permalink for this talk:
https://domm.plix.at/talks/2021_cic_koha
Koha
"The world's first free and open source library system"
library as in a place to rent books
"Koha is a fully featured, scalable library management system. Development is sponsored by libraries, volunteers, and support companies worldwide."
Perl!
Who has heard of Koha?
Who is using Koha?
Who is working on Koha?
Very big community
Yearly conferences (~ 200 attendees)
and other meetings and workshops in various countries and languages
Yesterday D-A-CH community meeting with 150 attendees
Used by more than 15.000 libraries worldwide
https://hea.koha-community.org/
"This website displays Koha usage statistics. With permission from the libraries, the following data has been collected from their installed systems."
40.000.000 users!
Steirische Landesbibliothek
Library of the state of Styria
founded 1812
regular library where you can borrow the latest crime novel
28.000 Users
They are currently migrating from an old, proprietary system to Koha
and hired Mark, David and me to setup and adapt Koha and migrate all their old data
Koha
Stable release twice a year
2021.05
2020.11
Debian packages
extensive documentation
mailing lists & IRC
Perl, Apache, MariaDB
"Intranet" - for librarians
"OPAC" - for patrons
Intranet
Z39.50
protocol for searching and retrieving information from a database
from the 80ies
If you think SOAP is bad, take a look at Z39.50
SRU
Search/Retrieve via URL
fancy librarian slang for a GET request
OPAC
online public access catalog
what patrons use to browse the library and reserve / check-out media
Define all the things
Remember the big form?
I blame this guy...
Gottfried Wilhelm Leibniz, 1646 - 1716
refined the binary number system
came up with the Decimal Classification System
wanted to build an "Alphabet of human thought"
a universal way to represent and analyze ideas and relationships
where each idea gets a unique identifier
should allow reasoning to be reduced to calculation
rebranded 300 years later as Semantic Web
Identifiers: yes
Unique: no
Well defined information is like crack to librarians
They want all of it, and use it everywhere
which is why this form is so big
Authorities
Normalizing this data into a proper SQL schema seems like a lot of work
Which is why Koha is not doing it
MARC
01452nz a2200385nc 4500001000400000003000800004005001700012008004100029024005100070 035002200121035002200143035002200165035002900187040002100216040004000237042000900277 065001400286065001500300075001400315075001700329079001800346083003700364083003800401 083003800439083004100477083003700518150001500555450001700570550016700587670000600754 750012300760750012500883913004301008942001501051^^201^^AT-LBST^^20210604105614.0^^88 0701n||azznnbabn | ana |c^^7 ^_a4041005-5^_0http://d-nb.info/gnd/404100 5-5^_2gnd^^ ^_a(DE-101)4041005-5^^ ^_a(DE-101)040410056^^ ^_a(DE-588)4041005-5^^ ^_z(DE-588c)4041005-5^_9v:zg^^ ^_aAT-LBST^_cAT-LBST^^ ^_aDE-101^_cDE-101^_9r:DE-1 01^_bger^_d1150^^ ^_agnd1^^ ^_a3.1^_2sswd^^ ^_a17.2^_2sswd^^ ^_bs^_2gndgen^^ ^_ bsaz^_2gndspec^^ ^_ag^_qs^_uw^_uz^_uo^^04^_a201.3^_9d:3^_9t:2007-01-01^_222/ger^^04 ^_a292.13^_9d:2^_9t:2007-01-01^_222/ger^^04^_a293.13^_9d:2^_9t:2007-01-01^_222/ger^^ 04^_a299.16113^_9d:2^_9t:2007-01-01^_222/ger^^04^_a398.2^_9d:2^_9t:2007-01-01^_222/g er^^ ^_aMythologie^^ ^_aGöttersage^^ ^_0(DE-101)040751597^_0(DE-588)4075159-4^_0h ttps://d-nb.info/gnd/4075159-4^_aMythos^_4vbal^_4https://d-nb.info/standards/element set/gnd#relatedTerm^_wr^_iVerwandter Begriff^^ ^_aM^^ 7^_aMythology^_0(DLC)sh 85089 371^_0http://lccn.loc.gov/sh85089371^_2lcsh^_9v:MACS-Mapping. Bitte keine Änderungen vornehmen.^^ 7^_aMythologie^_0(FrPBN)FRBNF119325770^_0http://data.bnf.fr/11932577^_ 2ram^_9v:MACS-Mapping. Bitte keine Änderungen vornehmen.^^ ^_Sswd^_is^_aMythologie^ _0(DE-588c)4041005-5^^ ^_aTOPIC_TERM^^^]
"When MARC was created, the Beatles were a hot new group ..." - https://jorol.github.io/processing-marc
created in the 1960s
1999 updated to MARC21
XML-Version
<?xml version="1.0" encoding="UTF-8"?> <record xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd" xmlns="http://www.loc.gov/MARC21/slim"> <leader>01452nz a2200385nc 4500</leader> <controlfield tag="001">201</controlfield> <controlfield tag="003">AT-LBST</controlfield> <controlfield tag="005">20210604105614.0</controlfield> <controlfield tag="008">880701n||azznnbabn | ana |c</controlfield> <datafield tag="024" ind1="7" ind2=" "> <subfield code="a">4041005-5</subfield> <subfield code="0">http://d-nb.info/gnd/4041005-5</subfield> <subfield code="2">gnd</subfield> </datafield> <datafield tag="035" ind1=" " ind2=" "> <subfield code="a">(DE-101)4041005-5</subfield> </datafield> <datafield tag="035" ind1=" " ind2=" "> <subfield code="a">(DE-101)040410056</subfield> </datafield> <datafield tag="035" ind1=" " ind2=" "> <subfield code="a">(DE-588)4041005-5</subfield> </datafield> <datafield tag="035" ind1=" " ind2=" "> <subfield code="z">(DE-588c)4041005-5</subfield> <subfield code="9">v:zg</subfield> </datafield> <datafield tag="040" ind1=" " ind2=" "> <subfield code="a">AT-LBST</subfield> <subfield code="c">AT-LBST</subfield> </datafield> <datafield tag="040" ind1=" " ind2=" "> <subfield code="a">DE-101</subfield> <subfield code="c">DE-101</subfield> <subfield code="9">r:DE-101</subfield> <subfield code="b">ger</subfield> <subfield code="d">1150</subfield> </datafield> <datafield tag="042" ind1=" " ind2=" "> <subfield code="a">gnd1</subfield> </datafield> <datafield tag="065" ind1=" " ind2=" "> <subfield code="a">3.1</subfield> <subfield code="2">sswd</subfield> </datafield> <datafield tag="065" ind1=" " ind2=" "> <subfield code="a">17.2</subfield> <subfield code="2">sswd</subfield> </datafield> <datafield tag="075" ind1=" " ind2=" "> <subfield code="b">s</subfield> <subfield code="2">gndgen</subfield> </datafield> <datafield tag="075" ind1=" " ind2=" "> <subfield code="b">saz</subfield> <subfield code="2">gndspec</subfield> </datafield> <datafield tag="079" ind1=" " ind2=" "> <subfield code="a">g</subfield> <subfield code="q">s</subfield> <subfield code="u">w</subfield> <subfield code="u">z</subfield> <subfield code="u">o</subfield> </datafield> <datafield tag="083" ind1="0" ind2="4"> <subfield code="a">201.3</subfield> <subfield code="9">d:3</subfield> <subfield code="9">t:2007-01-01</subfield> <subfield code="2">22/ger</subfield> </datafield> <datafield tag="083" ind1="0" ind2="4"> <subfield code="a">292.13</subfield> <subfield code="9">d:2</subfield> <subfield code="9">t:2007-01-01</subfield> <subfield code="2">22/ger</subfield> </datafield> <datafield tag="083" ind1="0" ind2="4"> <subfield code="a">293.13</subfield> <subfield code="9">d:2</subfield> <subfield code="9">t:2007-01-01</subfield> <subfield code="2">22/ger</subfield> </datafield> <datafield tag="083" ind1="0" ind2="4"> <subfield code="a">299.16113</subfield> <subfield code="9">d:2</subfield> <subfield code="9">t:2007-01-01</subfield> <subfield code="2">22/ger</subfield> </datafield> <datafield tag="083" ind1="0" ind2="4"> <subfield code="a">398.2</subfield> <subfield code="9">d:2</subfield> <subfield code="9">t:2007-01-01</subfield> <subfield code="2">22/ger</subfield> </datafield> <datafield tag="150" ind1=" " ind2=" "> <subfield code="a">Mythologie</subfield> </datafield> <datafield tag="450" ind1=" " ind2=" "> <subfield code="a">Göttersage</subfield> </datafield> <datafield tag="550" ind1=" " ind2=" "> <subfield code="0">(DE-101)040751597</subfield> <subfield code="0">(DE-588)4075159-4</subfield> <subfield code="0">https://d-nb.info/gnd/4075159-4</subfield> <subfield code="a">Mythos</subfield> <subfield code="4">vbal</subfield> <subfield code="4">https://d-nb.info/standards/elementset/gnd#relatedTerm</subfield> <subfield code="w">r</subfield> <subfield code="i">Verwandter Begriff</subfield> </datafield> <datafield tag="670" ind1=" " ind2=" "> <subfield code="a">M</subfield> </datafield> <datafield tag="750" ind1=" " ind2="7"> <subfield code="a">Mythology</subfield> <subfield code="0">(DLC)sh 85089371</subfield> <subfield code="0">http://lccn.loc.gov/sh85089371</subfield> <subfield code="2">lcsh</subfield> <subfield code="9">v:MACS-Mapping. Bitte keine Änderungen vornehmen.</subfield> </datafield> <datafield tag="750" ind1=" " ind2="7"> <subfield code="a">Mythologie</subfield> <subfield code="0">(FrPBN)FRBNF119325770</subfield> <subfield code="0">http://data.bnf.fr/11932577</subfield> <subfield code="2">ram</subfield> <subfield code="9">v:MACS-Mapping. Bitte keine Änderungen vornehmen.</subfield> </datafield> <datafield tag="913" ind1=" " ind2=" "> <subfield code="S">swd</subfield> <subfield code="i">s</subfield> <subfield code="a">Mythologie</subfield> <subfield code="0">(DE-588c)4041005-5</subfield> </datafield> <datafield tag="942" ind1=" " ind2=" "> <subfield code="a">TOPIC_TERM</subfield> </datafield> </record>
Koha stores most of the bibliographic data as MARC21 in a text blob field
This makes the DB schema rather easy, but querying can be a bit cumbersome
Which is why they use external search engines (ElasticSearch, Zebra) for most searches
MARC is horrible and quite awesome at the same time
Because librarians invented it, it is very well documented
https://www.loc.gov/marc/bibliographic/bd100.html
Perl has a lot of good modules to work with MARC
MARC::Record
Catmandu - a data toolkit
You can use Catmandu to transform / convert various dataformats used in libraries
But it's also a nice tool to convert JSON, CSV and other "normal" formats
Koha & Perl
Koha is old
first released in 2000
developed primarily by librarians
"accidental programmers"
Some of my next comments might sound harsh
But I want to make very clear that I find it amazing that the Koha community has managed to keep their project going for 20 years
And slowly modernize their code base
And to grow to 15.000 installations with 40.000.000 users
https://github.com/Koha-Community/Koha
C4 bin koha-tmpl rewrite-config.PL INSTALL catalogue koha_perl_deps.pl rotating_collections Koha cataloguing kohaversion.pl serials Koha.pm changelanguage.pl labels services LICENSE circ mainpage.pl skel MANIFEST.SKIP clubs members suggestion Makefile.PL course_reserves misc svc README cpanfile offline_circ t README.md debian opac tags README.robots docs package.json tmp about.pl errors patron_lists tools acqui etc patroncards virtualshelves admin fix-perl-path.PL plugins xt api gulpfile.js pos yarn.lock app.psgi help.pl reports authorities ill reserve basket installer reviews
Not using the standard layout for Perl code
lib
, bin
, ..
Usually a bad code smell.
Let's look at some code
Rendering and handling the BIG FORM
cataloguing/addbiblio.pl
A .pl
file, not a module?
file: cataloguing/addbiblio.pl #!/usr/bin/perl use Modern::Perl; use CGI q(-utf8);
file: cataloguing/addbiblio.pl #!/usr/bin/perl use Modern::Perl; use CGI q(-utf8);
file: cataloguing/addbiblio.pl #!/usr/bin/perl use Modern::Perl; use CGI q(-utf8);
file: cataloguing/addbiblio.pl 695ff # ======================== # MAIN #========================= my $input = CGI->new; my $error = $input->param('error'); my $biblionumber = $input->param('biblionumber'); # if biblionumber exists, it's a modif, not a new biblio. my $parentbiblio = $input->param('parentbiblionumber'); my $breedingid = $input->param('breedingid'); my $z3950 = $input->param('z3950'); my $op = $input->param('op') // q{}; my $mode = $input->param('mode'); my $frameworkcode = $input->param('frameworkcode'); my $redirect = $input->param('redirect'); my $searchid = $input->param('searchid') // ""; my $dbh = C4::Context->dbh; my $hostbiblionumber = $input->param('hostbiblionumber'); my $hostitemnumber = $input->param('hostitemnumber'); # fast cataloguing datas in transit my $fa_circborrowernumber = $input->param('circborrowernumber'); my $fa_barcode = $input->param('barcode'); my $fa_branch = $input->param('branch'); my $fa_stickyduedate = $input->param('stickyduedate'); my $fa_duedatespec = $input->param('duedatespec');
file: cataloguing/addbiblio.pl 695ff # ======================== # MAIN #========================= my $input = CGI->new; my $error = $input->param('error'); my $biblionumber = $input->param('biblionumber'); # if biblionumber exists, it's a modif, not a new biblio. my $parentbiblio = $input->param('parentbiblionumber'); my $breedingid = $input->param('breedingid'); my $z3950 = $input->param('z3950'); my $op = $input->param('op') // q{}; my $mode = $input->param('mode'); my $frameworkcode = $input->param('frameworkcode'); my $redirect = $input->param('redirect'); my $searchid = $input->param('searchid') // ""; my $dbh = C4::Context->dbh; my $hostbiblionumber = $input->param('hostbiblionumber'); my $hostitemnumber = $input->param('hostitemnumber'); # fast cataloguing datas in transit my $fa_circborrowernumber = $input->param('circborrowernumber'); my $fa_barcode = $input->param('barcode'); my $fa_branch = $input->param('branch'); my $fa_stickyduedate = $input->param('stickyduedate'); my $fa_duedatespec = $input->param('duedatespec');
file: cataloguing/addbiblio.pl 695ff # ======================== # MAIN #========================= my $input = CGI->new; my $error = $input->param('error'); my $biblionumber = $input->param('biblionumber'); # if biblionumber exists, it's a modif, not a new biblio. my $parentbiblio = $input->param('parentbiblionumber'); my $breedingid = $input->param('breedingid'); my $z3950 = $input->param('z3950'); my $op = $input->param('op') // q{}; my $mode = $input->param('mode'); my $frameworkcode = $input->param('frameworkcode'); my $redirect = $input->param('redirect'); my $searchid = $input->param('searchid') // ""; my $dbh = C4::Context->dbh; my $hostbiblionumber = $input->param('hostbiblionumber'); my $hostitemnumber = $input->param('hostitemnumber'); # fast cataloguing datas in transit my $fa_circborrowernumber = $input->param('circborrowernumber'); my $fa_barcode = $input->param('barcode'); my $fa_branch = $input->param('branch'); my $fa_stickyduedate = $input->param('stickyduedate'); my $fa_duedatespec = $input->param('duedatespec');
file: cataloguing/addbiblio.pl 695ff # ======================== # MAIN #========================= my $input = CGI->new; my $error = $input->param('error'); my $biblionumber = $input->param('biblionumber'); # if biblionumber exists, it's a modif, not a new biblio. my $parentbiblio = $input->param('parentbiblionumber'); my $breedingid = $input->param('breedingid'); my $z3950 = $input->param('z3950'); my $op = $input->param('op') // q{}; my $mode = $input->param('mode'); my $frameworkcode = $input->param('frameworkcode'); my $redirect = $input->param('redirect'); my $searchid = $input->param('searchid') // ""; my $dbh = C4::Context->dbh; my $hostbiblionumber = $input->param('hostbiblionumber'); my $hostitemnumber = $input->param('hostitemnumber'); # fast cataloguing datas in transit my $fa_circborrowernumber = $input->param('circborrowernumber'); my $fa_barcode = $input->param('barcode'); my $fa_branch = $input->param('branch'); my $fa_stickyduedate = $input->param('stickyduedate'); my $fa_duedatespec = $input->param('duedatespec');
$tagslib = &GetMarcStructure( 1, $frameworkcode ); $usedTagsLib = &GetUsedMarcStructure( $frameworkcode ); $mandatory_z3950 = GetMandatoryFieldZ3950($frameworkcode);
$tagslib = &GetMarcStructure( 1, $frameworkcode ); $usedTagsLib = &GetUsedMarcStructure( $frameworkcode ); $mandatory_z3950 = GetMandatoryFieldZ3950($frameworkcode);
$template->param( popup => $mode, frameworkcode => $frameworkcode, itemtype => $frameworkcode, borrowernumber => $loggedinuser, tab => scalar $input->param('tab') ); output_html_with_http_headers $input, $cookie, $template->output;
$template->param( popup => $mode, frameworkcode => $frameworkcode, itemtype => $frameworkcode, borrowernumber => $loggedinuser, tab => scalar $input->param('tab') ); output_html_with_http_headers $input, $cookie, $template->output;
CGI
mod_perl
Plack
REST API using Mojolicous
jQuery / Template::Toolkit
Ship of Theseus
"If you change each piece of a ship over time, is it still the same ship?" (Plutarch)
# CGI / mod_perl ScriptAlias /cgi-bin/koha/ "/usr/share/koha/intranet/cgi-bin/"
my $intranet = Plack::App::CGIBin->new( root => "$home/intranet/cgi-bin" )->to_app;
# Plack ProxyPass /cgi-bin/koha "http://koha:5005/intranet"
A few versions ago, RabbitMQ was added to handle long-running jobs
But not all log-running jobs have been converted to the new background-job system
Here's how "old" background jobs are implemented:
if (my $pid = fork) { # parent # return job ID as JSON my $reply = CGI->new(""); print $reply->header(-type => 'text/html'); print '{"jobID":"' . $jobID . '"}'; exit 0; } elsif (defined $pid) { # child # close STDOUT/STDERR to signal to end CGI session with Apache # Otherwise, the AJAX request to this script won't return properly close STDOUT; close STDERR; } # work continues here in fork ..
if (my $pid = fork) { # parent # return job ID as JSON my $reply = CGI->new(""); print $reply->header(-type => 'text/html'); print '{"jobID":"' . $jobID . '"}'; exit 0; } elsif (defined $pid) { # child # close STDOUT/STDERR to signal to end CGI session with Apache # Otherwise, the AJAX request to this script won't return properly close STDOUT; close STDERR; } # work continues here in fork ..
if (my $pid = fork) { # parent # return job ID as JSON my $reply = CGI->new(""); print $reply->header(-type => 'text/html'); print '{"jobID":"' . $jobID . '"}'; exit 0; } elsif (defined $pid) { # child # close STDOUT/STDERR to signal to end CGI session with Apache # Otherwise, the AJAX request to this script won't return properly close STDOUT; close STDERR; } # work continues here in fork ..
This does not work very well with Plack
where exit 0
is not a good idea
because it kills the Plack server
. ProxyPass /cgi-bin/koha "http://koha:5005/intranet"
ProxyPass "/cgi-bin/koha/tools/manage-marc-import.pl" "!" ProxyPass /cgi-bin/koha "http://koha:5005/intranet"
ProxyPass "/cgi-bin/koha/tools/manage-marc-import.pl" "!" ProxyPass /cgi-bin/koha "http://koha:5005/intranet"
Don't ProxyPass to Plack, but run as plain, old CGI
This is ugly, but works
(A secret Perl motto?)
And of course this script can be converted to the new background job method
changing another plank of the Koha ship
Summary
Koha is a fascinating piece of software
Used by a lot of people
Could be better integrated into the Perl community
Could probably benefit from a few more Perl / webdev experts