Koha

Perl & Raku Conference (in the Cloud) Cyberspace || Vienna 2021-06-10
Thomas Klausner https://domm.plix.at domm AT plix.at

Permalink for this talk:
https://domm.plix.at/talks/2021_cic_koha

Koha

"The world's first free and open source library system"

library as in a place to rent books

"Koha is a fully featured, scalable library management system. Development is sponsored by libraries, volunteers, and support companies worldwide."

Perl!

https://koha-community.org/

Who has heard of Koha?

Who is using Koha?

Who is working on Koha?

Very big community

Yearly conferences (~ 200 attendees)

and other meetings and workshops in various countries and languages

Yesterday D-A-CH community meeting with 150 attendees

Used by more than 15.000 libraries worldwide

https://hea.koha-community.org/

"This website displays Koha usage statistics. With permission from the libraries, the following data has been collected from their installed systems."

40.000.000 users!

Steirische Landesbibliothek

Library of the state of Styria

founded 1812

regular library where you can borrow the latest crime novel

28.000 Users

They are currently migrating from an old, proprietary system to Koha

and hired Mark, David and me to setup and adapt Koha and migrate all their old data

Koha

Stable release twice a year

2021.05

2020.11

Debian packages

extensive documentation

mailing lists & IRC

Perl, Apache, MariaDB

"Intranet" - for librarians

"OPAC" - for patrons

Intranet

Z39.50

protocol for searching and retrieving information from a database

from the 80ies

If you think SOAP is bad, take a look at Z39.50

SRU

Search/Retrieve via URL

fancy librarian slang for a GET request

OPAC

online public access catalog

what patrons use to browse the library and reserve / check-out media

Define all the things

Remember the big form?

I blame this guy...

Gottfried Wilhelm Leibniz, 1646 - 1716

refined the binary number system

came up with the Decimal Classification System

wanted to build an "Alphabet of human thought"

a universal way to represent and analyze ideas and relationships

where each idea gets a unique identifier

should allow reasoning to be reduced to calculation

rebranded 300 years later as Semantic Web

Identifiers: yes

Unique: no

Well defined information is like crack to librarians

They want all of it, and use it everywhere

which is why this form is so big

Authorities

Normalizing this data into a proper SQL schema seems like a lot of work

Which is why Koha is not doing it

MARC

 01452nz  a2200385nc 4500001000400000003000800004005001700012008004100029024005100070
 035002200121035002200143035002200165035002900187040002100216040004000237042000900277
 065001400286065001500300075001400315075001700329079001800346083003700364083003800401
 083003800439083004100477083003700518150001500555450001700570550016700587670000600754
 750012300760750012500883913004301008942001501051^^201^^AT-LBST^^20210604105614.0^^88
 0701n||azznnbabn           | ana    |c^^7 ^_a4041005-5^_0http://d-nb.info/gnd/404100
 5-5^_2gnd^^  ^_a(DE-101)4041005-5^^  ^_a(DE-101)040410056^^  ^_a(DE-588)4041005-5^^ 
  ^_z(DE-588c)4041005-5^_9v:zg^^  ^_aAT-LBST^_cAT-LBST^^  ^_aDE-101^_cDE-101^_9r:DE-1
 01^_bger^_d1150^^  ^_agnd1^^  ^_a3.1^_2sswd^^  ^_a17.2^_2sswd^^  ^_bs^_2gndgen^^  ^_
 bsaz^_2gndspec^^  ^_ag^_qs^_uw^_uz^_uo^^04^_a201.3^_9d:3^_9t:2007-01-01^_222/ger^^04
 ^_a292.13^_9d:2^_9t:2007-01-01^_222/ger^^04^_a293.13^_9d:2^_9t:2007-01-01^_222/ger^^
 04^_a299.16113^_9d:2^_9t:2007-01-01^_222/ger^^04^_a398.2^_9d:2^_9t:2007-01-01^_222/g
 er^^  ^_aMythologie^^  ^_aGöttersage^^  ^_0(DE-101)040751597^_0(DE-588)4075159-4^_0h
 ttps://d-nb.info/gnd/4075159-4^_aMythos^_4vbal^_4https://d-nb.info/standards/element
 set/gnd#relatedTerm^_wr^_iVerwandter Begriff^^  ^_aM^^ 7^_aMythology^_0(DLC)sh 85089
 371^_0http://lccn.loc.gov/sh85089371^_2lcsh^_9v:MACS-Mapping. Bitte keine Änderungen
  vornehmen.^^ 7^_aMythologie^_0(FrPBN)FRBNF119325770^_0http://data.bnf.fr/11932577^_
 2ram^_9v:MACS-Mapping. Bitte keine Änderungen vornehmen.^^  ^_Sswd^_is^_aMythologie^
 _0(DE-588c)4041005-5^^  ^_aTOPIC_TERM^^^]

"When MARC was created, the Beatles were a hot new group ..." - https://jorol.github.io/processing-marc

created in the 1960s

1999 updated to MARC21

XML-Version

 <?xml version="1.0" encoding="UTF-8"?>
 <record
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
     xmlns="http://www.loc.gov/MARC21/slim">
 
   <leader>01452nz  a2200385nc 4500</leader>
   <controlfield tag="001">201</controlfield>
   <controlfield tag="003">AT-LBST</controlfield>
   <controlfield tag="005">20210604105614.0</controlfield>
   <controlfield tag="008">880701n||azznnbabn           | ana    |c</controlfield>
   <datafield tag="024" ind1="7" ind2=" ">
     <subfield code="a">4041005-5</subfield>
     <subfield code="0">http://d-nb.info/gnd/4041005-5</subfield>
     <subfield code="2">gnd</subfield>
   </datafield>
   <datafield tag="035" ind1=" " ind2=" ">
     <subfield code="a">(DE-101)4041005-5</subfield>
   </datafield>
   <datafield tag="035" ind1=" " ind2=" ">
     <subfield code="a">(DE-101)040410056</subfield>
   </datafield>
   <datafield tag="035" ind1=" " ind2=" ">
     <subfield code="a">(DE-588)4041005-5</subfield>
   </datafield>
   <datafield tag="035" ind1=" " ind2=" ">
     <subfield code="z">(DE-588c)4041005-5</subfield>
     <subfield code="9">v:zg</subfield>
   </datafield>
   <datafield tag="040" ind1=" " ind2=" ">
     <subfield code="a">AT-LBST</subfield>
     <subfield code="c">AT-LBST</subfield>
   </datafield>
   <datafield tag="040" ind1=" " ind2=" ">
     <subfield code="a">DE-101</subfield>
     <subfield code="c">DE-101</subfield>
     <subfield code="9">r:DE-101</subfield>
     <subfield code="b">ger</subfield>
     <subfield code="d">1150</subfield>
   </datafield>
   <datafield tag="042" ind1=" " ind2=" ">
     <subfield code="a">gnd1</subfield>
   </datafield>
   <datafield tag="065" ind1=" " ind2=" ">
     <subfield code="a">3.1</subfield>
     <subfield code="2">sswd</subfield>
   </datafield>
   <datafield tag="065" ind1=" " ind2=" ">
     <subfield code="a">17.2</subfield>
     <subfield code="2">sswd</subfield>
   </datafield>
   <datafield tag="075" ind1=" " ind2=" ">
     <subfield code="b">s</subfield>
     <subfield code="2">gndgen</subfield>
   </datafield>
   <datafield tag="075" ind1=" " ind2=" ">
     <subfield code="b">saz</subfield>
     <subfield code="2">gndspec</subfield>
   </datafield>
   <datafield tag="079" ind1=" " ind2=" ">
     <subfield code="a">g</subfield>
     <subfield code="q">s</subfield>
     <subfield code="u">w</subfield>
     <subfield code="u">z</subfield>
     <subfield code="u">o</subfield>
   </datafield>
   <datafield tag="083" ind1="0" ind2="4">
     <subfield code="a">201.3</subfield>
     <subfield code="9">d:3</subfield>
     <subfield code="9">t:2007-01-01</subfield>
     <subfield code="2">22/ger</subfield>
   </datafield>
   <datafield tag="083" ind1="0" ind2="4">
     <subfield code="a">292.13</subfield>
     <subfield code="9">d:2</subfield>
     <subfield code="9">t:2007-01-01</subfield>
     <subfield code="2">22/ger</subfield>
   </datafield>
   <datafield tag="083" ind1="0" ind2="4">
     <subfield code="a">293.13</subfield>
     <subfield code="9">d:2</subfield>
     <subfield code="9">t:2007-01-01</subfield>
     <subfield code="2">22/ger</subfield>
   </datafield>
   <datafield tag="083" ind1="0" ind2="4">
     <subfield code="a">299.16113</subfield>
     <subfield code="9">d:2</subfield>
     <subfield code="9">t:2007-01-01</subfield>
     <subfield code="2">22/ger</subfield>
   </datafield>
   <datafield tag="083" ind1="0" ind2="4">
     <subfield code="a">398.2</subfield>
     <subfield code="9">d:2</subfield>
     <subfield code="9">t:2007-01-01</subfield>
     <subfield code="2">22/ger</subfield>
   </datafield>
   <datafield tag="150" ind1=" " ind2=" ">
     <subfield code="a">Mythologie</subfield>
   </datafield>
   <datafield tag="450" ind1=" " ind2=" ">
     <subfield code="a">Göttersage</subfield>
   </datafield>
   <datafield tag="550" ind1=" " ind2=" ">
     <subfield code="0">(DE-101)040751597</subfield>
     <subfield code="0">(DE-588)4075159-4</subfield>
     <subfield code="0">https://d-nb.info/gnd/4075159-4</subfield>
     <subfield code="a">Mythos</subfield>
     <subfield code="4">vbal</subfield>
     <subfield code="4">https://d-nb.info/standards/elementset/gnd#relatedTerm</subfield>
     <subfield code="w">r</subfield>
     <subfield code="i">Verwandter Begriff</subfield>
   </datafield>
   <datafield tag="670" ind1=" " ind2=" ">
     <subfield code="a">M</subfield>
   </datafield>
   <datafield tag="750" ind1=" " ind2="7">
     <subfield code="a">Mythology</subfield>
     <subfield code="0">(DLC)sh 85089371</subfield>
     <subfield code="0">http://lccn.loc.gov/sh85089371</subfield>
     <subfield code="2">lcsh</subfield>
     <subfield code="9">v:MACS-Mapping. Bitte keine Änderungen vornehmen.</subfield>
   </datafield>
   <datafield tag="750" ind1=" " ind2="7">
     <subfield code="a">Mythologie</subfield>
     <subfield code="0">(FrPBN)FRBNF119325770</subfield>
     <subfield code="0">http://data.bnf.fr/11932577</subfield>
     <subfield code="2">ram</subfield>
     <subfield code="9">v:MACS-Mapping. Bitte keine Änderungen vornehmen.</subfield>
   </datafield>
   <datafield tag="913" ind1=" " ind2=" ">
     <subfield code="S">swd</subfield>
     <subfield code="i">s</subfield>
     <subfield code="a">Mythologie</subfield>
     <subfield code="0">(DE-588c)4041005-5</subfield>
   </datafield>
   <datafield tag="942" ind1=" " ind2=" ">
     <subfield code="a">TOPIC_TERM</subfield>
   </datafield>
 </record>

Koha stores most of the bibliographic data as MARC21 in a text blob field

This makes the DB schema rather easy, but querying can be a bit cumbersome

Which is why they use external search engines (ElasticSearch, Zebra) for most searches

MARC is horrible and quite awesome at the same time

Because librarians invented it, it is very well documented

https://www.loc.gov/marc/

https://www.loc.gov/marc/bibliographic/bd100.html

Perl has a lot of good modules to work with MARC

MARC::Record

Catmandu - a data toolkit

You can use Catmandu to transform / convert various dataformats used in libraries

But it's also a nice tool to convert JSON, CSV and other "normal" formats

Koha & Perl

Koha is old

first released in 2000

developed primarily by librarians

"accidental programmers"

Some of my next comments might sound harsh

But I want to make very clear that I find it amazing that the Koha community has managed to keep their project going for 20 years

And slowly modernize their code base

And to grow to 15.000 installations with 40.000.000 users

https://github.com/Koha-Community/Koha

 C4             bin                koha-tmpl          rewrite-config.PL
 INSTALL        catalogue          koha_perl_deps.pl  rotating_collections
 Koha           cataloguing        kohaversion.pl     serials
 Koha.pm        changelanguage.pl  labels             services
 LICENSE        circ               mainpage.pl        skel
 MANIFEST.SKIP  clubs              members            suggestion
 Makefile.PL    course_reserves    misc               svc
 README         cpanfile           offline_circ       t
 README.md      debian             opac               tags
 README.robots  docs               package.json       tmp
 about.pl       errors             patron_lists       tools
 acqui          etc                patroncards        virtualshelves
 admin          fix-perl-path.PL   plugins            xt
 api            gulpfile.js        pos                yarn.lock
 app.psgi       help.pl            reports
 authorities    ill                reserve
 basket         installer          reviews

Not using the standard layout for Perl code

lib, bin, ..

Usually a bad code smell.

Let's look at some code

Rendering and handling the BIG FORM

cataloguing/addbiblio.pl

A .pl file, not a module?

 file: cataloguing/addbiblio.pl
 #!/usr/bin/perl
 
 use Modern::Perl;
 use CGI q(-utf8);
 file: cataloguing/addbiblio.pl
 #!/usr/bin/perl
 
 use Modern::Perl;
 use CGI q(-utf8);
 file: cataloguing/addbiblio.pl
 #!/usr/bin/perl
 
 use Modern::Perl;
 use CGI q(-utf8);
 file: cataloguing/addbiblio.pl 695ff
 # ========================
 #          MAIN
 #=========================
 my $input = CGI->new;
 my $error = $input->param('error');
 my $biblionumber  = $input->param('biblionumber'); # if biblionumber exists, it's a modif, not a new biblio.
 my $parentbiblio  = $input->param('parentbiblionumber');
 my $breedingid    = $input->param('breedingid');
 my $z3950         = $input->param('z3950');
 my $op            = $input->param('op') // q{};
 my $mode          = $input->param('mode');
 my $frameworkcode = $input->param('frameworkcode');
 my $redirect      = $input->param('redirect');
 my $searchid      = $input->param('searchid') // "";
 my $dbh           = C4::Context->dbh;
 my $hostbiblionumber = $input->param('hostbiblionumber');
 my $hostitemnumber = $input->param('hostitemnumber');
 # fast cataloguing datas in transit
 my $fa_circborrowernumber = $input->param('circborrowernumber');
 my $fa_barcode            = $input->param('barcode');
 my $fa_branch             = $input->param('branch');
 my $fa_stickyduedate      = $input->param('stickyduedate');
 my $fa_duedatespec        = $input->param('duedatespec');
 file: cataloguing/addbiblio.pl 695ff
 # ========================
 #          MAIN
 #=========================
 my $input = CGI->new;
 my $error = $input->param('error');
 my $biblionumber  = $input->param('biblionumber'); # if biblionumber exists, it's a modif, not a new biblio.
 my $parentbiblio  = $input->param('parentbiblionumber');
 my $breedingid    = $input->param('breedingid');
 my $z3950         = $input->param('z3950');
 my $op            = $input->param('op') // q{};
 my $mode          = $input->param('mode');
 my $frameworkcode = $input->param('frameworkcode');
 my $redirect      = $input->param('redirect');
 my $searchid      = $input->param('searchid') // "";
 my $dbh           = C4::Context->dbh;
 my $hostbiblionumber = $input->param('hostbiblionumber');
 my $hostitemnumber = $input->param('hostitemnumber');
 # fast cataloguing datas in transit
 my $fa_circborrowernumber = $input->param('circborrowernumber');
 my $fa_barcode            = $input->param('barcode');
 my $fa_branch             = $input->param('branch');
 my $fa_stickyduedate      = $input->param('stickyduedate');
 my $fa_duedatespec        = $input->param('duedatespec');
 file: cataloguing/addbiblio.pl 695ff
 # ========================
 #          MAIN
 #=========================
 my $input = CGI->new;
 my $error = $input->param('error');
 my $biblionumber  = $input->param('biblionumber'); # if biblionumber exists, it's a modif, not a new biblio.
 my $parentbiblio  = $input->param('parentbiblionumber');
 my $breedingid    = $input->param('breedingid');
 my $z3950         = $input->param('z3950');
 my $op            = $input->param('op') // q{};
 my $mode          = $input->param('mode');
 my $frameworkcode = $input->param('frameworkcode');
 my $redirect      = $input->param('redirect');
 my $searchid      = $input->param('searchid') // "";
 my $dbh           = C4::Context->dbh;
 my $hostbiblionumber = $input->param('hostbiblionumber');
 my $hostitemnumber = $input->param('hostitemnumber');
 # fast cataloguing datas in transit
 my $fa_circborrowernumber = $input->param('circborrowernumber');
 my $fa_barcode            = $input->param('barcode');
 my $fa_branch             = $input->param('branch');
 my $fa_stickyduedate      = $input->param('stickyduedate');
 my $fa_duedatespec        = $input->param('duedatespec');
 file: cataloguing/addbiblio.pl 695ff
 # ========================
 #          MAIN
 #=========================
 my $input = CGI->new;
 my $error = $input->param('error');
 my $biblionumber  = $input->param('biblionumber'); # if biblionumber exists, it's a modif, not a new biblio.
 my $parentbiblio  = $input->param('parentbiblionumber');
 my $breedingid    = $input->param('breedingid');
 my $z3950         = $input->param('z3950');
 my $op            = $input->param('op') // q{};
 my $mode          = $input->param('mode');
 my $frameworkcode = $input->param('frameworkcode');
 my $redirect      = $input->param('redirect');
 my $searchid      = $input->param('searchid') // "";
 my $dbh           = C4::Context->dbh;
 my $hostbiblionumber = $input->param('hostbiblionumber');
 my $hostitemnumber = $input->param('hostitemnumber');
 # fast cataloguing datas in transit
 my $fa_circborrowernumber = $input->param('circborrowernumber');
 my $fa_barcode            = $input->param('barcode');
 my $fa_branch             = $input->param('branch');
 my $fa_stickyduedate      = $input->param('stickyduedate');
 my $fa_duedatespec        = $input->param('duedatespec');
 $tagslib         = &GetMarcStructure( 1, $frameworkcode );
 $usedTagsLib     = &GetUsedMarcStructure( $frameworkcode );
 $mandatory_z3950 = GetMandatoryFieldZ3950($frameworkcode);
 $tagslib         = &GetMarcStructure( 1, $frameworkcode );
 $usedTagsLib     = &GetUsedMarcStructure( $frameworkcode );
 $mandatory_z3950 = GetMandatoryFieldZ3950($frameworkcode);
 $template->param(
     popup => $mode,
     frameworkcode => $frameworkcode,
     itemtype => $frameworkcode,
     borrowernumber => $loggedinuser,
     tab => scalar $input->param('tab')
 );
 
 output_html_with_http_headers $input, $cookie, $template->output;
 $template->param(
     popup => $mode,
     frameworkcode => $frameworkcode,
     itemtype => $frameworkcode,
     borrowernumber => $loggedinuser,
     tab => scalar $input->param('tab')
 );
 
 output_html_with_http_headers $input, $cookie, $template->output;

CGI

mod_perl

Plack

REST API using Mojolicous

jQuery / Template::Toolkit

Ship of Theseus

"If you change each piece of a ship over time, is it still the same ship?" (Plutarch)

 # CGI / mod_perl
 ScriptAlias /cgi-bin/koha/ "/usr/share/koha/intranet/cgi-bin/"
 my $intranet = Plack::App::CGIBin->new(
     root => "$home/intranet/cgi-bin"
 )->to_app;
 # Plack
 ProxyPass /cgi-bin/koha "http://koha:5005/intranet"

A few versions ago, RabbitMQ was added to handle long-running jobs

But not all log-running jobs have been converted to the new background-job system

Here's how "old" background jobs are implemented:

  if (my $pid = fork) {
      # parent
      # return job ID as JSON
      my $reply = CGI->new("");
      print $reply->header(-type => 'text/html');
      print '{"jobID":"' . $jobID . '"}';
      exit 0;
  } elsif (defined $pid) {
      # child
      # close STDOUT/STDERR to signal to end CGI session with Apache
      # Otherwise, the AJAX request to this script won't return properly
      close STDOUT;
      close STDERR;
  }
  # work continues here in fork ..
  if (my $pid = fork) {
      # parent
      # return job ID as JSON
      my $reply = CGI->new("");
      print $reply->header(-type => 'text/html');
      print '{"jobID":"' . $jobID . '"}';
      exit 0;
  } elsif (defined $pid) {
      # child
      # close STDOUT/STDERR to signal to end CGI session with Apache
      # Otherwise, the AJAX request to this script won't return properly
      close STDOUT;
      close STDERR;
  }
  # work continues here in fork ..
  if (my $pid = fork) {
      # parent
      # return job ID as JSON
      my $reply = CGI->new("");
      print $reply->header(-type => 'text/html');
      print '{"jobID":"' . $jobID . '"}';
      exit 0;
  } elsif (defined $pid) {
      # child
      # close STDOUT/STDERR to signal to end CGI session with Apache
      # Otherwise, the AJAX request to this script won't return properly
      close STDOUT;
      close STDERR;
  }
  # work continues here in fork ..

This does not work very well with Plack

where exit 0 is not a good idea

because it kills the Plack server

                                                                           . 
  ProxyPass /cgi-bin/koha "http://koha:5005/intranet"
  ProxyPass "/cgi-bin/koha/tools/manage-marc-import.pl" "!"
  ProxyPass /cgi-bin/koha "http://koha:5005/intranet"
  ProxyPass "/cgi-bin/koha/tools/manage-marc-import.pl" "!"
  ProxyPass /cgi-bin/koha "http://koha:5005/intranet"

Don't ProxyPass to Plack, but run as plain, old CGI

This is ugly, but works

(A secret Perl motto?)

And of course this script can be converted to the new background job method

changing another plank of the Koha ship

Summary

Koha is a fascinating piece of software

Used by a lot of people

Could be better integrated into the Perl community

Could probably benefit from a few more Perl / webdev experts

Questions