=encoding utf-8 =pod =head1 9 PostgreSQL features I wish I learned earlier =head2 META =head3 Author Thomas Klausner =head3 Email domm AT plix.at =head3 URL http://domm.plix.at =head3 URL_slides http://domm.plix.at/talks/2016_cluj_postgres_features_i_wished_i_learned_earlier =head3 Date 2016-08-25 =head3 Location Cluj =head3 Event The Europen Perl Conference aka YAPC::Europe 2016 =head2 /me Thomas Klausner domm http://domm.plix.at @domm_favors_irc =for newslide Vienna.pm =for newslide =for img alpine.png =for newslide =for img bicyclepm.png =for html s/🚗/🚲/g =head2 Intro In this talk I'll show you some real life examples of some stupid things I did And how to do them less stupid using some nifty Postgres features =for newslide I orginally wrote this talk for the PostgreSQL Conference Europe (nice name!) so it contains some Baby-Perl which I'm going to mostly skip (also because the talk was 50min there...) =head2 Connecting to Postgres Postgres supports a lot of different ways to connect to your database cluster. It also provides very fine-grained access control methods. You can C different privileges to different roles not only for schemata and tables, but even columns. Still, I recently inherited a project (using Postgres) When I tried to set it up on my laptop, I looked through the code to find out how I should connect the app to the DB =for newslide ~/project/knowledge$ ack 'DBI\-\>connect' My screen blurred with output like this: my $dbh=DBI->connect( 'dbi:Pg:dbname=knowledge;host=127.0.0.1;port=5431', 'hans', 'humptydumpty'); Hardcoded username, hardcoded password ~/project/knowledge$ ack -h -c 'DBI\-\>connect' 101 =for newslide This is of course an example of horrible programming. The following pattern is a bit less horrible: =for newslide package **KB::Stats::DBConnect**; use strict; use warnings; use DBI; sub dbh { return DBI->connect( 'dbi:Pg:dbname=' . dbname(), username(), password(), ); } =for newslide package KB::Stats::DBConnect; use strict; use warnings; use **DBI**; sub dbh { return DBI->connect( 'dbi:Pg:dbname=' . dbname(), username(), password(), ); } =for newslide package KB::Stats::DBConnect; use strict; use warnings; use DBI; sub **dbh** { return DBI->connect( 'dbi:Pg:dbname=' . dbname(), username(), password(), ); } =for newslide package KB::Stats::DBConnect; use strict; use warnings; use DBI; sub dbh { return %%DBI%%->**connect**( 'dbi:Pg:dbname=' . dbname(), username(), password(), ); } =for newslide package KB::Stats::DBConnect; use strict; use warnings; use DBI; sub dbh { return DBI->connect( '%%dbi:Pg:dbname=%%' . **dbname()**, **username()**, **password()**, ); } =for newslide sub dbname { # some code to figure out which database to use # might depend on server, envs, testing, ... } =for newslide package **SomeCode**; use strict; use warnings; use KB::Stats::DBConnect; my $dbh = KB::Stats::DBConnect->dbh; $dbh->do( ... ); =for newslide package SomeCode; use strict; use warnings; use **KB::Stats::DBConnect**; my $dbh = KB::Stats::DBConnect->dbh; $dbh->do( ... ); =for newslide package SomeCode; use strict; use warnings; use KB::Stats::DBConnect; my @@$dbh@@ = %%KB::Stats::DBConnect%%->**dbh**; $dbh->do( ... ); =for newslide package SomeCode; use strict; use warnings; use KB::Stats::DBConnect; my $dbh = KB::Stats::DBConnect->dbh; @@$dbh@@->**do**( ... ); =for newslide I used variations of this code for ages. It allows me to use different users, passwords and database names in different environments Good! =for newslide But it still only provides one global connection, using the same C (and thus the same set of C) everywhere in the application I was not using the fine-grained access control methods at all. Bad! =for newslide Then I learned about C =head3 1: The Connection Service File ~$ less .pg_service.conf [kbstats] dbname=kb host= user=domm =for newslide ~$ less .pg_service.conf [**kbstats**] dbname=kb host= user=domm =for newslide ~$ less .pg_service.conf [kbstats] @@dbname@@=%%kb%% @@host@@= @@user@@=%%domm%% =for newslide package KB::Stats::DBConnect; use strict; use warnings; use DBI; sub dbh { return DBI->connect( 'dbi:Pg:@@dbname@@=' . **dbname()**, username(), password(), ); } =for newslide package KB::Stats::DBConnect; use strict; use warnings; use DBI; sub dbh { return DBI->connect( 'dbi:Pg:@@service@@=' . **dbservice()** ); } =for newslide sub dbservice { return 'kbstats' } =for newslide sub dbservice { return '%%kbstats_test%%' if $ENV{**HARNESS_ACTIVE**}; return 'kbstats'; } =for newslide ~$ less .pg_service.conf [**kbstats**] %%dbname%%=@@kb@@ host= user=domm [**kbstats_test**] %%dbname%%=@@kb_test@@ host= user=domm =for newslide ~$ less .pg_service.conf [**kbstats**] dbname=kb host= %%user%%=@@domm@@ [**kbstats_test**] dbname=kb_test host= %%user%%=@@domm@@ =for newslide stage.company.com:~$ less .pg_service.conf [**kbstats**] dbname=knowledge host=stage.dbcluster.company.com user=stage password=Iy2sahpi [**kbstats_test**] dbname=knowledge_test host=stage.dbcluster.company.com user=stage_test password=zuxo2Vah =for newslide stage.company.com:~$ less .pg_service.conf [**kbstats**] %%dbname%%=@@knowledge@@ host=stage.dbcluster.company.com user=stage password=Iy2sahpi [**kbstats_test**] %%dbname%%=@@knowledge_test@@ host=stage.dbcluster.company.com user=stage_test password=zuxo2Vah =for newslide stage.company.com:~$ less .pg_service.conf [**kbstats**] dbname=knowledge host=stage.dbcluster.company.com %%user%%=@@stage@@ password=Iy2sahpi [**kbstats_test**] dbname=knowledge_test host=stage.dbcluster.company.com %%user%%=@@stage_test@@ password=zuxo2Vah =for newslide stage.company.com:~$ less .pg_service.conf [**kbstats**] dbname=knowledge host=stage.dbcluster.company.com user=stage %%password%%=@@Iy2sahpi@@ [**kbstats_test**] dbname=knowledge_test host=stage.dbcluster.company.com user=stage_test %%password%%=@@zuxo2Vah@@ =for newslide So we now have a clean separation of C we want to connect, and C to connect it. In the app code, we just specify the C we need. And in F<.pg_service.conf> we specify, how each C connects on the current machine / environment. =for newslide We can now use the full power of Postgres' fine-grained access control methods! We just need to make each and every distinct program, script, daemon, cron-job etc that makes up our app use a distinct service name and then define each service in F =for newslide Here's a method we use to "calculate" the service name from the currently running program: =for newslide sub dbservice { my **$base** = '@@kb@@'; } =for newslide sub dbservice { my $base = 'kb'; return $base . '@@_test@@' if $ENV{%%HARNESS_ACTIVE%%}; } =for newslide sub dbservice { my $base = 'kb'; return $base . '_test' if $ENV{HARNESS_ACTIVE}; my **$bin** = @@$0@@; } In Perl, C<$0> contains the name of the currently running program =for newslide project/bin/web_app.psgi project/cron/generate_report.pl project/daemon/do_some_work.pl =for newslide sub dbservice { my $base = 'kb'; return $base . '_test' if $ENV{HARNESS_ACTIVE}; my $bin = $0; $bin =~ **s**{^.+/(.+)?}{$1}; $bin =~ **s**{(\.+)$}{}; } =for newslide project/bin/web_app.psgi project/cron/generate_report.pl project/daemon/do_some_work.pl =for newslide web_app generate_report do_some_work =for newslide sub dbservice { my $base = 'kb'; return $base . '_test' if $ENV{HARNESS_ACTIVE}; my $bin = $0; $bin =~ s{^.+/(.+)?}{$1}; $bin =~ s{(\.+)$}{}; return **$base** . '**_**' . **$bin**; } =for newslide kb_web_app kb_generate_report kb_do_some_work =for newslide # @@live@@ [**kb_web_app**] dbname=knowledge application_name=kb_web_app host=dbcluster @@user@@=%%web_app%% password=ahN0rue1 =for newslide # @@my laptop@@ [**kb_web_app**] dbname=knowledge application_name=kb_web_app host= @@user@@=%%domm%% =for newslide GRANT **ALL** ON ALL TABLES IN SCHEMA stats TO @@web_app@@; =for newslide # @@live@@ [**kb_generate_report**] dbname=knowledge application_name=kb_generate_report host=dbcluster @@user@@=%%generate_report%% password=maH2dooX =for newslide C needs less privileges: =for newslide GRANT **SELECT** ON ALL TABLES IN SCHEMA stats TO @@generate_report@@; =for newslide GRANT SELECT ON ALL TABLES IN SCHEMA stats TO generate_report; GRANT **INSERT, UPDATE** ON stats.report TO @@generate_report@@; =for newslide F makes connecting your app to your DB easy & clean =head2 Correcting ownership Going back to the old app I inherited, all tables, sequences etc in this database belong to the role 'hans' Which sucks. So I wanted to change this to something a bit saner. I could now create a lot of statements like ALTER TABLE foo OWNER TO stats; One for each table, and more for all other things that can be owned. =for newslide I could inspect Postgres to get all the tables etc for tbl in `psql -qAt -c "select tablename from pg_tables where schemaname = 'public';" YOUR_DB` ; do psql -c "alter table \"$tbl\" owner to NEW_OWNER" YOUR_DB ; done And repeat that for sequences etc =for newslide Before doing that, I hit google, which came up with this stackoverflow post http://stackoverflow.com/questions/1348126/modify-owner-on-all-tables-simultaneously-in-postgresql Modify OWNER on all tables simultaneously in PostgreSQL which offered: =head3 2: REASSIGN OWNED REASSIGN OWNED BY old_role [, ...] TO new_role So all I had to do was REASSIGN OWNED BY hans TO stats; done! =head2 I suck at typing So I hit C a lot when typing my queries. Which works for short queries, but for longer ones it starts to get annoying soon: =for newslide select to_char(ra.created,'YYYY-MM-DD'),ba.is_verified,count(*) from realm_account ra, base_acccount ba where ba.id= ra.base_account and ra.created > '2015-06-01' and ra.realm = 4 group by 1,2 order by 1,2; =for newslide select to_char(ra.created,'YYYY-MM-DD'),ba.is_verified,count(*) from realm_account ra, base_a**ccc**ount ba where ba.id= ra.base_account and ra.created > '2015-06-01' and ra.realm = 4 group by 1,2 order by 1,2; =head3 3: \e C<\e> lets you edit the last statement with your favorite editor B So instead of hitting my arrow keys a lot to navigate to C, I can do \e :s/ccc/cc/ :wq Much nicer! =head2 From - To Dates I quite often have to limit the validity of some data to a given time range. Here's an example of a table where we store B =for newslide CREATE TABLE %%term%% ( uuid uuid primary key not null default uuid_generate_v4(), internal_name text not null, version integer not null default 1, text text, valid_from timestamp with time zone, valid_until timestamp with time zone ) =for newslide CREATE TABLE %%term%% ( **uuid** uuid primary key not null default uuid_generate_v4(), internal_name text not null, version integer not null default 1, text text, valid_from timestamp with time zone, valid_until timestamp with time zone ) =for newslide CREATE TABLE %%term%% ( uuid uuid primary key not null default uuid_generate_v4(), **internal_name** text not null, **version** integer not null default 1, **text** text, valid_from timestamp with time zone, valid_until timestamp with time zone ) =for newslide CREATE TABLE %%term%% ( uuid uuid primary key not null default uuid_generate_v4(), internal_name text not null, version integer not null default 1, text text, **valid_from** @@timestamp with time zone@@, **valid_until** @@timestamp with time zone@@ ) =for newslide We need to know from when to when a term is valid. So we can ask users to agree to the currently valid set of terms. =for newslide Now, we usually know when a term starts to be valid: valid_from timestamp with time zone **default now()**, But we don't know when it will end =for newslide So we could use C to indicate an unknown end-date But this makes querying annoying SELECT uuid FROM term WHERE valid_until > now() OR valid_until is NULL; =for newslide We could use a date in the far future as the default valid_until timestamp with time zone **default '2100-01-01'** Now we have a simple query again: SELECT uuid FROM term WHERE valid_until > now(); =for newslide But what if I happen to live until 2100-01-01? I don't want to get a call from an angry customer during my New Year Party especially not when I'm 124 years old... =head3 4: infinity I like infinity as a philosophical concept. And I also like that Postgres date/time fields provide the special values C and C<-infinity> valid_until timestamp with time zone **default 'infinity'** =for newslide We can still use the simple query: SELECT uuid FROM term WHERE valid_until > now(); But now it will work forever. Yay! =head2 Querying From-To-Dates Now, of course the real query to get the currently active terms needs to check C and C: SELECT uuid FROM term WHERE valid_from < now() AND valid_until > now(); This is not very complex, but I still mix up the correct comparison operators and/or the positioning of the fields SELECT uuid FROM term WHERE now() < valid_from AND valid_until < now(); wrong, but hard to see =head3 5: between a BETWEEN x AND y is equivalent to a >= x AND a <= y =for newslide SELECT uuid FROM term WHERE now() BETWEEN valid_from AND valid_until; Much nicer to read and understand =head2 CSV Export Sometimes a customer asks me for a one-off CSV export of some data. I used to write a short script to get the data and format it as CSV. I was young and needed the money. =head3 6: COPY .. as CSV copy (select stuff from table) to '/tmp/export.csv' (FORMAT CSV) nice copy (select stuff from table) to '/tmp/export.csv' ( FORMAT CSV, DELIMITER '|', HEADER ) =head2 Job Queues / Event Notifications Sooner or later your app will need to do something that takes a long time. Longer than a user (or your web proxy) will wait. So you want to offload this long-running task to an external program tell the user to wait a bit start the job and wait for the job to be done (or ask "Are we there yet?" every few seconds..) In the end you want to present the result to the user. =for newslide As we live in the future, we now have lots of tools and frameworks to implement queues and event sourcing AMQP, RabbitMQ, Redis, Kafka, ... Yesterday we also learned about Resque But sometimes you do not want to add yet another piece to your Jenga tower =for newslide So you implement a job queue in your Postgres database =for newslide CREATE TABLE %%job%% ( **id** serial primary key, name text, args jsonb, is_running boolean not null default false, is_finished boolean not null default false, has_error boolean not null default false, error text ) =for newslide CREATE TABLE %%job%% ( id serial primary key, **name** text, **args** jsonb, is_running boolean not null default false, is_finished boolean not null default false, has_error boolean not null default false, error text ) =for newslide CREATE TABLE %%job%% ( id serial primary key, name text, args jsonb, **is_running** boolean not null default false, **is_finished** boolean not null default false, **has_error** boolean not null default false, error text ) =for newslide INSERT INTO job (name,args) VALUES ( 'fib', '{"class":"CalcFib","number":"123"}' ); =for newslide # %%Worker%% my $sth = $dbh->prepare( "select id,name,args from job where is_running = false and is_finished = false" ); $sth->execute; while (my ($id, $name, $args) = $sth->fetchrow_array) { $dbh->do("update job set is_running = true where id = ?", undef, $id); JobRunner->start($name, $args); $dbh->do("update job set is_running = false, is_finished = true where id = ?", undef, $id); } =for newslide # Worker my **$sth** = $dbh->prepare( "**select** id,name,args from job where @@is_running@@ = %%false%% and @@is_finished@@ = %%false%%" ); $sth->execute; while (my ($id, $name, $args) = $sth->fetchrow_array) { $dbh->do("update job set is_running = true where id = ?", undef, $id); JobRunner->start($name, $args); $dbh->do("update job set is_running = false, is_finished = true where id = ?", undef, $id); } =for newslide # Worker my $sth = $dbh->prepare( "select id,name,args from job where is_running = false and is_finished = false" ); $sth->execute; @@while@@ (my ($id, $name, $args) = **$sth**->%%fetchrow_array%%) { $dbh->do("update job set is_running = true where id = ?", undef, $id); JobRunner->start($name, $args); $dbh->do("update job set is_running = false, is_finished = true where id = ?", undef, $id); } =for newslide # Worker my $sth = $dbh->prepare( "select id,name,args from job where is_running = false and is_finished = false" ); $sth->execute; while (my ($id, $name, $args) = $sth->fetchrow_array) { $dbh->**do**("**update** job set @@is_running@@ = %%true%% where id = ?", undef, $id); JobRunner->start($name, $args); $dbh->do("update job set is_running = false, is_finished = true where id = ?", undef, $id); } =for newslide # Worker my $sth = $dbh->prepare( "select id,name,args from job where is_running = false and is_finished = false" ); $sth->execute; while (my ($id, $name, $args) = $sth->fetchrow_array) { $dbh->do("update job set is_running = true where id = ?", undef, $id); @@JobRunner@@->**start**($name, $args); $dbh->do("update job set is_running = false, is_finished = true where id = ?", undef, $id); } =for newslide # Worker my $sth = $dbh->prepare( "select id,name,args from job where is_running = false and is_finished = false" ); $sth->execute; while (my ($id, $name, $args) = $sth->fetchrow_array) { $dbh->do("update job set is_running = true where id = ?", undef, $id); JobRunner->start($name, $args); $dbh->**do**("**update** job set @@is_running@@ = %%false%%, @@is_finished@@ = %%true%% where id = ?", undef, $id); } =for newslide There's so much wrong with this code... Should use transactions Should fork the JobRunner which is very hard to get right regarding the transactions it's too ugly to even think about the code any longer... =head3 7: NOTIFY Postgres provides a simple notification system that can be used to implement job queues NOTIFY jobs, '{"class":"CalcFib","number":"123"}'; To receive those notification, you have to C: LISTEN jobs; =for newslide And when a notification happens, you get: Asynchronous notification "jobs" with payload "{"class":"CalcFib","number":"123"}" received from server process with PID 4428. =for newslide # @@Worker@@ $self->dbh->do( 'LISTEN jobs' ); while ( my $notification = $self->dbh->pg_notifies ) { my ( $name, $pid, $payload ) = @$notification; JobRunner->start($name, $payload); } =for newslide # Worker $self->dbh->do( '**LISTEN jobs**' ); while ( my $notification = $self->dbh->pg_notifies ) { my ( $name, $pid, $payload ) = @$notification; JobRunner->start($name, $payload); } =for newslide # Worker $self->dbh->do( 'LISTEN jobs' ); while ( my $notification = $self->@@dbh@@->**pg_notifies** ) { my ( $name, $pid, $payload ) = @$notification; JobRunner->start($name, $payload); } =for newslide # Worker $self->dbh->do( 'LISTEN jobs' ); while ( my **$notification** = $self->dbh->pg_notifies ) { my ( $name, $pid, $payload ) = @$notification; JobRunner->start($name, $payload); } =for newslide # Worker $self->dbh->do( 'LISTEN jobs' ); while ( my $notification = $self->dbh->pg_notifies ) { my ( @@$name@@, @@$pid@@, @@$payload@@ ) = **@$notification**; JobRunner->start($name, $payload); } =for newslide # Worker $self->dbh->do( 'LISTEN jobs' ); while ( my $notification = $self->dbh->pg_notifies ) { my ( $name, $pid, $payload ) = @$notification; @@JobRunner@@->**start**($name, @@$payload@@); } =for newslide This very simple solution does not keep track of past jobs and does not report errors Job::Machine adds all of this, and some more https://metacpan.org/pod/Job::Machine it's of course implemented using C =head3 NOTIFY for events For a recent Microservice-based project, we had to implement some cross-database notifications If some fields in one service are touched, a notification should be sent to some other services =for newslide CREATE TABLE %%thing%% ( id serial primary key, modified timestamp with time zone not null default now(), type text, data jsonb ); =for newslide CREATE TABLE thing ( **id** serial primary key, **modified** timestamp with time zone not null default now(), **type** text, data jsonb ); =for newslide CREATE TABLE thing ( id serial primary key, modified timestamp with time zone not null default now(), type text, **data** jsonb ); =for newslide CREATE OR REPLACE **FUNCTION** @@thing_notify()@@ RETURNS TRIGGER AS $$ BEGIN =for newslide IF (%%TG_OP%% = '**DELETE**') THEN PERFORM pg_notify( 'thing_notify', TG_OP ||';'|| OLD.type ||';'|| OLD.id ||';'|| OLD.modified ); RETURN OLD; =for newslide IF (TG_OP = 'DELETE') THEN PERFORM **pg_notify**( '@@thing_notify@@', %%TG_OP%% ||';'|| OLD.%%type%% ||';'|| OLD.%%id%% ||';'|| OLD.%%modified%% ); RETURN OLD; =for newslide ELSIF (%%TG_OP%% = '**UPDATE**') THEN PERFORM pg_notify( 'thing_notify', TG_OP ||';'|| NEW.type ||';'|| NEW.id ||';'|| NEW.modified ); RETURN NEW; =for newslide ELSIF (TG_OP = 'UPDATE') THEN PERFORM **pg_notify**( '@@thing_notify@@', %%TG_OP%% ||';'|| NEW.%%type%% ||';'|| NEW.%%id%% ||';'|| NEW.%%modified%% ); RETURN NEW; =for newslide ELSIF (%%TG_OP%% = '**INSERT**') THEN PERFORM pg_notify( 'thing_notify', TG_OP ||';'|| NEW.type ||';'|| NEW.id ||';'|| NEW.modified ); RETURN NEW; =for newslide ELSIF (TG_OP = 'INSERT') THEN PERFORM **pg_notify**( '@@thing_notify@@', %%TG_OP%% ||';'|| NEW.%%type%% ||';'|| NEW.%%id%% ||';'|| NEW.%%modified%% ); RETURN NEW; =for newslide END IF; END; $$ LANGUAGE plpgsql; =for newslide CREATE **TRIGGER** @@trigger_thing_notify@@ AFTER INSERT OR UPDATE OR DELETE ON thing FOR EACH ROW EXECUTE PROCEDURE thing_notify(); =for newslide CREATE TRIGGER trigger_thing_notify **AFTER INSERT OR UPDATE OR DELETE** ON thing FOR EACH ROW EXECUTE PROCEDURE thing_notify(); =for newslide CREATE TRIGGER trigger_thing_notify AFTER INSERT OR UPDATE OR DELETE ON @@thing@@ FOR EACH ROW EXECUTE PROCEDURE thing_notify(); =for newslide CREATE TRIGGER trigger_thing_notify AFTER INSERT OR UPDATE OR DELETE ON thing FOR EACH ROW **EXECUTE PROCEDURE** @@thing_notify()@@; =for newslide db=# LISTEN thing_notify; =for newslide db=# LISTEN thing_notify; LISTEN =for newslide db=# LISTEN thing_notify; LISTEN db=# INSERT INTO thing (type,data) VALUES ('Fib','{"class":"CalcFib","number":"123"}'); =for newslide db=# LISTEN thing_notify; LISTEN db=# INSERT INTO thing (type,data) VALUES ('Fib','{"class":"CalcFib","number":"123"}'); Asynchronous notification "thing_notify" with payload "**INSERT**;%%Fib%%;@@4@@;**2015-10-28 20:52:41.581392+01**" received from server process with PID 4428. =for newslide domm=# UPDATE thing SET type = 'Test' WHERE id = 4; =for newslide domm=# UPDATE thing SET type = 'Test' WHERE id = 4; Asynchronous notification "thing_notify" with payload "**UPDATE**;%%Test%%;4;2015-10-28 14:47:43.866663+01" received from server process with PID 4428. =for newslide Here's some Perl code to handle those notifications in a Worker process: =for newslide sub run { my $self = shift; $log->infof( "Starting up Notifier, listening on %s", $self->channel ); $self->dbh->do( 'LISTEN ' . $self->channel ); my $checker = AnyEvent->timer( interval => 1, cb => sub { $self->get_notifications; } ); AnyEvent->condvar->recv; } =for newslide sub run { my $self = shift; $log->infof( "Starting up Notifier, listening on %s", $self->channel ); $self->dbh->do( '**LISTEN** ' . $self->@@channel@@ ); my $checker = AnyEvent->timer( interval => 1, cb => sub { $self->get_notifications; } ); AnyEvent->condvar->recv; } =for newslide sub run { my $self = shift; $log->infof( "Starting up Notifier, listening on %s", $self->channel ); $self->dbh->do( 'LISTEN ' . $self->channel ); my $checker = **AnyEvent**->timer( interval => 1, cb => sub { $self->get_notifications; } ); AnyEvent->condvar->recv; } =for newslide sub run { my $self = shift; $log->infof( "Starting up Notifier, listening on %s", $self->channel ); $self->dbh->do( 'LISTEN ' . $self->channel ); my $checker = AnyEvent->**timer**( interval => 1, cb => sub { $self->get_notifications; } ); AnyEvent->condvar->recv; } =for newslide sub run { my $self = shift; $log->infof( "Starting up Notifier, listening on %s", $self->channel ); $self->dbh->do( 'LISTEN ' . $self->channel ); my $checker = AnyEvent->timer( @@interval@@ => **1**, cb => sub { $self->get_notifications; } ); AnyEvent->condvar->recv; } =for newslide sub run { my $self = shift; $log->infof( "Starting up Notifier, listening on %s", $self->channel ); $self->dbh->do( 'LISTEN ' . $self->channel ); my $checker = AnyEvent->timer( interval => 1, @@cb@@ => **sub** %%{%% $self->get_notifications; %%}%% ); AnyEvent->condvar->recv; } =for newslide sub run { my $self = shift; $log->infof( "Starting up Notifier, listening on %s", $self->channel ); $self->dbh->do( 'LISTEN ' . $self->channel ); my $checker = AnyEvent->timer( interval => 1, cb => sub { @@$self@@->**get_notifications**; } ); AnyEvent->condvar->recv; } =for newslide sub **get_notifications** { my $self = shift; while ( my $ret = $self->dbh->pg_notifies ) { my $payload = $ret->[2]; $self->handle_notification( $payload ); } } =for newslide sub get_notifications { my $self = shift; @@while@@ ( my $ret = $self->dbh->**pg_notifies** ) { my $payload = $ret->[2]; $self->handle_notification( $payload ); } } =for newslide sub get_notifications { my $self = shift; while ( my $ret = $self->dbh->pg_notifies ) { my @@$payload@@ = **$ret->[2]**; $self->handle_notification( $payload ); } } =for newslide sub get_notifications { my $self = shift; while ( my $ret = $self->dbh->pg_notifies ) { my $payload = $ret->[2]; $self->**handle_notification**( @@$payload@@ ); } } =for newslide sub **handle_notification** { my ( $self, $payload ) = @_; my ( $op, $type, $id, $ts ) = split( /;/, $payload, 4 ); # do something with $type & $id } =for newslide sub handle_notification { my ( $self, $payload ) = @_; my ( $op, $type, $id, $ts ) = **split**( /@@;@@/, %%$payload%%, 4 ); # do something with $type & $id } =for newslide sub handle_notification { my ( $self, $payload ) = @_; my ( **$op**, %%$type%%, @@$id@@, **$ts** ) = split( /;/, $payload, 4 ); # do something with $type & $id } =for newslide sub handle_notification { my ( $self, $payload ) = @_; my ( $op, $type, $id, $ts ) = split( /;/, $payload, 4 ); # **do something** with $type & $id } =head2 Stupid queries =for img wundertuete_1.png =for newslide =for img wundertuete_detail.png =for newslide getting this status was already implemented for a single thing, so for the list we just called the relevant endpoint for each item dumb! =for newslide =for img wundertuete_viel.png =for newslide Lots of HTTP requests Very slow =for newslide So we packed all of the status-requests into one http request and used the old per-thing code It turned out that the performance problem was not caused by the HTTP overhead, or the user initiation it was caused by all of the checks taking ~.5 seconds per thing =for newslide So a HTTP request to get the status of 10 things would take 5 seconds Nginx could not be bothered to wait that long and returned a timeout Not so good. =for newslide I did some thinking and replaced the iterative per-thing status check with a rather long query so we can issue one HTTP request containing a set of ids which shall return the user-status for each campaign And this query uses... =head3 8: Common Table Expressions I like to think of CTEs as very temporary helper tables, and they can help making complex things doable. =for newslide So, with some hand-waving, here is the schema this example is based on: =for newslide CREATE TABLE %%campaign%% ( id serial primary key, uuid uuid NOT NULL, title text NOT NULL, type text NOT NULL, valid_from timestamp with time zone, valid_until timestamp with time zone, data jsonb ); =for newslide CREATE TABLE campaign ( **id** serial primary key, **uuid** uuid NOT NULL, title text NOT NULL, type text NOT NULL, valid_from timestamp with time zone, valid_until timestamp with time zone, data jsonb ); =for newslide CREATE TABLE campaign ( id serial primary key, uuid uuid NOT NULL, **title** text NOT NULL, **type** text NOT NULL, valid_from timestamp with time zone, valid_until timestamp with time zone, **data** jsonb ); =for newslide CREATE TABLE campaign ( id serial primary key, uuid uuid NOT NULL, title text NOT NULL, type text NOT NULL, **valid_from** timestamp with time zone, **valid_until** timestamp with time zone, data jsonb ); =for newslide CREATE TABLE %%usr_campaign_participation%% ( uuid uuid DEFAULT uuid_generate_v4() NOT NULL, created timestamp with time zone DEFAULT now() NOT NULL, usr integer, campaign integer ); =for newslide CREATE TABLE usr_campaign_participation ( uuid uuid DEFAULT uuid_generate_v4() NOT NULL, created timestamp with time zone DEFAULT now() NOT NULL, **usr** integer, **campaign** integer ); =for newslide The questions I want answered are: Which campaigns are currently active, expired or pending? Has the user already participated in any of those campaign? If it's a sweepstake, how many tickets does the user own? How many tickets have been issued in total? =head4 Which campaigns are currently active, expired or pending? SELECT c.uuid, c.type, now() < c.valid_from AS pending, now() > c.valid_until AS expired, c.data->'points' AS points FROM campaign c WHERE c.uuid in ($uuids) =for newslide SELECT c.uuid, c.type, **now() < c.valid_from** AS @@pending@@, %%-- boolean%% now() > c.valid_until AS expired, c.data->'points' AS points FROM campaign c WHERE c.uuid in ($uuids) =for newslide SELECT c.uuid, c.type, now() < c.valid_from AS pending, now() > c.valid_until AS expired, **c.data->'points'** AS points %%-- grab into json object%% FROM campaign c WHERE c.uuid in ($uuids) =head4 How many tickets have been issued in total? SELECT c.uuid, count(*) AS total_cnt FROM usr_campaign_participation p, campaign c WHERE c.id = p.campaign AND c.type like 'sweepstake%' AND c.uuid in ($uuids) GROUP BY 1 =head4 How many tickets have been issued to the user? special case of =head4 Which campaigns has a user participated in? SELECT c.uuid, count(*) AS usr_cnt FROM usr_campaign_participation p, campaign c WHERE c.id = p.campaign AND p.usr = ? AND c.uuid in ($uuids) GROUP BY 1 =head3 Pulling it together WITH some_name AS ( SELECT id,something FROM somewhere ) SELECT table.a_value, some_name.something FROM table, some_name WHERE table.id = some_name.id =for newslide %%WITH%% **some_name** %%AS%% ( @@SELECT id,something FROM somewhere@@ ) SELECT table.a_value, some_name.something FROM table, some_name WHERE table.id = some_name.id =for newslide WITH **some_name** AS ( SELECT something FROM somewhere ) %%SELECT%% table.a_value, **some_name**.something FROM table, **some_name** WHERE table.id = **some_name**.id =for newslide WITH participation_usr AS ( SELECT c.uuid, count(*) AS usr_cnt ... ), participation_total AS ( SELECT c.uuid, count(*) AS total_cnt ... ) SELECT participation_total.total_cnt, participation_usr.usr_cnt, c.uuid, c.type, ... FROM campaign c LEFT JOIN participation_usr ON participation_usr.uuid = c.uuid LEFT JOIN participation_total ON participation_total.uuid = c.uuid WHERE c.uuid in ($uuids) =for newslide %%WITH%% **participation_usr** %%AS%% ( SELECT c.uuid, count(*) AS usr_cnt ... ), **participation_total** %%AS%% ( SELECT c.uuid, count(*) AS total_cnt ... ) SELECT participation_total.total_cnt, participation_usr.usr_cnt, c.uuid, c.type, ... FROM campaign c LEFT JOIN participation_usr ON participation_usr.uuid = c.uuid LEFT JOIN participation_total ON participation_total.uuid = c.uuid WHERE c.uuid in ($uuids) =for newslide WITH participation_usr AS ( SELECT c.uuid, count(*) AS usr_cnt ... ), participation_total AS ( SELECT c.uuid, count(*) AS total_cnt ... ) %%SELECT%% **participation_total**.total_cnt, **participation_usr**.usr_cnt, c.uuid, c.type, ... FROM campaign c LEFT JOIN participation_usr ON participation_usr.uuid = c.uuid LEFT JOIN participation_total ON participation_total.uuid = c.uuid WHERE c.uuid in ($uuids) =for newslide WITH participation_usr AS ( SELECT c.uuid, count(*) AS usr_cnt ... ), participation_total AS ( SELECT c.uuid, count(*) AS total_cnt ... ) SELECT participation_total.total_cnt, participation_usr.usr_cnt, c.uuid, c.type, ... FROM campaign c **LEFT JOIN** participation_usr ON participation_usr.uuid = c.uuid **LEFT JOIN** participation_total ON participation_total.uuid = c.uuid WHERE c.uuid in ($uuids) =for newslide type | total | usr | pending | expired ---------------------+-------+-----+---------+--------- coupon/absolute | | | t | f coupon/absolute | | | f | f coupon/free | | 1 | f | f coupon/percent | | | f | t sweepstake/standard | 9028 | 1 | f | f sweepstake/standard | 4692 | 2 | f | f sweepstake/standard | 326 | | f | f sweepstake/standard | 2444 | 1 | f | f =for newslide =for img wundertuete_avg_req_time.png =head2 Trees =for img stats_tree.png =for newslide Generating trees is sort of hard it's easier when using recursion Postgres supports recursive queries =head3 9: WITH RECURSIVE Now, I cannot say that I really understand recursive queries completely To me it feels more like I'm trying a magic spell after seeking a glance of the instructions from my master's spell book: It seems to work, but I might summon a daemon by mistake.. =for newslide WITH RECURSIVE tree (id, parent, internal_name, level, path) AS ( SELECT root_node UNION SELECT recursion JOIN some_name on some_name.field = root.other_field ) SELECT * from tree order by path; =for newslide SELECT root_node UNION SELECT recursion JOIN some_name on some_name.field = root.other_field =for newslide -- root_node SELECT s.id, s2s.parent, s.internal_name, 0, s.internal_name FROM source s LEFT JOIN source_to_source s2s ON s.id = s2s.child WHERE s.id = 1244638 =for newslide UNION =for newslide -- recursion SELECT s2s.child, s2s.parent, s.internal_name, level +1, path || '/' || s.internal_name FROM source_to_source s2s JOIN source s ON s.id = s2s.child JOIN tree ON s2s.parent = tree.id =for newslide SELECT * from tree order by path; =for newslide WITH RECURSIVE tree (id, parent, internal_name, level, path) AS ( SELECT s.id, s2s.parent, s.internal_name, 0 ,s.internal_name FROM source s LEFT JOIN source_to_source s2s ON s.id = s2s.child WHERE s.id = 1244638 UNION SELECT s2s.child, s2s.parent, s.internal_name, level +1, path || '/' || s.internal_name FROM source_to_source s2s JOIN source s ON s.id = s2s.child JOIN tree ON s2s.parent = tree.id ) SELECT * FROM tree order by path =for newslide id | parent | internal_name | level | path ---------+---------+---------------+-------+------------------------------------------- 1254195 | | 10403 | 0 | 10403 536977 | 1254195 | 10203 | 1 | 10403/10203 1167420 | 1254195 | 10375 | 1 | 10403/10375 1244638 | 1254195 | 10400 | 1 | 10403/10400 1169241 | 1244638 | 10377 | 2 | 10403/10400/10377 1075499 | 1169241 | 10321 | 3 | 10403/10400/10377/10321 1259720 | 1169241 | 10404 | 3 | 10403/10400/10377/10404 1295846 | 1259720 | 10410 | 4 | 10403/10400/10377/10404/10410 1270124 | 1295846 | 10407 | 5 | 10403/10400/10377/10404/10410/10407 627220 | 1270124 | 10227 | 6 | 10403/10400/10377/10404/10410/10407/10227 640892 | 1270124 | 10230 | 6 | 10403/10400/10377/10404/10410/10407/10230 1181861 | 1244638 | 10378 | 2 | 10403/10400/10378 1083882 | 1181861 | 10323 | 3 | 10403/10400/10378/10323 1084352 | 1181861 | 10324 | 3 | 10403/10400/10378/10324 =head2 Questions? Or do you want to share your favourite Postgres feature? =head2 Bonus: Answers to previous questions NOTIFY payload has a max length of 8000 bytes. Instead of concatenating the payload, I could use C. Which might be problematic given the 8000 bytes limit... We can use row-level locking with the new SKIP LOCKED option to implement a saner job queue using a table. =head2 Bonus: Typos aborting transactions domm=# =for newslide domm=# BEGIN; BEGIN domm=# =for newslide domm=# BEGIN; BEGIN domm=# insert into test (foo) value ('bar'); ERROR: syntax error at or near "value" LINE 1: insert into test (foo) value ('bar'); domm=# =for newslide domm=# BEGIN; BEGIN domm=# insert into test (foo) value ('bar'); ERROR: syntax error at or near "value" LINE 1: insert into test (foo) value ('bar'); domm=# insert into test (foo) value**s** ('bar'); =for newslide domm=# BEGIN; BEGIN domm=# insert into test (foo) value ('bar'); ERROR: syntax error at or near "value" LINE 1: insert into test (foo) value ('bar'); domm=# insert into test (foo) value**s** ('bar'); ERROR: current transaction is aborted, commands ignored until end of transaction block =head3 10: ON_ERROR_ROLLBACK = interactive domm=# \set ON_ERROR_ROLLBACK interactive =for newslide domm=# \set ON_ERROR_ROLLBACK interactive domm=# BEGIN; BEGIN domm=# =for newslide domm=# \set ON_ERROR_ROLLBACK interactive domm=# BEGIN; BEGIN domm=# insert into test (foo) value ('bar'); ERROR: syntax error at or near "value" LINE 1: insert into test (foo) value ('bar'); domm=# =for newslide domm=# \set ON_ERROR_ROLLBACK interactive domm=# BEGIN; BEGIN domm=# insert into test (foo) value ('bar'); ERROR: syntax error at or near "value" LINE 1: insert into test (foo) value ('bar'); domm=# insert into test (foo) value**s** ('bar'); =for newslide domm=# \set ON_ERROR_ROLLBACK interactive domm=# BEGIN; BEGIN domm=# insert into test (foo) value ('bar'); ERROR: syntax error at or near "value" LINE 1: insert into test (foo) value ('bar'); domm=# insert into test (foo) value**s** ('bar'); INSERT 0 1 domm=# COMMIT; =for newslide I learned that during Postgres Conf 2015 from Florian Helmberger / https://25th-floor.com/ =for newslide =head2 Bonus: \x, \o, \t =head3 \x C<\x> toggle "expandend mode" Instead of one long line per row (which can be hard to read) you get tabular output =for newslide knowledge=# select * from stats.field limit 1; =for newslide id | created | modified | is_deleted | universe | internal_name | title | is_required | formula | type | description | formhandler | has_multival | valid_from | valid_until | calc_type | is_source_attribute | is_clonable | include_in_token_export ------+-------------------------------+-------------------------------+---- --------+----------+---------------+------------+-------------+---------+---- -----+--------------+---------------------------------------------------------- -------------------+--------------+------------------------+-------------+-------- ---+---------------------+-------------+------------------------- 1150 | 2015-07-14 16:34:48.02521+02 | 2015-07-14 16:34:48.02521+02 | f | 1100 | some_text | Some Text | t | | Text | a test field | {"name":"some_text","label":"Some Text","disabled":"0","type":"Text"} | f | 2015-07-01 02:00:00+02 | infinity | skip | f | f | f =for newslide knowledge=# \x =for newslide knowledge=# \x knowledge=# select * from stats.field limit 1; =for newslide -[ RECORD 1 ]-----------+---------------------------------------------------------------------- id | 1150 created | 2015-07-14 16:34:48.02521+02 modified | 2015-07-14 16:34:48.02521+02 is_deleted | f universe | 1100 internal_name | some_text title | Some Text is_required | t formula | type | Text description | a test field formhandler | {"name":"some_text","label":"Some Text","disabled":"0","type":"Text"} has_multival | f valid_from | 2015-07-01 02:00:00+02 valid_until | infinity calc_type | skip is_source_attribute | f is_clonable | f include_in_token_export | f =head3 \o output to a file =for newslide knowledge=# \o /tmp/foo =for newslide knowledge=# \o /tmp/foo knowledge=# select * from stats.field; =head3 \t Tuples only This turns off the "header" and "footer" name of the columns and number of rows =for newslide knowledge=# select id from stats.field limit 1; =for newslide knowledge=# select id from stats.field limit 1; id ------ 1150 (1 row) =for newslide knowledge=# \t =for newslide knowledge=# \t knowledge=# select id from stats.field limit 1; =for newslide knowledge=# \t knowledge=# select id from stats.field limit 1; 1150 =for newslide Great when you want to dump some ids to a file for processing with some other tools =for newslide knowledge=# \t =for newslide knowledge=# \t knowledge=# \o /tmp/foo =for newslide knowledge=# \t knowledge=# \o /tmp/foo knowledge=# select id from stats.field; Now C will only contains ids =head2 Thanks