=encoding utf-8
=pod
=head1 9 PostgreSQL features I wish I learned earlier
=head2 META
=head3 Author
Thomas Klausner
=head3 Email
domm AT plix.at
=head3 URL
http://domm.plix.at
=head3 URL_slides
http://domm.plix.at/talks/2016_cluj_postgres_features_i_wished_i_learned_earlier
=head3 Date
2016-08-25
=head3 Location
Cluj
=head3 Event
The Europen Perl Conference aka YAPC::Europe 2016
=head2 /me
Thomas Klausner
domm
http://domm.plix.at
@domm_favors_irc
=for newslide
Vienna.pm
=for newslide
=for img alpine.png
=for newslide
=for img bicyclepm.png
=for html s/🚗/🚲/g
=head2 Intro
In this talk I'll show you some real life examples of some stupid things I did
And how to do them less stupid using some nifty Postgres features
=for newslide
I orginally wrote this talk for the PostgreSQL Conference Europe
(nice name!)
so it contains some Baby-Perl which I'm going to mostly skip
(also because the talk was 50min there...)
=head2 Connecting to Postgres
Postgres supports a lot of different ways to connect to your database cluster.
It also provides very fine-grained access control methods.
You can C different privileges to different roles not only for schemata and tables, but even columns.
Still, I recently inherited a project (using Postgres)
When I tried to set it up on my laptop, I looked through the code to find out how I should connect the app to the DB
=for newslide
~/project/knowledge$ ack 'DBI\-\>connect'
My screen blurred with output like this:
my $dbh=DBI->connect(
'dbi:Pg:dbname=knowledge;host=127.0.0.1;port=5431',
'hans', 'humptydumpty');
Hardcoded username, hardcoded password
~/project/knowledge$ ack -h -c 'DBI\-\>connect'
101
=for newslide
This is of course an example of horrible programming.
The following pattern is a bit less horrible:
=for newslide
package **KB::Stats::DBConnect**;
use strict;
use warnings;
use DBI;
sub dbh {
return DBI->connect(
'dbi:Pg:dbname=' . dbname(),
username(),
password(),
);
}
=for newslide
package KB::Stats::DBConnect;
use strict;
use warnings;
use **DBI**;
sub dbh {
return DBI->connect(
'dbi:Pg:dbname=' . dbname(),
username(),
password(),
);
}
=for newslide
package KB::Stats::DBConnect;
use strict;
use warnings;
use DBI;
sub **dbh** {
return DBI->connect(
'dbi:Pg:dbname=' . dbname(),
username(),
password(),
);
}
=for newslide
package KB::Stats::DBConnect;
use strict;
use warnings;
use DBI;
sub dbh {
return %%DBI%%->**connect**(
'dbi:Pg:dbname=' . dbname(),
username(),
password(),
);
}
=for newslide
package KB::Stats::DBConnect;
use strict;
use warnings;
use DBI;
sub dbh {
return DBI->connect(
'%%dbi:Pg:dbname=%%' . **dbname()**,
**username()**,
**password()**,
);
}
=for newslide
sub dbname {
# some code to figure out which database to use
# might depend on server, envs, testing, ...
}
=for newslide
package **SomeCode**;
use strict;
use warnings;
use KB::Stats::DBConnect;
my $dbh = KB::Stats::DBConnect->dbh;
$dbh->do( ... );
=for newslide
package SomeCode;
use strict;
use warnings;
use **KB::Stats::DBConnect**;
my $dbh = KB::Stats::DBConnect->dbh;
$dbh->do( ... );
=for newslide
package SomeCode;
use strict;
use warnings;
use KB::Stats::DBConnect;
my @@$dbh@@ = %%KB::Stats::DBConnect%%->**dbh**;
$dbh->do( ... );
=for newslide
package SomeCode;
use strict;
use warnings;
use KB::Stats::DBConnect;
my $dbh = KB::Stats::DBConnect->dbh;
@@$dbh@@->**do**( ... );
=for newslide
I used variations of this code for ages.
It allows me to use different users, passwords and database names in different environments
Good!
=for newslide
But it still only provides one global connection, using the same C (and thus the same set of C) everywhere in the application
I was not using the fine-grained access control methods at all.
Bad!
=for newslide
Then I learned about C
=head3 1: The Connection Service File
~$ less .pg_service.conf
[kbstats]
dbname=kb
host=
user=domm
=for newslide
~$ less .pg_service.conf
[**kbstats**]
dbname=kb
host=
user=domm
=for newslide
~$ less .pg_service.conf
[kbstats]
@@dbname@@=%%kb%%
@@host@@=
@@user@@=%%domm%%
=for newslide
package KB::Stats::DBConnect;
use strict;
use warnings;
use DBI;
sub dbh {
return DBI->connect(
'dbi:Pg:@@dbname@@=' . **dbname()**,
username(),
password(),
);
}
=for newslide
package KB::Stats::DBConnect;
use strict;
use warnings;
use DBI;
sub dbh {
return DBI->connect(
'dbi:Pg:@@service@@=' . **dbservice()**
);
}
=for newslide
sub dbservice {
return 'kbstats'
}
=for newslide
sub dbservice {
return '%%kbstats_test%%' if $ENV{**HARNESS_ACTIVE**};
return 'kbstats';
}
=for newslide
~$ less .pg_service.conf
[**kbstats**]
%%dbname%%=@@kb@@
host=
user=domm
[**kbstats_test**]
%%dbname%%=@@kb_test@@
host=
user=domm
=for newslide
~$ less .pg_service.conf
[**kbstats**]
dbname=kb
host=
%%user%%=@@domm@@
[**kbstats_test**]
dbname=kb_test
host=
%%user%%=@@domm@@
=for newslide
stage.company.com:~$ less .pg_service.conf
[**kbstats**]
dbname=knowledge
host=stage.dbcluster.company.com
user=stage
password=Iy2sahpi
[**kbstats_test**]
dbname=knowledge_test
host=stage.dbcluster.company.com
user=stage_test
password=zuxo2Vah
=for newslide
stage.company.com:~$ less .pg_service.conf
[**kbstats**]
%%dbname%%=@@knowledge@@
host=stage.dbcluster.company.com
user=stage
password=Iy2sahpi
[**kbstats_test**]
%%dbname%%=@@knowledge_test@@
host=stage.dbcluster.company.com
user=stage_test
password=zuxo2Vah
=for newslide
stage.company.com:~$ less .pg_service.conf
[**kbstats**]
dbname=knowledge
host=stage.dbcluster.company.com
%%user%%=@@stage@@
password=Iy2sahpi
[**kbstats_test**]
dbname=knowledge_test
host=stage.dbcluster.company.com
%%user%%=@@stage_test@@
password=zuxo2Vah
=for newslide
stage.company.com:~$ less .pg_service.conf
[**kbstats**]
dbname=knowledge
host=stage.dbcluster.company.com
user=stage
%%password%%=@@Iy2sahpi@@
[**kbstats_test**]
dbname=knowledge_test
host=stage.dbcluster.company.com
user=stage_test
%%password%%=@@zuxo2Vah@@
=for newslide
So we now have a clean separation of C we want to connect, and C to connect it.
In the app code, we just specify the C we need.
And in F<.pg_service.conf> we specify, how each C connects on the current machine / environment.
=for newslide
We can now use the full power of Postgres' fine-grained access control methods!
We just need to make each and every distinct program, script, daemon, cron-job etc that makes up our app use a distinct service name
and then define each service in F
=for newslide
Here's a method we use to "calculate" the service name from the currently running program:
=for newslide
sub dbservice {
my **$base** = '@@kb@@';
}
=for newslide
sub dbservice {
my $base = 'kb';
return $base . '@@_test@@' if $ENV{%%HARNESS_ACTIVE%%};
}
=for newslide
sub dbservice {
my $base = 'kb';
return $base . '_test' if $ENV{HARNESS_ACTIVE};
my **$bin** = @@$0@@;
}
In Perl, C<$0> contains the name of the currently running program
=for newslide
project/bin/web_app.psgi
project/cron/generate_report.pl
project/daemon/do_some_work.pl
=for newslide
sub dbservice {
my $base = 'kb';
return $base . '_test' if $ENV{HARNESS_ACTIVE};
my $bin = $0;
$bin =~ **s**{^.+/(.+)?}{$1};
$bin =~ **s**{(\.+)$}{};
}
=for newslide
project/bin/web_app.psgi
project/cron/generate_report.pl
project/daemon/do_some_work.pl
=for newslide
web_app
generate_report
do_some_work
=for newslide
sub dbservice {
my $base = 'kb';
return $base . '_test' if $ENV{HARNESS_ACTIVE};
my $bin = $0;
$bin =~ s{^.+/(.+)?}{$1};
$bin =~ s{(\.+)$}{};
return **$base** . '**_**' . **$bin**;
}
=for newslide
kb_web_app
kb_generate_report
kb_do_some_work
=for newslide
# @@live@@
[**kb_web_app**]
dbname=knowledge
application_name=kb_web_app
host=dbcluster
@@user@@=%%web_app%%
password=ahN0rue1
=for newslide
# @@my laptop@@
[**kb_web_app**]
dbname=knowledge
application_name=kb_web_app
host=
@@user@@=%%domm%%
=for newslide
GRANT **ALL** ON ALL TABLES IN SCHEMA stats TO @@web_app@@;
=for newslide
# @@live@@
[**kb_generate_report**]
dbname=knowledge
application_name=kb_generate_report
host=dbcluster
@@user@@=%%generate_report%%
password=maH2dooX
=for newslide
C needs less privileges:
=for newslide
GRANT **SELECT** ON ALL TABLES IN SCHEMA stats TO @@generate_report@@;
=for newslide
GRANT SELECT ON ALL TABLES IN SCHEMA stats TO generate_report;
GRANT **INSERT, UPDATE** ON stats.report TO @@generate_report@@;
=for newslide
F makes connecting your app to your DB easy & clean
=head2 Correcting ownership
Going back to the old app I inherited, all tables, sequences etc in this database belong to the role 'hans'
Which sucks.
So I wanted to change this to something a bit saner.
I could now create a lot of statements like
ALTER TABLE foo OWNER TO stats;
One for each table, and more for all other things that can be owned.
=for newslide
I could inspect Postgres to get all the tables etc
for tbl in `psql -qAt -c "select tablename from pg_tables
where schemaname = 'public';" YOUR_DB` ;
do psql -c "alter table \"$tbl\" owner to NEW_OWNER" YOUR_DB ; done
And repeat that for sequences etc
=for newslide
Before doing that, I hit google, which came up with this stackoverflow post
http://stackoverflow.com/questions/1348126/modify-owner-on-all-tables-simultaneously-in-postgresql
Modify OWNER on all tables simultaneously in PostgreSQL
which offered:
=head3 2: REASSIGN OWNED
REASSIGN OWNED BY old_role [, ...] TO new_role
So all I had to do was
REASSIGN OWNED BY hans TO stats;
done!
=head2 I suck at typing
So I hit C a lot when typing my queries.
Which works for short queries, but for longer ones it starts to get annoying soon:
=for newslide
select to_char(ra.created,'YYYY-MM-DD'),ba.is_verified,count(*) from
realm_account ra, base_acccount ba
where ba.id= ra.base_account and ra.created > '2015-06-01'
and ra.realm = 4
group by 1,2 order by 1,2;
=for newslide
select to_char(ra.created,'YYYY-MM-DD'),ba.is_verified,count(*) from
realm_account ra, base_a**ccc**ount ba
where ba.id= ra.base_account and ra.created > '2015-06-01'
and ra.realm = 4
group by 1,2 order by 1,2;
=head3 3: \e
C<\e> lets you edit the last statement with your favorite editor
B
So instead of hitting my arrow keys a lot to navigate to C, I can do
\e
:s/ccc/cc/
:wq
Much nicer!
=head2 From - To Dates
I quite often have to limit the validity of some data to a given time range.
Here's an example of a table where we store B
=for newslide
CREATE TABLE %%term%% (
uuid uuid primary key not null default uuid_generate_v4(),
internal_name text not null,
version integer not null default 1,
text text,
valid_from timestamp with time zone,
valid_until timestamp with time zone
)
=for newslide
CREATE TABLE %%term%% (
**uuid** uuid primary key not null default uuid_generate_v4(),
internal_name text not null,
version integer not null default 1,
text text,
valid_from timestamp with time zone,
valid_until timestamp with time zone
)
=for newslide
CREATE TABLE %%term%% (
uuid uuid primary key not null default uuid_generate_v4(),
**internal_name** text not null,
**version** integer not null default 1,
**text** text,
valid_from timestamp with time zone,
valid_until timestamp with time zone
)
=for newslide
CREATE TABLE %%term%% (
uuid uuid primary key not null default uuid_generate_v4(),
internal_name text not null,
version integer not null default 1,
text text,
**valid_from** @@timestamp with time zone@@,
**valid_until** @@timestamp with time zone@@
)
=for newslide
We need to know from when to when a term is valid.
So we can ask users to agree to the currently valid set of terms.
=for newslide
Now, we usually know when a term starts to be valid:
valid_from timestamp with time zone **default now()**,
But we don't know when it will end
=for newslide
So we could use C to indicate an unknown end-date
But this makes querying annoying
SELECT uuid FROM term WHERE
valid_until > now()
OR valid_until is NULL;
=for newslide
We could use a date in the far future as the default
valid_until timestamp with time zone **default '2100-01-01'**
Now we have a simple query again:
SELECT uuid FROM term WHERE valid_until > now();
=for newslide
But what if I happen to live until 2100-01-01?
I don't want to get a call from an angry customer during my New Year Party
especially not when I'm 124 years old...
=head3 4: infinity
I like infinity as a philosophical concept.
And I also like that Postgres date/time fields provide the special values C and C<-infinity>
valid_until timestamp with time zone **default 'infinity'**
=for newslide
We can still use the simple query:
SELECT uuid FROM term WHERE valid_until > now();
But now it will work forever.
Yay!
=head2 Querying From-To-Dates
Now, of course the real query to get the currently active terms needs to check C and C:
SELECT uuid FROM term WHERE
valid_from < now() AND valid_until > now();
This is not very complex, but I still mix up the correct comparison operators and/or the positioning of the fields
SELECT uuid FROM term WHERE
now() < valid_from AND valid_until < now();
wrong, but hard to see
=head3 5: between
a BETWEEN x AND y
is equivalent to
a >= x AND a <= y
=for newslide
SELECT uuid FROM term WHERE
now() BETWEEN valid_from AND valid_until;
Much nicer to read and understand
=head2 CSV Export
Sometimes a customer asks me for a one-off CSV export of some data.
I used to write a short script to get the data and format it as CSV.
I was young and needed the money.
=head3 6: COPY .. as CSV
copy (select stuff from table) to '/tmp/export.csv' (FORMAT CSV)
nice
copy (select stuff from table) to '/tmp/export.csv' (
FORMAT CSV,
DELIMITER '|',
HEADER
)
=head2 Job Queues / Event Notifications
Sooner or later your app will need to do something that takes a long time.
Longer than a user (or your web proxy) will wait.
So you want to offload this long-running task to an external program
tell the user to wait a bit
start the job
and wait for the job to be done
(or ask "Are we there yet?" every few seconds..)
In the end you want to present the result to the user.
=for newslide
As we live in the future, we now have lots of tools and frameworks to implement queues and event sourcing
AMQP, RabbitMQ, Redis, Kafka, ...
Yesterday we also learned about Resque
But sometimes you do not want to add yet another piece to your Jenga tower
=for newslide
So you implement a job queue in your Postgres database
=for newslide
CREATE TABLE %%job%% (
**id** serial primary key,
name text,
args jsonb,
is_running boolean not null default false,
is_finished boolean not null default false,
has_error boolean not null default false,
error text
)
=for newslide
CREATE TABLE %%job%% (
id serial primary key,
**name** text,
**args** jsonb,
is_running boolean not null default false,
is_finished boolean not null default false,
has_error boolean not null default false,
error text
)
=for newslide
CREATE TABLE %%job%% (
id serial primary key,
name text,
args jsonb,
**is_running** boolean not null default false,
**is_finished** boolean not null default false,
**has_error** boolean not null default false,
error text
)
=for newslide
INSERT INTO job (name,args) VALUES (
'fib',
'{"class":"CalcFib","number":"123"}'
);
=for newslide
# %%Worker%%
my $sth = $dbh->prepare(
"select id,name,args from job where
is_running = false and is_finished = false"
);
$sth->execute;
while (my ($id, $name, $args) = $sth->fetchrow_array) {
$dbh->do("update job set is_running = true where id = ?", undef, $id);
JobRunner->start($name, $args);
$dbh->do("update job set is_running = false, is_finished = true
where id = ?", undef, $id);
}
=for newslide
# Worker
my **$sth** = $dbh->prepare(
"**select** id,name,args from job where
@@is_running@@ = %%false%% and @@is_finished@@ = %%false%%"
);
$sth->execute;
while (my ($id, $name, $args) = $sth->fetchrow_array) {
$dbh->do("update job set is_running = true where id = ?", undef, $id);
JobRunner->start($name, $args);
$dbh->do("update job set is_running = false, is_finished = true
where id = ?", undef, $id);
}
=for newslide
# Worker
my $sth = $dbh->prepare(
"select id,name,args from job where
is_running = false and is_finished = false"
);
$sth->execute;
@@while@@ (my ($id, $name, $args) = **$sth**->%%fetchrow_array%%) {
$dbh->do("update job set is_running = true where id = ?", undef, $id);
JobRunner->start($name, $args);
$dbh->do("update job set is_running = false, is_finished = true
where id = ?", undef, $id);
}
=for newslide
# Worker
my $sth = $dbh->prepare(
"select id,name,args from job where
is_running = false and is_finished = false"
);
$sth->execute;
while (my ($id, $name, $args) = $sth->fetchrow_array) {
$dbh->**do**("**update** job set @@is_running@@ = %%true%% where id = ?", undef, $id);
JobRunner->start($name, $args);
$dbh->do("update job set is_running = false, is_finished = true
where id = ?", undef, $id);
}
=for newslide
# Worker
my $sth = $dbh->prepare(
"select id,name,args from job where
is_running = false and is_finished = false"
);
$sth->execute;
while (my ($id, $name, $args) = $sth->fetchrow_array) {
$dbh->do("update job set is_running = true where id = ?", undef, $id);
@@JobRunner@@->**start**($name, $args);
$dbh->do("update job set is_running = false, is_finished = true
where id = ?", undef, $id);
}
=for newslide
# Worker
my $sth = $dbh->prepare(
"select id,name,args from job where
is_running = false and is_finished = false"
);
$sth->execute;
while (my ($id, $name, $args) = $sth->fetchrow_array) {
$dbh->do("update job set is_running = true where id = ?", undef, $id);
JobRunner->start($name, $args);
$dbh->**do**("**update** job set @@is_running@@ = %%false%%, @@is_finished@@ = %%true%%
where id = ?", undef, $id);
}
=for newslide
There's so much wrong with this code...
Should use transactions
Should fork the JobRunner
which is very hard to get right regarding the transactions
it's too ugly to even think about the code any longer...
=head3 7: NOTIFY
Postgres provides a simple notification system that can be used to implement job queues
NOTIFY jobs, '{"class":"CalcFib","number":"123"}';
To receive those notification, you have to C:
LISTEN jobs;
=for newslide
And when a notification happens, you get:
Asynchronous notification "jobs" with payload
"{"class":"CalcFib","number":"123"}"
received from server process with PID 4428.
=for newslide
# @@Worker@@
$self->dbh->do( 'LISTEN jobs' );
while ( my $notification = $self->dbh->pg_notifies ) {
my ( $name, $pid, $payload ) = @$notification;
JobRunner->start($name, $payload);
}
=for newslide
# Worker
$self->dbh->do( '**LISTEN jobs**' );
while ( my $notification = $self->dbh->pg_notifies ) {
my ( $name, $pid, $payload ) = @$notification;
JobRunner->start($name, $payload);
}
=for newslide
# Worker
$self->dbh->do( 'LISTEN jobs' );
while ( my $notification = $self->@@dbh@@->**pg_notifies** ) {
my ( $name, $pid, $payload ) = @$notification;
JobRunner->start($name, $payload);
}
=for newslide
# Worker
$self->dbh->do( 'LISTEN jobs' );
while ( my **$notification** = $self->dbh->pg_notifies ) {
my ( $name, $pid, $payload ) = @$notification;
JobRunner->start($name, $payload);
}
=for newslide
# Worker
$self->dbh->do( 'LISTEN jobs' );
while ( my $notification = $self->dbh->pg_notifies ) {
my ( @@$name@@, @@$pid@@, @@$payload@@ ) = **@$notification**;
JobRunner->start($name, $payload);
}
=for newslide
# Worker
$self->dbh->do( 'LISTEN jobs' );
while ( my $notification = $self->dbh->pg_notifies ) {
my ( $name, $pid, $payload ) = @$notification;
@@JobRunner@@->**start**($name, @@$payload@@);
}
=for newslide
This very simple solution does not keep track of past jobs
and does not report errors
Job::Machine adds all of this, and some more
https://metacpan.org/pod/Job::Machine
it's of course implemented using C
=head3 NOTIFY for events
For a recent Microservice-based project, we had to implement some cross-database notifications
If some fields in one service are touched, a notification should be sent to some other services
=for newslide
CREATE TABLE %%thing%% (
id serial primary key,
modified timestamp with time zone not null default now(),
type text,
data jsonb
);
=for newslide
CREATE TABLE thing (
**id** serial primary key,
**modified** timestamp with time zone not null default now(),
**type** text,
data jsonb
);
=for newslide
CREATE TABLE thing (
id serial primary key,
modified timestamp with time zone not null default now(),
type text,
**data** jsonb
);
=for newslide
CREATE OR REPLACE **FUNCTION** @@thing_notify()@@ RETURNS TRIGGER AS $$
BEGIN
=for newslide
IF (%%TG_OP%% = '**DELETE**') THEN
PERFORM pg_notify(
'thing_notify',
TG_OP ||';'|| OLD.type ||';'|| OLD.id ||';'|| OLD.modified
);
RETURN OLD;
=for newslide
IF (TG_OP = 'DELETE') THEN
PERFORM **pg_notify**(
'@@thing_notify@@',
%%TG_OP%% ||';'|| OLD.%%type%% ||';'|| OLD.%%id%% ||';'|| OLD.%%modified%%
);
RETURN OLD;
=for newslide
ELSIF (%%TG_OP%% = '**UPDATE**') THEN
PERFORM pg_notify(
'thing_notify',
TG_OP ||';'|| NEW.type ||';'|| NEW.id ||';'|| NEW.modified
);
RETURN NEW;
=for newslide
ELSIF (TG_OP = 'UPDATE') THEN
PERFORM **pg_notify**(
'@@thing_notify@@',
%%TG_OP%% ||';'|| NEW.%%type%% ||';'|| NEW.%%id%% ||';'|| NEW.%%modified%%
);
RETURN NEW;
=for newslide
ELSIF (%%TG_OP%% = '**INSERT**') THEN
PERFORM pg_notify(
'thing_notify',
TG_OP ||';'|| NEW.type ||';'|| NEW.id ||';'|| NEW.modified
);
RETURN NEW;
=for newslide
ELSIF (TG_OP = 'INSERT') THEN
PERFORM **pg_notify**(
'@@thing_notify@@',
%%TG_OP%% ||';'|| NEW.%%type%% ||';'|| NEW.%%id%% ||';'|| NEW.%%modified%%
);
RETURN NEW;
=for newslide
END IF;
END;
$$ LANGUAGE plpgsql;
=for newslide
CREATE **TRIGGER** @@trigger_thing_notify@@
AFTER INSERT OR UPDATE OR DELETE
ON thing
FOR EACH ROW
EXECUTE PROCEDURE thing_notify();
=for newslide
CREATE TRIGGER trigger_thing_notify
**AFTER INSERT OR UPDATE OR DELETE**
ON thing
FOR EACH ROW
EXECUTE PROCEDURE thing_notify();
=for newslide
CREATE TRIGGER trigger_thing_notify
AFTER INSERT OR UPDATE OR DELETE
ON @@thing@@
FOR EACH ROW
EXECUTE PROCEDURE thing_notify();
=for newslide
CREATE TRIGGER trigger_thing_notify
AFTER INSERT OR UPDATE OR DELETE
ON thing
FOR EACH ROW
**EXECUTE PROCEDURE** @@thing_notify()@@;
=for newslide
db=# LISTEN thing_notify;
=for newslide
db=# LISTEN thing_notify;
LISTEN
=for newslide
db=# LISTEN thing_notify;
LISTEN
db=# INSERT INTO thing (type,data) VALUES
('Fib','{"class":"CalcFib","number":"123"}');
=for newslide
db=# LISTEN thing_notify;
LISTEN
db=# INSERT INTO thing (type,data) VALUES
('Fib','{"class":"CalcFib","number":"123"}');
Asynchronous notification "thing_notify" with payload
"**INSERT**;%%Fib%%;@@4@@;**2015-10-28 20:52:41.581392+01**"
received from server process with PID 4428.
=for newslide
domm=# UPDATE thing SET type = 'Test' WHERE id = 4;
=for newslide
domm=# UPDATE thing SET type = 'Test' WHERE id = 4;
Asynchronous notification "thing_notify" with payload
"**UPDATE**;%%Test%%;4;2015-10-28 14:47:43.866663+01"
received from server process with PID 4428.
=for newslide
Here's some Perl code to handle those notifications in a Worker process:
=for newslide
sub run {
my $self = shift;
$log->infof( "Starting up Notifier, listening on %s",
$self->channel );
$self->dbh->do( 'LISTEN ' . $self->channel );
my $checker = AnyEvent->timer(
interval => 1,
cb => sub {
$self->get_notifications;
}
);
AnyEvent->condvar->recv;
}
=for newslide
sub run {
my $self = shift;
$log->infof( "Starting up Notifier, listening on %s",
$self->channel );
$self->dbh->do( '**LISTEN** ' . $self->@@channel@@ );
my $checker = AnyEvent->timer(
interval => 1,
cb => sub {
$self->get_notifications;
}
);
AnyEvent->condvar->recv;
}
=for newslide
sub run {
my $self = shift;
$log->infof( "Starting up Notifier, listening on %s",
$self->channel );
$self->dbh->do( 'LISTEN ' . $self->channel );
my $checker = **AnyEvent**->timer(
interval => 1,
cb => sub {
$self->get_notifications;
}
);
AnyEvent->condvar->recv;
}
=for newslide
sub run {
my $self = shift;
$log->infof( "Starting up Notifier, listening on %s",
$self->channel );
$self->dbh->do( 'LISTEN ' . $self->channel );
my $checker = AnyEvent->**timer**(
interval => 1,
cb => sub {
$self->get_notifications;
}
);
AnyEvent->condvar->recv;
}
=for newslide
sub run {
my $self = shift;
$log->infof( "Starting up Notifier, listening on %s",
$self->channel );
$self->dbh->do( 'LISTEN ' . $self->channel );
my $checker = AnyEvent->timer(
@@interval@@ => **1**,
cb => sub {
$self->get_notifications;
}
);
AnyEvent->condvar->recv;
}
=for newslide
sub run {
my $self = shift;
$log->infof( "Starting up Notifier, listening on %s",
$self->channel );
$self->dbh->do( 'LISTEN ' . $self->channel );
my $checker = AnyEvent->timer(
interval => 1,
@@cb@@ => **sub** %%{%%
$self->get_notifications;
%%}%%
);
AnyEvent->condvar->recv;
}
=for newslide
sub run {
my $self = shift;
$log->infof( "Starting up Notifier, listening on %s",
$self->channel );
$self->dbh->do( 'LISTEN ' . $self->channel );
my $checker = AnyEvent->timer(
interval => 1,
cb => sub {
@@$self@@->**get_notifications**;
}
);
AnyEvent->condvar->recv;
}
=for newslide
sub **get_notifications** {
my $self = shift;
while ( my $ret = $self->dbh->pg_notifies ) {
my $payload = $ret->[2];
$self->handle_notification( $payload );
}
}
=for newslide
sub get_notifications {
my $self = shift;
@@while@@ ( my $ret = $self->dbh->**pg_notifies** ) {
my $payload = $ret->[2];
$self->handle_notification( $payload );
}
}
=for newslide
sub get_notifications {
my $self = shift;
while ( my $ret = $self->dbh->pg_notifies ) {
my @@$payload@@ = **$ret->[2]**;
$self->handle_notification( $payload );
}
}
=for newslide
sub get_notifications {
my $self = shift;
while ( my $ret = $self->dbh->pg_notifies ) {
my $payload = $ret->[2];
$self->**handle_notification**( @@$payload@@ );
}
}
=for newslide
sub **handle_notification** {
my ( $self, $payload ) = @_;
my ( $op, $type, $id, $ts ) = split( /;/, $payload, 4 );
# do something with $type & $id
}
=for newslide
sub handle_notification {
my ( $self, $payload ) = @_;
my ( $op, $type, $id, $ts ) = **split**( /@@;@@/, %%$payload%%, 4 );
# do something with $type & $id
}
=for newslide
sub handle_notification {
my ( $self, $payload ) = @_;
my ( **$op**, %%$type%%, @@$id@@, **$ts** ) = split( /;/, $payload, 4 );
# do something with $type & $id
}
=for newslide
sub handle_notification {
my ( $self, $payload ) = @_;
my ( $op, $type, $id, $ts ) = split( /;/, $payload, 4 );
# **do something** with $type & $id
}
=head2 Stupid queries
=for img wundertuete_1.png
=for newslide
=for img wundertuete_detail.png
=for newslide
getting this status was already implemented for a single thing, so for the list we just called the relevant endpoint for each item
dumb!
=for newslide
=for img wundertuete_viel.png
=for newslide
Lots of HTTP requests
Very slow
=for newslide
So we packed all of the status-requests into one http request
and used the old per-thing code
It turned out that the performance problem was not caused by the HTTP overhead, or the user initiation
it was caused by all of the checks taking ~.5 seconds per thing
=for newslide
So a HTTP request to get the status of 10 things would take 5 seconds
Nginx could not be bothered to wait that long and returned a timeout
Not so good.
=for newslide
I did some thinking and replaced the iterative per-thing status check with a rather long query
so we can issue one HTTP request containing a set of ids
which shall return the user-status for each campaign
And this query uses...
=head3 8: Common Table Expressions
I like to think of CTEs as very temporary helper tables, and they can help making complex things doable.
=for newslide
So, with some hand-waving, here is the schema this example is based on:
=for newslide
CREATE TABLE %%campaign%% (
id serial primary key,
uuid uuid NOT NULL,
title text NOT NULL,
type text NOT NULL,
valid_from timestamp with time zone,
valid_until timestamp with time zone,
data jsonb
);
=for newslide
CREATE TABLE campaign (
**id** serial primary key,
**uuid** uuid NOT NULL,
title text NOT NULL,
type text NOT NULL,
valid_from timestamp with time zone,
valid_until timestamp with time zone,
data jsonb
);
=for newslide
CREATE TABLE campaign (
id serial primary key,
uuid uuid NOT NULL,
**title** text NOT NULL,
**type** text NOT NULL,
valid_from timestamp with time zone,
valid_until timestamp with time zone,
**data** jsonb
);
=for newslide
CREATE TABLE campaign (
id serial primary key,
uuid uuid NOT NULL,
title text NOT NULL,
type text NOT NULL,
**valid_from** timestamp with time zone,
**valid_until** timestamp with time zone,
data jsonb
);
=for newslide
CREATE TABLE %%usr_campaign_participation%% (
uuid uuid DEFAULT uuid_generate_v4() NOT NULL,
created timestamp with time zone DEFAULT now() NOT NULL,
usr integer,
campaign integer
);
=for newslide
CREATE TABLE usr_campaign_participation (
uuid uuid DEFAULT uuid_generate_v4() NOT NULL,
created timestamp with time zone DEFAULT now() NOT NULL,
**usr** integer,
**campaign** integer
);
=for newslide
The questions I want answered are:
Which campaigns are currently active, expired or pending?
Has the user already participated in any of those campaign?
If it's a sweepstake, how many tickets does the user own?
How many tickets have been issued in total?
=head4 Which campaigns are currently active, expired or pending?
SELECT
c.uuid,
c.type,
now() < c.valid_from AS pending,
now() > c.valid_until AS expired,
c.data->'points' AS points
FROM
campaign c
WHERE
c.uuid in ($uuids)
=for newslide
SELECT
c.uuid,
c.type,
**now() < c.valid_from** AS @@pending@@, %%-- boolean%%
now() > c.valid_until AS expired,
c.data->'points' AS points
FROM
campaign c
WHERE
c.uuid in ($uuids)
=for newslide
SELECT
c.uuid,
c.type,
now() < c.valid_from AS pending,
now() > c.valid_until AS expired,
**c.data->'points'** AS points %%-- grab into json object%%
FROM
campaign c
WHERE
c.uuid in ($uuids)
=head4 How many tickets have been issued in total?
SELECT
c.uuid, count(*) AS total_cnt
FROM
usr_campaign_participation p, campaign c
WHERE
c.id = p.campaign
AND c.type like 'sweepstake%'
AND c.uuid in ($uuids)
GROUP BY 1
=head4 How many tickets have been issued to the user?
special case of
=head4 Which campaigns has a user participated in?
SELECT
c.uuid, count(*) AS usr_cnt
FROM
usr_campaign_participation p, campaign c
WHERE
c.id = p.campaign
AND p.usr = ?
AND c.uuid in ($uuids)
GROUP BY 1
=head3 Pulling it together
WITH some_name AS (
SELECT id,something FROM somewhere
)
SELECT
table.a_value, some_name.something
FROM
table, some_name
WHERE
table.id = some_name.id
=for newslide
%%WITH%% **some_name** %%AS%% (
@@SELECT id,something FROM somewhere@@
)
SELECT
table.a_value, some_name.something
FROM
table, some_name
WHERE
table.id = some_name.id
=for newslide
WITH **some_name** AS (
SELECT something FROM somewhere
)
%%SELECT%%
table.a_value, **some_name**.something
FROM
table, **some_name**
WHERE
table.id = **some_name**.id
=for newslide
WITH participation_usr AS (
SELECT c.uuid, count(*) AS usr_cnt ...
),
participation_total AS (
SELECT c.uuid, count(*) AS total_cnt ...
)
SELECT
participation_total.total_cnt,
participation_usr.usr_cnt,
c.uuid, c.type, ...
FROM
campaign c
LEFT JOIN
participation_usr ON participation_usr.uuid = c.uuid
LEFT JOIN
participation_total ON participation_total.uuid = c.uuid
WHERE
c.uuid in ($uuids)
=for newslide
%%WITH%% **participation_usr** %%AS%% (
SELECT c.uuid, count(*) AS usr_cnt ...
),
**participation_total** %%AS%% (
SELECT c.uuid, count(*) AS total_cnt ...
)
SELECT
participation_total.total_cnt,
participation_usr.usr_cnt,
c.uuid, c.type, ...
FROM
campaign c
LEFT JOIN
participation_usr ON participation_usr.uuid = c.uuid
LEFT JOIN
participation_total ON participation_total.uuid = c.uuid
WHERE
c.uuid in ($uuids)
=for newslide
WITH participation_usr AS (
SELECT c.uuid, count(*) AS usr_cnt ...
),
participation_total AS (
SELECT c.uuid, count(*) AS total_cnt ...
)
%%SELECT%%
**participation_total**.total_cnt,
**participation_usr**.usr_cnt,
c.uuid, c.type, ...
FROM
campaign c
LEFT JOIN
participation_usr ON participation_usr.uuid = c.uuid
LEFT JOIN
participation_total ON participation_total.uuid = c.uuid
WHERE
c.uuid in ($uuids)
=for newslide
WITH participation_usr AS (
SELECT c.uuid, count(*) AS usr_cnt ...
),
participation_total AS (
SELECT c.uuid, count(*) AS total_cnt ...
)
SELECT
participation_total.total_cnt,
participation_usr.usr_cnt,
c.uuid, c.type, ...
FROM
campaign c
**LEFT JOIN**
participation_usr ON participation_usr.uuid = c.uuid
**LEFT JOIN**
participation_total ON participation_total.uuid = c.uuid
WHERE
c.uuid in ($uuids)
=for newslide
type | total | usr | pending | expired
---------------------+-------+-----+---------+---------
coupon/absolute | | | t | f
coupon/absolute | | | f | f
coupon/free | | 1 | f | f
coupon/percent | | | f | t
sweepstake/standard | 9028 | 1 | f | f
sweepstake/standard | 4692 | 2 | f | f
sweepstake/standard | 326 | | f | f
sweepstake/standard | 2444 | 1 | f | f
=for newslide
=for img wundertuete_avg_req_time.png
=head2 Trees
=for img stats_tree.png
=for newslide
Generating trees is sort of hard
it's easier when using recursion
Postgres supports recursive queries
=head3 9: WITH RECURSIVE
Now, I cannot say that I really understand recursive queries completely
To me it feels more like I'm trying a magic spell after seeking a glance of the instructions from my master's spell book:
It seems to work, but I might summon a daemon by mistake..
=for newslide
WITH RECURSIVE tree (id, parent, internal_name, level, path) AS (
SELECT root_node
UNION
SELECT recursion
JOIN some_name on some_name.field = root.other_field
)
SELECT * from tree order by path;
=for newslide
SELECT root_node
UNION
SELECT recursion
JOIN some_name on some_name.field = root.other_field
=for newslide
-- root_node
SELECT s.id, s2s.parent, s.internal_name, 0, s.internal_name
FROM source s
LEFT JOIN source_to_source s2s
ON s.id = s2s.child
WHERE s.id = 1244638
=for newslide
UNION
=for newslide
-- recursion
SELECT
s2s.child, s2s.parent, s.internal_name,
level +1, path || '/' || s.internal_name
FROM source_to_source s2s
JOIN source s
ON s.id = s2s.child
JOIN tree
ON s2s.parent = tree.id
=for newslide
SELECT * from tree order by path;
=for newslide
WITH RECURSIVE tree
(id, parent, internal_name, level, path) AS (
SELECT s.id, s2s.parent, s.internal_name, 0 ,s.internal_name
FROM source s
LEFT JOIN source_to_source s2s
ON s.id = s2s.child
WHERE s.id = 1244638
UNION
SELECT
s2s.child, s2s.parent, s.internal_name,
level +1, path || '/' || s.internal_name
FROM source_to_source s2s
JOIN source s
ON s.id = s2s.child
JOIN tree
ON s2s.parent = tree.id
)
SELECT * FROM tree order by path
=for newslide
id | parent | internal_name | level | path
---------+---------+---------------+-------+-------------------------------------------
1254195 | | 10403 | 0 | 10403
536977 | 1254195 | 10203 | 1 | 10403/10203
1167420 | 1254195 | 10375 | 1 | 10403/10375
1244638 | 1254195 | 10400 | 1 | 10403/10400
1169241 | 1244638 | 10377 | 2 | 10403/10400/10377
1075499 | 1169241 | 10321 | 3 | 10403/10400/10377/10321
1259720 | 1169241 | 10404 | 3 | 10403/10400/10377/10404
1295846 | 1259720 | 10410 | 4 | 10403/10400/10377/10404/10410
1270124 | 1295846 | 10407 | 5 | 10403/10400/10377/10404/10410/10407
627220 | 1270124 | 10227 | 6 | 10403/10400/10377/10404/10410/10407/10227
640892 | 1270124 | 10230 | 6 | 10403/10400/10377/10404/10410/10407/10230
1181861 | 1244638 | 10378 | 2 | 10403/10400/10378
1083882 | 1181861 | 10323 | 3 | 10403/10400/10378/10323
1084352 | 1181861 | 10324 | 3 | 10403/10400/10378/10324
=head2 Questions?
Or do you want to share your favourite Postgres feature?
=head2 Bonus: Answers to previous questions
NOTIFY payload has a max length of 8000 bytes.
Instead of concatenating the payload, I could use C. Which might be problematic given the 8000 bytes limit...
We can use row-level locking with the new SKIP LOCKED option to implement a saner job queue using a table.
=head2 Bonus: Typos aborting transactions
domm=#
=for newslide
domm=# BEGIN;
BEGIN
domm=#
=for newslide
domm=# BEGIN;
BEGIN
domm=# insert into test (foo) value ('bar');
ERROR: syntax error at or near "value"
LINE 1: insert into test (foo) value ('bar');
domm=#
=for newslide
domm=# BEGIN;
BEGIN
domm=# insert into test (foo) value ('bar');
ERROR: syntax error at or near "value"
LINE 1: insert into test (foo) value ('bar');
domm=# insert into test (foo) value**s** ('bar');
=for newslide
domm=# BEGIN;
BEGIN
domm=# insert into test (foo) value ('bar');
ERROR: syntax error at or near "value"
LINE 1: insert into test (foo) value ('bar');
domm=# insert into test (foo) value**s** ('bar');
ERROR: current transaction is aborted, commands ignored
until end of transaction block
=head3 10: ON_ERROR_ROLLBACK = interactive
domm=# \set ON_ERROR_ROLLBACK interactive
=for newslide
domm=# \set ON_ERROR_ROLLBACK interactive
domm=# BEGIN;
BEGIN
domm=#
=for newslide
domm=# \set ON_ERROR_ROLLBACK interactive
domm=# BEGIN;
BEGIN
domm=# insert into test (foo) value ('bar');
ERROR: syntax error at or near "value"
LINE 1: insert into test (foo) value ('bar');
domm=#
=for newslide
domm=# \set ON_ERROR_ROLLBACK interactive
domm=# BEGIN;
BEGIN
domm=# insert into test (foo) value ('bar');
ERROR: syntax error at or near "value"
LINE 1: insert into test (foo) value ('bar');
domm=# insert into test (foo) value**s** ('bar');
=for newslide
domm=# \set ON_ERROR_ROLLBACK interactive
domm=# BEGIN;
BEGIN
domm=# insert into test (foo) value ('bar');
ERROR: syntax error at or near "value"
LINE 1: insert into test (foo) value ('bar');
domm=# insert into test (foo) value**s** ('bar');
INSERT 0 1
domm=# COMMIT;
=for newslide
I learned that during Postgres Conf 2015 from Florian Helmberger / https://25th-floor.com/
=for newslide
=head2 Bonus: \x, \o, \t
=head3 \x
C<\x> toggle "expandend mode"
Instead of one long line per row (which can be hard to read)
you get tabular output
=for newslide
knowledge=# select * from stats.field limit 1;
=for newslide
id | created | modified |
is_deleted | universe | internal_name | title | is_required | formula
| type | description | formhandler
| has_multival | valid_from |
valid_until | calc_type | is_source_attribute | is_clonable | include_in_token_export
------+-------------------------------+-------------------------------+----
--------+----------+---------------+------------+-------------+---------+----
-----+--------------+----------------------------------------------------------
-------------------+--------------+------------------------+-------------+--------
---+---------------------+-------------+-------------------------
1150 | 2015-07-14 16:34:48.02521+02 | 2015-07-14 16:34:48.02521+02 | f
| 1100 | some_text | Some Text | t | | Text | a
test field | {"name":"some_text","label":"Some Text","disabled":"0","type":"Text"}
| f | 2015-07-01 02:00:00+02 | infinity | skip | f
| f | f
=for newslide
knowledge=# \x
=for newslide
knowledge=# \x
knowledge=# select * from stats.field limit 1;
=for newslide
-[ RECORD 1 ]-----------+----------------------------------------------------------------------
id | 1150
created | 2015-07-14 16:34:48.02521+02
modified | 2015-07-14 16:34:48.02521+02
is_deleted | f
universe | 1100
internal_name | some_text
title | Some Text
is_required | t
formula |
type | Text
description | a test field
formhandler | {"name":"some_text","label":"Some Text","disabled":"0","type":"Text"}
has_multival | f
valid_from | 2015-07-01 02:00:00+02
valid_until | infinity
calc_type | skip
is_source_attribute | f
is_clonable | f
include_in_token_export | f
=head3 \o
output to a file
=for newslide
knowledge=# \o /tmp/foo
=for newslide
knowledge=# \o /tmp/foo
knowledge=# select * from stats.field;
=head3 \t
Tuples only
This turns off the "header" and "footer"
name of the columns and number of rows
=for newslide
knowledge=# select id from stats.field limit 1;
=for newslide
knowledge=# select id from stats.field limit 1;
id
------
1150
(1 row)
=for newslide
knowledge=# \t
=for newslide
knowledge=# \t
knowledge=# select id from stats.field limit 1;
=for newslide
knowledge=# \t
knowledge=# select id from stats.field limit 1;
1150
=for newslide
Great when you want to dump some ids to a file for processing with some other tools
=for newslide
knowledge=# \t
=for newslide
knowledge=# \t
knowledge=# \o /tmp/foo
=for newslide
knowledge=# \t
knowledge=# \o /tmp/foo
knowledge=# select id from stats.field;
Now C will only contains ids
=head2 Thanks