Things I learned at PGConf.EU 2022

2022-10-28
⇦
⇧
⇨

The biggest yearly European Postgres, PGConf.EU took place from 26th to 28th October in Berlin. As I have a bunch of friends in Berlin and like Postgres, I decided to attend. I also submitted a talk but wasn't accepted (which in the end did not surprise me, as they got more than 260 submissions and could only accept ~80). So after a nice and productive 8h train ride from Vienna to Berlin, I spend three very informative days at the conference, met up with my Perl friends in a nice beer garden with a gigantic chestnut tree, and also spend a few nice evenings / breakfasts with various old friends who moved to Berlin ages ago. I got to cycle around quite a lot (very nice), but missed the Critical Mass on Friday (hanging around with friends was nicer...)

Here are my slightly edited notes about what I learned (or at least try to remember) and/or found funny, directly from the train ride back to Vienna. The notes might contain errors and misunderstandings, so please do check out the slides of the talks, which are hopefully linked from the respective talk pages.

Day1

Efficient Graph Analysis with SQL/PGQ - Peter Boncz
- Happy that I don't have enough data to worry about CPU cache hits
- Cypher (graph query language) looks .. complex
- "ASCII Art graph syntax" : MATCH (x:Account)−[:isLocatedIn]−>(g:City)<−[:isLocatedIn]−(y:Account)
- SQL/PGQ is a new standard based on Cypher and similar prior art
- could be done with with recursive but comparable recursive queries are much more complex
- But it would be nice to have it in Postgres (and I think it's planned?)
Changelog vs Two-dimensional time - what works better? - Henrietta Dombrovskaya
- Similar to "Temporal Tables" (which is an SQL standard to limit the validity of data to a given time range), but with one more dimension (hence "bi-temporal")
- https://github.com/hettie-d/pg_bitemporal
- Can be done in Postgres thanks to support for range types, infinity and GIST index
- "You can do that easily in one gigantic CTE"
- SELECT * FROM bitemp WHERE now()<@asserted and now()<@effective
BRIN improvements and new opclasses - Tomas Vondra
- BTREEs are nice and useful, but large (on disk)
- BRIN only works (well) with proper data, eg append-only timestamps
- minmax_multi to work around outliers destorying index performance
- bloom-filter, good for uuid (do we need that for access-by-uuid? - not for primary key, so still need to check)
- "It's all about being smarter by not doing work that doesn't need to be done"
Performance tips you have never seen before - Hans-Jürgen Schönig
- latency matters (connect via localhost vs socket)
- column order matters for storage (first fixed size (int) due to "alignment", then relevant (esp for broad tables))
- index: hashtext(email), then select ... where hashtext(email) = hashtext('serach@email') and email ='search@email'
- pgstattuple
- FDW fetch_size
- Fulltext index needs vacuum regularly (because new entries will be put into a pending list, which has to be seqscanned after checking the index and is only merged during vacuum)
Understanding the impact of Materialized CTEs - Divya Sharma
- CTE always materialized to temporary table (before 12), 12+ has conditional materialization
- work_mem, enable_master
- recursive (or multiple references), "with sideeffect", MATERIALZE => always materialize
- generate_series is not only used by consultants (as suggested by Hans-Jürgen Schönig earlier), but in ~50% of talks at pgconfeu :-)
- NOT MATERIALIZED -> inline even though planer would not inline, which can be faster depending on indexes and the CTE. "It depends"
No Array of Sunshine: an Introduction to Data Types from a Data Quality Perspective - Renee Phillips
- Goldilocks and the tree data types => other talk to check out?
- Postgres is using 1-based array
- Don't use "NULL" as your vanity license plate
- True duplicates vs fake duplicates (eg times without timezone on timezone change)
- "You solve one problem, you create at least one more"
- Good talks, but a bit to basic for me
Inside the machine room of a world map: PostgreSQL and OpenStreetMap - Sarah Hoffmann
- OSM is not using PostGIS!!
- it has three main data types: node, way, relation
- OSM is "a experiment to get away with as little as possible to achieve world domination"
- it is BIG!
- World-Dump takes ~2 days, so they are generated from a restored backup
- tileID bit-interleaved lat/lon
- 9.5, because of index in transaction-id (which is not allowed after 9.6)
- osm2pgsql

Day 2

A comparison of PostgreSQL backup tools - Matt Pearson
- Don't store backups on same partition as the data
- pg_dump for migrations but not (good) backup, slow
- pg_dumpall: all databases, includes roles
- Barman
- pgBackRest
- Unfortunately there were a few easily avoidable speaker errors: very small font size on the flowchart, don't turn back on audience
How to handle 1000 application users - Laurenz Albe
- always nice to hear Austrian English accents
- connection to DB is expensive, takes lot of server resources
- lot of connections makes SQL statements more expensive (because each idle connection has to be checked
- work_mem is per connections, so one cannot have high work_mem with lots of connection
- pg_stats_activity wait_events is probably bad
- no way to limit the number of active connections
- => connection pool
- less active connection = less CPU task switching
- prepared statements cannot be use with transactional pooling. Fix: put statement into PLSQL function
- application server pooling possible (but probably not for plack when we fork lots of processes)
- pgBouncer
- no more active connections than CPU-Cores / parallel I/O requests
- short queries; idle_in_transaction time should be short
"How do you put an elephant in a container in 3 steps?"
- room full, so could not attend
Neon, cloud-native storage backend for PostgreSQL - Heikki Linnakangas
- "postgres without storage" / separate compute from storage
- not in Postgres, needs patches and extension
- very cool, but probably not something I'll use
- "works very well on the powerpoint platform" :-)
MVCC Unmasked - Bruce Momjian
- Multi Version Concurrency Control
- reader don't block writers, writers don't block readers
- function vs procedure: all functions use same snapshot, procedures can "commit" snapshots
- very nice snapshot rule / row visibility slide
- transaction (ids) have an "aborted" flag, so the txnid still is in xmax, but ignored
- another nice slide about page layout, item-pointer, data; dead items, index pointer
- Very good talk explaining MVCC very well!
Google AlloyDB vs. Amazon Aurora vs. Azure Hyperscale: comparison of databases build for clouds - Taras Kloba
- Nice summary slide at the start (with different types of cars)
- Each provider seem to change quite a bit in Postgres, not sure I like that
- Not too relevant for me, to cloudy
- But I found a power outlet and migrated my mastodon account to (https://social.linux.pizza/@domm)[https://social.linux.pizza/@domm]
Time-series, PostgreSQL, and You
- small room, lots of persons => this conf needs a better "query planner" :-)
- again I could not attend
What happens when stack overflow doesn’t have an answer? comparing ST_within & H3 for spatial queries - Kshitij Purwar
- Power plant emission stats by analysing arial / satelite images and checking exhausts plumes, very smart!
- Use H3 (by Uber) to improve performance of a spatial query
- uses a hexagonal grid
- H3 is hard to install
- audience comment: alternatives could be st_simplified / st_subdivide

Berlin.pm

At the Berlin.pm meeting I had a nice few hours of chats, good food and drinks (well, Fassbrause), found a big fan of oe1.orf.at and a wrench to fix my bike and also learned about * F4Maps * The Critical Mass mobile app which allows you to share your current location and see the location of other riders in the area, so you can find where the CM is currently (nice if you missed the start or want to wait until they pass by your current location).

Day 3

Full Page Writes in PostgreSQL - Critical, Evil or Both - Why you should care and how to optimize - Grant McAlister
- If you turn off full_pages_writes, your database will be much faster. But you'll have data corruption :-)
- I know nothing about WAL/pagewrites/checkpoints, ...
- touching pages is bad, inserting/indexing uuids touches a lot of pages because they are random
- nice slide on partitioning to reduce block size
- I should read some intro docs...
Practical transactions theory for PostgreSQL users - Ilya Kosmodemiansky
- ACID (ACI = memory, D = disk)
- Phase Locking: slow, deadlock ("Reißverschlussverfahren")
- Multi Version 2 Phase Locking = MVCC
- hah, more page explanation: data in 8kb chunks
- xmin/xmax again :-)
- still hard to read xmax not as xmas
- transaction handling happens in SharedMem, so Postgres processes one one machine can talk to each other fast
- "An elephant never forgets"
- Advisory locks for rare critical cases
- select xmin from table just works, I thought it needs some magic, but those fields are just invisible internal columns
Breaking the sound barrier - Applications at light speed - Henrietta Dombrovskaya, Jan Karremans
- data -> program -> data (punchcards)
- programs <-> data, needs DBMS (concurrency, security, ..)
- DBMS = "specialised software designed to manage data in the most efficient way"
- "Database agnostic Apps" are a hoax, no one switches databases (my comment: But there are database agnostic apps like CMS that target multiple DBMS)
- 3Ps: Performance, Performance, Performance
- Devs: "Database is the source of all slowness"; DBAs: "Yes, if devs write stupid queries or let even stupider ORMs generate the stupidest SQL" (my words)
- Relational DB and OO App hate each other = object relation impedance mismatch
- Business Logic split into "Application business logic" (in the app) and "Data business logic" (in the DBMS)
- NORM - No ORM
- App <=> Transfer Model (JSON) <=> DB
- Generate DB functions from JSON Schema
- Audience Comment: Devs don't think with the relation model, Maybe relational DBMS are the problem?
Lunch break
- What are the odds of two people meeting at 600 person conference that not only know about MARC21 034$s + $t, but have recently submitted some patches regarding this field?
- Whatever the odds, it did happen :-)
Table Partitioning - Transparent but No Magic - Boriss Mejias
- The first known partitioning implementation was Moses receiving the 10 commandments on two stone tablets
- timestamp(0) - no milliseconds, not needed for a lot of things, smaller tables, better
- Very nice use of a game controller to illustrate that an index does not fix into RAM
- [di, dj[ - inclusive range vs exclusive range
- partitions are transparent, so select count(*) has to go through all partitioned tables
- dynamic pruning: select where date > now() - 1 month will pick Oct and Sept
- partition key does not need to be in where clause, could be in order_by
- might plan all the partition, but does not execute all of them ("only clean up kitchen")
- getting the query right is fundamental, CTEs help, but actual values seem to have to be used via subselect instead of join
- partition large table when the have a logical partition key; need to use that key!!
- partition by hash (eg hashed user id, modulo(n))
The curious case of the point of sales, and why we still need pglogical - Jaime Casanova
- replication data to a lot of POS over bad network
- pglogical.forward_origins as a very weird data structure (empty array, or empty array with exactly one value all)
- if central is not enough, add another central2
Lightning Talks
- "oh du lieber execute, alles ist hin"
- faceting, roaring bitmaps => pgfaceting
- There are Postgres shibboleth in hiring (just like in PERL^wperl^wPerl)
- pg_crash - if your database is too reliable: "crash as a service"
- insert .. on conflict does nothing, whereas insert with error generates dead tuples (which is bad for page writes etc)
- postgres.conf settings should be adapted for modern hard drives (because defaults target the lowest common denominator)
- random_page_cost (1 or 1.1) >= seq_page_count
- max_wal_size should be bigger than 1G ? but my DB is only 10GB?
- autovacuum_vacuum_cost_delay 2ms

General notes

Scheduling is hard
- There weren't a lot of beginner talks (but I guess the conference is not targeted at beginners..)
- Some talks referenced each other, but where in the "wrong order", i.e. a very detailed talk before a more general introduction.
I need to talk with the orgas if we (Perl) can use their conference software for our events
Some stats from the closing speech:
- 600 attendees, 599 showed up (I just wonder if these where the same 599 people...)
- 150 speakers / 260 talks submitted
My new varifocal glasses (yes, I'm that old) don't work well with the setup in the smaller rooms (flat floor and low hanging monitors) because I have to tilt my head to see above the heads in front of me, which results in me viewing through the "near" zone of the multifocals, which makes the slides (which are not near..) hard to read.
Is it now considered polite to thank speakers for their talk when asking a question, or is this a Postgresconf thing?
Lot of DBAs, few devs, and they (DBAs) don't like us (Devs) very much :-)
I used to mostly attend Perl conferences, where I'm quite well known and know a lot of people (inside jokes), and spend less time in talks and more in the hallway track. It's an interesting experience to once again attend an event as some random person with hardly any connections to other attendees.

As always: A big Thank-you to the orgas, speakers, sponsors and attendees!