/ domm

I hack Perl for fun and profit.

Follow me on twitter!
Atom Icom ... on Atom!
<<<<<<<<<<
30.01.2016: App::TimeTracker::Command::Category
24.11.2015: Vienna.pm TechMeet
04.09.2015: Things I learned at YAPC::Europe 2015 in Granada
01.08.2015: Announcing Measure::Everything
26.07.2015: Announcing InfluxDB::LineProtocol
02.07.2015: How to notify 1/14th of your users every fortnight

UPDATE: @pndc spotted a (rather obvious) bug in my code, see this tweet (which I cannot embed for some reason). I leave my failed version as is (so I can still rant about week numbers...), but add another section that is actually working.

A few days ago I had to write a new spam script notifier. The script should spam inform users of new campaigns available on the website. Each user should get a mail every 14 days, and all users should be notified in a given 14-day interval. So basically I have to partition our users into 14 more-or-less evenly sized sets, and on each day select the correct set of users to spam notify. And I don't want to store the partition decision for each user, so it has to be calculable and persistent (i.e. I want the same result for each user every time I calculate her partition).

Of course modulo is the right tool for such a problem. I just need something random enough to modulo with 14, then I will get the users partitioned into 14 sets. The user id was my first try, but it's not random enough (because we use shared sequences, and always prefetch a batch of sequences from the DB (some of which get discarded if the DB connection is reset)). But the creation time (as epoch seconds) proved to be random enough:

SELECT
    (EXTRACT(EPOCH FROM usr.created)::int % 14) AS partition,
    count
  FROM usr
  GROUP BY 1 ORDER BY 2;

SQL!!

Here's a tiny bit more complicated query that calculates the relative count (percentage) using two CTEs (Common Table Expression) / WITH clauses

WITH
  total AS (
    SELECT count(*) AS total FROM usr
  ),
  raw AS (
    SELECT
      (EXTRACT(EPOCH FROM usr.created)::int % 14) AS partition,
      count(*) AS absolute
    FROM usr GROUP BY 1 ORDER BY 2
  )
  SELECT
    raw.partition,
    100 * raw.absolute::float / total.total AS percentage
  FROM raw,total;

So now I can show you the results without leaking our user numbers:

 partition |    percentage
-----------+------------------
         0 | 7.02596825479004
         7 | 7.09154683475492
         5 | 7.10103847132879
        10 | 7.11613880224175
         1 | 7.12821906697212
        13 | 7.12951338105038
        11 | 7.14504514998943
        12 |  7.1459080260416
         9 | 7.15798829077197
         8 | 7.15885116682414
         6 | 7.17740300194579
         3 | 7.19595483706743
         4 | 7.19983777930219
         2 | 7.22658693691945

Compare this to the results based on the user id:

 partition |    percentage
-----------+------------------
         3 | 6.98109870007723
        13 | 6.98713883244241
         1 |  6.9931789648076
         7 | 7.00612210559014
         5 | 7.02639969281613
         9 |  7.0298511970248
        11 |  7.0328712632074
        10 | 7.18991470470224
         4 | 7.24298158191067
         6 | 7.28137956623221
         8 | 7.29130264083216
         2 | 7.29518558306692
         0 | 7.31718892239724
        12 | 7.32538624489285

And you see that usr.id min/max differs by 0.34% (absolute), while created only differs by 0.20%.

So I now know how to get my users for a given partion:

SELECT
    usr.id
  FROM usr
  WHERE
    (EXTRACT(EPOCH FROM usr.created)::int % 14) = ?

After boring you with some hardcore SQL (and I'm looking forward to any improvements somebody who really knows SQL will surely leave in the comments section), lets go back to..

Perl!!

But how do I figure out which partition to use on a given day? As we're basing the spamming notifications on fortnights, weekdays seem obvious. But there are only 7 days in a week. So lets fiddle a bit with week numbers0.

To get a constant streams of "ones" to "fourteens" that will repeat for each day without gaps or jumps, I just take the day-of-week (a number between 1 and 7); and the current week number modulo1 2 (which will either result in 0 or 1); if week number modulo 2 results in 0 (an even week), keep the day-of-week as it is, if it's 1 (an odd week), add 7.

Here's the code as a one-liner:

perl -MTime::Moment -E 'my $n = Time::Moment->now;
  my $day       = $n->day_of_week;
  my $week      = $n->week;
  my $partition = $day + $week % 2 * 7;
  printf("%s: w%02d d%02d => %02d\n",
    $n->strftime("%F"), $week, $day, $partition)'

The interesting bit is this line:

my $partition = $day + $week % 2 * 7;

which implements what I've described earlier. But with a slight twist: Instead of doing an if on the result of week modulo 2, I just multiply it with 7, which (as week modulo 2 can only result on zero or one) will give me 0 or 7 - just what I need to add to day-of-week2.

Here's a version of the one-liner that will return a set of 30 days starting from now:

 perl -MTime::Moment -E 'my $n = Time::Moment->now;
  for (1 .. 30) {
    my $day       = $n->day_of_week;
    my $week      = $n->week;
    my $partition = $day +  $week % 2 * 7;
    printf("%s: w%02d d%02d => %02d\n",
      $n->strftime("%F"), $week,$day,$partition);
    $n = $n->plus_days(1)
  }'

And here's another version where you can specify the start-date and the numbers of iterations on the command-line3:

 perl -MTime::Moment -E 'my $n = Time::Moment->new(
  year=>$ARGV[0], month=>$ARGV[1],day=>$ARGV[2]);
  for (1 .. $ARGV[3]) {
    my $day       = $n->day_of_week;
    my $week      = $n->week;
    my $partition = $day +  $week % 2 * 7;
    printf("%s: w%02d d%02d => %02d\n",
      $n->strftime("%F"), $week,$day,$partition);
    $n = $n->plus_days(1)
  }' 2011 12 26 42

 2011-12-26: w52 d01 => 01
 2011-12-27: w52 d02 => 02
 2011-12-28: w52 d03 => 03
 2011-12-29: w52 d04 => 04
 2011-12-30: w52 d05 => 05
 2011-12-31: w52 d06 => 06
 2012-01-01: w52 d07 => 07
 2012-01-02: w01 d01 => 08
 2012-01-03: w01 d02 => 09
 2012-01-04: w01 d03 => 10
 2012-01-05: w01 d04 => 11
 2012-01-06: w01 d05 => 12
 2012-01-07: w01 d06 => 13
 2012-01-08: w01 d07 => 14
 2012-01-09: w02 d01 => 01
 2012-01-10: w02 d02 => 02
 2012-01-11: w02 d03 => 03
 2012-01-12: w02 d04 => 04
 2012-01-13: w02 d05 => 05
 2012-01-14: w02 d06 => 06
 2012-01-15: w02 d07 => 07
 2012-01-16: w03 d01 => 08
 2012-01-17: w03 d02 => 09
 2012-01-18: w03 d03 => 10
 2012-01-19: w03 d04 => 11
 2012-01-20: w03 d05 => 12
 2012-01-21: w03 d06 => 13
 2012-01-22: w03 d07 => 14
 2012-01-23: w04 d01 => 01
 2012-01-24: w04 d02 => 02
 2012-01-25: w04 d03 => 03
 2012-01-26: w04 d04 => 04
 2012-01-27: w04 d05 => 05
 2012-01-28: w04 d06 => 06
 2012-01-29: w04 d07 => 07
 2012-01-30: w05 d01 => 08
 2012-01-31: w05 d02 => 09
 2012-02-01: w05 d03 => 10
 2012-02-02: w05 d04 => 11
 2012-02-03: w05 d05 => 12
 2012-02-04: w05 d06 => 13
 2012-02-05: w05 d07 => 14

This shows that we can handle new-years etc correctly, thanks to the ISO-8601 week number4.

I just hope our users like the new spam information!

Updated working code.

So, as spotted by @pndc, if a year has 53 week (eg 2015), the week after week 53 is week 1. So an odd week follows an odd week, which leads to this bug:

 2016-01-02: w53 d06 => 13
 2016-01-03: w53 d07 => 14
 2016-01-04: w01 d01 => 08

pndc suggest using floor(time_t/86400)%14, which in fact works:

 perl -MTime::Moment -E 'my $n = Time::Moment->new(
  year=>$ARGV[0], month=>$ARGV[1],day=>$ARGV[2]);
  for (1 .. $ARGV[3]) {
    my $epoch     = $n->epoch;
    my $partition = ($epoch/86400) % 14;
    printf("%s: %02d\n",
      $n->strftime("%F"), $partition);
    $n = $n->plus_days(1)
  }' 2015 12 26 14

 2015-12-26: 09
 2015-12-27: 10
 2015-12-28: 11
 2015-12-29: 12
 2015-12-30: 13
 2015-12-31: 00
 2016-01-01: 01
 2016-01-02: 02
 2016-01-03: 03
 2016-01-04: 04
 2016-01-05: 05
 2016-01-06: 06
 2016-01-07: 07
 2016-01-08: 08

And also works for 2012/2013:

 perl -MTime::Moment -E 'my $n = Time::Moment->new(
  year=>$ARGV[0], month=>$ARGV[1],day=>$ARGV[2]);
  for (1 .. $ARGV[3]) {
    my $epoch     = $n->epoch;
    my $partition = ($epoch/86400) % 14;
    printf("%s: %02d\n",
      $n->strftime("%F"), $partition);
    $n = $n->plus_days(1)
  }' 2012 12 26 14

 2012-12-26: 06
 2012-12-27: 07
 2012-12-28: 08
 2012-12-29: 09
 2012-12-30: 10
 2012-12-31: 11
 2013-01-01: 12
 2013-01-02: 13
 2013-01-03: 00
 2013-01-04: 01
 2013-01-05: 02
 2013-01-06: 03
 2013-01-07: 04
 2013-01-08: 05

Thanks, and I owe pndc a $favourite_beverage.

0 I "like" it that there are two ways to define week numbers. ISO-8601 (of course the sane one, as most things that start with ISO), which defines the first week of the year as the first week that features a Thursday; and a nameless non-standard (Wikipedia calls it "North American") that defines the first week of the year as the week featuring YYYY-01-01. And yes, I once had to fix a bug caused by this craziness.

1 again - I just love modulo!

2 Further golfing is left as an exercise to the reader.

3 Now I could pack it into a proper script, but...

4 Using the "North American" method, 2012-01-01 return "1" instead of "52", thus it's an odd week, resulting in the partition to be calculated as 14 instead of 7. Which is a bug (in the "standard", not the code).
P.S. Postgres shows this quite "nicely" (if you know about it - never use 'w', always 'iw'!!):
bc. domm=# select to_char('2012-01-01'::date,'w');
1
domm=# select to_char('2012-01-01'::date,'iw');
52

Comments (via disqus)

14.06.2015: Make a playlist quickly
31.05.2015: Porting a very simple script to Perl6
10.05.2015: Things I learned at German Perl Workshop 2015
14.01.2015: Picture-of-the-day helper script
>>>>>>>>>>