Announcing InfluxDB::LineProtocol

tl;dr: New on CPAN: InfluxDB::LineProtocol - Write and read InfluxDB LineProtocol

We recently moved from Graphite / Whisper to InfluxDB to store various stats generated by our applications, mostly because we were unhappy with whisper. And InfluxDB is new and shiny, so it must be good!

Well, it turns out the InfluxDB is in fact quite good. But they recently released version 0.9 and changed the way you write data quite drastically. Instead of an JSON API, you now use something they call a Line Protocol, where you send one sort-of-simple line of plain text per thing you want to measure:

metric,tag=value,another_tag=foo value=>42,another_value=>1 $timestamp

One nice benefit of this method is that you can pack several (a lot!) of these lines into one batch and send them to InfluxDB in one request.

Unfortunately for us, the InfluxDB module on CPAN does not support the new API.

Fortunately for you, I wrote a generator and parser and put it on CPAN as InfluxDB::LineProtocol. InfluxDB::LineProtocol just converts Perl data into the Influx format, and back again. You will need to use something else to actually send the line(s), for example Hijk

Example

use InfluxDB::LineProtocol qw(data2line line2data);
use Hijk;

my $line = data2line(
    'into_the_past',
    { jigawatts => 1.21 },
    { source => 'Plutonium', target => 1955 }
);

my $res = Hijk::request(
    method       => 'POST',
    host         => 'localhost',
    port         => 8086,
    path         => "/write",
    query_string => "db=timetravel",
    body         => $line
);

This would record a measurement named 'into_the_past' and a value named 'jigawatts' valued 1.21. The last hash passes some tags to InfluxDB which you can later use to query your data. The current timestamp is added automatically (unless you pass it as a fourth value, maybe because you want to fill your InfluxDB with historical data.

Parsing?

But why would I want to parse an InfluxDB line?

Well, we generate a lot of lines1, so we offload the sending of the lines to an external program (more on that in some upcoming blog posts). We have one instance of this program running on each host, collecting data from various services running there. This program adds some host-specific info (hostname, data center, etc) to each line. To do this, we need to parse the line, add some data, and generate a new one2:

sub add_tags_to_line {
    my ( $self, $line ) = @_;

    my ( $measurment, $values, $tags, $timestamp ) = line2data($line);
    my $combined_tags = { %$tags, %{ $self->tags } };
    return data2line( $measurment, $values, $combined_tags, $timestamp );
}

Profit!

Since deploying InfluxDB::LineProtocol, I was already able to find one nasty bug that was happening zero to three times a day, involving some extreme long requests3. Yay!

1 Measure Everything tends to do that..

2 Yes, I did consider using something like Sereal to dump the data structure, send the dump to the collector, add some data, and only encode it as InfluxDB once. But I wanted to keep things simple in the beginning (premature optimization etc).

3 Ok, I could have also found the bug by just looking at the log files, but the stats helped me locate the correct time frame in the log files...