Monday, April 9

App::WRT v4.3.0: schwartzian transforms, long-term projects

I should have been doing other things, but I spent a couple of hours over the weekend making wrt, the static site generator I use for p1k3.com, a bit more capable.

I decided I wanted to make the feed wrt generates (like this one) contain the most recent n entries instead of just the entries for the most recent month. For example, instead of just rendering a feed with the entries for this April, I wanted it to contain the last 30 days for which I’d written something.

If wrt entries lived in, say, an SQL database of some sort, this would be just a matter of changing a query to get some different ones. Since they’re just flatfiles in a directory tree without a lot of abstractions around them, it was a bit trickier but also more interesting.

Simplified a lot, the wrt repository for this site looks something like this:

/home/brennen/p1k3/
▾ archives/
  ▾ 2018/
    ▸ 1/
    ▸ 2/
    ▸ 3/
    ▾ 4/
      ▸ 5/
      ▸ 8/
      ▸ 9/

The basic idea is that a file 3 deep in the hierarchy of numerical entries—like 2018/4/9 for this entry—represents a day, inside a month, inside a year. If I wanted to put the last 30 entries into the feed, I’d need to flatten this structure out into a sorted list.

I remembered that I was already getting a list of all the entries for the wrt render-all script that renders the whole site at once, so it seemed simple enough to reuse that list, but there was a catch: Doing a simple reversed sort on that list gave me results like these:

...
2016/10/16
2016/10/14
2016/10/12
2016/10
2016/1/5
2016/1/3
2016/1/28
...

…because in a string comparison, 2018/10 follows 2018/1, not 2018/9.

If I’d decided to pad the months with 0s, like 2018/01, a while back, this would have been less of a problem, but it seemed pretty solvable. I just needed to convert the entry paths to a different format and sort by that.

I wound up reading the Wikipedia entry for the Schwartzian transform, and writing something like the following:

sub get_date_entries_by_depth {
  my $self = shift;
  my ($depth) = @_;

  # Match given $depth:
  my @particles;
  for (my $i = 0; $i < $depth; $i++) {
    push @particles, '\d+';
  }
  my $pattern = join '/', @particles;

  # Sort matching entries by sortable_date_from_entry()
  return map  { $_->[0] }
         sort { $a->[1] cmp $b->[1] }
         map  { [$_, sortable_date_from_entry($_)] }
         grep m{^ $pattern $}x, $self->get_all_source_files();
}

sub sortable_date_from_entry {
  my ($entry) = @_;
  my @parts = map { sprintf("%4d", $_) } split '/', $entry;
  return join '', @parts;
}

First, this builds a regular expression to match entries that are at a certain depth in the hierarchy (1 is a year, 2 is a month, 3 is a day).

Then it:

  1. greps the list returned by get_all_source_files() for entries matching the pattern.
  2. maps the matching entries to a list of two-element arrays where the 0th element is the original path to the entry (2018/4/9), and the 1st element is a format returned by sortable_date_from_entry() that will sort correctly using string comparison (201800040009).
  3. Sorts the overall list by comparing the formatted values.
  4. Re-maps the list to the original format stored in the 0th element.

So now, in order to get the list of entries to turn into a feed, I can just call:

my @entries = reverse $self->get_date_entries_by_depth(3);

…and take the first 30 or so.

Once I had the feed done, I decided to apply the same idea to the set of entries on the front page, and once I’d done that I realized that I could also use the same sorted lists to generate next/previous links for any given node in the date tree.

This was an interesting way to kill some time, both because I revisited an algorithm I’d forgotten about, and because every time I hack on a project like this I’m in a dialog with basic decisions I made before I knew how to write software at all. And maybe, by the same token, looking with fresh eyes at norms that I’d take for granted in any more modern context. wrt isn’t a good piece of software by any contemporary standard, and the approach it represents isn’t one I’d use for anything bigger than a trivial shell script at my day job, but there’s a curious durability to it all the same.

Every few years I revisit some facet of this tiny, mundane tool and apply a bit of understanding I lacked when it was first written, and some structure comes a little clearer that lives in the space between my ignorance at 20 and my experience, such as it is, at whatever age I’ve reached.

Everyone should have a few long-term projects, however small and unremarkable.