Sunday, April 5, 2020

wrt 7.0.0

Links:

It’s been nearly a year since I released a version of wrt, the tool I use for publishing this site from a collection of flat files. I hacked on it for a while late in 2019, and got somewhere in the neighborhood of a 7.0.0 release before getting sidetracked by illness, a fried computer, and holiday travel.

I checked on the state of the code last night and realized I’d left a bunch of changes dangling and had mostly lost track of the mental state I’d built up around my plans. I even had a release blog post mostly written. I went ahead and cleaned up a few obvious loose ends and published a release, which I’ll now attempt to describe.

new features

Minor stuff: There’s some refactoring, improvement here and there of how things outside of ASCII are handled, and probably a slightly better test suite (it’s still abysmal, though).

title extraction and entry caching

I decided a while ago that wrt should know what an entry’s title is, so that it can be used to do things like populate <title> tags, display navigation links for each entry, or generate an index for a site. I was already doing some of those things, on an ad hoc basis, but I wanted a general solution. Before this version, an entry like today’s would have been made up of the following files:

  • archives/2020/4/5/index
  • archives/2020/4/5/tag-wrt.prop
  • archives/2020/4/5/tag-technical.prop
  • archives/2020/4/5/tag-perl.prop

Where index contains the body of the entry for the 5th, and tag-wrt.prop says that the entry has been tagged “wrt”. The .prop extension indicates a “property”, and right now it just represents a boolean or a flag - either an entry has a property or it doesn’t.

I considered adding values to properties, based on the contents of the file, and then using title.prop to specify an entry’s overall title. So, for example, 2020/4/5/title.prop would have contained the string “App::WRT 7.0.0 …”.

It was easy to implement this, and it worked, but I wasn’t happy with it as a user. I like to change entry titles as I’m writing, and I sometimes have more than one top-level heading, or a set of subheadings in an entry that I’d like the title logic to capture. I’ve also never bothered teaching wrt to display any kind of a page / date header separately from the text of an entry, and entry titles are typically just represented with inline header tags. It seemed weird to duplicate the title into another file.

Since keeping titles in separate files is cumbersome, the other obvious option is getting them out of the body of the entry itself. wrt now does this by rendering the HTML for every entry in the archive and parsing it with a library called Mojo::DOM, then extracting the text of tags <h1> through <h6> into a title cache which can be queried later.

Out of laziness, I started adding this feature by storing the rendered HTML for each entry in memory, and accidentally discovered that by doing so I can avoid rendering most entries at least twice - once for an individual date and once for the display of every entry in a month, with a handful additionally showing up on the index page and in feeds.

As a downside, this is really slow for an operation like rendering a single entry. But at least displaying an entry can reference data extracted from all the other entries.

I feel a bit queasy about loading thousands of blog entries into memory at once in order to display any given one of them. But in thinking about it, I’m pretty sure it would have worked fine even on the machine I used to write the first version of wrt (originally called display.pl), circa 2001. In 2019 I guess I don’t really have a problem assuming that the systems I use for this will have at least half a gig of RAM. It would probably be good if wrt adjusted its behavior for really constrained environments, but my gut says that even a low end laptop or cheap shared hosting shouldn’t be too affected by this.

a tagging system

I’ve been using, as mentioned above, property files named like tag-foo.prop to add tags to p1k3 entries and display them on a topic index. This was partially supported (if undocumented) in wrt, but mostly made up of ad hoc stuff in the Makefile that generates p1k3.

Although it’s still not really documented and probably has lingering issues, this release of wrt now fully supports a similar scheme, where the filenames become something like:

  • archives/2020/4/5/indexindex
  • archives/2020/4/5/tag-wrt.proptag.topics.wrt.prop
  • archives/2020/4/5/tag-technical.proptag.topics.technical.prop
  • archives/2020/4/5/tag-perl.proptag.topics.perl.prop

A property file starting with tag is treated as a link between the entry containing it and another entry path with dots as directory separators, so tag.topics.wrt.prop tags /2020/4/5 as related in some way to /topics/wrt. If /topics/wrt exists in the archive, it’ll be rendered like usual followed by a list of tagged entries. If it doesn’t exist, it’s treated as a “virtual” entry and the tag list still renders.

This is kind of confusing, but it allows for an arbitrary number of user-defined tagging schemes.

json feed output

wrt 7 uses JSON::Feed to output JSON Feed data in addition to Atom feeds.

I’m not really sure how many feedreaders support this format, but it was relatively painless to implement, and at least NewsBlur seems to handle it.

a repl for debugging

wrt repl in a repository root will now yield a simple commandline where you can interactively inspect the App::WRT object. Handy for development purposes, more than anything.

breaking changes

I removed entry_map from configuration and hardcoded its assumptions about how entries are laid out. This is a major change if you were using it, but I’d be even more surprised if anyone had been than I already would be if anyone were using wrt in the first place. (As always, if I’m wrong, please do let me know.)

I got rid of the embedded_perl toggle, since turning it off would have broken templates. (The underlying embedded Perl feature is still in place, though I may deprecate it in future. It really shouldn’t be used for anything besides templates.)

The old (undocumented) tagging system has been ripped out and replaced, as described above.

Since it uses Mojo::DOM to parse the HTML of rendered entries, wrt will now issue warnings for parsing errors. For the most part, I don’t think this will break anything, but it may surface stuff like character encoding issues. It led to me noticing that I had some 20-year-old entries originally written in… Well, something that definitely wasn’t UTF-8, at any rate.

future work / observations

Apart from improving and fully documenting the tagging system, I’d like to spend some time making sure wrt could actually be used by someone else without the scaffolding and assumptions built into the one site where I routinely use it. My thought right now is to build a manual published with wrt itself. We’ll see how that goes, I guess.

In some ways this release feels a little shaky. It’s got ideas in it that deviate from the stark simplicity of most of this code’s history, and it brings the total of external library dependencies to 16, at least a couple of which are non-trivial. Mojo::DOM in particular makes me a bit nervous.

On the other hand, it adds a couple of things I’ve wanted for years, and some of the underlying changes are a good foundation for solving the problems that remain. I continue to think of wrt as both a format for storing writing and a concrete implementation of a tool for publishing that format. For what they are, I’m happy with both.

(Elsewhere: I’m thinking hard about how I take notes and conduct research, how doomed the web generally feels as a platform, and what language ecosystems I want to spend my remaining time as a programmer in. All of that might influence future extensions to the wrt format, or lead to implementations in something besides Perl. Time will tell.)

p1k3 / 2020 / 4 / 5
tags: topics/perl, topics/technical, topics/wrt