Sunday, April 5, 2020
wrt 7.0.0
Links:
It’s been nearly a year since I released a version of wrt, the tool I use for
publishing this site from a collection of flat files. I hacked on it for a
while late in 2019, and got somewhere in the neighborhood of a 7.0.0 release
before getting sidetracked by illness, a fried computer, and holiday travel.
I checked on the state of the code last night and realized I’d left a bunch of
changes dangling and had mostly lost track of the mental state I’d built up
around my plans. I even had a release blog post mostly written. I went ahead
and cleaned up a few obvious loose ends and published a release, which I’ll now
attempt to describe.
new features
Minor stuff: There’s some refactoring, improvement here and there of how
things outside of ASCII are handled, and probably a slightly better test suite
(it’s still abysmal, though).
title extraction and entry caching
I decided a while ago that wrt should know what an entry’s title is, so that it
can be used to do things like populate <title>
tags, display navigation links
for each entry, or generate an index for a site. I was already doing some of
those things, on an ad hoc basis, but I wanted a general solution.
Before this version, an entry like today’s would have been made up of the
following files:
archives/2020/4/5/index
archives/2020/4/5/tag-wrt.prop
archives/2020/4/5/tag-technical.prop
archives/2020/4/5/tag-perl.prop
Where index
contains the body of the entry for the 5th, and tag-wrt.prop
says that the entry has been tagged “wrt”. The .prop
extension indicates a
“property”, and right now it just represents a boolean or a flag - either an
entry has a property or it doesn’t.
I considered adding values to properties, based on the contents of the file,
and then using title.prop
to specify an entry’s overall title. So, for
example, 2020/4/5/title.prop
would have contained the string “App::WRT
7.0.0 …”.
It was easy to implement this, and it worked, but I wasn’t happy with it as a
user. I like to change entry titles as I’m writing, and I sometimes have more
than one top-level heading, or a set of subheadings in an entry that I’d like
the title logic to capture. I’ve also never bothered teaching wrt to display
any kind of a page / date header separately from the text of an entry, and
entry titles are typically just represented with inline header tags. It seemed
weird to duplicate the title into another file.
Since keeping titles in separate files is cumbersome, the other obvious option
is getting them out of the body of the entry itself. wrt now does this by
rendering the HTML for every entry in the archive and parsing it with a library
called Mojo::DOM, then extracting the text of tags <h1>
through <h6>
into
a title cache which can be queried later.
Out of laziness, I started adding this feature by storing the rendered HTML for
each entry in memory, and accidentally discovered that by doing so I can avoid
rendering most entries at least twice - once for an individual date and once
for the display of every entry in a month, with a handful additionally showing
up on the index page and in feeds.
As a downside, this is really slow for an operation like rendering a single
entry. But at least displaying an entry can reference data extracted from
all the other entries.
I feel a bit queasy about loading thousands of blog entries into memory at once
in order to display any given one of them. But in thinking about it, I’m
pretty sure it would have worked fine even on the machine I used to write the
first version of wrt (originally called display.pl), circa 2001. In 2019 I
guess I don’t really have a problem assuming that the systems I use for this
will have at least half a gig of RAM. It would probably be good if wrt adjusted
its behavior for really constrained environments, but my gut says that even a
low end laptop or cheap shared hosting shouldn’t be too affected by this.
a tagging system
I’ve been using, as mentioned above, property files named like tag-foo.prop
to add tags to p1k3 entries and display them on a topic index. This
was partially supported (if undocumented) in wrt, but mostly made up of ad hoc
stuff in the Makefile
that generates p1k3.
Although it’s still not really documented and probably has lingering issues,
this release of wrt now fully supports a similar scheme, where the filenames
become something like:
archives/2020/4/5/index
→ index
archives/2020/4/5/tag-wrt.prop
→ tag.topics.wrt.prop
archives/2020/4/5/tag-technical.prop
→ tag.topics.technical.prop
archives/2020/4/5/tag-perl.prop
→ tag.topics.perl.prop
A property file starting with tag
is treated as a link between the entry
containing it and another entry path with dots as directory separators, so
tag.topics.wrt.prop
tags /2020/4/5
as related in some way to /topics/wrt
.
If /topics/wrt
exists in the archive, it’ll be rendered like usual followed
by a list of tagged entries. If it doesn’t exist, it’s treated as a
“virtual” entry and the tag list still renders.
This is kind of confusing, but it allows for an arbitrary number of
user-defined tagging schemes.
json feed output
wrt 7 uses JSON::Feed to output JSON Feed data in
addition to Atom feeds.
I’m not really sure how many feedreaders support this format, but it was
relatively painless to implement, and at least NewsBlur
seems to handle it.
a repl for debugging
wrt repl
in a repository root will now yield a simple commandline where you
can interactively inspect the App::WRT
object. Handy for development
purposes, more than anything.
breaking changes
I removed entry_map
from configuration and hardcoded its assumptions about
how entries are laid out. This is a major change if you were using it, but I’d
be even more surprised if anyone had been than I already would be if anyone
were using wrt in the first place. (As always, if I’m wrong, please do let me
know.)
I got rid of the embedded_perl
toggle, since turning it off would have broken
templates. (The underlying embedded Perl feature is still in place, though I
may deprecate it in future. It really shouldn’t be used for anything besides
templates.)
The old (undocumented) tagging system has been ripped out and replaced, as
described above.
Since it uses Mojo::DOM to parse the HTML of rendered entries, wrt will now
issue warnings for parsing errors. For the most part, I don’t think this
will break anything, but it may surface stuff like character encoding issues.
It led to me noticing that I had some 20-year-old entries originally written
in… Well, something that definitely wasn’t UTF-8, at any rate.
future work / observations
Apart from improving and fully documenting the tagging system, I’d like to
spend some time making sure wrt could actually be used by someone else without
the scaffolding and assumptions built into the one site where I routinely use
it. My thought right now is to build a manual published with wrt itself.
We’ll see how that goes, I guess.
In some ways this release feels a little shaky. It’s got ideas in it that
deviate from the stark simplicity of most of this code’s history, and it brings
the total of external library dependencies to 16, at least a couple of which
are non-trivial. Mojo::DOM in particular makes me a bit nervous.
On the other hand, it adds a couple of things I’ve wanted for years, and some
of the underlying changes are a good foundation for solving the problems that
remain. I continue to think of wrt as both a format for storing writing and a
concrete implementation of a tool for publishing that format. For what they
are, I’m happy with both.
(Elsewhere: I’m thinking hard about how I take notes and conduct research, how
doomed the web generally feels as a platform, and what language ecosystems I
want to spend my remaining time as a programmer in. All of that might
influence future extensions to the wrt format, or lead to implementations in
something besides Perl. Time will tell.)
tags: topics/perl, topics/technical, topics/wrt
p1k3 /
2020 /
4 /
5