Thursday, August 10

catenating files in order of modification time, a bad solution

Setting: Linux - a recent Ubuntu. GNU coreutils.

I wanted to join a large set of small logfiles together into a single file, in the order they were originally written. The list of files exceeded ARG_MAX, so cat * > foo would fail with:

bash: /bin/cat: Argument list too long

I knew I’d probably use some sort of find | xargs combo, with NULs instead of newlines because I couldn’t be entirely sure that logfiles would never have spaces or other weirdness in the names.

As usual, there’s a set of StackExchange answers for this. I wound up writing this ridiculous variant:

find . -name '*.log' -printf '%T@ %p\0' |
  sort -nz |
  sed -Ez 's/^[^ ]+ (.*)$/\1/' |
  xargs -0 cat > all

A script with a test and some explanatory comments:

#!/bin/sh

# create some test files:
echo "a" > "a a.log"
echo "b" > "b b.log"
echo "c" > "c c.log"

# print mtime, space, full path to file, separated by NULs:
find . -name '*.log' -printf '%T@ %p\0' |

  # the -z option to GNU sort(1) and sed(1) treats NUL as line delimiter

  # sort lines numerically:
  sort -nz |

  # strip leading timestamp - I'd use cut(1) here but it lacks a -z option:
  sed -Ez 's/^[^ ]+ (.*)$/\1/' |

  # feed filenames, separated by NULs, to cat(1), and
  # redirect output to a file called "all":
  xargs -0 cat > all

cat all

When run, this outputs:

a
b
c

I think this works. It is, no doubt, several kinds of wrong. It does function as a useful illustration of how silly things can get when everything is a string and quoting problems take over an otherwise simple solution. find(1) and xargs(1) really seem to live in the space where classical Unix shell and filesystem approaches expose their sharp edges quickly.

friday, august 4

we're a fire
burning in the fibers
and tissues of the world

linkdump

HyperCard On The Archive (Celebrating 30 Years of HyperCard) | Internet Archive Blogs

⚓ T12961 Assess Impact of CVE-2017-1000117 et al (`ssh://-...` executing code)

When-you-have-a-nice-hat.jpg

Interoperability between IPv6 and IPv4 — "A common estimate of the length of time involved is 10 years - in terms of the history of the Internet, a very long time indeed, but probably a realistic figure in terms of the amount of installed IPv4 software and infrastructure, all of which will need to be replaced or upgraded."

The Early History of the "more" Command

Photo

unifying OS installation and configuration management

Huffy: Owner’s Manual for Multi-Speed Comfort Bicycles — This is a genuine piece of shit bike.

Nate Cull - YouTube — 'I collect and curate 1980s synthpunk. My particular thing is "concept playlists": a playlist that wants to be a concept album.'

Why Github can't host the Linux Kernel Community

Blood Oath (episode) | Memory Alpha — The Klingons in this episode are recurring characters from different episodes of TOS, all played by the original actors.

The Lost Cause Rides Again

Adafruit CircuitPython API Reference — Adafruit CircuitPython 0.0.0 documentation