Monday, July 11

extracting all (?) of the filenames from packages available in debian

I’m thinking about renaming a command-line utility, and I want to pick something that isn’t already taken. I decided that getting a list of the command names available in Debian packages, maybe with the addition of some lists like Wikipedia’s List of Unix commands, would be a decent start at this.

I thought that maybe apt-cache could give me what I wanted, but a quick look at the man page wasn’t that helpful. Google brought me to this writeup by Kevin van Zonneveld on apt-file, which lets you search for packages by filenames they contain. Something like so:

$ sudo apt-get install apt-file
$ apt-file update
$ apt-file search /some/path

This will show you packages which provide files containing /some/path. It’s supposed to take patterns (Perl regexps? The usual grep flavor? POSIX?), but I’m not quite sure whether I ever got it to just print all filenames.

Eventually I figured out that it keeps a cache of filenames in ~/.cache/apt-file/:

$ cd ~/.cache/apt-file && ls
ftp.us.debian.org_debian_dists_jessie_contrib_Contents-amd64.gz
ftp.us.debian.org_debian_dists_jessie_main_Contents-amd64.gz
ftp.us.debian.org_debian_dists_jessie_non-free_Contents-amd64.gz
ftp.us.debian.org_debian_dists_jessie-updates_main_Contents-amd64.gz

These have some commentary at the top, followed by data in the following format:

FILE                                                    LOCATION
bin/ash                                                 shells/ash
bin/bash                                                shells/bash
bin/bash-static                                         shells/bash-static
bin/bsd-csh                                             shells/csh
...

Here is a dumb command to get the names of commands from these files:

zcat ~/.cache/apt-file/*.gz | \
egrep '^(usr/bin/|sbin/|bin/)' | \
cut -f1 -d' ' | \
perl -pe 's/^(.*)\/(.*)$/$2/' | \
sort | uniq > used_names.txt

Here is the resulting list: used_names.txt.

more: used_names.txt