Wednesday, June 21

extracting your photos from flickr

Background: I’ve been meaning to delete my flickr account for a while. A recent e-mail stating that all Yahoo accounts would be subject to Verizon’s terms-of-service by the 15th of this month prompted me to actually go ahead and do it. On the off chance this is useful to anyone else, here are a few notes about exporting photo data and working with the results.

Some priors: I run Debian GNU/Linux, and have a DigitalOcean droplet that was convenient for intermediate storage. I wasn’t too concerned about metadata beyond the photos themselves, the titles I’d given them, and what sets they were in. Depending on how you use flickr, you might care about more than that.

☙

I started off by Googling around for a flickr export utility, and after trying a couple of other things I landed on a script called flickrbackup. It’s about 400 lines of Python, and packaged for Debian (and thus also Ubuntu).

Here’s the package description from Debian Stretch as of this writing:

$ apt-cache show flickrbackup
Package: flickrbackup
Version: 0.2-3.1
Installed-Size: 51
Maintainer: Tiago Bortoletto Vaz <tiago@debian.org>
Architecture: all
Depends: python, python-pyexiv2
Description-en: Simple tool to perform a backup of your photos in flickr
 flickrbackup is a simple python script which make a local copy of all your
 pictures hosted in flickr.com.
 .
 It downloads the pictures and organize them using your set names. flickrbackup
 is also able to store title, description, tags and other metadata from flickr
 sets as EXIF data.
Description-md5: ce24e11f393b22430037469a74e4a131
Section: utils
Priority: optional
Filename: pool/main/f/flickrbackup/flickrbackup_0.2-3.1_all.deb
Size: 6420
MD5sum: 0569e3d2513c6dbd519b7b45cef9eed8
SHA256: 1e1cafa542904fa97734939adef77dd7631fb595361c67120440fc2068b889a8

On a Debian system, you can install the script with:

$ apt-get install flickrbackup

Then make a directory for photos:

$ mkdir flickr_dump

And run the script like so:

$ flickrbackup -e -o flickr_dump

The -e tells it to store flickr metadata in Exif tags on the image files, and -o flickr_dump tells it where to stash the photos.

I’m a bit fuzzy on this part and can’t easily replicate it since I already deleted my account, but the script should open a browser where you’ll be prompted to login if necessary and authorize the script to access your account. It stashes credentials in a file called .flickrbackup.frob.cache - I had to get this on my laptop and then copy it over to a system on DigitalOcean to use flickrbackup from there, since attempting to run a browser on the DO box didn’t work.

After that, image files will be downloaded into subdirectories of flickr_dump by set name. Anything not in a set should land in flickr_dump/No Set. This part takes a while.

✦

Next, I used a handful of tools to reorganize things. I’m still deciding how to self-host a photo collection, so this is fairly rough, but after some experimentation I decided to stash the set names for later use and then move everything into directories by date taken.

Before doing anything else, I used the rename utility to lowercase set names, replace spaces with underscores, and remove some extra characters:

$ cd flickr_dump

# spaces to underscores:
$ rename 's/ +/_/g' ./*

# lowercase:
$ rename 'tr/A-Z/a-z/' ./*

# zap parentheses:
$ rename 's/[()]//g' ./*

Next I made a tab-separated file containing set names in the first column and image filenames in the second:

$ find . -iname '*.jpg' | sed 's/\// /g' | awk '{ print $2 "\t" $3; }' > setlist.txt

This finds everything that ends in a .jpg extension, replaces slashes in the path with spaces, and feeds those lines to awk, which by default treats space-separated values as fields that can be referenced by number. (So the line ./sprkfn/12843074003.jpg becomes . sprkfn 12843074003.jpg, and then sprkfn 12843074003.jpg.) I’ll probably convert these to topic tags in my blog system at some point in the future. (It seems safe to just do this by filename because as far as I can tell the filenames are unique. I think.)

With that out of the way, I started messing around with ExifTool. ExifTool is an old-school swiss-army style utility (written in Perl, natch) for slicing and dicing photos based on Exif tags. It can manipulate the data in individual images, as well as organizing sets of files based on values like creation date. I followed the examples here and copied the photos into directories like so:

$ exiftool -o . '-Directory<CreateDate' -d ~/p1k3/files/photos/%Y-%m -r .

For right now, this is the resulting set of folders, by way of some scripting I’m already using to display photos here. It’s pretty clunky, but I’ll improve on it eventually. (At least, that is, if I don’t just delete everything out of hopeless disgust with my small part in creating the emergent panopticon.)

tags: debian, flickr, perl, shell, technical

p1k3 / 2017 / 6 / 21