Wednesday, June 21
extracting your photos from flickr
Background: I’ve been meaning to delete my flickr account for a while. A recent e-mail stating that all Yahoo accounts would be subject to Verizon’s terms-of-service by the 15th of this month prompted me to actually go ahead and do it. On the off chance this is useful to anyone else, here are a few notes about exporting photo data and working with the results.
Some priors: I run Debian GNU/Linux, and have a DigitalOcean droplet that was convenient for intermediate storage. I wasn’t too concerned about metadata beyond the photos themselves, the titles I’d given them, and what sets they were in. Depending on how you use flickr, you might care about more than that.
☙
I started off by Googling around for a flickr export utility, and after trying
a couple of other things I landed on a script called flickrbackup
. It’s
about 400 lines of Python, and packaged for Debian (and thus also Ubuntu).
Here’s the package description from Debian Stretch as of this writing:
$ apt-cache show flickrbackup
Package: flickrbackup
Version: 0.2-3.1
Installed-Size: 51
Maintainer: Tiago Bortoletto Vaz <tiago@debian.org>
Architecture: all
Depends: python, python-pyexiv2
Description-en: Simple tool to perform a backup of your photos in flickr
flickrbackup is a simple python script which make a local copy of all your
pictures hosted in flickr.com.
.
It downloads the pictures and organize them using your set names. flickrbackup
is also able to store title, description, tags and other metadata from flickr
sets as EXIF data.
Description-md5: ce24e11f393b22430037469a74e4a131
Section: utils
Priority: optional
Filename: pool/main/f/flickrbackup/flickrbackup_0.2-3.1_all.deb
Size: 6420
MD5sum: 0569e3d2513c6dbd519b7b45cef9eed8
SHA256: 1e1cafa542904fa97734939adef77dd7631fb595361c67120440fc2068b889a8
On a Debian system, you can install the script with:
$ apt-get install flickrbackup
Then make a directory for photos:
$ mkdir flickr_dump
And run the script like so:
$ flickrbackup -e -o flickr_dump
The -e
tells it to store flickr metadata in Exif tags on the image files,
and -o flickr_dump
tells it where to stash the photos.
I’m a bit fuzzy on this part and can’t easily replicate it since I already
deleted my account, but the script should open a browser where you’ll be
prompted to login if necessary and authorize the script to access your account.
It stashes credentials in a file called .flickrbackup.frob.cache
- I had to
get this on my laptop and then copy it over to a system on DigitalOcean to use
flickrbackup
from there, since attempting to run a browser on the DO box
didn’t work.
After that, image files will be downloaded into subdirectories of flickr_dump
by set name. Anything not in a set should land in flickr_dump/No Set
. This
part takes a while.
✦
Next, I used a handful of tools to reorganize things. I’m still deciding how to self-host a photo collection, so this is fairly rough, but after some experimentation I decided to stash the set names for later use and then move everything into directories by date taken.
Before doing anything else, I used the rename
utility to lowercase set names,
replace spaces with underscores, and remove some extra characters:
$ cd flickr_dump
# spaces to underscores:
$ rename 's/ +/_/g' ./*
# lowercase:
$ rename 'tr/A-Z/a-z/' ./*
# zap parentheses:
$ rename 's/[()]//g' ./*
Next I made a tab-separated file containing set names in the first column and image filenames in the second:
$ find . -iname '*.jpg' | sed 's/\// /g' | awk '{ print $2 "\t" $3; }' > setlist.txt
This finds everything that ends in a .jpg
extension, replaces slashes in the
path with spaces, and feeds those lines to awk
, which by default treats
space-separated values as fields that can be referenced by number. (So the
line ./sprkfn/12843074003.jpg
becomes . sprkfn 12843074003.jpg
, and then
sprkfn 12843074003.jpg
.) I’ll probably convert these to topic tags in my
blog system at some point in the future. (It seems safe to just do this by
filename because as far as I can tell the filenames are unique. I think.)
With that out of the way, I started messing around with ExifTool. ExifTool is an old-school swiss-army style utility (written in Perl, natch) for slicing and dicing photos based on Exif tags. It can manipulate the data in individual images, as well as organizing sets of files based on values like creation date. I followed the examples here and copied the photos into directories like so:
$ exiftool -o . '-Directory<CreateDate' -d ~/p1k3/files/photos/%Y-%m -r .
For right now, this is the resulting set of folders, by way of some scripting I’m already using to display photos here. It’s pretty clunky, but I’ll improve on it eventually. (At least, that is, if I don’t just delete everything out of hopeless disgust with my small part in creating the emergent panopticon.)