Tate: An iPhoto Gallery Exporter in Perl

The Perl Journal March, 2005

By Simon Cozens

Simon is a freelance programmer and author, whose titles include Beginning Perl (Wrox Press, 2000) and Extending and Embedding Perl (Manning Publications, 2002). He's the creator of over 30 CPAN modules and a former Parrot pumpking. Simon can be reached at simon@ simon-cozens.org.

Recently I took a photography course, and as a result I've gotten into taking a lot of photographs. I put some of them on my web site for public view, but this has always been a tedious exercise: I choose the album in iPhoto, then have it generate the scaled images, thumbnails, and HTML for the album, and then upload the whole gallery to my server via scp. Then I need to remember to update the page that contains the index of all the galleries—which is where I usually fail. I looked at a few other gallery programs, but none of them quite did what I wanted; that is, produce nice pages while minimizing the steps required to go from iPhoto to a working web site.

A recent server crash and upload gave me the opportunity to think about how I would do things in an ideal world. I would mark in iPhoto the albums that I wanted to upload, and a program would generate the appropriate HTML and images, use the iPhoto album comments as a description for each album and put together a front page, and then upload everything that needed to be uploaded to the server.

I soon found myself writing the piece of pseudocode in Figure 1. After that, writing the software wasn't very challenging at all. From my initial design, which was really only useful for my own site, I've developed a handy little iPhoto-to-web utility with a slightly more general application. Rather than take you through exactly how I built the new tool, which I've called "Tate," I'll look at some of the things that were nonobvious, and then at what I plan to do with Tate in the future.

Interacting with iPhoto

The first problem, and in fact the hardest part of writing Tate, was working out which albums to select, extracting the comments from them, and finding out which photos are in the album. I started by grabbing brian d foy's "iphoto" program from CPAN: This uses Mac::Glue to talk to iPhoto's scripting interface. I hacked around with the bits of it that I needed, to produce:

my $iPhoto = Mac::Glue->new( "iPhoto" );
my $albums = $iPhoto->prop( "albums" );
my @albums = map { 
  $albums->obj(item => $_) 
} 1..$albums->count;
for my $album (@albums) {
    my $name   = $album->prop( "name" )->get;
    my $photos = $album->prop( "photos" );
    my $count  = $photos->count;
    my @photos = map {
      $photos->obj(item => $_) 
    } 1..$photos->count;
    # ...
}

This gives us an array of photo albums and allows us to get at the object representing each photo in the album. There are only two problems with this: The first is that we don't necessarily want to publish all the albums in the library; the second is that we also need to use the album's comment as a description and, annoyingly, that isn't accessible using the scripting interface.

In fact, after closely examining file modification times while changing album comments in iPhoto, I determined that the file that contains the comments is Pictures/iPhoto Library/Library.iPhoto. This is a binary file (an Objective C "typedstream" file) that I couldn't work out how to read. In desperation, I first found that each album's data contained the string \223\231\223\204\232 at most once, and I found that the album's name was the first string of printable characters in the record.

But the position of the album's description was not obvious. I spent quite a while on it, but couldn't work out how to extract it reliably. So in the end, I used the "canary" technique. A canary is a known piece of data in an unknown location. So, as the user, I make sure that I add the string "Export:" to the comments of albums I wish to publish. This means that when I see "Export:" in the binary file, I can grab everything printable after that string and use that as the album comment; see Example 1.

Now we have a hash linking album names to descriptions; when we're iterating through the albums using Mac::Glue, we now only need to look at albums that have an entry in this hash.

When we're looking at individual photos, we can ask iPhoto for various properties, as in Example 2.

This is all the information we need from iPhoto; now that we know where the images and thumbnails live, we can work on them directly.

Resizing and Filtering

And what work on them we need to do! Generally, the original photos in iPhoto are extremely large, so we need to resize them for web use. The Imager module is what you need for this. Imager is an XS module that binds a consistent interface to all the appropriate graphics libraries on the system. So, if we have libjpeg installed, we can use it to read, scale, and write JPEG images—exactly what we need for our gallery.

Imager is very easy to use, with a nice, simple OO interface:

my $maxwidth = 800;

Imager->new
      ->open(file => $image, type => "jpeg")
      ->scale(xpixels => $maxwidth)
      ->write(file => $image, type=>"jpeg");

We can even add another step in there for thumbnails: When you're scaling down an image dramatically, you lose some of the sharpness. Imager has a Photoshop-like "unsharp mask" filter (and many other filters besides) that we can apply to make those images look crisper:

Imager->new
      ->open(file => $image, type => "jpeg")
      ->scale(xpixels => $maxwidth)
      ->filter(type=>'unsharpmask', 
               scale=>0.2, stddev=>1.5)
      ->write(file => $image, type=>"jpeg");

Imager is a handy Swiss Army Knife for throwing images around programmatically and it gives us just what we need for resizing and sharpening our gallery images.

Incidentally, we expect to be running Tate every time we've got some new photographs to go up on the Web, which means we don't want to be resizing and filtering every single image every time. Instead, we keep the output photos around and just test to see if the photo is newer than the one we created last time:

if (!-e $album_dir."/".$image_name
   or (stat($image_path))[9] > 
   (stat($album_dir."/".$image_name))[9]){
     copy($thumb_path, $thumb_dir."/".$thumb_name);
     copy($image_path, $album_dir."/".$image_name);
     # Do the Imager stuff
}

Templating and Providing Information

So now we know where to find our images and we have them in the form we want them, the next stage is to produce the HTML to go around them. There are three types of pages we want to generate: the individual pages for each picture, the page full of thumbnails for each gallery, and the title page, which is the list of all the galleries.

When I first wrote Tate, I had it create the pages by printing individual snippets of HTML. As so often happens, I ended up quickly becoming confused, fighting through a file written in two different languages, and trying to remember which bits of HTML belonged where. It was more out of frustration than a drive for customizability that lead me to template out the HTML generation to separate template files. But when I'd done this, I realized it made Tate much more useful; it means the end-user doesn't have to put up with my crazy ideas of what a photo gallery should look like and can completely redesign it.

So, for instance, with the main index page, we build up an array containing useful data for each album:

for my $album (@albums) {
    # ... 
    push @album_data, {
          name => $name,
          photos => $count,
          comment => $comment{$name},
          thumb_url => process_photos($name, 
                                      $photos)
    };
}

And then send the whole thing off to be templated:

my $template = Template->new(
  {INCLUDE_PATH => TEMPLATES}
);
$template->process("album_index.tt", 
                  { albums => \@album_data },
                  STAGING_AREA."/index.html") 
    or die $template->error;

The page for an individual gallery is similar, but for individual photos, we can do something a bit special.

Satisfying Geek Photographers

I'm trying to learn to take better pictures, and much like any other discipline, the only way to improve is to practice, to look at what went right and what went wrong, and to try to learn how to do it better next time. So when I'm browsing through some of my pictures, it's helpful to know how the camera was set up when the photo was taken: the exposure settings, aperture setting, whether the camera was in shutter or aperture priority, the zoom, the flash, and so on. Thankfully, modern cameras embed all this information in the form of "EXIF tags" into the pictures that they take. Image::Info, a generic interface to all kinds of information stored in graphic file headers, can read these tags.

I was planning to use Image::Info anyway because it gives you the size of the image, from which you can create the appropriate height and width attributes for the IMG tags. To do this, we create an Image::Info object for the photo, and pass it in to the template:

my $info = image_info($image_path);

my $page_data = {
    album => $name,
    title => $title,
    comment => $comment,
    file => $image_name,
    info => $info
};

But then, I also read in the Image::Info documentation that it lets me get at the EXIF tags. So if I put something like this in the template:

Taken: [% info.DateTime %] <br>
Camera: [% info.Make %] [% info.Model %] <br>
Exposure: [%info.ExposureMode%], 
          [% info.ExposureProgram %], 
          f[% info.FNumber %], 
          [%info.ExposureTime%]", 
          ISO [%info.ISOSpeedRatings%],
          [%info.MeteringMode%] metering<br>
Flash: [% info.Flash%]<br>

I can get a handy source of information about how the shot was taken.

Distributing Templates

The version of Tate that I've released uses a little trick to ensure that you have the templates without making it complicated to install. I just wanted Tate to be a single file, which you download and run; having to actually install it the MakeMaker way and then install the templates somewhere globally so that they could be accessed by all users who wanted to use Tate just seemed like overkill.

So, after my frustration made me rip the HTML code out of the program, a different frustration made me finally put them back in, but this time, each template in its own Perl subroutine:

sub photo_page_template{ <<'EOF'
<html>
   <head>
    ...
EOF
}

(I used this as a cheap-and-cheerful alternative to using Inline::Files to store multiple data files in the same program.)

When Tate starts, it checks to see if you have the templates installed. If you don't, it creates ~/Application Support/Tate/Templates and writes the three templates into it:

open OUT, ">".TEMPLATES."image.tt";
print OUT photo_page_template();
close OUT
# ...

The templates are still separate blocks of data from the program, so we still maintain separation of code and data. They still turn out as separate files in the filesystem so that you can customize them, but now we only have a single file for the user to download and run.

Uploading with rsync

Now we have a set of HTML files, photographs, and thumbnails arranged in a "staging area." Each album is in its own directory, and the thumbnails in subdirectories off them. The nice thing about this arrangement is that it's precisely the directory layout we're going to want on the web server. All we need to do is get the whole file tree up to the server, and we're done.

Of course, there's a dumb way to do it: I can just transfer every file up to the server. But as I mentioned earlier, I expect Tate to be run reasonably often, every time I have new photography to go up on the Web. Most of the old galleries won't change, so there's no sense sending all the files up again—I only want to upload files that have changed.

The rsync protocol, by Andrew Tridgell and Paul Mackerras, is a good solution to this problem. It exchanges lists of file information to determine which files have been changed between a server and a client, and then sends the necessary files across. If two files differ, the server chunks the file up into pieces, and only sends across the pieces that differ. This means that it's an efficient way of mirroring large sets of files that don't change very much. In fact, it's often used for mirroring CPAN, since using rsync means only the new modules get pushed out to the mirror sites, rather than everything, every night.

Best of all, there's a Perl implementation of the rsync protocol (File::RsyncP), we can use to transfer files to the web server. The idea is that you use your normal way of connecting to the server—I'm going to log in via ssh—and then execute the rsync command. Then File::RsyncP will talk with the remote rsync and exchange files.

After all that explanation, making it work is actually stunningly simple; see Example 3.

The "staging area" is where we've built our static tree of galleries; the destination is where we want to put it all on the remote filesystem.

Then we create a File::RsyncP object, telling it how to connect to the server and what to run, tell the object that it is sending to a particular destination, point it at the staging area, and tell it to go. Our beautiful photographs have made their way from the camera, to iPhoto, to a staging area on the local drive, and now finally onto the web server, for all the world to see!

Future Plans

Tate is available for download from http://simon-cozens.org/ programmer/tate.html. At some point in the future, I plan to use it to learn to program the CamelBones toolkit; CamelBones is a binding to the OS X Cocoa libraries to create a native, GUI OS X application written in Perl. When I do, I shall write about it here!

TPJ