The Perl Journal June 2003
It ought to have been very simple. I needed to produce a newsletter. The content was going to be created offline, and then uploaded to a processing script that was going to distribute it electronically. And Perl was going to help me.
It ought to have been very simple, but like so many things, it ended up being a little more complex than that. Complex enough, I hope, that there are one or two things involved in the process of creating the newsletter that we can all learn from.
The first idea I had was to produce a newsletter in HTML, which people could look at on the Web, or print off and distribute to less Internet-aware friends. And because I hate producing HTML by hand, I used the Template Toolkit to template it out. Let's begin by looking at how I did that.
Andy Wardley's Template Toolkit is a fantastically useful suite of Perl modules that implement a parser and interpreter for a little templating language. Templating languages are most often used to fill values computed by a program into some text. For instance, we could have a template like this:
[% today %] [% title %] [% forename %] [% surname %] [% address %] Dear [% title %] [% surname %], Thank you for your letter dated [% their_date %]. This is to confirm that we have received it and will respond with a more detailed response as soon as possible. In the mean time, we enclose more details of ...
We tell Template Toolkit what the various values of today, title, and so on ought to be, and it fills out the template.
Of course, as we're about to find out with our newsletter project, things that start out nice and simple have a way of getting bigger and more complex. Template Toolkit supports a lot more than just filling scalars into a form: It has support for arrays, hashes and objects, the ability to include templates inside templates, declare macros, run blocks multiple times, filter text through various functions, and much more. Thankfully, we're only going to use a small amount of this functionality in the newsletter.
The HTML page we're constructing is slightly tricky. It uses CSS to lay out text in three columns. We'll have a header, a column describing generally what the newsletter is about, a main column of news, and then a further column of other informationhow to get in touch with me, and so on. At the bottom, we'll put some information about how to make sure people have the latest edition of the newsletter.
The top of the HTML is static, so we pass that out to a separate file. Let's assume we have a file called "Head" that contains all the HTML header and the initial <body> tag.
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>Simon's Newsletter</title> ... </head> <body> <div class="box-wrap">
Similarly, we have a static "Foot" file:
</div> </body> </html>
Now we can forget about most of the mucky business of HTML and concentrate on the content:
[% INCLUDE Head %] <p> My newsletter will appear here. </p> [% INCLUDE Foot %]
We can process this with the Template Toolkit using the following bit of Perl:
use Template;
my $template = Template->new();
$template->process("newsletter.thtml");
(I use the extension "thtml" to remind myself that this is templated HTML.)
With that little program, the generated HTML page gets spat out to standard output. Great. In fact, Template Toolkit comes with a handy little utility called tpage, which is functionally equivalent to our Perl program above. You can just say tpage newsletter.thtml, and Template Toolkit will process the template in the same way.
So far so good. But of course, we haven't used any template variables yet. Let's add some now, by giving the newsletter an issue number and date:
[% INCLUDE Head %] <div class="box-header"> <h1 align="center">Simon's Newsletter</h1> <p align="right"><I>Issue [% issue %] - [% date %]</I></p> </div> <p> My newsletter will appear here. </p> [% INCLUDE Foot %]
Now we need a way to tell the template what the values of these variables are. We do this by passing in a hash reference as the second parameter of process.
use Template;
my $template = Template->new();
my $vars = {
issue => 1,
date => scalar localtime
};
$template->process("newsletter.thtml", $vars);
Once again, the newsletter will be produced on standard output. As it happens, we can still use tpage, even if we're adding template variables. We can say:
tpage define issue=1 date="May 17th" newsletter.thtml
tpage is a very handy tool for prototyping your templates in the way we're doing here.
Now we have our header out of the way. Let's move on to our three columns. We'll handle them in order of complexity. The left-hand column is just static text, so we can dispose of that trivially:
<div class="column-two"> <div class="column-two-content"> <h2>WEC Trek to Japan</h2> [% INCLUDE trek %] </div> </div>
The right-hand column will be partially static text, but will also contain an array of brief news items. I'm going to omit all the div and other extraneous tags for the time being so that we can concentrate on the content.
<h2>In brief</h2> <ul> [% FOREACH point = brief %] <li>[% point %]</li> [% END %] </ul> < h2>Contact Details</h2> [% INCLUDE contact %]
brief is going to be an array of pieces of news. Just like in Perl, we use FOREACH to iterate over that array; the template code is equivalent to this in Perl:
for my $point (@brief) {
This makes the local variable point contain the text for each news item.
The middle column is very similar, but slightly more complex. We'll have a number of more substantial news items, separated by horizontal lines. These will be passed in as an array of hash references, and we let Template do the work of sorting it all out. Here's what the template looks like for the news column.
<h2 align="center">News</h2> [% FOREACH item = news %] <h3>[%item.title%]</h3> [% item.content %] <P ALIGN="right"><I>- [%item.when%]</I></P> <BR> <HR WIDTH="80%" ALIGN="center"> <BR> [% END %]
What this says is that it expects an array called "news," and will iterate over the array (that's the familiar FOREACH), putting each element in a temporary variable called item. item will itself be a hash reference, and we extract the elements called title, content, and when from it.
Template's dot operator is a little like Perl 5's arrow operator (and Perl 6's dot operator), minus the worry about brackets: It can be used to retrieve elements from hashes or arrays and also call methods on objects. Template Toolkit knows how to look at item and do the right thing with itif we put an object inside our news array with when, content, and title methods, we'd get the same results.
This works well, but there's a little bit of a bug: We want the items separated by lines, but it looks slightly ugly to have a line right at the end after the last item. So we tell Template to output the HR tag unless we're on the last item:
[% '<HR WIDTH="80%" ALIGN="center">' UNLESS item == news.last %]
Template Toolkit provides special "virtual methods" on Perl values, which allow us to do clever things like this: Arrays have methods like first and last, which are equivalent to .[0] and .[-1] respectively. There are also methods that allow you to call Perl functions such as split or join on template variables.
This completes the template part of the newsletterthe complete template is given in Listing 1. (All code for this article is also available online at http://www.tpj.com/source/.)
Now let's start thinking about how we want to get these values into our template. We will have our "in brief" news points stored in a file, one point per line, to make it very easy to read those into an array:
open BRIEF, "inbrief" or die $!;
my @brief = <BRIEF>;
close BRIEF;
$vars = {
issue => $issue,
brief => \@brief,
date => $date
};
$template->process("newsletter.thtml", $vars, "newsletter-$issue.html");
This time, we've used a third argument to process, which tells Template Toolkit not to write to standard output, but to save the output to the named file.
What about the main news items? Well, this is where the story starts to get a bit more complicated. I want the news articles to also appear on my blog (http://blog.simon-cozens.org/) as I upload them. My blog uses a piece of software called blosxom, written by Rael Dornfest at O'Reilly. I like blosxom because it has a UNIX natureI put my blog items as plain-text files in a directory and it sorts them all out. So a blog entry could be a file called "1234.txt" containing this:
Head Goes Here <p> Here is the text of today's blog entry </p>
Blosxom looks at the first line of the file and uses that as the entry's heading. The rest of the file is HTML text that is added verbatim into the blog page which is being constructed. Blosxom also takes a look at the file's timestamp in the filesystem and uses that as the date of the entry. Note that the name of the file ("1234.txt") is arbitrary, and isn't used in building up the entry at all.
Now, because I wanted the news articles to appear on my blog, I thought it would be sensible to use blosxom format for the articles. That way, once they've been processed into the newsletter, they can be moved across to the blog data directory and be picked up there, too. So let's read in these files the same way blosxom does:
use File::stat;
use File::Copy;
my @news;
for my $file (<*txt>) {
my $item = {};
$item->{when} = localtime(stat($file)->mtime);
$item->{when} =~ s/\d+:\d+:\d+ //;
open IN, $file or die "$file: $!";
$item->{title} = <IN>;
local $/;
$item->{content} = <IN>;
close IN;
push @news, $item;
copy $file, "/opt/blog/$file";
}
We look for all the "txt" files in the current directory, and process each one of them. First, we look at the last-modified time of the file and convert that to a string. We remove the time, leaving only the date, and use that as the when element of our array. Now we can open up the file, read the first line as the title, and everything else goes into content. Once we've finished reading the file, we stick the item onto the array of news items and copy the entry over to the blog data directory for blosxom to pick it up.
Now we have all the data we need...or most of it at least.
A further wrinkle comes from the fact that I only want to create one file offline and let my processing program do the right thing with it. This actually works to our advantage because we can produce a tar file that contains all the data and metadata we need in one directory.
We'll stipulate that the tar file comes in with a known filename and known format: Each issue should be contained in a file called "issueX.tar.gz" and this should contain a directory issueX/. We can now use Archive::Tar to extract the files:
use Archive::Tar; my $filename = shift; my $tar = Archive::Tar->new; $tar->read($filename, 1); $tar->extract;
And we can grab our issue number and the directory where we expect to find our files from the name of the file:
my $dir = $filename; $dir =~ s/\.tar\.gz//; $dir =~ /issue(-?\d+)$/ or die "Directory name not in correct format"; my $issue = $1;
The -? is there because I wanted to produce "pre" issues of the newsletter, whimsically called "Issue -2" and "Issue -1." Because these preissues were monthly and the real issues will be weekly, I wanted to specify the date manually: "Issue -2" should have a date of "May," rather than "July 10-17" or whatever. So we read in the date of the newsletter from a file called date in our data directory:
open DATE, "$dir/date" or die "Can't open date file"; my $date = <DATE>;
So now we have all we need to produce the HTML version of the newsletter: a way to untar the input, find the issue number, look at the date, read the brief news items, and also read in the blosxom entries.
We have an HTML file, but it's not much good just sitting on our filesystem. We need to get it out onto the Web. My personal web site is currently externally hosted, so I have to use FTP to transfer the files up to the site. No problemPerl has the Net::FTP module to handle this for me:
use Net::FTP;
$ftp = Net::FTP->new("simon-cozens.org");
$ftp->login("simon",$password) or die $!.$@;
print "Creating HTML version...\n";
my $output = "newsletter$issue.html";
$template->process("newsletter.thtml", $vars, $output);
print "Uploading $output...\n";
$ftp->put($output, "public_html/newsletter/$output");
And I also upload it once again calling it "latest.html" so people can make sure they're reading the most recent version.
print "Uploading as latest.html...\n"; $ftp->put($output, "public_html/newsletter/latest.html");
So far we have the newsletter available as an HTML file on the Web, and also as entries on my weblog. But both of these are "pull" mediapeople have to keep checking the site to see if there's something new. Some people expressed a desire to have the news available as "push," where they are informed every time there's an update. The obvious way to do this is by e-mail. (Another way is via RSS, but that raises the bar a littleeveryone knows how e-mail works.) And of course, I would rather die than knowingly send HTML e-mail.
Easy enough, I thoughtI'll just knock up another template that will generate a plain-text e-mail and send that out to a mailing list I had set up. This template was very similar to the HTML one, but obviously, much simpler:
Issue [% issue %] - [% date -%] News ==== [% FOREACH item = news -%] [%item.title%] [%- item.content -%] - [%item.when%] [%- '-' UNLESS item == news.last %] [%- END %] ...
But when I processed this, I realized a slight problem. All of the news items are designed to be on the Web, in blosxom formatin HTML. I had to de-HTMLify these items before putting them through the processor. The HTML::TreeBuilder and HTML::FormatText modules came to my rescue here:
use HTML::TreeBuilder;
use HTML::FormatText;
for (@news) {
my $text = $_->{content};
$tree = HTML::TreeBuilder->new->parse($text) or die $!;
$formatter = HTML::FormatText->new(leftmargin => 1, right margin => 75);
$_->{content} =$formatter->format($tree);
}
This replaces each content with a plain-text equivalent, ready to be processed by our e-mail template.
Now it's a very simple matter of using Mail::Mailer to send out the processed e-mail:
$template->process("email.template", $vars, "email.txt");
use Mail::Mailer;
$mailer = new Mail::Mailer 'smtp', Server => "localhost";
$fh = $mailer->open({Subject => "Newsletter Issue $issue",
To => 'wectrek2003@lists.netthink.co.uk');
print "Sending...\n";
open LET, "email.txt" or die $!;
print $fh <LET>;
$fh->close;
And we're done. 76 lines of Perl code and seven modules later, we have a system that allows me to take a file full of news and metadata, say
% process-newsletter issue3.tar.gz
and magically have a web site and weblog updated and a newsletter sent out via e-mail. The whole process-newsletter program can be found in Listing 2.
Larry is wise, and strong. But remember how his one regret was he didn't get to a Christian missionary? Guess what Ruby's creator used to be? A missionary in Hiroshima, Larry. In Hiroshima.
-Dave Green, NTK
http://www.ntk.net/index.cgi?b=02001-02-16&l=160#l
So far I've been very coy about what this newsletter is all about. Next month, I'm planning to even Larry's old scoreI'll be going out on a short-term mission trip working with churches around the Shiga area of western Japan. In the field, I may not have excellent Internet connectivity, so I wanted something that would allow me to do as much of the work offline as possible; this is why I wanted only to have to deal with one file and have the processing system do the rest.
In the process of writing the newsletter system, we've seen examples of how to use Template Toolkit, how to unpack tarballs with Archive::Tar, how to upload files with Net::FTP, how to turn HTML into plain text, and how to send out mail, all from Perl. By putting in the time to create this admittedly complex processor, I'll now be able to spend less time creating the newsletter and more time creating news to go in it.
This is, I believe, exactly the kind of laziness Larry had in mind when he created Perllaziness that requires a reasonable investment of time and effort up front, but then allows me to keep in touch with those back home, yet still have more time away from the computer, doing good things with good people.
You can keep up to date with my trip at http://simon-cozens.org/ mission/latest.html, where you'll see the output of this very system.
TPJ
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> < title>Newsletter</title> <link rel=stylesheet type="text/css" href="http://simon-cozens.org/mission/pl.css" title="myStyle"> </head> < body> < div class="box-wrap"> < div class="box-header"> <h1 align="center">Simon Cozens newsletter</h1> <p align="right"><I>Issue [% issue %] - [% date %]</I></p> </div> <div class="columns-float"> < div class="column-one"> <div class="column-one-content"> <h2 align="center">News</h2> [% FOREACH item = news %] <h3>[%item.title%]</h3> [% item.content %] <P ALIGN="right"><I>- [%item.when%]</I></P> <BR> [% '<HR WIDTH="80%" ALIGN="center">' UNLESS item == news.last %] <BR> [% END %] </div> </div> < div class="column-two"> <div class="column-two-content"> <h2>WEC Trek to Japan</h2> [% INCLUDE trek %] </div> </div> < /div><!-- close columns-float --> < div class="column-three"> <div class="column-three-content"> <h2>In Brief</h2> <ul> [% FOREACH point = brief %] <li>[% point %]</li> [% END %] </ul> [% INCLUDE phone %] </div> </div> < /div><!-- close box-wrap --> </body> < /html>
use Template;
use File::stat;
use Net::FTP;
$ftp = Net::FTP->new("simon-cozens.org", Debug => 0);
$ftp->login("simon","xxx") or die $!.$@;
use Archive::Tar;
my $filename = shift;
my $tar = Archive::Tar->new;
$tar->read($filename, 1);
$tar->extract;
my $dir = $filename;
$dir =~ s/\.tar\.gz//;
$dir =~ /issue(-?\d+)$/ or die "Directory name not in correct format";
my $issue = $1;
open DATE, "$dir/date" or die "Can't open date file";
open BRIEF, "$dir/brief"
or die "Can't open brief news file $dir/brief: $!";
my @points = <BRIEF>;
# Now, the rest will be in blosxom format
my @news;
for my $file (<$dir/*txt>) {
my $item = {};
$item->{when} = localtime(stat($file)->mtime);
$item->{when} =~ s/\d+:\d+:\d+ //;
open IN, $file or die "$file: $!";
$item->{title} = <IN>;
local $/;
$item->{content} = <IN>;
close IN;
push @news, $item;
}
my $template = Template->new();
my $vars = {
issue => $issue,
news => \@news,
brief => \@points,
date => <DATE>
};
print "Creating HTML version...\n";
my $output = "newsletter$issue.html";
$template->process("newsletter.thtml", $vars, $output);
print "Uploading $output...\n";
$ftp->put($output, "public_html/japan/$output");
print "Uploading as latest.html...\n";
$ftp->put($output, "public_html/japan/latest.html");
print "Processing email version...\n";
# Now process the email version, and send it to the list.
use HTML::TreeBuilder;
use HTML::FormatText;
for (@news) {
my $text = $_->{content};
$tree = HTML::TreeBuilder->new->parse($text) or die $!;
$formatter = HTML::FormatText->new(leftmargin => 1, rightmargin => 75);
$_->{content} =$formatter->format($tree);
}
$template->process("email.template", $vars, "email.txt");
use Mail::Mailer;
$mailer = new Mail::Mailer 'smtp', Server => "localhost";
$fh = $mailer->open({Subject => "Newsletter Issue $issue",
To => 'wectrek2003@lists.netthink.co.uk');
print "Sending...\n";
open LET, "prayer-email.txt" or die $!;
print $fh <LET>;
$fh->close;