Extending Bryar

The Perl Journal September 2003

By Simon Cozens

Simon is a freelance programmer and author, whose titles include Beginning Perl (Wrox Press, 2000) and Extending and Embedding Perl (Manning Publications, 2002). He's the creator of over 30 CPAN modules and a former Parrot pumpking. Simon can be reached at simon@ simon-cozens.org.

My boss, as I've mentioned before, has some good ideas about software design, and some of them eventually rub off on me. But there's one good idea of his that I'd forgotten about when I was writing Bryar, the blogging software we examined in my last article.

The idea is that if you're writing a class, you should always try writing another—related but functionally different—class at the same time, and then you'll see what concepts can be abstracted out. If your class deals particularly with specifics of a MySQL database, try writing another class to play with a Postgres database. Not only will it obviously help you if you do need to extend your application in the future, it'll show you if all of your concepts are at the right level.

Although I didn't do it at the time, Bryar allows us plenty of opportunities to put this into practice. So, here I present a candid case study of extending Bryar in various directions, together with all the lessons it taught me about putting code in the right place.

Speeding it All up With mod_perl

One of the things that's always been a problem with Bryar is that it's quite slow; it doesn't handle caching, the CGI script isn't persistent so everything has to be recreated from scratch every time a request is made, and so on. With quite a lot of people regularly pulling down XML from my blog, the server is doing much more work than it needs to. So let's make our next extension by converting Bryar to speak mod_perl.

We start by looking at parse_args, which receives the path, any arguments, and sometimes the text of a new comment, and turns them into a parsed set of arguments to pass to the Bryar::Collector. The parse_args subroutine in our CGI version is 44 lines long, a big hint that something is very wrong. Generally, if a subroutine doesn't fit on your screen, it's too big.

This, in our mod_perl version, we extract the URI and the query arguments:

sub parse_args {
    my ($self, $bryar) = @_;
    my $r = Apache->request;
    my $pi   = $r->path_info;
    my %args = $r->args;

And then we find that everything else we need to do will be identical to the CGI version. Oops! Maybe we should put all this into a base class, and while we're at it, we can split up that massive subroutine as well.

Our previously 44-line subroutine now looks like this in the base class:

sub parse_args {
    my $self = shift;
    my $bryar = shift;
    my $cgi = new CGI;
    my %params = $self->obtain_params();
    my %args = $self->parse_path($bryar);

    if (my $search = $params{search}) {
        $args{content} = $search if $search =~ /\S{3,}/; # Avoid trivials.
    }
    $args{comments} = $params{comments} if $params{comments};
    $self->process_new_comment($bryar, %params) if $args{newcomment};
}

This is considerably easier to maintain. It also means that the Frontend classes can now concentrate on what they do best, which is dealing solely with the differences between interfaces. As a result, the business end of the ::CGI class simply becomes:

use CGI;
sub obtain_url { url() }
sub obtain_path_info { path_info() }
sub obtain_args { my $cgi = new CGI; 
                  map { $_ => $cgi->param($_) } $cgi->params 
                }
sub send_data { my $self = shift; print "\n",@_ }
sub send_header { my ($self, $k, $v) = @_; print "$k: $v\n"; }

Again, it's now trivial to work out what this class is doing, without having to wade through the rest of the code. That makes implementing the mod_perl class, and indeed any other front-end classes you want, extremely trivial.

But first, of course, we need to know how mod_perl works. mod_perl is an extension to Apache that allows much of its internal workings to be driven from Perl. Most mod_perl-based applications work by taking responsibility for the content that gets presented to the web client, but you can also use mod_perl to write Perl handlers for authentication, authorization, logging, and so on. We're just going to concentrate on content generation.

mod_perl's interface to the programmer comes through the Apache object, which represents the request that was made of the server; this is normally called $r. It's obtained through the Apache->request method, and we can use it to ask the server about the current URL, the path info and so on:

sub obtain_url { Apache->request->uri() }
sub obtain_path_info { Apache->request->path_info() }

We can also ask it for the CGI parameters, but we need to use the extension module Apache::Request to do this, because it more faithfully resembles the CGI.pm interface, and because it handles POST queries as well as GET queries. We use POST queries in Bryar to pass in comments:

sub obtain_params { 
    my $apr = Apache::Request->new(Apache->request);
    map { $_ => $apr->param($_) } $apr->param ;
}
and use the request object to write our headers and data to the client:
sub send_data { my $self = shift;
                Apache->request->status(OK);
                Apache->request->print(@_);
              }
sub send_header {
    my ($self, $k, $v) = @_; Apache->request->header_out($k, $v) 
}   

And that's basically it, apart from one small thing: Where do we plug this thing in? Apache looks for a subroutine called handler, and passes it a request object. We'll put our handler in the Frontend::Mod_perl module too, so that our mod_perl handler can be self-contained in the one file:

sub handler ($$) {
    my ($class, $r)= @_;
    return DECLINED if $r->filename and -f $r->filename;
    Bryar->go();
}

This says that we refuse to handle this request if it's been resolved to a file on disk and that file exists; we do this so that we can have http://blog.simon-cozens.org/ handled by Bryar, but http://blog.simon-cozens.org/blog.css, the stylesheet, handled by Apache normally.

This handler will do what we want, but we can make it a bit more clever by allowing the user to configure Bryar from the Apache configuration file. In my Apache config file, I have:

PerlModule Bryar
<Location />
SetHandler perl-script
PerlHandler Bryar::Frontend::Mod_perl

PerlSetVar BryarDataDir /web/blog
PerlSetVar BryarBaseURL http://blog.simon-cozens.org/
</Location>

PerlSetVar sets a variable that we can get at from our Apache request object, if we modify the handler accordingly:

Bryar->go(datadir => $r->dir_config('BryarDataDir'),
          baseurl => $r->dir_config('BryarBaseURL'));

And now, thanks to good OOP design and abstraction, we have a handler in a short, self-contained module that we can drop into a Bryar installation and transform it from a CGI program to an Apache instance.

Notice that when we refactored the frontend to a base class, the base class asked its subclasses how to do particular things, and they provided specific ways of getting information or performing appropriate tasks. This is a brilliant trick because it completely avoids the need to subclass entire methods. If you apply this appropriately, you can make your classes an absolute joy to subclass. For instance, I wrote Mail::Thread, a mail threading library, and wanted it to be useful for each of the various different mail message classes out there, each of which can have different ways of getting headers, and so on. So, when my very own Email::Simple library came along, Iain Truskett was able to subclass Mail::Thread in very few lines:

package Email::Thread;
use base 'Mail::Thread';
sub _get_hdr { my ($class, $msg, $hdr) = @_; $msg->header($hdr); }
sub _container_class { "Email::Thread::Container" }

package Email::Thread::Container;
use base 'Mail::Thread::Container';
sub subject { eval { $_[0]->message->header("Subject") } }

It's a technique we use often to help design really reusable code.

A New Data Source

However, to get any real speed benefit from our mod_perl implementation of Bryar, we will have to write an additional data source that makes use of the fact that we can store document objects in memory as they persist from request to request. But let's not go ahead and do that immediately; we'll apply that trick my boss taught me to implement a completely different data source and check our abstraction layers.

The obvious place from which to get data, if not a filesystem, is a relational database. We'll assume that we've got the following database structure in place:

CREATE TABLE posts (
      id mediumint(8) unsigned NOT NULL auto_increment,
      content text,
      title varchar(255),
      epoch timestamp,
      category varchar(255),
      author varchar(20),
      PRIMARY KEY(id)
);
CREATE TABLE comments (
      id mediumint(8) unsigned NOT NULL auto_increment,
      document mediumint(8),
      content text,
      epoch timestamp,
      url varchar(255),
      author varchar(20),
      PRIMARY KEY(id)
);

By far the easiest way to access this database from Perl is for us to create a new Bryar::Document subclass. Our new datasource, Bryar::DataSource::DBI, will not return Bryar::Documents but Bryar::Document::DBI objects. This will allow us to wrap the database tables using Class::DBI in very few lines of code:

package Bryar::Comment::DBI;
use base qw(Class::DBI::mysql Bryar::Comment);
__PACKAGE__->set_db('Main','dbi:mysql:bryar');
__PACKAGE__->set_up_table('comments');

package Bryar::Document::DBI;
use base qw(Class::DBI::mysql Bryar::Document);
__PACKAGE__->set_db('Main','dbi:mysql:bryar');
__PACKAGE__->set_up_table('posts');
__PACKAGE__->has_many('comments' => 'Bryar::Comment::DBI' => "document");

That's all we need. All of the SQL work is done by inheritance from Class::DBI::mysql, and all the Bryar side of things is handled by the inheritance from Bryar::Document and Bryar::Comment.

Except there's a slight nit: Class::DBI doesn't really like multiple inheritance and refuses to create a comments method if one already exists. So we need to do the inheritance after we've set everything up:

package Bryar::Document::DBI;
use base qw(Class::DBI::mysql);
__PACKAGE__->set_db('Main','dbi:mysql:bryar');
__PACKAGE__->set_up_table('posts');
__PACKAGE__->has_many('comments' => 'Bryar::Comment::DBI');
use Bryar::Document;
push @Bryar::Document::DBI::ISA, "Bryar::Document";

Now we need to write the Bryar::DataSource class. As it turns out, there's nothing in Bryar::DataSource::FlatFile that can be abstracted out; everything there is flat-file specific. So let's just go ahead and implement the methods we need. Retrieving all documents is easy, thanks to Class::DBI:

package Bryar::DataSource::DBI;
sub all_documents { Bryar::Document::DBI->retrieve_all() }

We have to do a little more work for searching; in fact, Class::DBI's support for searching is less extensive than some of the other database abstraction libraries. However, there's a nice plug-in by Tatsuhiko Miyagawa, called Class::DBI::AbstractSearch, which we can use to construct powerful WHERE clauses.

Let's first dispatch the easy case of finding an individual blog post by ID, and then we'll see where AbstractSearch gets us:

sub search {
    my ($self, $bryar, %params) = @_;
    if ($params{id}) { 
        return Bryar::Document::DBI->retrieve($params{id}) 
    }

AbstractSearch allows us to construct our WHERE class in the form of a Perl data structure; if we wanted to search for all blog posts by author simon on August 15th, we'd say:

{ author => "simon",
  epoch => { "between", [20030815000000, 20030815235959] }
}

So we can write our code a little like this:

my %condition;
$condition{epoch} = {between => [ _epoch2ts($params{since}),
                     _epoch2ts($params{before}) ] } if $params{since};
$condition{"lower(content)"} = {like => "%". lc $params{content}."%"}
                     if $params{content};

and we can see if there's a limit:

my %limits;
$limit{limit} = $params{limit} if $params{limit};

now we can just pass these two hashes to our Class::DBI- derived class.

Bryar::Document::DBI->search_where(\%condition, \%limit);

and this returns a list of document objects!

Next, a quick bit of Time::Piece hackery to convert the epoch times we're receiving into SQL-friendly format:

sub _epoch2ts { Time::Piece->new(shift)->strftime("%Y%m%d%H%M%S"); }

And now we need something to store comments; this is trivial because we can just pass everything to the new method of the comment class:

sub add_comment {
    my ($self, $bryar, %params) = @_;
    Bryar::Document::Comment->new(\%params);
}

And it's all over bar the documentation: We've added database backing to Bryar in one single drop-in file in around 30 lines of code. This is how it should be.

Genius From Mars

Of course, at this stage, we now have a system that takes documents from a database, renders them with a template toolkit, and spits them out with mod_perl. It was at this point that I realized that Bryar may well turn into something more than just weblog software. If you think of the datasource class as data to be modeled, and the rendering layer as a view class, then Bryar acts as the controller in the classical MVC pattern. Combine this with the genericity of Bryar, and you've got something that can be used for displaying catalogues of products, for content and document management, and for all kinds of other purposes.

But let's not go down that road right now; there are three key points we want to take away from our experiences with Bryar for the moment.

The first is that technique of refactoring code by adding another implementation of a class and then abstracting out commonalities, which can lead to cleaner and more extensible code.

The second point, as we've mentioned, is that one way this manifests itself is in having base classes ask subclasses about specific behavior, and implementing the generic behavior in the base class. This leads to wonderfully subclassable code.

The third, less-believable point is that sometimes it pays to listen to your boss.

TPJ