Something for Nothing

The Perl Journal March 2003

By Simon Cozens

Simon is a freelance programmer and author, whose titles include Beginning Perl (Wrox Press, 2000) and Extending and Embedding Perl (Manning Publications, 2002). He's the creator of over 30 CPAN modules and a former Parrot pumpking. Simon can be reached at simon@ simon-cozens.org.

Everyone knows that the object of Perl modules is to make life easier for the programmer—to reduce the amount of code you end up writing. More correctly, you can think of CPAN modules as reducing the amount of auxiliary code in your programs, leaving you free to get on with the specific algorithms you wish to implement.

In this article, I'm going to embrace and extend Mark-Jason Dominus's concept of "structural code." When Mark talks about structural code in his Red Flags tutorial, he means code that doesn't get you any closer to doing what you want to do, but is required to keep the compiler happy or the code looking sane. For instance, in

sub remove_duplicates {
    my @list = @_;
    my %seen;
    return grep { !$seen{$_}++ } @list;
}

most of the code is structural. The first three lines of that code do nothing towards removing duplicates from a list— they fulfill no functional role, merely a structural one. This implementation is better, but still contains a lot of structural code that you can't avoid when you're programming in Perl:

sub remove_duplicates {
   my %seen;
   grep { !$seen{$_}++ } @_;
}

As I said, I'm going to extend that concept. In this article, structural code is anything that is generic to programming and is not essential to the specific algorithms and functionality of the program you're writing.

I'd like to introduce four Perl modules—three of mine, and one originally written by Michael Schwern—and show how they can be combined to reduce the amount of structural code in an application to nearly zero. We'll first take a brief look at the four modules, then we'll show how they worked in a recent application of mine.

Config::Auto

Almost every single application needs to store a user's configuration settings. The end result is that every single application generally includes some code for dealing with whatever configuration format the programmer chose. Such a case is a prime candidate for modularization, and indeed there are a number of modules that can deal with various formats: XML::Simple for the ever-popular XML, Config::IniFiles for Windows-style INI, and several others. For UNIX applications, the standard configuration formats are either a variant of key = value (from lynx):

# all cookies.
accept_all_cookies=off

# bookmark_file specifies the name and 
# location of the default bookmark file 
# into which the user can paste links for 
# easy access at a later date.

bookmark_file=lynx_bookmarks.html

or colon separated (as in /etc/groups and friends):

nobody:*:-2:
nogroup:*:-1:
wheel:*:0:root
daemon:*:1:root

Or maybe space separated (this one from is "gltron," a rather enjoyable OpenGL light-bikes game):

iset show_help 0
iset show_fps 1
iset show_wall 1
iset show_glow 1
iset show_2d 1

I've had to write code to deal with all of these different formats many times over, and I finally gave up: I had what I call a "once and for all" moment. I wanted to sit down and crunch out some code that would just handle whatever I threw at it, and know that I would never ever have to tackle this problem again in my Perl programming career. Try it. It's tremendously freeing.

So I wrote Config::Auto, which parses all of the above formats and more. The idea is not that it gives the user a complete free-for-all. Ideally, you would specify what format you were prepared to read, what sort of data structure you expected to get at the end of it, and then you would know that the parser would be able to handle it with no additional work needed.

In its most basic use, you would say:

use Config::Auto;
my $config = Config::Auto::parse("~/.myapprc", format => "equal");

to parse an equals-separated configuration file such as the .lynxrc above. But if we're trying to avoid extraneous code, why not have the parser work out what sort of configuration file it's been handed?

use Config::Auto;
my $config = Config::Auto::parse("~/.myapprc");

And now it'll take a long look at your rc file and determine what format it looks like it's in.

And actually, there's no reason why, assuming standard naming conventions, you should have to tell it where the configuration file is, anyway. If your program is called myapp, then it's a reasonable guess that if the user has a ~/.myapprc file, those are the configuration settings. Config::Auto also tries a few other standard locations, to leave us with:

use Config::Auto;
my $config = Config::Auto::parse();

Configuration file handled! Structural code: zero. (Well, near zero. Future versions of Config::Auto may well declare and populate a $main::config variable for you on import. But maybe there is such a thing as Too Much Magic.)

Attribute::Persistent

The next area that requires too much code is storing persistent data. As I've mentioned in a previous column, even with AnyDBM_File and MLDBM, handling persistent variables is still a pain in the neck. Attribute::Persistent just takes all the pain away, once and for all, with no tie and no structural code at all.

use Attribute::Persistent;
my %hash :persistent; # And that's all.

Persistent storage handled! Structural code: zero.

Getopt::Auto

In a recent comp.lang.perl.moderated thread, it was pointed out that there are a number of things that every novice Perl program reinvents, despite there being perfectly round wheels out there; a command-line options processing system was one of them. Here, I disagree.

There are a number of different styles of command-line options: —long, —short, and the CVS-style "bare" command. In a similar vein to Config::Auto, I wanted a system that handled all of them.

But then I realized there was a more serious problem. If you're implementing something with an interface similar to CVS (that is, a single executable that can perform various commands, although it could be argued that this is not the UNIX Way), then you'll end up with a horrific piece of code that would look something like this:

my $command = shift @ARGV;
if ($command eq "add") {
   do_add(@ARGV);
} elsif ($command eq "subtract") {
   do_subtract(@ARGV);
} ...

The two equally dissatisfying alternatives look slightly better:

my %commands = ( add => \&do_add, 
                 subtract => \&do_subtract,
                 ... 
               );
my $what = shift;
if (exists $commands{$what}) { 
   $commands{$what}->(@ARGV) 
}
else { do_help() }

Or even

no strict 'refs';
my $what = shift;
&{"do_$what"}(@ARGV);

But they have two problems: First, you still need to handle things like —help and —version separately, and your —help text will generally repeat all the possible arguments over again. Opponents of structural code will know that repetition is to be avoided at all costs: This is a specific case of the Prime Rule of programming and user-interface design—"You should never tell the computer anything it already knows or can reasonably be expected to work out." If you're a manager, you might like to contemplate the fact that programmer time is expensive and computer time is cheap. Who should be doing the boring work?

The bigger problem is that all this is structural code once again. Dispatching to the appropriate routine is useful, but it's not as useful as actually doing the work of your program. So I had another once-and-for-all moment and decided that something else should be implementing this structural code. That "something else" is Getopt::Auto. As you can probably tell, I'm pretty fond of the idea that computers should do things automatically—it is, after all, what they're for.

With Getopt::Auto, you simply declare what commands you're willing to process, maybe give some help text for them, and the module does the rest. For instance:

use Getopt::Auto (
   [ "—add", "Add two numbers together", \&do_add ],
   [ "—subtract", 
     "Subtract one number from another", 
     \&do_subtract 
   ], 
     ...
);

With no further code, yourapp —add 3 5 will call do_add(3,5). And, as an added bonus, you get —version and —help free of charge:

% yourapp —help
yourapp —help - This text
yourapp —version - Prints the version number

yourapp —add - Add two numbers together
yourapp —subtract - Subtract one number from another

Of course, you may not like GNU-style —long options. Let's try again with CVS-style options without specifying the subroutines explicitly:

use Getopt::Auto (
  [ "add", "Add two numbers together" ],
  [ "subtract", "Subtract one number from another" ],
    ...
);

This time, yourapp add 3 5 will call add(3,5); help will still work and will now spit out the commands in the new bare style. You write the specification, and Getopt::Auto takes care of the rest.

The more alert of you may well be asking "Isn't this specification structural code?" Well, yes; I thought of that. What would be really nice is if you could say:

use Getopt::Auto;

and it would just work. Well, with one proviso, it does. The proviso is that you must provide POD documentation for each subroutine you want to turn into a command. But of course, all of your subroutines are documented anyway, so that shouldn't be a problem.

Here's our fully automated calculator example:

use Getopt::Auto;
our $VERSION = "1.0";

=head2 add - Adds two numbers together

   calc add x y

Adds x and y together and prints the result.

=cut

sub add { print $_[0] + $_[1], "\n" }

=head2 subtract - Subtracts one number from another

   calc subtract x y

Subtracts y from x.

=cut

sub subtract { print $_[0]-$_[1], "\n" }

Now we can say:

% calc —add 3 5
8

% calc —help   
This is calc, version 1.0

calc —help - This text
calc —version - Prints the version number

calc —add - Adds two numbers together[*]
calc —subtract - Subtracts one number from another[*]

More help is available on the topics marked with [*]
Try calc —help —foo
And if we follow its suggestion: 
% calc —help —add
This is calc, version 1.0

calc —add - Adds two numbers together

	 calc add x y

Adds x and y together and prints the result.

Options processing and subroutine dispatch handled! Structural code: zero.

Class::DBI

The final module is not one of my own, but it's so efficient at removing structural code in database-backed applications that it absolutely has to be mentioned. Database applications with the DBI are breeding grounds for structural code: Either you spend a lot of time handling the various select, insert, update, and delete calls yourself, or you use some kind of abstraction layer that does some of the work for you.

Class::DBI is like this abstraction layer, except that in most cases, it does almost all of the work for you. With Class::DBI, you set up one subclass that represents your database:

package Myapp::DBI;
use base 'Class::DBI';
and then tell it what your DBI parameters are:
Myapp::DBI->set_db('Main', 'dbi:mysql:myapp');

No connecting, no disconnecting, no mucking about with handles. But how do you get at the data? Well, you need a class for each of the tables you want to play with:

package Myapp::Person;
use base 'Myapp::DBI';
Myapp::Person->table("person");

Next, tell it the columns you're interested in, starting with the primary key:

Myapp::Person->columns(All => qw(
                                   id 
                                   name 
                                   department
                                   salary
                                )
                       );

and away you go: Your class now has create, retrieve, and search methods to return Person objects, and you also have accessor methods for each of the columns.

# 3% raise for all programmers!
for my $person (Myapp::Person->search({
                department  => "programming"}) {
   $person->salary($person->salary()*1.03);
}

There are good tricks for handling relationships between tables and between database and nondatabase objects; I refer you to Tony Bowden's article on Class::DBI for perl.com at http://www .perl.com/pub/a/2002/11/27/classdbi.html.

While this removes most of the rigmarole of handling data in databases, it still violates the Prime Rule because we're having to tell the computer about the columns in our database tables. In the vast majority of cases, the database can tell us what columns it has. Unfortunately, the way it tells us is generally database specific. So Class::DBI has certain database-specific add-on modules, such as Class::DBI::mysql. (It's only a matter of time before someone combines them all...)

Now we can tell our Myapp::DBI to inherit from this:

package Myapp::DBI;
use base 'Class::DBI::mysql';
...
and the need to detail the columns goes away:
package Myapp::Person;
use base 'Myapp::DBI';
__PACKAGE__->set_up_table('person');

(Class::DBI folk tend to use __PACKAGE__ instead of repeating the class name; this is slightly related to the Prime Rule. If you ever need to change the class's name, you only want to be changing it in one place.)

But even this code is reasonably structural! The computer not only knows what columns it has in its database tables, but it also knows what tables it has. With Class::DBI::Loader, we can get it down to:

use Class::DBI::Loader;
Class::DBI::Loader->new( dsn => "dbi:mysql:myapp", namespace => "MyApp");

and now we can use MyApp::Person as before.

Database access is handled with very little structual code indeed.

Putting It All Together

We've seen four tools that give us a great deal of functionality for very little cost in code. With all of these modules, what we gain in brevity, we sacrifice in flexibility. For instance, to make absolutely full use of Class::DBI requires some investment, in terms of tuning access to the columns of each table and declaring the various relationships between columns longhand.

In the code that I write from day to day, I try to strike a balance; the last thing you really want are classes and variables magicking themselves into existence without your really being aware of them. So, for instance, I don't use Class::DBI::Loader. I prefer to declare each table's class manually.

Well, not exactly "manually." That wouldn't be a very good use of my time. Instead, I have a little script that produces an application template—a basis for an application that uses many of the aforementioned techniques. I spend most of my preparation time working out the best database schema, and then I type something like:

appgen PerlBooks

Anyone who bears the scars of the old dBase III+ application generator will recognize the name and the concept; appgen goes away, examines the database, and spits out a number of skeleton files, which I will turn into my eventual application.

So, first we take the name of the namespace (PerlBooks), turn it into our database name (perlbooks), and try to use Class::DBI::Loader on that database:

use Class::DBI::Loader;

my $namespace = shift;
my $database  = lc $namespace;

my $loader = Class::DBI::Loader->new(
   dsn       => "dbi:mysql:$database",
   namespace => $namespace,
);

(The application generator itself doesn't need to be portable to multiple databases—although its output must be!—since, for better or worse, I do all my development on MySQL.)

Now we do a little ugly messing about. First, we want our own copy of the database handle so we can prod the database, and this allows us to ask it for its tables. Instead of repeating the DSN in the DBI connection, we ask $loader what DSN it used:

my $dbh = DBI->connect( 
    @{ $loader->_datasource } 
) or croak($DBI::errstr);
my %tables = map { $_ => 1 } $dbh->tables;

Now for each table, we want to spit out a module representing that table in the ordinary Class::DBI way:

foreach my $table (keys %tables) {
   my $class = $loader->_table2class($table);
   my $ref   = $dbh->selectall_arrayref(
                             "DESCRIBE $table"
                                       );

Most of this code is cobbled together from bits of Class:: DBI::mysql and Class::DBI::Loader. Here, we turn the table name (say, account) into the appropriate class name, PerlBooks::Account, using Class::DBI::Loader's built-in method, and then get a description of the database table.

Now we want to know what the primary key is, so we grep that out of the table's description:

my ( @cols, $primary );
foreach my $row (@$ref) {
   my ($col) = $row->[0] =~ /(\w+)/;
   push @cols, $col;
   next unless $row->[3] eq "PRI";
   die "$table has composite primary key" if $primary;
   $primary = $col;
}
die "$table has no primary key" unless $primary;

This gives us $primary and a list of columns in @cols. At this point, we can write our class:

my $file = $class; $file =~ s{::}{/}g;
open OUT, ">$file.pm" or die $!;
print OUT <<EOF;
package $class;
use base '${namespace}::DBI';
__PACKAGE__->table($table);
__PACKAGE__->columns( Primary => q{$primary} );
__PACKAGE__->columns( All     => qw{@cols} );
EOF

We do something that Class::DBI::Loader doesn't do, which is to guess the "has-a" relationships in each table. For instance, if we have a column in transaction called account, we guess this is a reference to the primary key in the account table:

for (@cols) {
   if (exists $tables{$_}) {
      print OUT "__PACKAGE__->has_a($_ => q{".
         $loader->_table2class($_)."});\n";
   }
}

This spits out something like:

__PACKAGE__->has_a(account => q{PerlBooks::Account});

Then the account method in our PerlBooks::Transaction will no longer produce a numeric ID, but will instead produce a PerlBooks::Account object. Finally, our generator finishes off the current class:

print OUT <<EOF;

1;
EOF
   close OUT;
}

Now we can get onto the main PerlBooks module, which has to load up the others, and any other modules we might want to use:

open OUT, ">$namespace.pm" or die $!;
print OUT "package $namespace;\n\n";
print OUT "use Config::Auto\n";
print OUT "use ".$loader->_table2class($_).";\n" for keys %tables;
print OUT "\n1;\n";
close OUT;

Our PerlBooks::DBI class is generated next, but this needs to be done a little carefully. As we've seen, Class::DBI expects the main class that subclasses it to tell it the connection parameters, including the username and password. Typically, though, we don't want to store the username and password in our main program files, so we bring them in from a PerlBooks::Config class:

open OUT, ">$namespace/DBI.pm" or die $!;
print OUT <<EOT;
package ${namespace}::DBI;
use ${namespace}::Config;
use base 'Class::DBI';
__PACKAGE__->set_db('Main',
   'dbi:'.\$${namespace}::Config::dbd.
   ':'.\$${namespace}::Config::db,
   \$${namespace}::Config::username, 
   \$${namespace}::Config::password);
__PACKAGE__->autocommit(1);
1;
EOT
close OUT;

Finally, we write out a skeleton version of that PerlBooks::Config class to be overwritten by the real values of the username and password by our application's installer:

open OUT, ">$namespace/Config.pm" or die $!;
print OUT <<EOT
package ${namespace}::Config;
our (\$dbd, \$db, \$username, \$password) = 
    ("mysql", "$database", "", "");

1;
EOT

This is as far as I've currently progressed with the application generator, and already it has saved me a lot of work. But as I look at it now, there's a lot more it ought to do. For instance, it could easily spit out an ExtUtils::MakeMaker-based installation program, which would prompt for the correct username and password and write the ::Config module. As Alan Perlis said, "Programs that write programs are the happiest programs of all"—this is a program that writes a program that writes a program!

The other obvious task for my application generator is to spit out the main application file perlbooks, containing at least:

use PerlBooks;
use Getopt::Auto;

...

But this may be overkill, and currently I'm sufficiently happy with the ability to point my application generator at a database and come out with most of what I need to start writing database-driven application code that is relatively free from structural code.

In Closing

There are a number of things you could take away from this article. You might think that I've created three really interesting modules that you should go and have a look at—but then, I know who you'll come to for help with them, so maybe that's not such a good idea.

You might take away the Prime Rule—never tell a computer what it already knows or can be reasonably expected to find out for itself. If you do, I promise it'll radically impact the way you think about user interfaces.

You could take away the fact that, with CPAN modules, there may well be More Than One Way To Do It, but there's almost always an easier way.

But what I really want you to take away is that programming really ought to be fun. If you find that your programming is becoming a drudge, see if there isn't a way you can abstract away the drudge, whether there's already a module out there that does it all for you or whether you should sit down and tackle it in a once-and-for-all moment.

Doing so will free you from banging out code for the sake of code, and allow you to get on with the interesting bit of your job—having ideas, working out the best way to get things done, solving problems—and my fervent hope is that it'll make programming fun for you once again.

TPJ