Other People's Arguments

The Perl Journal February 2003

By Simon Cozens

Simon is a freelance programmer and author, whose titles include Beginning Perl (Wrox Press, 2000) and Extending and Embedding Perl (Manning Publications, 2002). He's the creator of over 30 CPAN modules and a former Parrot pumpking. Simon can be reached at simon@ simon-cozens.org.

I'm going to let you into a secret about writing technical articles. The trick that I often use to plan an article is to think of a particular technique I want to illustrate, then find a practical use for it and, finally, find a problem that the practical application solves. Here comes the trick: You then present it all back-to-front. That way, you've got an article that looks like it's showing you how to solve a particular problem, and the technique you want to talk about pops up magically as the answer to all your problems in the end.

In this article, for instance, the technique that I want to talk about is the little-known @DB::args magic variable; the application is my recent rubyisms Perl module; the problem, if you want to call it that, is my recent dabbling with Ruby.

As you can probably tell from last month's column, I've grown fond of some of the features I've been using in Ruby. This is a common problem—or so I'm told—with love affairs: If and when you come back to your first love, you can't help but want some of the things you've left behind. Thankfully, however, programming languages are a good deal easier to change than people. So the more frustrated I got with the things from Ruby that I thought Perl lacked, the more I wanted to sit down and fix them up.

The first thing I found myself missing was the super keyword. It came up in Perl as I was specializing a Class::DBI-based module. I had a class representing a database table, which I could search by its two columns, real_name and displayed_name. But I also wanted a "magical" search term name that searched through both columns. So, in my subclass, I would say:

sub search {  
    my ($self, $terms) = @_;
    if (exists $terms->{name}) { 
        # Do our special search
    } else {
        # Call the superclass's search.
    }
}

Now, calling the superclass method is easy. We all know how to do that. Here's how you'd naturally write it in Perl:

$self->SUPER::search($terms);

However, this is what it looks like in Ruby:

super

You can probably understand why I felt spoiled by Ruby. No problem, I thought, I can find a way to do this in Perl. So I thought about what the super subroutine had to do:

It should have a prototype of () so it could be called like a built-in.
It should use caller to get at the method that called it.
It should have to somehow get at the object and the arguments to the method, so it could work out the superclass method and call it with the same arguments.

Well, the first two were pretty easy:

 sub super () { 
     my $caller= (caller(1))[3];
     $caller =~ s/.*:://;

 }

But the third had me completely confused. How on earth could I retrieve my caller's subroutine arguments? Well, the obvious place to start looking was the documentation for caller; and there I found something I had never noticed before:

"Furthermore, when called from within the DB package, caller returns more detailed information: it sets the list variable @DB::args to be the arguments with which the subroutine was invoked."

Wow, just perfect. So I wrote up a little subroutine to call caller from package DB:

 package DB;
 sub uplevel_args { my @x = caller(2); return @DB::args }

This looks two frames up the stack (DB::uplevel_args is the zeroth frame, SUPER::super is the first, and the method that called super is the second), and returns the arguments from the method. The array is needed to stop Perl from optimizing the call to caller.

So now we know how the method was called, which tells us the object.

 sub super () { 
     my $caller= (caller(1))[3];
     $caller =~ s/.*:://;

     @_ = DB::uplevel_args();
     my $self = $_[0];
 }

Unfortunately, it gets tricky again here: We'd like to say $self ->SUPER::$caller, but that gives us a "Bad name after ::" error. And we want to avoid using eval, if possible. What we need is to somehow get hold of a reference to the appropriate superclass method. Since we already know the class and the method name, we are half way there. Let's assume a putative UNIVERSAL::super subroutine that works like UNIVERSAL::can and returns a reference to the method if one is available. Then we can say:

 sub super () { 
     my $caller= (caller(1))[3];
     $caller =~ s/.*:://;

     my @their_args = DB::uplevel_args();
     my $self = $their_args[0];
     $self->UNIVERSAL::super($caller)->(@their_args);
 }

Now this is pretty clever, but it has a slight untidyness problem. Suppose we have a class Wibble::Simple that inherits from class Wibble. With our current use of super, we'd see a call stack like this:

 Wibble::Simple::do_it
    SUPER::super
       Wibble::do_it

Whereas we'd prefer to see this:

 Wibble::Simple::do_it
    Wibble::do_it

Now, there is a way to make SUPER::super morph itself into the appropriate method: the goto &subroutine technique. Since UNIVERSAL::super—when it's written—returns a subroutine reference, we merely need to set the @_ to be what we want the superclass to see, and then goto the reference:

 sub super () { 
     my $caller= (caller(1))[3];
     $caller =~ s/.*:://;

     @_ = DB::uplevel_args();
     my $self = $_[0];
     my $supermethod = $self->UNIVERSAL::super($caller);
     goto &$supermethod;
 }

Right, we're done! Well, apart from the little matter of that UNIVERSAL::super method, that is. But this isn't too much of a problem—all we need to do to work out an object's super method is to think of what Perl would do. And what Perl does is ask each member of that object's class's @ISA array if it can perform the method. This can be done with the can method, which returns a code reference if the class can perform the given method—precisely what we want!

 package UNIVERSAL;

 sub super {
     my ($class, $method) = @_;

     if (ref $class) { $class = ref $class; }
     my $x;
     for (@{$class."::ISA"}, "UNIVERSAL") {
         return $x if $x = $_->can($method);
     }
 }

And with this—and a little testing and documentation—the SUPER module was born and released onto CPAN. Now I could write things like the following:

sub search {  
    my ($self, $terms) = @_;
    if (exists $terms->{name}) { 
        # Do our special search
    } else {
        super;
    }
}

This made me happy.

But then, a few more lines of code later, there was another problem. One of the nice things about Ruby's OO model as opposed to Python's and Perl's is that the recipient of a method is implicit.

It's not necessary to say:

my $self = shift;

to get it from the argument list. It's just there, and it's possible to get at it with the self keyword.

Furthermore, you can call one method from another just by naming it, and the self is again passed around implicitly.

Here's a bit of Ruby to demonstrate this:

class Thing

  def look
      _print
  end

  def _print
      puts self
  end

end

foo = Thing.new
foo.look

We create a new object and call its look method. This then calls another method, _print, implicitly passing the object around. _print receives the object, once again implicitly, before finally referencing it as self.

Of course, this can't be done in Perl—we can't change the fact that method calls do pass around the receiver and that we need to pass the receiver to a submethod. And we can't rewrite @_ (once we know the receiver) to pretend it was never there in the first place. But we can fake it.

Using the same @DB::args trick, we can create a subroutine that returns the first argument of its caller:

sub self () {
    return (DB::uplevel_args())[0];
}
This means we can say things like:
sub look   { self->_print }
sub _print { print self, "\n" }
Not a bad start. But we'd really like to be able to say:
sub look   { _print }
sub _print { print self, "\n" }

That's right: Even though we call print with no arguments, it should still know what the current receiver is.

Once again, being able to mess about with other subroutines' arguments comes to our aid. The key to this is realizing that we don't have to look just two levels up the stack—we can look farther if we want to. And as we look up the call stack, we'll eventually come to a frame that is called "properly," with the appropriate type of object as its first argument.

So, we first modify DB::uplevel_args so that we can look up a variable number of frames:

sub uplevel_args { my @x = caller($_[0]+1); return @DB::args };

We now look up the stack until we find a subroutine whose first parameter is-a whatever class we were called by:

sub self () {
    my $call_pack = (caller())[0];

    my $level =1;
    while (caller($level)) {
        my @their_args = DB::uplevel_args($level);
        if (ref $their_args[0]
            and eval { $their_args[0]->isa($call_pack) }) {
            return $their_args[0];
        }
        $level++;
    }

    return $call_pack;
}

We're only interested in objects that are inherited from the caller, because if we have

package Thing;
sub look_to_file { my $output = new IO::File (...); 
                   _print($output)
                 }

we want the recipient of _print to be the Thing, not the IO::File. So in this case, we want to ignore the argument to _print but look back at the arguments of look_to_file.

Notice also that if we don't find any object of the relevant class at any time in the recent past, we assume that we're dealing with a class method, and that the self is the name of the calling package; this is a reasonable approach and is pretty much what Ruby does:

% ruby -e 'print self;'
main
% perl -Mrubyisms -e 'print self;'
main

So now we can use an implicit self, and pass it around between methods of the same class. Very neat, no?

But I've glossed over another little detail of my Ruby example: Our class, Thing, had a constructor, but we didn't define a new method. This is because all classes in Ruby inherit from class Class, which provides a generally-good-enough default constructor and then calls the initialize method to allow us to specify the object. This is a neat idea, so I wanted to steal that too. First, we need to make everything that imports the rubyism method inherit from class Class:

sub import {
    no strict 'refs';
    push @{(caller())[0]."::ISA"}, "Class";
    rubyisms->export_to_level(1, @_);
}

We find the calling package's package name, and slap Class onto the end of its @ISA array. Then we can use a little Exporter trick that deserves to be better known: Call Exporter's import to make super and self available to calling packages in addition to doing our own importish things. You might think that after all we have seen in this article, we could so far just say:

sub import {
    no strict 'refs';
    push @{(caller())[0]."::ISA"}, "Class";
    super;
}

to jump to our superclass. Unfortunately, that doesn't quite work. This is because Exporter's import method moves symbols from Exporter to the class calling the method—in this case, rubyisms. This isn't what we want—we want to move symbols from rubyisms to whatever used it. So we call the import_to_level method, which moves symbols around at a different calling level. This does the right thing.

Now we can populate the Class class with the methods we want. A generally-good-enough constructor in Perl blesses an empty hash and calls a specializer before returning the new object:

package Class;
sub new {
    my $class = shift;
    my $self = bless {}, $class;
    $self->initialize(@_);
    return $self;
}

(Note that we can't use self here to get the recipient because in the constructor, there isn't a recipient yet!)

We provide a dummy specializer for completeness:

sub initialize {}

We can now rewrite our Ruby example in Rubyish Perl:

package Thing;
use rubyisms;

sub _print { print self }
sub look   { _print }
my $foo = Thing->new;
$foo->look;

And I was happy again—until I found another feature from Ruby I wanted to steal...

Now that we have this technique of inspecting the caller's arguments, it is very simple to write our own keywords such as super and self that depend on the properties of a subroutine. Another subroutine-specific keyword in Ruby is yield.

As we saw last month, every method in Ruby can take an optional block, and the yield keyword calls back that block. In Perl, we don't have the same optional block syntax, but we do have something similar: If subroutines are given a prototype starting with the & character, they will behave somewhat like map or grep.

For instance, here's a truly simple array iterator. You might want to call it a "Visitor design pattern" if you're a Gang-of-Four devotee; if you're a Perl devotee, you might call it a highly redundant for loop. It simply visits each element of the array, calling a codeblock on the element:

sub each_arr (&@) { 
    my ($code, @array) = @_;
    for (@array) { 
         $code->($_);
    }
}

With the syntax-modifying & prototype, this can be called as follows:

each_arr { print $_[0], "\n" } (10, 20, 30);

But we'd prefer to write this in a more Rubyish way, like so:

sub each_arr {
    for (@_) { yield }
}

The Perl way is slightly different—instead of a block at the end, we expect a block at the beginning. So, once again, we look at our caller's arguments and ensure that the first argument is a code reference. If it isnt, we give a nice Ruby-friendly error:

sub yield {
    my @their_args = DB::uplevel_args(1);
    if ((!@their_args) or ref $their_args[0] ne "CODE") {
        croak "no block given (LocalJumpError)";
    }

And now we have the code reference; we can call it on $_:

    $their_args[0]->($_);
}

This is pretty good, but Ruby's yield doesn't just yield the default value. In fact, Ruby doesn't really have a "default value" equivalent to $_. yield can take arguments, and those arguments should be passed to the code reference. But this being Perl, we want to support both styles: implicit $_ and explicit arguments. So our yield subroutine ends up looking like this:

sub yield (@) {
    my @their_args = DB::uplevel_args(1);
    if ((!@their_args) or ref $their_args[0] ne "CODE") {
        croak "no block given (LocalJumpError)";
    }
    my @stuff = (@_||$_);
    $their_args[0]->(@stuff);
}

But there is a slight problem. If we try out our shiny new yield with the each_args example, we might see something like this:

CODE(0x10774)
10
20
30

I must stress again that we're only faking it. We can't rewrite each_arr's @_ array so that the codeblock is squirrelled away for yield and doesn't appear when we call for. The code reference is going to stay part of @_ whether you like it or not. So yield needs to be a bit tricky.

The obvious way to solve this problem in the majority of cases is to simply refuse to call the code reference on itself:

$their_args[0]->(@stuff)
    unless $stuff[0] == $their_args[0];

And that is By And Large Good Enough. The iterators now work the way we expect them to.

So that's all I've wanted from Ruby so far, and the whole module, rubyisms.pm, is available from CPAN. I'm sure that there'll be more features added as I keep dragging things across from Ruby.

But we've seen that with the knowledge gained from a relatively simple but relatively obscure technique—the interaction between caller and @we can bend Perl in some interesting and extraordinary directions without mucking about with XS, the Perl internals, or any other difficulty.

Sometimes, it seems, getting involved in other people's arguments isn't such a bad thing after all.

TPJ