Exploring Perl Libraries

Dr. Dobb's Journal February 2001

Viewing library module data

By Robert Kiesling

Robert is the maintainer of the Linux Frequently Asked Questions with Answers FAQ on the Internet. He can be contacted at rkiesling@mainmatter.com.

Much of the time it takes to learn an object-oriented, GUI-centric language (with its overhead of event-driven and graphical objects) is spent learning about the class libraries. Perl is an exception. It trades off some of the facilities of more general-purpose languages for speed and simplicity of internal data. Still, you can get a reasonable picture of an object's member classes, methods, and globals, despite the apparent lack of a mechanism for viewing the data.

The Perl API can include a hierarchy of object types and a well-defined API, but these can be traded off for the execution speed of lower-level data structures. This flexibility, almost paradoxically, gives you the ability to tinker with library and application code while it is executing.

Even though the Perl libraries do not necessarily present an object-oriented API or code base, the language can still effectively implement or simulate a class-based environment. The Perl library modules I present here (available electronically; see "Resource Center," page 5), let you view library module data within the Perl interpreter itself (using the Tk::Browser module) and look up the source code and documentation for these modules; see Figure 1.

Perl Library Structure

In Perl, a module is simply a single file in Perl's library directories. A package is synonymous with a module, but its inheritance and directory hierarchies are qualified relative to the Perl interpreter's search path. A package is declared at the beginning of most source modules using the package keyword: package Lib::Module;. This declaration identifies the package file's path name, relative to the Perl interpreter's @INC search path. More significantly, however, it tells the interpreter to create and use an additional symbol table hash (a "stash") that bears the package's name.

Like all Perl hash tables, a stash contains a number of key/value pairs, which may refer to any of Perl's recognized data types as well as file handles and other stashes.

The stashes follow the naming convention for packages: the hierarchical name of the module with the elements separated by double colons (::). The interpreter maintains each module's stash separately, identified by the package name with double colons appended to it. The package, Lib::Module, for example, uses the stash Lib::Module::. Each stash can be accessed from the calling package by using the typeglob reference to the package handle. In addition, the default stash, or main::, contains references to all of the other stashes that are in effect.

In Perl terms, each hash reference is a typeglob, which refers to one of the standard data types. A reference to a client stash would simply be another hash; see Listing One(a). The default stash effectively provides a class membership to a package even if the package doesn't explicitly declare one; contains data accessible to all library modules; and keeps track of which modules have been loaded at run time. The syntax of stash typeglobs is somewhat like that of a file-handle reference, but the stashes are still hash objects. Using the exists operator, you can test for the presence of a variable name in a given stash, as in Listing One(b).

When actually looking up the elements of a symbol table, they are dereferenced through a typeglob that refers to the symbol table entry. An entry often consists of more than one data type: An array like that in Listing One(c), for example, contains data of type SCALAR, which typically will be the array's name, as well as data of type ARRAY. It is necessary to escape the colons with backslashes so that the interpreter does not treat them as operators. Also, the entry is treated as a SCALAR when simply looking up the variable name, as a HASH when retrieving the entry's data reference, and as an ARRAY when retrieving the data itself.

Symbol table typeglobs can have up to eight key values for the standard data types SCALAR, ARRAY, and HASH; see Listing One(d). A stash entry can also refer to the following: CODE, the interpreted bytecode of the program itself; IO, a file handle; PACKAGE, the name of the package that is using the data; the symbol's NAME; and FORMAT, a reference to the routine that formats the data. If not used, the values are undefined, with the exception of SCALAR, which is always present, even if it is simply an empty string.

Perl defines the type of data lexically; stronger type checking does not occur unless you're writing to the data. But this can also contribute to confusion when performing multiple dereferences, accessing variables with similar names, and resolving language syntax ambiguities.

Static Library Declarations

The standard Exporter.pm library module lets one module's routines refer to another module's data without relying on the run-time binding of data and subroutines to a particular module. Exporter.pm provides an interface for exporting variables and functions to another module; see Listing Two(a). The receiving module, Listing Two(b), can then specify which symbols it needs from the original module. Variables and subroutines named in the @EXPORT array appear in the receiving module's symbol table as if they were declared in that module. The symbol names in @EXPORT_OK get exported only if the calling module requests them.

When a Perl module processes a use <modulename> statement, it calls a subroutine named import, which is defined by default in Exporter.pm. The difference between the statements require <modulename> and use <modulename> is that the require statement imports all of the symbols of its argument into the calling module's symbol table namespace. All of the subroutine and variable names act as if they were defined in the calling module. This can lead to duplicate variable and subroutine names unless care is taken to export only those symbols that a module needs.

However, you can use a variable or function of another module by calling it statically, as in Listing Two(c). This calling convention creates another symbol table hash for ExtraModule::, if one does not already exist. The Perl interpreter then looks for the readfile() subroutine in ExtraModule's namespace, not in the namespace of the calling module.

The disadvantage of importing data statically is that the Perl interpreter can completely bypass the Exporter.pm routines, which do version and name checking. It is possible to import data from another similarly named module, subroutine, or variable without the interpreter being aware of it.

Static symbol declarations can interfere with the use of the SUPER:: keyword in programs, which use method calls and superclass data. Any subroutine or variable imported in this manner must be fully qualified. The Perl interpreter makes no attempt to look up the name in any of the existing symbol table hashes except the one specified, or in %main:: if no package name qualification is made. This is one way a package can override the interpreter's data-hiding mechanism.

In fact, in Perl terminology, a client module's symbol table is referred to as the "INNER namespace," relative to the calling module's, which is referred to conceptually as the "OUTER symbol table." Thus, a fully qualified data reference may have the name syntax of *OUTER::INNER::symbolname.

The conceptual view of Perl's module evaluation is that the client namespace of one package can be contained within the namespace of another, even though Perl's import mechanisms go to some pains to make each module believe it has its own stash. A call to a client package does not need to return, and the client package does not have a view of the calling package's stash, nor does the client package know how it was loaded.

The use statement checks for the module's availability when the program is first byte-compiled and the interpreter resolves external data references. Perl's AutoLoader and DynaLoader modules let references be resolved without actually loading the module. The success of a static symbol-name lookup depends on the called module's availability when the main program is executed. This can easily cause misleading results when examining the main program's stash, as the reference handles are lexically resolved.

Similarly, the scope of a variable declared with my extends only to the subroutine, loop, or conditional construct in which it is declared. The scope is lexical in nature, and the variable is not, under normal circumstances, visible in the stash of a client subroutine.

The require Statement and Scoping

Instead of importing data references, the require statement evaluates the called module immediately so that a complete, separate symbol table hash is created in the main:: stash. This is the method by which modules such as Tk::Browser.pm and Devel::Symdump.pm can view unrelated modules' stashes. Instead of generating private variables, the entire namespace of the evaluated module is created as a separate environment, and compile-time pragmas may be determined independently of the calling module's run-time environment.

The use of my to declare variables relaxes a "use strict vars" pragma. Otherwise, variables must be declared or statically qualified. A typeglob declaration that refers to a symbol table hash must be declared as local, so that its scope extends to a subsequently created stash; see Listing Three(a). The eval "require package" method of symbol-table creation is the same as that used by the standard base.pm module, which imports and evaluates modules when they are byte compiled, as with the use base <packagename>; statement.

To create a complete stash context, the interpreter must know which stash you're interested in. You state this with the package keyword, then you declare the package space with a use <package> keyword. In many programs, a use statement is sufficient, because Perl's AutoLoader module can load functions on demand. However, because you want to view the entire stash without waiting for a program to execute, you must load the entire package with the require keyword.

When used together, the three statements provide a method to simulate a switch to a different, possibly unrelated, package; see Listing Three(b). When examining a symbol table space with a Tk::Browser, these statements must be called in the Tk::Bowser.pm module, so that the stash of the client module is visible to the Tk::Browser stash. Conversely, when examining a symbol table, all that is necessary to avoid confusing one stash context with another is to match the stash's handle:

if( $key =~ /VERSION/) && ($package =~ /New::Package/) {

Classes and Object References

The basic mechanisms for creating objects in Perl are the bless() and tie() functions. Each creates an association between an object reference (almost always a hash variable) and a module package. Once an object is blessed, any references to the object or its hash keys is referred to the package. Objects are most commonly blessed when they are constructed. Perl constructors, like those in other other object-oriented languages, are commonly named "new," although a package is free to follow its own naming conventions. In Listing Four(a), the constructor is called as a method. It returns the object, which is now registered with Perl, as a reference to a Lib::Module object. In effect, bless creates the new variable in the called package's symbol table, instead of the calling package's. The calling package receives a reference to the newly created object, but it refers only to the called package's newly created variable.

The main program does not need to explicitly name the object's class. The reference to the object knows which package it belongs to and the program will call the appropriate function as an object method; see Listing Four(b). While the symbol table value of *m{PACKAGE} will be Tk::Browser, the value of ref( $m) will be the name of its member package — in this example, Lib::Module. Of course, you still need to specify the package that contains the constructor routine, where the bless function registers package membership. In contrast, using an unblessed reference, you still could call a function statically, as in Listing Four(c).

Another method of static module loading is via the require statement, which loads a package's symbols into the calling program's namespace.

The tie() function provides a set of function references (for example, READ, STORE, and PRINT) in the object's package. They provide a facility for user-defined functions of common assignment and retrieval functions. The exact functions a data type recognizes are described in the perltie(1) manual page (http://www.frognet.net/help/manpages/docs/perltie.html).

Inheriting Multiply

There are two main mechanisms for specifying a package's class membership: the well-documented @ISA array and the less well-known use base statement. The first specifies how the Perl interpreter should go about looking up symbols. The use base statement is invoked when the program is byte compiled, and reads another module's code into the symbol space before the main module's code is interpreted. (This is similar to the require statement.)

If you had a Tk module that exported symbols as well as derived some of its functions from the main widget class, you would need to specify both of its superclasses:

package Tk::NewWidget;
@ISA=qw(Exporter Tk::Widget);

This adds references to the Exporter:: and Tk::Widget:: stashes to the main stash. In effect, it provides a search path to look up symbol names from either module, even if the module does not implement an object class of its own. Perl does not require an object hierarchy to import modules, but may import symbol names statically.

In practice, Perl class hierarchies tend to be only one or two levels deep. The language resolves inconsistencies in external symbol name resolution by providing a UNIVERSAL abstract superclass that provides packages with default mechanisms for class membership and name resolution; namely, the can and isa functions. Either of these may be called as a class method or statically. The client module need not declare a class membership at all.

Unlike other object-oriented languages, Perl does not enforce an inheritance construct. Because even relatively low-level features of the language are available to a program script, each module can use the data suited to the application, provided that the programmer is aware of the potential issues and difficulties that this flexible approach can incur.

DDJ

Listing One

(a)
*main::MainWindow:: => 
  { InitBindings => 
*MainWindow::InitBindings 
    viewable => 
*MainWindow::viewable 
   ::_configure => 
*MainWindow::_configure 
    *etc*
  }

(b)
if( exists  ${*IO\:\:File\:\:}{VERSION} ) {
    my( $version_name, $module_version ) = 
           %{*IO\:\:File\:\:}{VERSION};
}

(c)
my ($array_key, $array_val) = %{*Module\:\:array};
my $array_name = ${*{$array_val}{SCALAR}};
my $array_contents = @{*{$array_val}{ARRAY}};

(d)
foreach ( my ($key, $val) = %{*Module\:\:} ) {
  local (*entry) = $val;
  if( defined *entry{ARRAY}) { 
    foreach( @{*entry{ARRAY}} ) {print "$_, "; }
  } 
  if( defined *entry{HASH}) { 
    foreach( my ($key_1, $val_1) = %{*entry{HASH}}) { 
      print "$key_1=>$val_1, ";
    }
  }
  if( defined *entry{SCALAR}) { 
    print ${*entry{SCALAR}}."\n";
  }

Back to Article

Listing Two

(a)
package extraModule;
require Exporter;@
EXPORT=qw(readData);@
EXPORT_OK=qw(VERSION ISA);
sub readData{ 
   ... _program code_ ...
}

(b)
package mainModule;
use extraModule qw(VERSION ISA readData);

(c)
my $extra_data = ExtraModule::readfile( $filename );

Back to Article

Listing Three

(a)
my %keylist;
# Create a new namespace for package $pkg.
unless( exists ${"$pkg\:\:"}{VERSION} ) { 
    eval "package $pkg";
    eval "use $pkg";
    eval "require $pkg"; 
}
while( my ($key, $val) = each %{*{"$pkg\:\:"}} ) {
    if( defined $val ) {
        local (*v) = $val;
        # test for $pkg to make sure we get the right stash.
    if( ($val =~ /$pkg/) && ($val =~ /VERSION/ ) ) {
        $m -> {version} = ${*v{SCALAR}};
    }
    if( ($val =~ /$pkg/) && ($val =~ /ISA/ ) ) {
        $m -> {superclasses} = "@{*v{ARRAY}}";
        }
    $keylist{$key} = $val;
    }
}

(b)
eval "package Lib::Module";
eval "use Lib::Module";
eval "require Lib::Module";

Back to Article

Listing Four

(a)
package Lib::Module;
sub new {
    my $proto = shift;
    my $class = ref( $proto ) || $proto;
    my $self = {
    children => [],
    parents => '',
    pathname => '',
    basename => '',
    packagename => '',
    version => '',
    superclasses => undef,  
    baseclass => '' };
    bless( $self, $class);
    return $self;
}

(b)
package Tk::Browser;
use Lib::Module;

my $m = new Lib::Module; 
$m -> module_info( $packname );

(c)
package Tk::Browser;
my $m = &Lib::Module::module_info( $packname );

Back to Article