Article jan2007.tar

Show Me Your References

Randal L. Schwartz

I spend a lot of time (perhaps some would say "far too much") on chat channels helping out Perl beginners, and even the occasional expert. One of the things that still amazes me is the absolute gibberish that people come up with while trying to construct references and then dereference them. It's like they just stomp on the top row of the keyboard and then hand that to Perl and say "here, interpret this".

So, I thought that this month I'd go "back to the basics" and review the standard forms of creating references, and then using those references by dereferencing them.

The first thing to understand about a reference is that it fits wherever a scalar fits, except as the key of a hash. So we can put a reference into a scalar variable, or as an element of a list, or as a value within an array or a hash. We can also pass those lists containing references to and from subroutines. Packages like Storable and Data::Dumper can take complex bundles that include references and safely serialize and restore them.

One way to create a reference is to put a backslash in front of an existing variable or subroutine name. For example, I can create scalar, array, hash, and subroutine references from existing things, like so:

my $scalar_ref = \$scalar;
my $array_ref = \@array;
my $hash_ref = \%hash;
my $code_ref = \&marine;
            
Note that for the subroutine ("code") reference, I must include the ampersand in front of the subroutine name.

These references could also have been placed immediately as elements into an array or hash:

my @refs = (\$scalar, \@array, \%hash, \&marine);
my %ref_map = (
  scalar => \$scalar,
  array => \@array,
  hash => \%hash,
  code => \&marine,
);
In this case, all of $refs[2], $ref_map{hash}, and \%hash contain the same reference to the hash %hash.

I can also create a reference to an anonymous array, hash, or subroutine using the anonymous constructor syntax, like so:

my $array_ref = [3, 4, 5];
my $hash_ref = { first => 'Randal', last => 'Schwartz', \
  login => 'merlyn' };
my $code_ref = sub { my $sum = 0; $sum += $_ for @_; return $sum };
In every respect, these references to anonymous items act identically with the references to named items from earlier. (Note that there is no simple syntax to create the rarely needed anonymous scalar.)

To access the original item, I need to dereference the reference to it. In the case of a scalar, array, or hash, a dereference lets me get at the variable to get or set its value. In the case of a code ref, dereferencing generally invokes the corresponding subroutine.

To begin, let's look at the canonical rule of dereferencing that will always work regardless of how the reference is obtained. We start by taking the syntax as if references aren't involved, such as $some_array[$element]. We then take the name of the item out and replace it with curly braces around the expression that gives us a reference. The simplest example is scalar access. Start with a scalar variable:

$scalar = 42;  # update
print $scalar; # access
and replace the name (scalar) with curly braces around the thing holding the reference:

${$scalar_ref} = 42;  # update via $scalar_ref
print ${$scalar_ref}; # access via $scalar_ref 
${$refs[0]} = 42;   # update via $refs[0]
print ${$refs[0]};  # access via $refs[0]
${$ref_map{scalar}} = 42;   # update via $ref_map{scalar}
print ${$ref_map{scalar}};  # access via $ref_map{scalar}
An array has more access forms, so there are more canonical dereferencing equivalents. Here are the non-reference versions:

@array           # entire array
$array[$index]   # single element of array
@array[@indices] # array slice
$#array          # index of last array element
Again, the canonical rule is the same. Replace the name with curly braces around the thing holding the reference. For $array_ref, this looks like:

@{$array_ref}           # entire array
${$array_ref}[$index]   # single element of array 
@{$array_ref}[@indices] # array slice
$#{$array_ref}          # index of last array element
And for $refs[1] and $ref_map{array}, it looks like:

@{$refs[1]}           # entire array
${$refs[1]}[$index]   # single element of array
@{$refs[1]}[@indices] # array slice
$#{$refs[1]}          # index of last array element

@{$ref_map{array}}           # entire array
${$ref_map{array}}[$index]   # single element of array
@{$ref_map{array}}[@indices] # array slice
$#{$ref_map{array}}          # index of last array element
Yes, these are admittedly rather ugly. Luckily, most of them are not common in typical Perl programs. It's important to learn this canonical rule first though, because you can always fall back on them when you get into trouble.

Continuing on, the hash also has a number of access forms:

%hash        # entire hash
$hash{$key}  # single element of hash
@hash{@keys} # hash slice
And the rule is again the same: replace the name of the item with curly braces around the thing holding the reference. For $hash_ref, $refs[2], and $ref_map{hash}, this looks like:

%{$hash_ref}        # entire hash
${$hash_ref}{$key}  # single element of hash
@{$hash_ref}{@keys} # hash slice

%{$refs[2]}        # entire hash
${$refs[2]}{$key}  # single element of hash
@{$refs[2]}{@keys} # hash slice 

%{$ref_map{hash}}        # entire hash 
${$ref_map{hash}}{$key}  # single element of hash 
@{$ref_map{hash}}{@keys} # hash slice
OK, now we have ugly on top of ugly. Ugly squared. I can honestly say that I don't recall ever taking a hash slice of a hash whose hashref came from an element of another hash. But if I did, that last line would be how I would need to do it.

Finally, for code ref dereferencing, we're invoking the subroutine. For the purpose of constructing the canonical form, we'll pretend that subroutine invocations without an ampersand are forbidden:

&marine        # invoke subroutine passing current @_
&marine()      # invoke subroutine with no arguments
&marine(@args) # invoke subroutine passing @args
Again, the rule is the same (see how simple this is?). Replace the name with curly braces around the thing holding the reference:

&{$code_ref}        # invoke subroutine passing current @_ 
&{$code_ref}()      # invoke subroutine with no arguments
&{$code_ref}(@args) # invoke subroutine passing @args 

&{$refs[3]}         # invoke subroutine passing current @_
&{$refs[3]}()       # invoke subroutine with no arguments 
&{$refs[3]}(@args)  # invoke subroutine passing @args 

&{$ref_map{code}}        # invoke subroutine passing current @_ 
&{$ref_map{code}}()      # invoke subroutine with no arguments 
&{$ref_map{code}}(@args) # invoke subroutine passing @args 
And that finishes the canonical form. If this was all there was, you could do everything you wanted with references, but they'd be ugly.

Luckily, there are a few syntax optimizations that actually end up applying about 90% of the time. For example, you can remove any curly braces you introduced for dereferencing as long as the only thing inside the braces is a simple scalar (not array or hash element, or complex expression). That simplifies some of the items above to the following forms:

$$scalar_ref = 42;  # update via $scalar_ref 
print $$scalar_ref; # access via $scalar_ref 

@$array_ref           # entire array 
$$array_ref[$index]   # single element of array 
@$array_ref[@indices] # array slice 
$#$array_ref          # index of last array element 

%$hash_ref        # entire hash 
$$hash_ref{$key}  # single element of hash 
@$hash_ref{@keys} # hash slice 

&$code_ref        # invoke subroutine passing current @_ 
&$code_ref()      # invoke subroutine with no arguments 
&$code_ref(@args) # invoke subroutine passing @args 
     
Also, as an optimization along a different axis, the most common thing to do with arrays and hashes is to access a single element, so you can replace an ugly dereferencing with an equivalent arrow form:

${ UGLY_ARRAY_REF_EXPRESSION }[$index]  # canonical form for               
                                        # array element
UGLY_ARRAY_REF_EXPRESSION->[$index]     # arrow form for array              
                                        # element

${ UGLY_HASH_REF_EXPRESSION }{$key}     # canonical form for hash element

UGLY_HASH_REF_EXPRESSION->{$index}      # arrow form for hash element
Similarly, invoking a subroutine has an equivalent arrow form (thanks to Chip Salzenberg under my nudging during the 5.004 release cycle):

${ UGLY_CODE_REF_EXPRESSION }(@args) # canonical form for code                  
                                     # invocation
UGLY_CODE_REF_EXPRESSION->(@args)    # arrow form for code     
                                     # invocation
The nice thing about these arrows is that they read from left to right. For example, the code ref stored in $hash_map{code} simplifies nicely:

${$hash_map{code}}(@args) # canonical form
$hash_map{code}->(@args)  # arrow form
And that leads us to the final optimization. If as a result of the previous rules for an arrow form, we end up with an arrow between a pair of delimiters for array indices, hash keys, or subroutine arguments, we can drop that arrow:

${$refs[1]}[$index] # canonical form
$refs[1]->[$index]  # arrow form
$refs[1][$index]    # reduced arrow form
 
${$ref_map{hash}}{$key} # canonical form
$ref_map{hash}->{$key}  # arrow form
$ref_map{hash}{$key}    # reduced arrow form 

$hash_map{code}(@args)  # reduced arrow form
And there you have it, the complete set of referencing and dereferencing instructions for Perl version 5. For Perl 6, all the rules are different, of course. So for now, enjoy!

Randal L. Schwartz is a two-decade veteran of the software industry -- skilled in software design, system administration, security, technical writing, and training. He has coauthored the "must-have" standards: Programming Perl, Learning Perl, Learning Perl for Win32 Systems, and Effective Perl Programming. He's also a frequent contributor to the Perl newsgroups, and has moderated comp.lang.perl.announce since its inception. Since 1985, Randal has owned and operated Stonehenge Consulting Services, Inc.