Show Me Your References
Randal L. Schwartz
I spend a lot of time (perhaps some would say "far too much") on chat channels helping out Perl beginners,
and even the occasional expert. One of the things that still amazes me is
the absolute gibberish that people come up with while trying to construct
references and then dereference them. It's like they just stomp on
the top row of the keyboard and then hand that to Perl and say "here,
interpret this".
So, I thought that this month I'd go "back
to the basics" and review the standard forms of creating references,
and then using those references by dereferencing them.
The first thing to understand about a reference is
that it fits wherever a scalar fits, except as the key of a hash. So we can
put a reference into a scalar variable, or as an element of a list, or as a
value within an array or a hash. We can also pass those lists containing
references to and from subroutines. Packages like Storable and Data::Dumper
can take complex bundles that include references and safely serialize and
restore them.
One way to create a reference is to put a backslash in
front of an existing variable or subroutine name. For example, I can create
scalar, array, hash, and subroutine references from existing things, like
so:
my $scalar_ref = \$scalar;
my $array_ref = \@array;
my $hash_ref = \%hash;
my $code_ref = \&marine;
Note that for the subroutine ("code")
reference, I must include the ampersand in front of the subroutine name.
These references could also have been placed
immediately as elements into an array or hash:
my @refs = (\$scalar, \@array, \%hash, \&marine);
my %ref_map = (
scalar => \$scalar,
array => \@array,
hash => \%hash,
code => \&marine,
);
In this case, all of $refs[2], $ref_map{hash}, and
\%hash contain the same reference to the hash %hash.
I can also create a reference to an anonymous array,
hash, or subroutine using the anonymous constructor syntax, like so:
my $array_ref = [3, 4, 5];
my $hash_ref = { first => 'Randal', last => 'Schwartz', \
login => 'merlyn' };
my $code_ref = sub { my $sum = 0; $sum += $_ for @_; return $sum };
In every respect, these references to anonymous items
act identically with the references to named items from earlier. (Note that
there is no simple syntax to create the rarely needed anonymous scalar.)
To access the original item, I need to dereference the
reference to it. In the case of a scalar, array, or hash, a dereference
lets me get at the variable to get or set its value. In the case of a code
ref, dereferencing generally invokes the corresponding subroutine.
To begin, let's look at the canonical rule of
dereferencing that will always work regardless of how the reference is
obtained. We start by taking the syntax as if references aren't
involved, such as $some_array[$element]. We then take the name of the item
out and replace it with curly braces around the expression that gives us a
reference. The simplest example is scalar access. Start with a scalar
variable:
$scalar = 42; # update
print $scalar; # access
and replace the name (scalar) with curly braces around
the thing holding the reference:
${$scalar_ref} = 42; # update via $scalar_ref
print ${$scalar_ref}; # access via $scalar_ref
${$refs[0]} = 42; # update via $refs[0]
print ${$refs[0]}; # access via $refs[0]
${$ref_map{scalar}} = 42; # update via $ref_map{scalar}
print ${$ref_map{scalar}}; # access via $ref_map{scalar}
An array has more access forms, so there are more
canonical dereferencing equivalents. Here are the non-reference versions:
@array # entire array
$array[$index] # single element of array
@array[@indices] # array slice
$#array # index of last array element
Again, the canonical rule is the same. Replace the
name with curly braces around the thing holding the reference. For
$array_ref, this looks like:
@{$array_ref} # entire array
${$array_ref}[$index] # single element of array
@{$array_ref}[@indices] # array slice
$#{$array_ref} # index of last array element
And for $refs[1] and $ref_map{array}, it looks like:
@{$refs[1]} # entire array
${$refs[1]}[$index] # single element of array
@{$refs[1]}[@indices] # array slice
$#{$refs[1]} # index of last array element
@{$ref_map{array}} # entire array
${$ref_map{array}}[$index] # single element of array
@{$ref_map{array}}[@indices] # array slice
$#{$ref_map{array}} # index of last array element
Yes, these are admittedly rather ugly. Luckily, most
of them are not common in typical Perl programs. It's important to
learn this canonical rule first though, because you can always fall back on
them when you get into trouble.
Continuing on, the hash also has a number of access
forms:
%hash # entire hash
$hash{$key} # single element of hash
@hash{@keys} # hash slice
And the rule is again the same: replace the name of
the item with curly braces around the thing holding the reference. For
$hash_ref, $refs[2], and $ref_map{hash}, this looks like:
%{$hash_ref} # entire hash
${$hash_ref}{$key} # single element of hash
@{$hash_ref}{@keys} # hash slice
%{$refs[2]} # entire hash
${$refs[2]}{$key} # single element of hash
@{$refs[2]}{@keys} # hash slice
%{$ref_map{hash}} # entire hash
${$ref_map{hash}}{$key} # single element of hash
@{$ref_map{hash}}{@keys} # hash slice
OK, now we have ugly on top of ugly. Ugly squared. I
can honestly say that I don't recall ever taking a hash slice of a
hash whose hashref came from an element of another hash. But if I did, that
last line would be how I would need to do it.
Finally, for code ref dereferencing, we're
invoking the subroutine. For the purpose of constructing the canonical
form, we'll pretend that subroutine invocations without an ampersand
are forbidden:
&marine # invoke subroutine passing current @_
&marine() # invoke subroutine with no arguments
&marine(@args) # invoke subroutine passing @args
Again, the rule is the same (see how simple this is?).
Replace the name with curly braces around the thing holding the reference:
&{$code_ref} # invoke subroutine passing current @_
&{$code_ref}() # invoke subroutine with no arguments
&{$code_ref}(@args) # invoke subroutine passing @args
&{$refs[3]} # invoke subroutine passing current @_
&{$refs[3]}() # invoke subroutine with no arguments
&{$refs[3]}(@args) # invoke subroutine passing @args
&{$ref_map{code}} # invoke subroutine passing current @_
&{$ref_map{code}}() # invoke subroutine with no arguments
&{$ref_map{code}}(@args) # invoke subroutine passing @args
And that finishes the canonical form. If this was all
there was, you could do everything you wanted with references, but
they'd be ugly.
Luckily, there are a few syntax optimizations that
actually end up applying about 90% of the time. For example, you can remove
any curly braces you introduced for dereferencing as long as the only thing
inside the braces is a simple scalar (not array or hash element, or complex
expression). That simplifies some of the items above to the following forms:
$$scalar_ref = 42; # update via $scalar_ref
print $$scalar_ref; # access via $scalar_ref
@$array_ref # entire array
$$array_ref[$index] # single element of array
@$array_ref[@indices] # array slice
$#$array_ref # index of last array element
%$hash_ref # entire hash
$$hash_ref{$key} # single element of hash
@$hash_ref{@keys} # hash slice
&$code_ref # invoke subroutine passing current @_
&$code_ref() # invoke subroutine with no arguments
&$code_ref(@args) # invoke subroutine passing @args
Also, as an optimization along a different axis, the
most common thing to do with arrays and hashes is to access a single
element, so you can replace an ugly dereferencing with an equivalent arrow
form:
${ UGLY_ARRAY_REF_EXPRESSION }[$index] # canonical form for
# array element
UGLY_ARRAY_REF_EXPRESSION->[$index] # arrow form for array
# element
${ UGLY_HASH_REF_EXPRESSION }{$key} # canonical form for hash element
UGLY_HASH_REF_EXPRESSION->{$index} # arrow form for hash element
Similarly, invoking a subroutine has an equivalent
arrow form (thanks to Chip Salzenberg under my nudging during the 5.004
release cycle):
${ UGLY_CODE_REF_EXPRESSION }(@args) # canonical form for code
# invocation
UGLY_CODE_REF_EXPRESSION->(@args) # arrow form for code
# invocation
The nice thing about these arrows is that they read
from left to right. For example, the code ref stored in $hash_map{code}
simplifies nicely:
${$hash_map{code}}(@args) # canonical form
$hash_map{code}->(@args) # arrow form
And that leads us to the final optimization. If as a
result of the previous rules for an arrow form, we end up with an arrow
between a pair of delimiters for array indices, hash keys, or subroutine
arguments, we can drop that arrow:
${$refs[1]}[$index] # canonical form
$refs[1]->[$index] # arrow form
$refs[1][$index] # reduced arrow form
${$ref_map{hash}}{$key} # canonical form
$ref_map{hash}->{$key} # arrow form
$ref_map{hash}{$key} # reduced arrow form
$hash_map{code}(@args) # reduced arrow form
And there you have it, the complete set of referencing
and dereferencing instructions for Perl version 5. For Perl 6, all the
rules are different, of course. So for now, enjoy!
Randal L. Schwartz is a two-decade veteran of the
software industry -- skilled in software design, system
administration, security, technical writing, and training. He has
coauthored the "must-have" standards: Programming Perl, Learning Perl, Learning Perl for Win32 Systems, and Effective Perl
Programming. He's also a frequent contributor to the Perl newsgroups,
and has moderated comp.lang.perl.announce since its inception. Since 1985,
Randal has owned and operated Stonehenge Consulting Services, Inc.
|