A Magic Header for Starting Perl Scripts

The Perl Journal April 2003

By Péter Szabó

Péter is studying computer science and informatics at Budapest University of Technology and Economics, Hungary. His primary interests are programming languages, model verification, and computer typesetting. He can be contacted at pts@inf.bme.hu.

Most Perl scripts start with a line beginning with #! and containing the world perl (for example, #!/usr/local/ bin/perl -w). This line tells the UNIX operating system that it shouldn't treat the current file as a binary executable, but it should invoke the specified perl interpreter with the specified options, and that interpreter will take care of the Perl script. Although the concept of this first line is simple, writing it to be universally applicable tends to be very hard.

For example, specifying #! /usr/local/bin/perl -w will cause a confusing my_script.pl: No such file or directory message on many out-of-the-box Linux systems, on which perl is located in /usr/bin/. Specifying #! /usr/bin/perl -w solves the problem on those Linux boxes, but will break compatibility with most Solaris systems, on which the standard place for perl is /usr/local/bin/perl. The bad news is that there is no standard place for perl to be specified after #!. To make matters worse, some older UNIX systems impose very strict rules on what can be specified in the #! line. As a result, you may choose between:

Specifying the path that works for the majority of your user base.
Documenting the incompatibility, and politely asking the users to manually modify your scripts if there's a problem.
Writing an install script to automate the modifications. (How will the user start the install script? By typing perl install.pl, presumably.)
Finding a solution that will work anywhere.

This article describes the last option—the Magic Perl Header, a multiline solution for starting a Perl script on any UNIX operating system.

Common One-Line Pitfalls

The most obvious #! one-liners are just not good enough:

Specifying #!/usr/local/bin/perl -w -T doesn't work because some UNIX operating systems allow only a single command-line switch after #!. This one-liner won't work on Linux.
Specifying #!/usr/local/bin/perl -wT doesn't work because some UNIX operating systems expect a space after #!. This one-liner works on Linux only if /usr/local/bin/perl exists (but it usually doesn't).
Specifying #! /usr/local/bin/perl -wT doesn't work because perl might be located in /usr/bin or somewhere else, such as on Debian Linux. (Remember: The user of your script may not be educated enough to be able to find the perl binary and modify the first line of the script accordingly; and in some security configurations, they may not have the permission to do it, even when they know exactly what to change.) This one-liner rarely works on out-of-the-box Linux.
Specifying #! perl -wT doesn't work because some UNIX operating systems expect an absolute executable name (starting with /) after #!. This one-liner doesn't work on Linux.
Specifying #! /usr/bin/env perl -wT doesn't work because some systems allow only zero or one argument after the command name. (Moreover, in some systems there is a limit for the overall length of the first line—it can be as few as 32 or 64 characters.) It would be very hard to specify the -T switch from anywhere other than the command line. (The -w switch is easier: just write BEGIN{$^W=1} in front of the Perl code.) The -T switch is a security switch, and specifying it too late opens the backdoor for malicious accidents. You (the programmer) should be extremely careful here, but it is difficult because there is no place to specify the correct switches. This one-liner doesn't work on Linux.
Specifying #! /usr/bin/env perl doesn't work, either, because env might be missing or located somewhere else on some systems. This one-liner works on Linux.

Building the Magic Perl Header

It is clear that there is no single-#!-line solution to the problem in the general case, because there is no portable way to start Perl to run a script. A multiple-line solution will be necessary. In this section, I will begin to build this solution. I will identify problems and limitations along the way, and in the next section, present the final, complete magic header that will allow you to start a Perl script on any UNIX system.

The only portable beginning for a script is:

#! /bin/sh

/bin/sh is available on all UNIX systems, but it might be a symlink to any shell, including Bourne shell variants (such as Bash and ash), Korn shell variants (such as pdksh and zsh), and C shell variants (such as csh and tcsh). Many UNIX utilities, and the libc system(3) function (conforming to ANSI C, POSIX.2, BSD 4.3) rely on a working /bin/sh. So it is fairly reasonable to assume that /bin/sh exists and is a Bourne, Korn, or C shell variant. On Linux, /bin/sh is usually a symlink to /bin/bash. (On Linux install disks, it is sometimes a symlink to /bin/ash or the built-in ash of BusyBox.) On Win32 MinGW MSYS, /bin/sh is Bash, but there is no /bin/bash. On Solaris, /bin/sh is Sun's own simplistic Bourne-shell clone, and Digital UNIX also has a simple Bourne-shell clone in /bin/sh. (The line #! /bin/sh -- that is seen in many shell scripts to allow arbitrary filenames for the executable won't work here because tcsh gives an error for the -- switch.)

We can write a simple shell wrapper that will find the perl executable in $PATH and run it with the correct switches. In fact, this is the only way that this works on Win32 systems, using .bat batch files. A candidate for the solution is:

## file my_script.sh, version 1
#! /bin/sh
perl my_script.pl

## file my_script.pl
# real Perl code begins here

This has the following problems:

1. It doesn't pass command-line arguments.

2. It doesn't propagate exit() status.

3. It cannot find the Perl script on the $PATH—it will take it from the current directory, which is usually wrong, and might also present a security issue.

4. It needs two separate files.

Problems 1-3 can be overcome quite easily:

## file my_script.sh, version 2
#! /bin/sh
exec perl -S -- my_script.pl "$@"

All Bourne and Korn shells (such as GNU Bash, ash, zsh, pdksh, and Solaris /bin/sh) can interpret my_script.sh correctly. However, C shells use a different notation for "all the arguments passed to the shell, unmodified." They use $argv:q instead of "$@". The perlrun(1) manual page describes a memorable construct that detects the C shell:

eval '(exit $?0)' && eval 'echo "Korn and Bourne"'
echo All

The message "All" gets echoed on all three shell types, but only Korn and Bourne shells print the "Korn and Bourne" message. (In zsh, the result depends on the value of $?, but it won't cause a problem since zsh understands both the csh and Bourne shell constructs we use.) The trick here is that $? is the exit status of the previous command, with the initial value of 0, but $?0 in the C shell is a test that returns "1" because the variable $0 exists.

We can change echo in the C shell detection code to exec perl, and that's it:

## file my_script.sh, version 3
#! /bin/sh
eval '(exit $?0)' && exec perl -S — "$0" "$@"
exec perl -S -- "$0" $argv:q

Now we're ready to make our first wizard step: Combine my_script.pl and my_script.sh into a single file, which invokes itself using perl when run from the shell. (Forget about csh-compatibility for a moment—we'll get to that later.)

A simple attempt would be:

#! /bin/sh
eval 'echo DEBUG; exec perl -S $0 ${1+"$@"}'
if 0;
# real Perl code begins here

Unfortunately, it doesn't run the real Perl code, but it produces an infinite number of DEBUG messages. That's because Perl has a built-in hack: If the first line begins with #! and it doesn't contain the word perl, Perl executes the specified program instead of parsing the script. See the beginning of the perlrun(1) manual page for further details.

In the following simple trick, suggested by the perlrun(1) manual page, we include the word perl in the first line:

#! /bin/sh — # -*- perl -*-
eval 'exec perl -S $0 ${1+"$@"}'
if 0;
# real Perl code begins here

This fails to work on many systems, including Linux, because the OS invokes the command line (/bin/sh, -- # *-* perl -*-, ./my_script.pl), and the shell gives an unpleasant error message about the completely bogus switch.

So we can omit the first line:

eval 'exec perl -S $0 ${1+"$@"}'
if 0;
# real Perl code begins here

This solution is inspired by Thomas Esser's epstopdf utility, and it seems to work on Linux systems with both perl my_script.pl and ./my_script.pl. But we can do better. The major flaw in this script is that it relies on the fact that the operating system recognizes executables beginning with ASCII characters as scripts, and runs them through /bin/sh. On some systems, a "Cannot execute binary file'' or "Exec format error'' may occur.

Note that this script is quite tricky since the first line is valid in both Perl and Bourne-compatible shells. (It doesn't work in the C shell, but we'll solve that problem later on.)

The solution has another problem: If someone gives the script a weird filename with spaces and other funny characters in it, such as:

-e system(halt)

then the command

perl -S -e system(halt)

will be executed, which is a disaster when there is a dangerous program named halt on the user's $PATH. This problem can be solved easily, by quoting $0 from the shell, and prefixing it with -- to prevent Perl from recognizing further options.

We have two conflicting requirements for the #! line: The portability requirement is that it must be exactly #! /bin/sh; but it must contain the word perl to avoid the infinite DEBUG loop described earlier. There is no single line that can satisify both of these requirements, but what about having two lines, then running perl -x, so the OS will parse the first and Perl will find the second?

#! /bin/sh
eval 'exec perl -S -x — "$0" ${1+"$@"}'
if 0;
#!perl -w
# real Perl code begins here

The trick here is that Perl, when invoked with the -x switch, ignores everything up to #!perl. Users of nonUNIX systems should invoke this script with perl -x. UNIX users may freely choose any of perl my_script.pl, perl -x my_script.pl, ./my_script.pl, and even sh my_script.pl.

The subtle bilingual tricks in this script are worth studying. When the file is read by perl -x, it quickly skips to the real Perl code. When the file is read by the shell, it executes the line with eval: it calls perl -x with the script filename and command-line arguments. The double-quotes and $@ are shell script wizardry, so things will work even when arguments contain spaces or quotes. The -S option tells Perl to search for the file in $PATH again because most shells leave $0 unchanged (i.e., $0 is the command the user has typed in).

Although the second and the third lines contain valid no-op Perl code, Perl never interprets these lines because of the -x switch. These lines are also completely ignored by perl my_script.pl because that immediately invokes /bin/sh. However, when the user loads this script with the do Perl built-in, the second and third lines get compiled and interpreted, a harmless no-op code is run, and no syntax error occurs.

There are still deficiencies that remain:

It doesn't work in the C shell. We have already solved this earlier in this section.
It reports line numbers in error messages relative to the #!perl -w line.
It prints warnings when locale settings are invalid. (Try setting 'export LANG=invalid' in Bash before running the script to see the ugly warning messages.)

With regard to the line number problem, the do Perl built-in can be used to reread the script with a construct like this:

BEGIN{ if(!$second_run){ $second_run=1; do($0); die $@ if $@; exit } }

BEGIN is required here to prevent Perl from compiling the whole file and possibly complaining about syntax errors with the wrong line numbers. The die $@ if $@ instruction will print runtime error messages correctly. See perlvar(1) for details about $@. Unfortunately the code

BEGIN{ if(!$second_run){ $second_run=1; do($0); die $@ if $@; exit } }
die 42;

yields an extra error message "BEGIN failed—compilation aborted." This error is confusing because die 42 causes a run-time error, not a compile-time error. To get rid of the message, we should eliminate exit somehow, and tell Perl not to continue parsing the input after } }. We'll use the __END__ token to stop parsing early enough.

The locale warning is a multiline message starting with "perl: warning: Setting locale failed." Perl emits this if the locale settings specified in the environment variables LANG, LC_ALL, and LC_* are incorrect. See perllocale(1) for details. The real fix for this warning is installing and specifying locale correctly. However, most Perl scripts don't use locale anyway, so a broken locale doesn't do any harm to them.

Although Perl is a good diagnostics tool for locale problems, most of the time we don't want such warning messages, especially not in CGI (these warnings would fill the web server's log file), or some system daemon processes, when the program is prohibited from writing to stderr on normal operation. The system administrator should really fix locale settings, but that can take time. Most users don't have time to wait weeks to run a single Perl script that doesn't depend on locale anyway.

The perllocale(1) man page says that PERL_BADLANG should be set to a true value to get rid of locale warnings. Actually, PERL_BADLANG must be set to a nonempty, nonnumeric string (for example, PERL_BADLANG=1 doesn't work). So we'll set it to PERL_BADLANG=x in the shell script section. Note that this has no effect if Perl is invoked before the shell. For example, perl, perl -x, perl -S, and perl -x -S all emit the warning long before the shell has a chance to change PERL_BADLANG.

The Finished Header

Combining it all together, we have the final version of the Magic Perl Header; see Example 1. The file should have the executable attribute on UNIX systems.

This header is valid in multiple languages, so its meaning depends on the interpreter. Fortunately, the final effect of the header in all interpreters is that perl gets invoked running the real Perl code after the header. Let's see how the header achieves this:

When executed with perl, without the -x switch, Perl runs /bin/sh immediately. (/bin/sh may be any type of shell.)
Bourne and Korn shell variants interpret the file as:

#! /bin/sh —
true && eval '...; exec perl -T -x -S "$0" #{1+"$@"}' # comment
garbage

So they run perl -x.

C shell variants interpret the file as:

#! /bin/sh --
false && eval '...' ;
eval '...; exec perl -T -x -S — "$0" $argv:q'  # comment
garbage

So they run perl -x.

The backslash at the end of the second line of the header seems to be superfluous, but it isn't because csh doesn't allow the breaking of the line in the midst of the string without a backslash.
The operating system runs the file by running /bin/sh, some shell variant. This is true even for ancient systems that don't know about the #!-hack, but just treat ASCII files as shell scripts.
perl -x interprets the file as:

#!perl -w
untaint $0; do $0; die $@ if $@; __END__
garbage
So it runs the current file again, with do, not respecting the #! lines. This is a good idea to make error line numbers come out correctly.

The only way to untaint a value is regexp subexpression matching. We use it in $0=~/(.*)/s.
do $0 treats the file as:

eval 'garbage' if 0;
eval 'garbage' . q+garbage+ if 0;
# real Perl code

do $0 doesn't consult $ENV{PATH} for the location of the script (it iterates over @INC), but by the time do $0 is invoked, $0 already has the relevant component of $ENV{PATH} prepended to it if a path search was done, so @INC won't be examined here. Note that $0 may be a relative pathname, but this isn't a problem since chdir() was not called since the path search. Without the index function in the script, do would have looked at @INC and found the Perl built-in ftp.pl instead of our magic script named ftp.pl when calling perl -x ftp.pl in the current directory.
The real Perl code is compiled only once because the previous read (invoking do $0) has finished compilation at the __END__ token. The real compilaion bypasses the __END__ token because it is part of the single-quoted string q+garbage+.
Error line numbers are reported correctly because compilation occurs inside do, which ignores #!. Both compile-time and run-time errors, including manual calls to die(), are caught and reported early by the die $@ if $@ statement. Each error is reported only once because the real Perl code is compiled once.
The real code may contain any number of exit(), exec(), fork(), and die() calls, and they will work as expected. return outside a subroutine is fortunately disallowed in pure Perl, so we don't have to treat this case.
push@INC,"." is required by perl 5.8.0 -T.

So the real Perl code gets executed, even on old UNIX systems, no matter how the user starts the program. The header is suitable for inclusion into CGI scripts. (In nonCGI programs, where extreme security is not important, occurences of the -T option can be removed.)

All of the following work perfectly, without the locale warning:

DIR/nice.pl         # preferred
ash  DIR/nice.pl
sh   DIR/nice.pl
bash DIR/nice.pl
csh  DIR/nice.pl
tcsh DIR/nice.pl
ksh  DIR/nice.pl
zsh  DIR/nice.pl

The following invocations are fine:

perl -x -S DIR/nice.pl	# locale-warning
perl DIR/nice.pl	# locale-warning
perl -x DIR/nice.pl	# locale-warning
perl -x -S nice.pl	# locale-warning; only if on # $PATH, recommended on Win32
perl nice.pl		# locale-warning; only from curdir
perl -x nice.pl	# locale-warning; only from curdir
nice.pl		# only if on $PATH (or $PATH contains '.')

The following don't work, because buggy Perl 5.004 tries to run /bin/sh -S nice.pl:

perl -S nice.pl	# doesn't work 
perl -S DIR/nice.pl	# doesn't work

Of course, there is a noticeable performance penalty: /bin/sh is started each time the script is invoked. This cannot be completely avoided because PERL_BADLANG has to be set before perl gets invoked. After the shell has finished running, one line of helper Perl code is parsed (after #!perl), and the do causes five lines of helper code to be parsed. The time and memory spent on these six lines is negligible. So the only action that slows script startup is the shell. If the user sets and exports PERL_BADLANG=x, fast startup is possible by calling:

perl -x -S nice.pl
perl -x DIR/nice.pl

In a Makefile, you should write:

export PERL_BADLANG=x
goal:
    perl -x DIR/nice.pl

The command-line options -n and -p would fail with this header. This is not a serious problem because -n can be implemented as wrapping the code inside while (<>) { ... }, and -p can be changed to the wrapping while (<>) { ... } continue { print }.

Header Wizard

I've implemented a Header Wizard that automatically adds the Magic Perl Header to existing Perl scripts. The Header Wizard is available from http://www.inf.bme.hu/~pts/Magic.Perl.Header/ magicc.pl.zip. [For convenience, we have also posted this at http://www.tpj.com/source/, though downloading from the author's site guarantees that you get the most recent version. -Ed.]

The easy recipe for the universally executable Perl script:

1. Write your Perl script as usual. You may call exit() and die() as you like.

2. Specify the #! ... perl line as usual. You may put any number of options, but the -T option must either be missing or specified alone (separated with spaces). Example: #! /dummy/perl -wi.bak -T. See perlrun(1) and perlsec(1) for more information about the -T option.

3. Run magicc.pl (the Header Wizard), which will prepend an eight-line magic header containing the right options to the script, and it will make the script file executable (with chmod +x ...). (The -T option will be moved after both exec perls, and other options will be moved after #!perl because Perl looks for switches only there.)

4. Run your script with a slash, but without sh or perl on the command line. For example: ./my_script.pl arg0 arg1 arg2. After you have moved the script into $PATH, run it just as my_script.pl arg0 arg1 arg2. (This avoids the locale warnings and makes options take effect.) Should these invocations fail on a UNIX system for whatever reason, please feel free to e-mail me. As a quick fix, run the script with perl -x -S ./my_script.pl arg0 arg1 arg2.

5. Note that on Win32 systems, perl -x -S is the only way to run the script. You may write a separate .bat file that does this.

6. Tell your users that they should run the script the way described in Step 4. There is a high chance that it will work even for those who don't follow the documentation.

Conclusion

For such a widely implemented language, Perl can be suprisingly hard to invoke reliably on a variety of platforms. I hope this Header Wizard helps you to write Perl scripts that will start with a minimum of fuss on just about any system.

TPJ