The Perl of Wizdom

December 1995 Dr. Dobb's Developer Update

by Ray Valdés

Ray Valdes is senior technical editor for Dr. Dobb's Journal and can be contacted at ray@valdes.com
If you look at "Help Wanted" ads, there are an increasing number of companies seeking programmers who know Perl. This demand is caused by the explosive growth of the World Wide Web. Much of the back-end processing on Web servers is done with CGI (Common Gateway Interface) scripts written in Perl. The much-talked-about Satan program, which allows system administrators (and crackers) to probe networks for security holes, is written in Perl 5 and uses a Web browser as its user interface. If you work on UNIX, you likely already know Perl. UNIX users often learn Perl through osmosis: It bears a strong resemblance to the UNIX shell. But those coming to Perl from PCs--with a background in C++, Delphi, or Smalltalk--now find Perl to be appallingly retro. The version of Perl available on most PCs seems to lack essential features found in C--data structures, lexically scoped local variables, a switch/case statement, pointers, and the like--not to mention any of the fancy new constructs in C++. This is not surprising, given that the first language Perl's designer learned was Basic-Plus.

With the versions of Perl available on PCs, you don't get much beyond the basic command line--forget about a user-friendly integrated development environment (IDE) with syntax-oriented editor, code browser, GUI design tool, and copious online help. Despite this, the language does grow on you, and has an undeniable charm; it's like going back to a warm-sounding tube amplifier after years of cold, digital sound systems. And, it turns out, Perl is surprisingly powerful and easy to use. The latest version, Perl 5, even allows object-oriented programming. The older, more widespread version of Perl (version 4.039) offers associative arrays, dynamic code execution (an eval function), and a sophisticated dataflow-tracking mechanism that plugs security holes in remotely executed programs. This version of Perl comes with a debugger (perldb) that is itself written in 595 lines of Perl.

Here, I'll look at what Perl is, how it came to be, what Perl "wizards" are doing with it, and where it's going. (For some reason, UNIX experts are called "gurus," while Perl afficionados are dubbed "wizards.")

What It Is

Perl is a general-purpose scripting language created by Larry Wall about eight years ago. Its syntax is a blend of the UNIX shell, C, awk, sed, and other familiar UNIX utilities. Perl has been described as the "Swiss-Army chainsaw" of UNIX tools. Over the years, it has become the language of choice among system administrators for quickly cobbling together small utility programs.

Increasingly, you'll find Perl used in mission-critical apps across the enterprise. Some Wall Street trading firms are relying on Perl for financial analysis (in many situations, a quick result obtained in Perl is preferable to an extremely accurate, but delayed, calculation implemented in some other language). A midwestern bank tracks its multimillion dollar mutual-fund accounts with a system that combines 10,000 lines of Perl working in conjunction with Oracle and Sybase database engines. Boeing uses Perl in transferring aircraft designs from CAD stations over to CAM machines. Silicon Graphics uses Perl in its suite of chip-layout programs. Perl formed an essential part of the tools suite used to design Nintendo's Ultra 64 chip. Not bad for a language that started out as a quick-and-dirty replacement for awk.

In 1986, Larry Wall was working at Unisys on a project that involved a number of computers on a wide-area network exchanging data that needed to be stored in different locations in a synchronized manner. For this task, he used the Usenet news protocol and the rn news reader, which he had written earlier (and which is still used by many people for reading Internet newsgroups). When the time came to create reports that distilled the reams of data, Wall found that awk couldn't deal with opening and closing multiple files based on information in those files. Rather than write a dedicated application, he created Perl (short for "Practical Extraction and Report Language").

The initial design allowed Perl to function as a data-reduction language, via facilities for scanning large amounts of text, navigating multiple files, invoking external commands to obtain dynamic data, and printing out easily formatted reports. Perl also became a convenient file-manipulation language, with facilities for file renaming, deleting, moving, and attribute changing. As the language developed, Wall added features to make Perl a useful process-manipulation language: On the appropriate operating system, you can easily create and destroy processes and control the flow of data between them. Perl also later became a powerful networking language that lets you create programs to access resources on local networks and on wide-area networks like the Internet. A number of Web servers are written in Perl. The smallest Web server, TinyHttpd, consists of 190 lines of Perl. A full-featured server, Plexus, offers multithreading and authentication in 800 lines of Perl.

Syntax

Perl syntax is an unabashed crazy-quilt of constructs borrowed from other languages. Perhaps this is because Larry Wall is a linguist by training, and presumably aware of how natural languages, like English, evolve and are cross-fertilized by words and idioms from other tongues.

Regarding Perl syntax, one novice complained: "It's sort of like sed, but not. It's sort of like awk, but not. And so on." In response, Larry Wall freely admits: "Guilty as charged. Perl is happily ugly, and happily derivative." It seems that anything that's useful for quickly creating small utilities has been incorporated into the language. As far back as 1991, Wall wrote: "In general, if you think something isn't in Perl, try it out, because it usually is." The language has grown by accretion, in response to the mundane needs of real-world tasks, rather than academic notions of elegance and orthogonality. A common slogan among Perl wizards is: There's More Than One Way To Do It. (For some reason, these slogans are always written in upper and lower case.)

Some old-school UNIX gurus complain that Perl's hodgepodge of constructs goes against the time-honored practice of building small tools that do just one thing and do it well. Wall's response: "Innocent, Your Honor. Perl users build small tools all day long."

Perl makes implicit many of the common tasks explicit in writing C programs. For example, a single flag (the -p flag) enables the following functions: opening standard input, looping through the file, reading a line at a time until the end-of-file condition is reached, modifying the current line of text, and printing the modified line to standard output.

Here are several examples. First is a program that changes all instances of "June" to "July" in a bunch of text files and creates backup copies of the files: perl -p -i.bak -e "s/June/July/g;" *.txt. Here's another one-liner that goes through some text files and prints only the first occurrence of any given line of text: perl -ne 'print unless $s($_)++' *.txt. Here's something that reads a file and increments the number that appears in the first column of text: perl -p -e "s/^(\S+)/$1+1/e" afile. Lastly, an example that reads a file and reverses the order of words in each line: perl -ne 'print join("",reverse split(/(\S+)/));' afile.

These examples all run from the command line. Obviously, it is possible to write scripts and store them in files. Space precludes presenting more than a handful of lines of code in this article. The examples here may seem cryptic, but are not really difficult, once you become familiar with a few basic concepts and with the regular-expression syntax that permeates many UNIX utilities (note, however, that Perl provides many unique extensions to the usual regular-expression syntax).

Perl has some interesting conveniences. For example, it allows for multiple assignments. The following statement assigns to the variables a, b, and c, the values 1, 2, and 3, respectively: ($a, $b, $c) = (1, 2, 3). Here's a statement that swaps the values of a and b: ($a, $b) = ($b, $a).

Certain single-character variables have special meaning--for example, $_ contains the current line that is an implicit argument to many functions, such as chop, split, and join. Another single-character variable, $$, contains the current process ID. There are two variables that contain the real and effective uids of a process. If you have a Perl script running as root, and want to change its effective uid so it runs with the more-restricted privileges of a particular user, you simply write $> = $<.

Randal Schwartz, author of a popular tutorial entitled Learning Perl (O'Reilly & Associates, 1993), writes: "Sometimes Perl looks like line noise, but to the seasoned Perl programmer, it looks like checksummed line-noise with a mission in life." For code that is truly cryptic, in multiple senses of the word, see Example 1. This example, from the cypherpunks mailing list, implements RSA encryption and decryption in just three lines of Perl. (Note that the U.S. government considers the machine-readable version of this code to be a munition, subject to export control.) But to be fair, you can probably find as many examples of obfuscated code in C as in Perl.

These examples should give you a taste of what Perl programs are like. To learn it for real, you need access to a Perl interpreter, and books such as Programming Perl, by Larry Wall and Randal Schwartz (O'Reilly & Associates, 1990), or the aforementioned Schwartz book.

Efficiency and Security

Perl is a pseudocompiled language, which means it is compiled into an intermediate binary representation at run time that is then immediately interpreted. Unlike some semi-compiled languages, Perl allows your programs to construct pieces of code and evaluate them at run time, using the eval function. This is useful, for example, when you want to specify a custom regular-expression search pattern that depends on values not known until run time. This can result in a program that runs faster, even with the additional overhead of the run-time compilation.

Perl's built-in pattern-matching and textual manipulation capabilities often outperform dedicated C programs, because Wall spent a lot of time optimizing the regular-expression code. Recently posted benchmarks show a simple Perl 4 program to be more than five times faster than the equivalent in awk, nawk, mawk, or gawk. (Note that these benchmarks assume copious amounts of RAM.) The version written in Perl 5 was almost ten times faster. Because Perl is semicompiled, it can often run more than 100 times faster than interpreted languages such as Tcl and sh. In most cases, the Perl compiler does not add appreciable delay to execution; on a Sparc 20, a 10,000-line Perl program takes about a second to compile.

Regarding security, Perl 4 offers an alternative executable version of the language, called "taintperl," which uses a dataflow-tracing mechanism to determine which data is derived from insecure sources, and thus lets your program prevent dangerous operations before they happen. In Perl 5, this facility is incorporated into the standard executable, enabled via the -T flag.

Culture Clash

A favorite diversion in comp.lang.perl.misc is to figure out ways to print out "Just Another Perl Hacker" (JAPH) in one's signature. Many different one-liners can output JAPH. The variety of alternatives provides a concrete illustration of the slogan "There's More Than One Way To Do It." The most prolific author of JAPH one-liners is Randal Schwartz (merlyn@stonehenge.com). Here's one: $_ = "Jvtu Aopuifs Pfsm Hbdlfs,"; y/a-z/za-y/; print;.

The term "hacker," of course, is used in the original sense of the word--someone who is expert at quickly divining the inner nature of complex systems from the most mundane details, and who can use this intuition to make these systems behave in transcendent or unexpected ways. The mass media confuse this term with "cracker," which means someone who breaks into computer systems, possibly for malicious or illegal purposes.

Perl is a superb tool for both sysadmins and hackers: Both are often concerned with the same things. A constant concern of both has been password maintenance. Schwartz's Learning Perl introduces the Perl language in Chapter 1 by writing a simple program to guess a secret word. Likewise, the longest program in Programming Perl, coauthored by Schwartz, is a program called "passwd," that tries to make sure a user's password is not easily crackable.

Schwartz, who worked for five years as a sysadmin for a multibillion-dollar chip manufacturer, was recently convicted of three felony counts of computer crime for running a password-cracking program on a machine he had an account on (and had served as sysadmin for) at that chip manufacturer. You can get more details on this case by sending a blank e-mail message to fund@stonehenge.com.

The details of Schwartz's case are not germane to this article, but one general conclusion we can draw from his case is that a culture clash is in the offing, as Perl moves away from the free-form hacker culture from which it sprung, and as mission-critical uses are found for this language in straight-laced corporate America.

Conclusion

Perl runs counter to the trend to which the rest of us are becoming increasingly accustomed, in which Windows squeezes out all alternative platforms, and in which most apps are available only for Windows. Perl is refreshingly secular and democratic; it runs on DOS, Macintosh, Amiga, VMS, OS/2, Windows NT, Windows 95, and various flavors of UNIX. It has spread because of its hybrid, flexible, and nondenominational design, and because the source code is freely available, under the terms of Larry Wall's "Artistic License."

Because a single person is keeper of the syntax, the code you write on one platform is highly portable to all the rest. The members of the Perl community who have ported the language to the PC and Mac platform have gone to great lengths to let Perl programs access the underlying operating environment in useful ways.

Another reason for Perl's popularity is the wealth of online resources available. The most valuable resources are human, the wizards who populate comp.lang.perl and give freely of their time to apprentices. In addition to Larry Wall and Randal Schwartz, there's Tom Christensen, Stephen Potter, Mike Stok, and maybe a hundred other people across the world. There are also a number of ftp sites devoted to Perl resources, such as at the University of Florida and metronet.com. You can find pointers to some of these resources at http://www.ddj.com/articles/1995/9562/9562a/perl.htm. You can also get this list via e-mail from ray@valdes.com.

For More Information:

Software Engineering with Perl: Prototyping & Toolsmithing for Better Software
Carl Dichter and Mark Pease
Prentice Hall, 1994
ISBN 0-130-16965-X

Programming Perl
Larry Wall and Randal L. Schwartz
O'Reilly & Associates, 1991
ISBN 0-93717-564-1

Learning Perl
Randal L. Schwartz
O'Reilly & Associates, 1993
ISBN 1-56592-042-2

60 Minute Guide to CGI Programming with Perl
Robert Farrell
IDG Books, 1995
ISBN 1-56884-780-7

Teach Yourself Perl in 21 Days
David Till
Sams Publishing, 1995
ISBN 0-67230-586-0

DDJ