Sydney S. Weinstein, CDP, CCP is a consultant, columnist, author and president of Datacomp Systems, Inc., a consulting and contract programming firm specializing in databases, data presentation and windowing, transaction processing, networking, testing and test suites and device management for UNIX and MS-DOS. He can be contacted care of Datacomp Systems, Inc., 3837 Byron Road, Huntingdon Valley, PA 19006-2320 or via electronic mail on the Internet/Usenet mailbox syd@DSI.COM (dsinc!syd for those that cannot do Internet addressing).
Administrivia
As I have said in prior columns, I am willing to forward a list of Usenet sites near you for access to Usenet, Netnews and E-mail. However, I can only provide this service to those who send a self-addressed stamped envelope. Also, please include your area code in the request. An area code gives me a greater chance of finding a site that might be a local call for you.Note, however, I do not contact these sites for permission. All I am doing is extracting the names and contact information from the Usenet mapping information and sending you that printout. It is up to you to contact the sites listed in the maps. Remember, they are doing you a favor if they let you connect.
Pearl Of The Month: Perl
One of the most respected freely distributed software authors on the net is Larry Wall of JPL-NASA. He has written many software tools including the popular netnews reader RN, the source language patching program Patch, and a software configuration and distribution support toolset Dist. His latest large effort has been Perl Practical Extraction and Report Language, or Pathologically Eclectic Rubbish Lister. Perl was first released as version 2. This review is of his new release, version 3.To quote the manual page: "Perl is an interpreted language optimized for scanning arbitrary text files, extracting information from those text files, and printing reports based on that information. It's also a good language for many system management tasks. The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal). It combines (in the author's opinion, anyway) some of the best features of C, sed, awk, sh, so people familiar with those languages should have little difficulty with it. (Language historians will also note some vestiges of csh, Pascal, and even BASIC PLUS.)"
That paragraph was true for version 2, but is an understatement for version 3. Perl can now handle binary files, network sockets, and even dbm database files with ease.
Perl runs counter to the typical UNIX tool philosophy of "do one item in a tool and hook many tools together with shell scripts and pipes." Perl allows you to combine all the sections together in one efficient script. Perl's two great claims to fame in my opinion are its ability to provide the right set of features for writing useful tools for systems, and its interpretative nature to allow for easy debugging and development.
Installation
Perl is not a small program, so if snarfed off the network it comes in many parts. Perl also, as of this writing in mid-December (Now you know, these columns have about a four month lead time) six patches have been issued to Perl, making the current version Release 3.0 Patchlevel 6. After unpacking all the parts and applying the six patch files (using Larry's patch utility, of course), the instructions say to run a shell script called Configure. It's worth obtaining Perl just to see how this is done. Configure, a giant shell script, written by the Perl program metaconfig from Larry's Dist package analyzes your system and determines what features of Perl your system can support, where things are located on your system, and a great deal of additional information to make Perl install correctly on your system automatically. If only other packages used this method (Note, Elm does).After the Configure is run, a special version of the make script automatically figures the dependencies for each C file as you have configured them and adapts the Makefile. Then the system is compiled three times, once for normal Perl, once for a version called taintperl, and lastly for a setuidperl version. The tainted versions prevent any command line argument, environment variable, or input nor any result of operations on these values from being used in subshells, system calls, or for modifying files or directories. This is used for setuid scripts.
Now, another thing I wish more authors provided: Perl has a rather complete regression test suite to validate Perl's configuration and compilation. This test suite may not catch all problems, but it goes a long way towards providing confidence that a package as large as this one was configured properly and compiled without compiler induced errors. Perl runs the regression test automatically after the system has been built, and performs over 850 separate tests.
Features
As a combination of the shell, C, sed and awk, Perl has a syntax close to C, with most of its operators, plus the ability to process variables and run subprocesses like the shell, perform pattern matching and substitution like sed, and report generation features similar to awk. A couple of the more interesting features include:Associative arrays: In addition to scalar variables (single numbers or text strings) and normal arrays (vectors of numbers or strings), Perl also provides an array concept called an associative array. This array is a mapping of tuples. Thus the array index, called a key, is itself just a number or text string. The difference between this and an array where the index is an enumerated type is that the index is dynamic and includes any values desired at run time. Thus you could say
$balls{'red'} = 7; $balls{'green'} = 34; $balls{'blue'} = 12; while (($color, $number) = each %balls) { print "I have $number $color balls\n";}Variables are preceded by a $ and the { } array indices are for associative arrays. Standard arrays use [] for their indices. The % prefix indicates the entire associative array. Thus this program initializes the associative array and then loops using the each function to return each tuple of key and value. These tuples are assigned to the scalar variables color and number and then used in the print statement. An easier way to initialize the array would be to use the list construct of Perl:
%balls = ( 'red', 7, 'green', 34, 'blue', 12');Open function: Perl's open call can also open pipes to or from other processes. Thus Perl can start other processes and either read their results (very useful for letting Perl figure out the SQL to run and then running SQL to obtain the data) or for passing Perl's output to another program (such as the print spooler). Of course, Perl can read, write, and append to files with the open function.Formats: Perl supports a BASIC-like format option for output files in addition to the print and printf constructs. The Perl program in Listing 1 converts UNIX System V type df (disk free) listings into the BSD type of report.
Listing 1 shows several of the features of Perl, as well as demonstrating the format capabilities.
The first three lines make sure that Perl is running the program, allowing a plain "executable" file to automatically be a Perl script. If the system supports the #! notation, then the kernel will spawn Perl to handle this file automatically, and not the shell. Otherwise the second line causes the shell to execute Perl on the script. When Perl does see the script, it treats the first line as a comment, and the second and third lines as a valid Perl statement. Since the variable running_via_sh is not non-zero (it isn't even defined yet), the eval statement is not executed and Perl just continues on in the script.
he array (@ is the symbol for an entire regular array) of the arguments passed to the Perl script, not counting the name of the script. Thus the join line makes a text string of all the arguments separated by spaces. This string is used in the open call to the df process, causing it to output only the requested file systems (if arguments are given), or all the file systems (if no arguments are given). Note that if the df fails, the shell | | construct is honored to allow the error message to be output when the open fails.
The formats could appear anywhere in the script. The default top of page format for the STDOUT file is called top, but that association can easily be changed. In this case the top format is used to provide column headers. Note that format continues across lines until a line with just a period is encountered, thus outputting multiple lines. Both top of page and file formats may also include variable substitutions. The STDOUT format "writes" to the standard output file and have three types, <, | , and > for left justify, center, and right justify respectively. All variable substitution formats start with a @ character and take as many spaces as are desired. Each line with @s in it is immediately followed by a line listing the variables to print on that line. It is not necessary to space variables as I did, but I think the spacing improves readability.
The while loops over the lines read from the Df file. The special symbols <> mean read a line from the file. The if block uses regular expression matching on the line just read and only performs the then clause of the if when the line contains the text string total blocks. In the else clause the s commands are string substitution, again based on regular expression mapping. These commands work on the input line by default, however the =~ operator is used a couple of lines later to specify that the substitute should be performed on the $name variable instead of the input line. The write function is used to output a line using the format specified earlier.
Finally, the last if block uses the special variable syntax $#name, which references the subscript of the highest element. Since this Perl script origins arrays at zero (the default), a less than zero check tells whether any arguments were passed to the script. As a result of this test, the total line is only printed if no arguments were passed and the df is for all file systems.
Perl scripts are also easy to debug, in part because of the debugger imbedded within Perl. Adding a -d argument to the invocation line tells Perl to run the script in debug mode. Debug mode supports breakpoints, single stepping program browsing and "immediate mode" execution of any valid Perl statement. Thus the contents of variables can be examined or changed at any time.
I didn't even describe directory processing, BSD socket access, subroutines or much on regular expression processing. Perl does come with a complete reference manual, although a tutorial manual is not provided.
Perl, of course, is most useful on UNIX (or Xenix) systems. However, restricted portions of Perl have been compiled on VMS and on MS-DOS. Perl has gotten so popular there is now a Usenet news group comp.lang.perl. But Perl is not a small program, and its load size causes a sizeable overhead at startup. Of course, for longer scripts this delay is not a problem, but the overhead is enough to keep Perl from replacing the shell for all scripts.
There's More
comp.sources.unix was active for a short while and has again gone quiet. During its active time, Rich Salz, the moderator of comp.sources.unix, did provide some unusual postings.From Harold Walters at Oklahoma State University came a set of 109 functions called xxalloc providing dynamic array manipulation in one, two and three dimensions. xxalloc includes routines for allocating, initializing, printing, renumbering and fleeing both arrays of structures and arrays of simple types. An "edge-vector" approach is used for two- and three-dimensional arrays to allow for development of reusable subroutine libraries without regard to some "maximum" dimension. The package includes installation instructions, a test program to exercise most of the package, and manual page. It has been tested on System V, BSD and MS-DOS machines and is available as Volume 20, Issue 28.
Chin Huang has written a program to automatically generate C function prototypes and variable declarations from C language source code. It differs from other similar programs in that it doesn't parse the function body. This package needs FLEX, which is also available from the archive sites. Cproto is Volume 20, Issue 29.
For those still using curses instead of bit-mapped screens, John Lupien at AT&T has published a curses-based digital clock for VT100 and compatibles. It's a small, simple program and is Volume 20, Issue 45.
Plum-Hall has placed into the public domain a simple set of benchmarks intended to give programmers timing information about common C operations. They were designed to be short enough to type while browsing at trade shows, and are protected from overly aggressive compiler optimizations. The plumbenchmarks are Volume 20, Issue 47.
David Curry at NASA Ames Research Center has posted Index, a program to allow you to maintain multiple databases of textual information, each with a different format. For each database Index allows insertion, deletion, edits on existing entries, searches using full regular expressions, restricted searches, pattern matching and arbitrary formatting. Index is Volume 20, Issue 56 and 57.
Richard O'Rourke of Microplex Systems, Ltd. provided a pegboard program which keeps track of who is in and out of the office, and when they are due back. The program is designed for Xenix, but should work on other flavors of UNIX and is in Volume 20, Issue 76.
For those running Xenix or UNIX V3.2.1, Volume 20, Issue 81 and 83 from Eric Raymond has provided an editor/minilanguage to rebind the keyboard on the console. Useful for Emacs users and for changing the virtual terminal selector keys.
One of the stranger programs in comp.sources.unix in that last spurt of postings was the "Reactive Keyboard". Mark James of the University of Calgary has augmented a general-purpose command line editor with predictive text generation. The program interfaces with a standard shell, allows simple editing of input lines, and will predict input lines based on previous input. It's weird to type an edit followed by a compile and have the command processor provide the file name for the compile, and then after you edit the file again, have it predict another compile. The Reactive Keyboard is Volume 20, Issues 29-32, but it requires BSD-style ptys to work properly.
Chip Salzenberg of AT-Engineering has posted his latest version of Deliver, a program which delivers electronic mail once it has arrived at a given machine. Deliver extends inflexible E-mail delivery systems to allow complete control over mail deliver through the use of delivery files. Delivery files are shell scripts which are executed during message delivery. These scripts control which people or programs get each E-mail message. Look for Volume 20 Issues 23-27.
Pcomm, v1.2, is a UNIX telecommunications program made to look like Datastorm Technologies ProComm for MS-DOS. New in v1.2 is BSD support, auto-login scripts (using shell scripts), imbedded external file transfer programs, and faster operation via I/O buffering. Emmet Gray from the US Army submitted this as Volume 20, Issues 67-75.
For those stuck with the old troff, and wanting to deal with printers other than the Wang C/A/T phototypesetter, Chris Lewis of Elegant Communications, Inc. has provided psroff. It converts the output of standard troff to postscript, di-troff format, and a partial attempt at HP-LJ family of printers. Several patches are also available to further enhance this package which is Volume 20, Issues 33-38.
And Still More:
comp.sources.misc
Although Rich Salz has been intermittent with postings, Brandon Allbery, the moderator of comp.sources.misc has been providing plenty to write about.In Volume 8, Issue 99, Paul Blackburn of the Open Software Foundation provided a script for keeping track of changes to files. Useful to system administrators who need to detect unwanted or unexpected changes to files.
A UNIX make work-alike, Make v1.5 was posted as Volume 8 Issues 104-106 by Greg Yachuk of Informix Software. This make is very close to the make provided on Sun systems and runs under MS-DOS or UNIX. New features include the -k, -S and -q options, supporting the $(MAKE) macro, and several bug fixes.
A 16-bit MS-DOS compress is also available. Most versions for MS-DOS cannot handle 16-bit compression tables (the default on most UNIX systems). This version can, and is based on, the Compress 4.0 UNIX sources. It requires about 400K to run. Posted as Volume 9 Issue 5 by Doug Graham.
Steve Tynor has his head in the clouds to provide us with FPLAN, a flight planning program intended for use in general aviation. It reads a file consisting of departure and destination airports, navigation aids, intermediate checkpoints, fuel consumption rates and winds aloft and produces a flight plan with wind corrected heading, fuel consumption for each leg, vor fixes for each checkpoint (Volume 9, Issues 11-16).
Lastly is popi, a program to perform interactive digital image transformations. Based on the program described in the book Beyond PhotographyThe Digital Darkroom by Gerald J. Holzmann, this implementation by Rich Burridge consists of an interactive previewer and a digital matrix transformation system. Popi can perform transformations on arbitrary images in grey scale. A sample image is included to show how to invert the grey scale (make it a negative), frame it (crop), and solarize it (fancy signal processing). Popi includes a postscript printing facility and modules to allow it to work for Amiga, Apollo, Atari, IBM PC, Kermit, MGR, NeWS, SunView, X11 and XView systems. The nine parts are Volume 9 Issues 47-55.