Just Regular

The Perl Journal August, 2004


Regular expressions are the reason I learned Perl. The first time I saw an s// operator do its job, I was hooked. Given the mountain of data manipulation I was facing at the time, that one little operator sufficed to draw me in to the whole world of Perl. Even now, I love regexes. You might even say I love them a little too much. Looking back at some of my code, I find tools I've written that amount to little more than frameworks wrapped around my regexes. They're just data-delivery systems designed to pass various things through the real workhorses—the regular expressions that filter and transform the data. But I guess there's nothing really wrong with that.

They're hard to resist because of their conciseness. A wealth of very subtle conditional activity can be contained in a very small syntactic space. Nowhere but in a regex do you get so much bang-per-character. Under the right circumstances, one-line pattern matches can do the work of 20 if..else conditions.

Regular expressions, like Perl itself, have deep ties to the past. They give UNIX graybeards a warm, fuzzy feeling, and for those of us who didn't come from UNIX, they just make us feel clever. Quick test: Can you use the phrase "zero-width negative look-ahead assertion" convincingly in a sentence? (Talking like this will either earn you the respect of your peers or make them not want to sit with you at lunch, depending on where you work.)

But even regex junkies should remember that conciseness does not equal efficiency. Perl's regular expression engine does things under the hood that aren't obvious. These can be pitfalls for the unwary. Too much backtracking, for instance, can cause exponential drag on the speed of your program. It's important to remember that a regular expression might not always be the best tool for the job. Examine your options, and don't get too dependent on any one technique.

This month, Jeff "japhy" Pinyan shows off some regex tricks that will let you maximize the power of your patterns without tipping the scales toward inefficiency. He'll show you how to use delayed-execution assertions to build some nifty self-constructing and recursive regexes, and introduce you to a whole stable of regex variables that I'll bet you didn't know about.

Regular expressions have been both praised as a vitally important tool and criticized as overly abstruse, arcane, and dense. Some see them as a deterrent to learning Perl. That's a little unfair, since you can do a lot in Perl without ever writing a regex, but it's true that regexes have become more closely associated with Perl than with other languages, even though some of those languages have regular expression capabilities that rival those of Perl. In any case, what's true of Perl is true of regular expressions: You can learn as little or as much as you need to do the job in front of you, without digging into a bloated API reference. And for that I'm thankful.

Kevin Carlson
Executive Editor
The Perl Journal