A World of Text

The Perl Journal September, 2004


Just like in life, being a good citizen in the programming world requires a little work. You have to educate yourself, and find a way to bury differences so that you can unite with others in a common cause. Granted, in the programming world, these causes tend not to be of life-or-death importance (unless you happen to be coding, say, a heart monitor), but even so, making a little effort toward good programmer citizenship can help to spread some good will around the world.

Take Unicode. In the late 1980s, nearly simultaneous discussions at Xerox and Apple about the complexities of multilingual text encoding resulted, eventually, in an industry-wide collaboration on a universal standard for representing the many script systems used around the world. That standard became known as Unicode. The Xerox folks were working on a way to extend the Chinese character set for their customers in Asia; and at Apple, development of Apple File Exchange inevitably led to discussions of the same problems the Xerox team were wrestling with. Collaboration was logical, and over time, nearly every major operating-system vendor joined the effort. In this case, what was good for international communication was also good for international business.

Thankfully, the Unicode standard didn't just extend multilingual cross-compatibility to those who had enough money to be a potential customer base for the computer industry. The open and universal nature of the standard has allowed it to be extended to encompass any human language. So it truly can be a standard for the whole world.

Perl now has pretty robust support for Unicode, so there's little excuse for writing apps that aren't Unicode-aware. (See Simon Cozens's article on page 16.) So why don't more of us do just that? I suspect that some of us still suffer from a bit of monolingual myopia. Despite years of Spanish classes, I myself am shamefully monolingual. But that doesn't explain it all.

Perhaps this is just one area where Perl's quick-and-dirty simplicity can be a drawback. We often begin building a tool in Perl as a simple, quick solution to local and specific problems, and only later realize that our solution has wider applicability. We don't always then return to the design phase and rearchitect our tool as a proper application to serve the needs of that wider audience. Yet, this type of redesign is a prerequisite for widespread code reuse, which both accelerates the growth of a language and extends its usefulness.

Happily, Perl makes much of the work of supporting Unicode trivial, or even occasionally completely unnecessary. Most of the string-processing functions in Perl now work just fine on Unicode text. So check out your current crop of Perl apps—you may find that you are already Unicode-compliant and didn't even know it.

Kevin Carlson
Executive Editor
The Perl Journal