PROGRAMMER'S BOOKSHELF

Writing Localized Software

Charles Pfefferkorn

Charlie is an independent consultant and the chair of the Software Forum's International Software group. He can be contacted at charlie@crystal-media.com or 73234.2154@compuserve.com.

International markets for software are big and getting bigger. Companies like Microsoft, in fact, earn over half of their revenues outside the United States. At one time, international markets were happy receiving the previous version of newly released U.S. software. Today, however, these markets want the latest version now. The participants in these markets read the most recent editions of U.S. computer magazines and actively search the Internet for the most up-to-date information. As a result, U.S. companies are forced to reduce the time between U.S. and localized release dates, and many are releasing them simultaneously. To participate in the international market, you need to understand both its business and technical aspects.

The three books I'll examine here will help you design and develop international software. While there is some overlap, they focus on different aspects of the process. Software Internationalization and Localization: An Introduction provides an overview, covering several platforms and providing information about International Standards and various business issues. Understanding Japanese Information Processing focuses on processing Japanese text. It includes C code for converting between various Japanese character-set encoding methods and special functions for repairing Japanese text damaged by e-mail programs and newsgroup readers. It also provides access to online information about Japanese. The third book is Developing International Software for Windows 95 and Windows NT. It includes code samples, tables, figures, checklists, and troubleshooting guides. All three books provide glossaries, references to additional documents, and numerous appendices.

Software Localization

Software Internationalization and Localization, by Emmanuel Uren, Robert Howard, and Tiziana Perinotti, discusses the creation of products for international markets. The book lists over 40 separate engineering issues, including different languages, character sets, writing systems, currencies, currency formats, measurement systems, number formats, calendars, date formats, standards, legal systems, and cultures.

Western European languages, for instance, use diacritical characters and additional non-English letters. Eastern European languages include Cyrillic script and Greek letters. Asian languages use thousands of ideographic characters derived from traditional Chinese characters. Arabic and Hebrew use a bidirectional writing system when English words are included. Different languages have different capitalization, hyphenation, spelling, and grammar rules, and imply different typography. Even number formats vary: In the U.S., the decimal separator is a period and the thousand separator is a comma. In most Western European nations, it is the opposite. A less obvious difference is numeric rounding rules. Even colors, symbols, and sounds (like those of emergency vehicles and telephones) vary from culture to culture.

The book also describes issues associated with translation. Translated text is often longer then the original. Words that are different in one language may translate into the same word in another.

Software Internationalization focuses on using IBM PCs with either DOS or Windows 3.x, but the book also provides UNIX and Macintosh information. The technical information is descriptive and references are included; source code, however, is not, and some of the technical information is becoming dated.

The book also includes a chapter on non-Western European languages and a chapter on International Standards and International Standards Organizations.

Software Internationalization concludes with a discussion of international business issues: development models, business relationships, distribution channels, legal issues, logistics, government regulations, custom duties and taxes, repatriating funds, and the cost of doing international business. There is also a chapter on developing products in Europe and marketing them in the United States.

Japanese Information Processing

Japanese text is written using four types of characters: romaji (Roman characters), hiragana, katakana, and kanji. Romaji includes the standard English alphabet and numerals. Hiragana and katakana are syllabaries for Japanese sounds. Hiragana is used for grammatical words, inflectional endings for verbs and adjectives, and some nouns. Katakana is used for words of foreign origin and for emphasis. Kanji includes the characters borrowed from the Chinese over 1500 years ago.

In Understanding Japanese Information Processing, Ken Lunde carefully describes the evolution of Japanese character-set standards and their relationship to ISO character-set standards. The primary Japanese character set standards are JIS X 0208-1990 and JIS X 0212-1990. JIS X 208-1990 contains 6879 characters, of which 6355 are kanji, divided into two groups: 2965 in Level 1 and 3390 in Level 2. JIS X 0212-1990 contains 6067 characters, of which 5801 are supplemental kanji.

Lunde also describes other Asian language standards and international character sets including ISO 10646 and its subset, Unicode. In Unicode, 121,403 characters of Chinese origin (Chinese, Japanese, and Korean) are mapped into 20,902 unique characters using Han Unification rules.

Separate, but related to the character sets are the encoding methods. The three major Japanese encoding methods are JIS, Shift-JIS, and EUC. JIS is a modal system for encoding various character sets, including JIS X 0208-1990 and JIS X 0212-1990. It is used primarily for passing information between computing systems. Shift-JIS is a nonmodal modification developed by Microsoft and used by many other platforms, including Japanese PCs and KanjiTalk (the Japanese Macintosh OS). Shift-JIS supports faster internal processing, but does not support Level 2 or supplemental kanji. EUC (Extended UNIX Code) is the internal coding system used by most UNIX workstations and is defined by ISO 2022-1993. The appendices of Lunde's book also include information about Japanese corporate character sets and encoding methods.

Since the major Asian character sets are extremely large, entering characters is difficult. While kanji tablets with thousands of keys exist, other input methods for Asian languages have been developed that use combinations of software and hardware. Lunde examines these options and describes typography issues.

For some developers, the most important part of the book will be the algorithms (presented in C) for converting between different encodings, handling text streams, automatically detecting the Japanese encoding used for a text file, and repairing JIS-encoded files. These algorithms are included in the set of tools, which the author provides via the Internet.

Lunde devotes an entire chapter to Japanese text-processing tools, including operating systems, text editors, word processors, page-layout software, online dictionaries, machine-translation software, and terminal software. The chapter on using Japanese e-mail and newsgroups includes advice on how to repair files damaged by network mail programs and newsgroup readers. In the appendices, he lists professional organizations, mailing lists, and FTP sites for additional software and documents.

Developing International Software

Developing International Software for Windows 95 and Windows NT, by Nadine Kano, focuses on developing international software on Windows 95 and Windows NT. The early chapters discuss general issues associated with internationalizing and localizing software. Kano stresses the importance of planning and having written specifications that define localization requirements. She also describes Microsoft's experience in developing international software using a single team for both the domestic and international versions. Finally, Kano discusses the trade-offs Microsoft made in developing Windows 95.

Other issues covered include designing an international user interface, researching legal issues, setting up a development environment, testing, assisting translators, and coding practices.

Chapter 3 covers encoding character sets. Windows 95 uses a code-page model. For Japanese, Windows 95 uses code page 932, a Shift-JIS encoding; Windows NT uses Unicode. To produce a single code base for both Windows 95 and Windows NT, you must use generic prototypes and compiler switches. All Win32 API functions contain two entry points: one for traditional string parameters and one for Unicode string parameters.

To localize the user interface, use resource files to define pictures, strings, messages, menus, dialog boxes, and version information. Chapter 4 describes how to organize these resources and link them to your source code. Chapter 5 describes how to use Microsoft Win32 NLSAPI to support linguistic and cultural conventions such as date, time, calendar, number, and currency formats. This API also provides sorting and character-type information. Like the rest of Win32 API, NLSAPI exists in two forms (-A APIs and -W APIs). On Windows NT you can use either form, but on Windows 95 you can only use the -A forms.

Chapter 6 covers multilingual input, fonts, and multilingual text layout. Chapter 7 covers processing of Far Eastern writing systems (Chinese, Japanese, and Korean), including the use of Input Method Editors (IMEs) supported by Windows NT and Windows 95. On Windows NT 3.5, the interface to the IMEs depends on the target language. A unified API is provided by Windows NT 3.51 and Windows 95.

While many coding examples are included, you will still need to use other Microsoft reference materials, including reference manuals for Windows NT and Windows 95 and the appropriate SDKs.

Conclusion

Both Software Internationalization and Localization and Developing International Software for Windows 95 and Windows NT cover the basics of developing international products. Windows 95 and Windows NT developers will prefer the latter. Developers using other platforms will probably prefer the former, as will those interested in an introduction to the business aspects of developing international software. Both books are excellent.

Ken Lunde's Understanding Japanese Information Processing is an essential reference book for developers processing Japanese text. It will also appeal to individuals interested in the Japanese language.

Software Internationalization and Localization: An Introduction

Emmanuel Uren, Robert Howard, and Tiziana Perinotti

Van Nostrand Reinhold, 1992 300 pp., $39.95

ISBN 0-442-01498-8

Understanding Japanese Information Processing

O'Reilly & Associates, 1993 470 pp., $29.95

ISBN 1-56592-043-0

Developing International Software for Windows 95 and Windows NT

Microsoft Press, 1995 800 pp. $35.00

ISBN 1-55615-840-8