December 1994/A Text Translation Tool for C Programmers

International Programming

A Text Translation Tool for C Programmers

R. Scott Guthrie

Scott Guthrie has been programming in C for over ten years. He received a B.S. in Computer Science from California Polytechnic State University, San Luis Obispo in 1976. Scott resides in Colorado Springs, Colorado, and is a Senior Systems Analyst for MCI Communications Corporation, Systems Engineering. Scott can be reached on the Internet at asr._scott_guthrie@mcimail.com or by phone at (719) 535-3236.

Introduction
To compete in the international marketplace, developers must accommodate international users by building multilingual support into their applications. Evolving standards — including definitions for character sets (such as ISO 10646), system functionality (POSIX), and library-level assistance (such as setlocale, available in the Standard C language library) — serve to simplify this task somewhat, but still leave the bulk of the work to the developer.
DOS, OS/2, and UNIX support the use of "country codes" for native language keyboard and text (character set display. Typically, applications that use country codes are developed specifically for a particular language. The country codes do not address the issue of text translation (substitution) between languages.
Some Windows applications developers use multiple resource files — one for each of the languages to be supported. StarView by Star Division takes this a step further: it allows a given resource file to include multiple text string definitions for each text instance. The string set (language) displayed can be configured as required. The disadvantages of using multiple resource files are:

it requires multiple copies of the application's executable (one for each language);

switching languages requires application exit, reconfiguration, and restart;

the person translating the language must be able to use the resource editor;

it does not address locale issues, such as date formats and monetary values;

it is not portable to non-Windows environments. DLLs (Dynamic Linked Libraries), which are also used in Windows to provide multilingual support, are subject to similar problems.
This article describes Xlate, a text substitution tool that addresses the problems noted above for country codes, resource files, and DLLs. Complete C language code for this tool is included, along with several sample applications that demonstrate its use in text translation and in formatting locale-specific dates and monetary values.

What Is Xlate?
Xlate is a set of C language functions that help developers present an application's output in a language-independent manner.
It was originally developed to facilitate the design and construction of applications to be used in several European countries, which use many different native languages. The target system consisted of a network of OS/2 workstations used to present selection menus and process simple user input. The applications were to be entirely text-windows oriented, so as to allow users to pick and choose functions from displayed lists and options.
The specific requirements were:

To support applications written in C.

To be portable between OS/2, DOS, and UNIX platforms.

To provide alternate keyboard and character-set support for non-English systems.

To incorporate support for alternate date, time, monetary, and other locale-related issues.

To allow for dynamic language switching (during application execution).

To enable implementation of additional languages after application development without substantial redevelopment effort.

To eliminate the need for application developers to be familiar with the languages the application would support.
As developed, Xlate meets all of these specifications. For one requirement, the ability to add languages with minimal redevelopment effort, up-front planning is necessary. Planning considerations and sample application examples are included later in this article. With respect to another requirement, that the developer not be required to know the target language, Xlate provides an added bonus: since language translation is performed externally to the application development, the language translators require no programming knowledge.
Further, even though it was not designed with the Windows environment in mind, the functionality provided by Xlate does not preclude its use in Windows applications.
The Xlate routines described and demonstrated in this article are written for DOS using Borland C v3.1. Since the code does not use special compiler features, it will easily migrate to other C language compilation environments, including UNIX.

Xlate's Functionality
Xlate is a text substitution tool. To use it, applications pass text strings through an Xlate function which performs a substitution based on a table loaded in memory.
The strings to be replaced are referred to as "Keys," while the substituted strings are referred to as "Results." Keys and Results are created by a language translator (human) as text files, which are called translation files. Translation files have the file extension .trn.
The Xlate call XlateSet establishes the translation table that will be used (until another call to XlateSet is made). For performance reasons, the Xlate system builds the translation table in memory instead of working strictly from the file. To perform a translation, the application calls Xlate's Xlate function, providing it with a Key value. The Result string is returned. The following example shows how this would look in an application:

XlateSet("HELLO"); /* Establish a Translate File */ printf("%s\n", Xlate("Hello World!"));
The output from the printf statement would be the Result string found in the translation file HELLO.TRN for the Key value of "Hello World!"
One additional user-callable Xlate function, XlateFree, releases the allocated memory for any currently loaded translation file. It is not necessary to issue this call when switching translation files because XlateSet does this automatically. Several other Xlate functions, which are used internally and are not user-callable, are discussed later in the "Detailed Xlate Function Descriptions" section.

Translation File Formats
The Xlate system supports two types of translation files. The first type is a text file which can be created with any text editor. This file provides the Key and Result translation strings for the application. It must have the .trn extension to be recognized by the Xlate system.
The second type of translate file is a binary representation of the text-based file, and must have the extension .trb. Using .trb files enhances the performance of the system. At run time, Xlate will attempt to find and load the .trb file. If Xlate can't find this binary file, it will load the text-based file (.trn) instead. You can create a .trb file from a .trn file by running a utility called SPEED_UP. You would normally use this utility after you have completed development and testing of your code. This utility is provided on this month's code disk.

Text-Based Translate File Format
In addition to supplying Key and Result string values, the textbased translate file format provides for comments and descriptive text. Square brackets ('[ ]') identify the Key and Result string values; other text is ignored. Additional formatting rules are listed in the source code (Listing 14) as comments under the function XlateGetString.
You can use comments on lines containing Key/Result pairs to describe the location and limitations for the text destination. For example, if there is room for only ten characters in the display area, you could note that in the comment area of the line in the translation file. Anyone later using this translation file as a model would thus learn of the limitation in the display area without having to refer to the actual display screen. If the comment described the text as representing a title that should be centered within 20 characters, the translator could create a centered title the first time by bracketing the translated title with the appropriate number of space characters.
A problem common to many text replacement tools is the lack of association of the Key value with the Result text. Because Xlate allows the key to include embedded blanks and does not limit its length, the application can be written more naturally and thus can be more easily understood when the programmer has to revisit the code. For example,
printf("Text Translation Tools\n"); /* Plain C code */
can be written as,
printf("%s\n",Xlate("Text Translation Tools"));
/* XLATE style */
instead of,
char str[81];
...
str = fn("Value1"); /* fn translates "Value1" to "Text
                   Translation Tools" */
printf("%s\n",str);
which represents the limitation of some methods available.
The English version of the translation file line for the above example may appear as
[Translation Tools][Translation Tools] Max. 40 chars
where the Key and the Result are identical, and the Comment field describes the text limitation of 40 characters.
Xlate's use of text files to define Key/Result string pairs makes it very easy for non-programmers to help with the actual translation when a new language is to be added to an application. And, since all translated text values exist only in the translation files, the application programmer need not know anything about the target languages.

Xlate Usage and Examples
This section presents several examples showing usage and application of the Xlate functions. The first example shows how an application can use Xlate to substitute text strings; other examples show how numeric values can be read in, date formats specified, and monetary conversions facilitated.

Simple Text Substitution
menu.c (Listing 1) represents a complete sample application. It presents the user with a menu of choices, and the user's selection determines which of the four associated translation files — b_fast.trn, lunch.trn, dinner.trn, and snack.trn, (Listing 2, Listing 3, Listing 4, and Listing 5) — will be loaded. Output from a sample execution of this application is shown in Listing 6.
This application incorporates some knowledge of the data that will be presented. As written, it depends on there being a total of four menu choices (or Quit), with each having only three translatable food items. It is often desirable, and sometimes necessary, for applications to be even further removed from data dependency. Examples later in the article demonstrate this technique.

Loading Numeric Values
In addition to direct text translation, translation files can function as a source for numeric values. This is useful for providing configuration information to the application, such as in this case, the number of menus or food items available.
For the purpose of reading an integer value into an application's variable, the translation file line may appear as

[Count][4] Integer number of menus available.
while the application code calling for the value appears as

iCount = atoi(Xlate("Count"));
The resultant integer value in the variable iCount will be 4.
This technique can be used for many other types of configuration information, such as date and monetary formats, as will be shown later.

Dealing with a Configurable Number of Entries
The data dependencies in the menu example can be eliminated by reading in the number of possible menus, and reading in the number of food items each menu contains. Reading in integer values from the translation file is useful when the design of an application must provide for an unspecified or varying number of text entries, such as List Box or Menu items.
Even though the application must be written to accommodate the possibility of a variable number of items, the actual number can be defined by the translation file without requiring later modification of the source code.
The sample code segment shown below (from Listing 7) uses this technique to display the user interface languages available. Note that the application does not contain or require knowledge of the number of possible languages in the list. This configuration information is totally defined by the translation file entries.
char LanguageString[80];   /* Language string buffer */
int LanguageCount;         /* Number of Languages */
...
XlateSet("LANGUAGE");
/* Translation file = LANGUAGE.TRN */

/* Get the number of Languages */
LanguageCount =
   atoi(Xlate("Number of Languages"));

/* Display Language list */
printf("There are %0d Languages.
       These are:\n", LanguageCount);
for(i=1; i<=LanguageCount; i++)
{
 sprintf(LanguageString,
        "Language %0d", i);
 printf("%s\n",Xlate(LanguageString));
}
For the translation file in Listing 8, the output would be as in Listing 9.

Data Formatting
Text translation does not address the problem of differences in data formatting. For example, the representation of dates and monetary values varies from country to country, and even from area to area within some countries. A combination of some fairly simple utility functions and some translation file entries allows data formats to be defined as needed. The following example shows how this is done for various date formats.

Date Formatting
The format used to represent dates varies widely. Possible date formats include:

YY/MM/DD MM/DD/YY DD/MM/YY YY.MM.DD DD.MM.YY MM-DD-YY YY-MM-DD YYDDD YYYY/MM/DD
Applications must be able to display the date in the format expected by the user, and that format is typically a function of language or region. Xlate can be used to provide this formatting functionality.
The DateFormatter function in demodate.c (not listed here, but provided on this month's code disk), accepts month, day, and year values and builds the date string according to four entries in the translation file. The four translation file lines define the sequence of the month, day, and year in the date string, and provide the format string that will define the separators.
Time values (and other locale-specific data items) can be formatted for presentation using these techniques as well.

Monetary Formatting
The only difference between the formatting of date and monetary values is the involvement of exchange rates. Because of the volatility of value exchange rates, the conversion factors must be supplied from some external source, not hard-coded in the application or the translation file.
currency.c (Listing 10) displays the value of 2.8 Bananas (one hypothetical island's form of currency) in two other hypothetical islands' currency (Coconuts and Sea Shells, respectively).
The translation file simply contains the denominations (Bananas, Coconuts, and Sea Shells). The application must obtain the conversion factors from some other data source.
This sample application does provide an interesting example of how to achieve data independence. The Currency application doesn't even know what our currency units are! The translation file (currency. trn) and sample output for the application are in Listing 11 and Listing 12.

Detailed Xlate Function Descriptions
Three of the Xlate functions are directly callable by the user and the others are used internally. A description of each function and its parameters is provided below. The full source code for Xlate is found in Listing 13 (xlate.h) and Listing 14 (xlate.c).

Function XlateSet
XlateSet clears any current translate table from memory, and establishes the translation file name passed as the new one to be loaded. This routine does not validate the file name or perform the table load.

Function Xlate
Xlate is used by applications to retrieve the Result string for a given Key value. When called, it checks to see if the translation file has been loaded into memory as a translation table. If not, it calls XlateLoad to do this.
The actual translation (substitution) is triggered by a call to bsearch, which uses the passed-in Key string as the search value, and the translation table as the subject of the search. The XlateSearchCompare function is passed to the bsearch function to define the search method.
If the search fails to find a match, the default Key is used as the Key and the search is performed for that value. (If no default Key is found either, an error string is returned in English.)
Xlate returns a pointer to the Result for the key provided, or if not found, a pointer to the the string "?".

Function XlateLoad
XlateLoad is called by Xlate when no translation table is loaded in memory.
XlateLoad first calls XlateFree to remove any existing translation table from memory. It then attempts to open a binary version (.trb) of the translation file. If XlateLoad can open the binary file it calls function XlateLoadBinary to load the translation table. If XlateLoad can't open the binary file, it attempts to open the text version (.trn) and calls function XlateLoadText if successful.
The binary file's load process is a little more straightforward than its text-based counterpart. This is because the binary file contains a record explicitly indicating the number of Key/Result pairs to be loaded, while the text file does not. So the binary load can be done in one pass, while the text load requires two passes through the text file, once to count Key/Result pairs, and once to actually load them. Also, the text load function XlateLoadText must perform a sort step (done via qsort) which is not required of XlateLoadBinary, Note that XlateLoadText sorts only the Xlate pointers; the Key and Result values are not moved.

Function XlateLoadBinary
XlateLoadBinary loads the binary version (created via the SPEED_UP utility) of the translation file. Binary translation files must be in the following format:

".TRB" as a file signature. (4 bytes)

Integer number of translate Key/Result pairs. (2 bytes)

Repeat for above number of Key/Result pairs of the following:

Integer length of Key string including NULL terminator. (2 bytes)

NULL-terminated Key string. (Length in bytes as indicated above.)

Integer length of Result string including NULL terminator. (2 bytes)

NULL-terminated Result string. (Length in bytes as indicated above.)

EOF

Function XlateLoadText
XlateLoadText loads the text version of the translation file. XlateLoadText repeatedly calls function XlateGetString to parse the file and determine the number of Key/Result lines. After it counts Key/Result lines, XlateLoadText rewinds the file and allocates memory for the translation table. XlateLoadText then reads in the file contents, allocating additional memory for each Key and Result string found. It reads the strings into the allocated memory (again via XlateGetString), and finally, sorts the pointers via qsort.

Function XlateGetString
XlateGetString extracts strings from the text version of the translation file. (XlateGetString's parsing rules are documented in Listing 14. )

Function XlateFree
XlateFree is used to free the memory allocated for the translation table and allocated to the Key and Result strings. It can be called by the user if additional memory is required by the application and translation is not immediately needed. When the Xlate function call is made later, the current translation file will be reloaded.

Function XlateSearchCompare
XlateSearchCompare is used internally by the Xlate function. It defines the search method bsearch will use to search the translation table.

Function XlateSortCompare
XlateSortCompare is used internally by the XlateLoadText function. It defines the sort method that qsort uses in sorting the entries of the translation table by key value.

Global Values
In addition to the functions defined above, Xlate internally uses three globally available values. These are:
XLATE *Xlatebase — a pointer to the translation table allocated in memory;
int XlateLines — the number of entries in the translation table, and
char XlateFile — the current translation file's name.

Speeding Up Xlate
Xlate's slowest operation is the creation of a translation table from a text-based (.trn) translation file. This operation requires the following steps:
1. Open the translation file,
2. Read and parse the translation file, counting strings and passing over comments,
3. Rewind the translation file,
4. Allocate memory for the translation table,
5. Read and parse the translation file for Key and Result strings,
6. Allocate memory for the Key and Result strings, and finally,
7. Sort the translation table.
To speed up this process, I've created a utility called SPEED_UP which performs some of these steps for an application before the application is run. SPEED_UP reads a .trn file as input, and by borrowing some of Xlate's functions, builds an image of the translation table in memory. This image is identical to the one that would have been built by the application. After building the table, SPEED_UP writes it to a binary (.trb) file. When the application runs, it will perform only the following steps:
1. Open the translation file, (binary version)
2. Read the size of the table,
3. Allocate table memory,
4. Read Key and Result string lengths and values (No parsing required),
5. Allocate memory and copy Key and Result strings.
The advantage of using SPEED_UP is faster loading of the translation table at run time. The disadvantage is more difficulty in modifying and maintaining translation files. To minimize this disadvantage during development, I've made the Xlate system smart enough to use whichever version of the translation file it can find. (However, the Xlate system will always attempt to load the binary version first.) During development, the programmer need only supply text-based translation files. As long as no binary files are available, the system will automatically use the text-based files.
The source code for the SPEED_UP is included on this month's code disk. Since SPEED_UP uses several of the Xlate functions, module xlate.obj must be included in the link step.

Conclusions
The Xlate C language function set provides a very flexible method for replacing text strings in any language where the characters are available as displayable codes (as in the IBM Extended ASCII set for PC workstation environments). Application development and code maintainability are not dependent on the programmer's knowledge of any target language. An entire application can be programmed in American English, leaving the foreign language strings external to the application development effort. In addition, those responsible for creating the foreign language text do not need to know about the application's internals or, for that matter, any aspect of programming. If appropriate care is taken in isolating display formats and if text string length maximums are taken into account, additional language translation files can be added after the application is complete without requiring any modifications. The ability to embed comments and documentation in the translation file itself helps to ensure the ability to add additional languages quickly.