Portability


UNIX 'termcap' Facility Improves Portability By Hiding Terminal Dependencies

Ronald Florence


Ronald Florence is a novelist, sheep farmer, occasional computer consultant, and UNIX addict. He can be reached at ron@mlfarm or ... {hsi,rayssd}!mlfarm!ron.

For programmers accustomed to writing for single-user systems, UNIX (and Xenix) holds some quick surprises. All those carefully optimized, hand-coded screens, the lightning-fast displays that write to the screen buffer, even "well-behaved" routines that rely on BIOS calls, are suddenly useless. Terminal displays, including the console, are treated as teletype devices under UNIX. To perform even the simplest screen display function, such as clearing the screen, the program must send the proper screen control sequence. In effect, all screen displays are comparable to using the ANSI.SYS driver under MS-DOS.

If the UNIX system had only a single terminal or if only one type of terminal were used on the system, it would be easy enough to hand-code the proper screen control sequences. Indeed, even if several different terminals are used on a system, the screen control sequences can be hand coded. For example, the function in Listing 1 could be used to clear screens.

For a closed system where most of the output is teletype format, with only simple screen display commands, your programs may not need much more.

But what if the system is not closed? What if there are outside logins using a variety of terminals? And what if you want to write screen displays that utilize a wide range of terminal capabilities, including automargins and optimized cursor motion, and make sure those displays are scaled to the size of the terminal display? And what if some of the terminals using the system require padding at certain speeds or have other quirks that make them unsuitable or tricky to use with fancy screen display programs? It is possible to keep adding options to code like Listing 1, but by the tenth terminal type, the code starts to look like linguini.

The alternative is to use the termcap and terminfo databases of screen display parameters and control sequences which are provided with most UNIX systems. Termcap, which was developed at Berkeley, uses an ASCII database; the terminfo database is compiled. A curses library of screen display and terminal input functions is supplied with both systems. Terminfo is theoretically faster; it supports many terminal capabilities which are normally not encoded into the termcap database, and the curses library supplied with terminfo has many capabilities which are not supported under termcap curses. The termcap database is substantially easier to modify, and there are ways to incorporate many of the capabilities of the terminfo curses into programs running on termcap systems. This article will discuss only termcap, which is used by Xenix and by most BSD systems.

The UNIX documentation describes the termcap routines as "low level" and the curses routines as "higher level," in much the way that troff/nroff is a low level formatting package, and the formatting macro packages (MM or MS) are high level. Actually, the analogy is not really appropriate. Curses is a screen optimization package with some convenient windowing functions. Termcap is a straightforward package of functions to access the database of screen and keyboard control sequences.

The termcap database is normally in the file /etc/termcap. Comments in the file are prefaced with a # character. All lines which do not begin with the # are considered part of the database.

Each entry in the database represents a different terminal. The entry begins with alternate names of the terminal, separated by | characters. Usually the first name listed for the terminal is a special two-character abbreviation, used by some older programs. The second name is used by most utilities, such as the editor vi. The last name listed is the full name of the terminal, and is the only name which can have blanks inserted for readability. Thus:

d1|vt100|vt-100|pt100|pt-100|dec vt100:
are the names of a DEC vt-100. If you add terminal descriptions to the termcap database, make sure that every name in your addition is unique.

The capabilities of the terminal are listed after the name, separated from one another by colons. Newlines in the entry must be escaped with a backslash. The capabilities are strings, boolean, or integers. Most are mnemonic. Boolean capabilities are true if named. Strings follow an equals sign (=). Integers follow a #. There are no spaces or tabs within capabilities or between them, and an entry carried to a second line must repeat the :. Thus:

MT|myterm|My Special Terminal:\
    bs:am:cl =\E[J:ho=\E[H:lines#24:
indicates that myterm can backspace (bs), has automatic margins (am), that there are 24 lines displayed on the screen, and gives the sequences that should be sent to clear the screen (cl) and home the cursor (ho).

Several special sequences are used to encode the strings:\E is the escape character (0x1b); ^X is "Control-X" or any other control key; \n, \r, \t, \b, and \f are newline, carriage return, tab, backspace, and form feed; \^ is ^, and \\ is \; All non-printing characters may be represented as octal escapes; the :, which is used to separate capabilities in each entry, must be entered as \072 if used in a string. Null characters can be entered as \200 because the routines that process termcap entries strip the high bits of the output late, so that \200 comes out \000.

Padding can be encoded into the strings by prefacing the string with an integer, representing milliseconds of delay. An integer and a * indicate that the delay is proportional to the number of lines involved in the execution of the command. When the * is used, the delay can be stated in tenths of a millisecond, so that 3.5* before the string for ce (clear to end of line) would mean that the command requires 3.5 milliseconds of padding for each line that is to be cleared.

Terminals which are identical to another entry with few exceptions can make use of the tc string and the @ negator.

NT|newterm|My alternate terminal:lines=25:@bs:tc=vt100:
describes a terminal with 25 lines, no backspace capability, but otherwise identical to a vt100.

One caution in using entries with tc encoding: programs with a fixed stack (such as Xenix 286) may crash when reading tc encoded entries. The cure is to make the stack larger with the -F option on the compile command line.

The cursor addressing string (cm) is coded with printf-like escapes. These are described in detail in the termcap (M) entry in the UNIX documentation.

In addition to the regular termcap capabilities, which begin with lower case letters, some UNIX systems utilize extensions. Xenix uses a variety of upper case termcap entries to indicate special PC keys: PU for Page Up, EN for End, GS for Start-Character-Graphics-Mode, and pseudo-mnemonics for eight-bit PC graphics drawing characters. GNU Emacs uses upper- case capabilities to describe terminal command sequences which are not generally used in termcap, such as AL and DL for adding and deleting multiple lines. Programs which use these extended termcap capabilities may not be portable to other UNIX systems.

The termcap library provides functions to retrieve the encoded information from the database. The termcap routines first search the environment for a TERMCAP variable. If it is found, does not begin with a slash, and the terminal type matches the environment string TERM, the TERMCAP string is read. If it begins with a slash, it is read as the pathname of the termcap database (instead of the default /etc/termcap). Using the environment variable instead of searching the database will speed up the development of new termcap entries. If your system has a tset command which supports separate TERM and TERMCAP environment entries, it will also speed the startup of programs which use termcap.

One obvious use for the termcap database is in displaying formatted text to the screen. Although there are wordprocessing programs available to run under UNIX and/or Xenix, much text processing in UNIX systems is done by using an editor (vi or emacs) to prepare the text with nroff/troff formatting codes, usually with one of the macro packages such as MM. The formatted file is then piped to a printer or type-setter, or to a screen display for proofing.

Although it is possible to prepare nroff terminal driving tables to encode the screen control sequences needed for such formatting features as bold type, italics or underlining, a different table would have to be encoded and compiled for each terminal, and the user would have to indicate the terminal type on the nroff command line:

nroff -cm -Tmyterm myfile
Also, the nroff terminal driving table format was created when daisy-wheel printers were the cutting edge of desktop hardcopy capabilities, and the coding is sometimes awkward to adapt to the capabilities of a terminal display.

For simple text formatting, it is easier to parse the default nroff output, which uses backspaces and overstrikes to generate underlined or bold characters, and use termcap to look up the appropriate standout (bold) and underline sequences. The program in Listing 2 (Bold.c), uses termcap library functions to look up the terminal screen control sequences for so and se (standout start and standout end), us and ue (underline start and underline end), and sg, which is an integer coded quantity indicating how many spaces the attribute change to standout mode requires. For terminals with multiple fonts, the switchover to italic font could be encoded in us, so that underlined text would be displayed in italics. A bold screen attribute could be encoded in so and se, so that bold text would be displayed in bold font, instead of in reverse video. Alternately, new termcap entries could be created to hold the screen control sequences for bold or italic fonts.

The termcap access functions are simple and straightforward. To parse the database, you need to allocate a buffer of 1024 characters (tbuf in Listing 2) , to hold the entire termcap entry as it is retrieved by tgetent(). This buffer must be retained through all calls to the three functions which parse capabilities: tgetstr(), tgetflag(), and tgetnum(). Another buffer (sbuf in Listing 2) should be allocated for the strings which will be retrieved by tgetstr(). This should be a static buffer. The tgetstr() function is passed the address of a pointer to this buffer. As string capabilities are read, they are placed in the buffer, and the pointer is advanced. Using a static buffer saves the overhead of allocating space for each string as it is retrieved.

The termcap library also provides a function tputs(), which correctly sends screen control sequences to the display, including any needed padding. tputs() requires a pointer to a user-supplied function which can display a single character. The function prch() (Listing 1) invokes the macro putchar(). Although it is not used here, the termcap library includes one other function, tgoto(), which uses the cm (cursor movement) string to go to a desired column and line. Because togoto() will output tabs, programs which make tgoto() calls should turn off the TAB3 bit when setting the line protocol.

The function putout() in Listing 2 is not really necessary. It is used here to check for insertions of ^G (0x7) in the text files. ^G was chosen because it passes through nroff transparently. It is used to trigger expanded font in files sent to the printer. In Bold.c, it triggers the insertion of a space between characters to simulate expanded font.

Termcap can also be used to retrieve the sequences sent by non-ASCII keys, like the arrow or functions keys. Although the termcap curses library does not use the arrow or functions keys, the keys can be added to programs which use curses for screen control by making a second set of termcap calls (curses makes it own calls to termcap), and then reading for the arrow or function key sequences in a getkey() routine (see Listing 3, keys.c).

Reading arrow keys for terminals which use a single character code for each arrow (such as ^H, ^J, ^K, ^L) is simple, but many terminals, such as the PC console, send escape-prefaced strings (ESC[A, ESC[B, etc.) when the arrow keys or other non-ASCII keys are pressed. Some systems may balk at reading strings with a simple read() system call. It is worth fiddling with the VMIN and VTIME values in structure termio if you cannot read key sequences with the code in getkey(). The values in function fixquit() in Listing 3 are a good start.

The alternative is to put the strings together out of characters read one at a time. This may be the most reliable technique for an editor or other program that reads repeated sequences of fast input characters that might be misinterpreted, such as an ESC followed by a [ and an alphabetic character, which an ANSI terminal might interpret as a screen control sequence. The trick if you are reading a character at a time is to distinguish between a lone ESC (0x1b) and an ESC sent as the first character of an escape sequence. One technique is to set a timeout alarm. If you get the characters that would constitute a key string before the timeout, return the key string, otherwise return an ESC followed by individual characters. The whole procedure takes tinkering, and fast typists can foul it up. Hence, using a read() call is simpler.

One problem that can arise with the arrow key is that ^\, the UNIX "quit" character, is used as an arrow key on some terminals. Even if the "quit" signal is disabled, the keys will still be intercepted. The easiest fix to the problem is to change the "quit" key to an impossible value. The function fixquit() does this.

The global variable ttytype is set by the curses termcap routines, which in this program are called before lookupkeys(). The ttytype could be set by a call to getenv(), as in the code for Listing 1. The header file in Listing 4 (keys.h) defines integer equivalents for the arrow and function keys; these defines can be used in switch statements. (The values given are those used in the terminfo header files.)

What termcap cannot do is to optimize screen output by cutting down the overhead of repeated cursor movement sequences. The output routines in the curses library do a fair job and are simple to use. The code for life.c in Listing 5 uses these routines along with the arrow key routines from key. c, and while the speed of output cannot compare with an optimized routine writing directly to screen memory, it is quick enough on a console or a terminal running at 19,200 baud.

Highly optimized screen output which requires even more efficiency could mean a journey into the treacherous code of screen display routines which calculate the cost of each move. One such package is the display routines in the Gosling Emacs code, which quite properly carries a dire warning to those who would venture into the tangles of the code.