SPELUNKING MS-DOS: DOCUMENTING THE UNDOCUMENTED

Scott Robert Ladd

Scott Ladd is a freelance computer journalist and the C language columnist for Micro Cornucopia. He can be reached at PO. Box 61425, Denver, CO 82026.

Spelunking -- the art of exploring caves -- requires a sense of adventure, a willingness to follow hunches, and a knowledge of how caves are constructed. Often the spelunker is entering uncharted territory, and must explore dead ends and dangerous areas. Finding the cavern filled with glistening crystals makes all the effort worthwhile.

Exploring the undocumented internals of MS-DOS is a process similar to spelunking. Debuggers and disassemblers are the tools, and working P?5 utilities are the mountains containing the caves to be explored. You begin with a guess as to where you might find something interesting, and then follow a chain of logic from there. Sometimes you hit a dead end and have to back up and try a different path. Often, though, you find a cave filled with wonders -- a function or feature of value.

This article details several useful, and sometimes essential, undocumented DOS functions and features I have uncovered while spelunking. Although I have confidence in my information, and have used these functions in my own programs, I recommend caution in using them. There are reasons why features are undocumented. In any version of MS-DOS some functions may disappear, or their behavior may change. Some OEM versions of MS-DOS may work differently than others. Using undocumented information can at times be dangerous, allowing you to manipulate or change certain aspects of the MS-DOS environment, which could lead to data loss or other problems.

So much for the obligatory warnings. In the text, I use the term function to refer to an MS-DOS service invoked through the INT 0x21 interrupt, and an interrupt is a specific MS-DOS software interrupt. I assume that you're familiar with MS-DOS, the Intel 80x86 family of microprocessors, and an assembler or a high-level language, which allows the programming of software interrupts (such as Turbo Pascal or C).

All of this information works in DOS, Versions 2.11, 3.10, 3.21, and 3.30. Because some DOS utilities (such as PRINT) use these capabilities, you can hope that things won't change with later versions of MS-DOS.

The Program Segment Prefix

Every time DOS loads a program, it creates a data structure called the program segment prefix (PSP). This is a 256-byte area located in memory just below the actual program. The PSP contains the program's command-line arguments, file tables, and other associated information. Although the PSP itself is documented, many of its fields are not. Table 1, page 57, shows the format of the PSP, including those undocumented fields I currently know of.

Table 1: The PSP format

Field      Byte     Byte
No.       Offset   Length      Field Name

 1         0x00      2         CP/M exit
 2         0x02      2         Memory size
 3         0x04      1         Unknown
 4         0x05      5         Function dispatcher call
 5         0x0A      4         Saved DOS tenninate vector
 6         0x0E      4         Saved CTRLBREAK vector
 7         0x12      4         Saved critical error vector
 8         0X16      2         Parent PSP segment (undocumented)
 9         0x18     20         Local file handle table (undocumented)
 10        0x2C      2         Local environment segment
 11        0x2E      4         Local stack storage (undocumented)
 12        0x32      2         File handle table size (undocumented)
 13        0x34      4         Segment offset of local file handle table
                               (undocumented)
 14        0x36     24         Unknown
 15        0x50      3         MS-DOS function call
 16        0x53     10         Unknown
 17        0x5C     16         Defauft file control block 1
 18        0x6C     16         Default file control block 2
 19        0x7C      4         Unknown
 20        0x80      1         Command tail length
 21        0x81    127         Command tail

Let's look at each of the known fields individually:

Field 1 (CP/M exit) --This is one of several holdovers from the venerable CP/M operating system. Any program which wishes to exit can do so by jumping to offset 0 of the PSP. This is not a recommended way for a program to exit; use function 0x4C of INT 0x21 instead.

Field 2 (Memory size) --This number represents the number of paragraphs (16-byte chunks) of memory available to the program. It can be handy to check this number if you have a memory intensive program, just to be sure you have enough room.

Field 4 (Function dispatcher call) A FAR CALL to the MS-DOS function dispatcher, normally accessed through a software interrupt (INT 0x21).

Fields 5 through 7 (Saved interrupt vectors) --MS-DOS stores the values of these important interrupts when a program starts up, and then restores them from this area when the program exits. This is insurance in case the program modifies these vectors without restoring them later.

Field 8 (Parent PSP segment) --Here we find the PSP segment of the parent program, which is normally COMMAND. COM.

Field 9 (Local file handle tabe) If you've ever wondered why a program is limited to only 20 open files, this area explains it. This table contains 20 1-byte entries, each of which is an index into the internal MS-DOS file handle table. A value of 0xFF in the local table indicates that that file has not been opened. The first five entries are assigned the default standard input, standard output, standard error, standard aux, and standard printer handles. This explains how MS-DOS does I/O redirection -- by placing the "internal" handle of a file in, say, standard input, rather than the handle which represents the CRT. Note that you can (carefully) change these values, doing your own I/O redirection from inside your program.

Field 10 (Local environment segment) -- This word is used to find the local copy of the environment, which MS-DOS creates when starting the pre gram.

Field 11 (Local stack storage) When an MS-DOS interrupt is invoked, the stack segment and pointer of the program are stored here, while MS-DOS switches to an internal stack. When the interrupt is done, the program's stack information is restored.

Field 12 (File handle table size) Contains the total number of entries of the local file handle table.

Field 13 (Address of local file handle table) --Changing this value seems to have no effect in MS-DOS prior to Version 3.30, which has a documented function allowing you to expand the size of the local file handle table. That function (0x67) allocates a new memory block (to hold the new table) via function 0x48, and, if successful, stores the address of that block here.

Field 15 (MS-DOS function call) Yet another way of getting to the MS-DOS function dispatcher, these three bytes contain the instructions INT 0x21 and FAR RET.

Fields 17 and 18 (Default file control blocks (FCBs]) --These are remnants of CP/M compatibility, and are virtually useless. The first two command line parameters are parsed into this area under the (usually false) assumption hat they are file names. Apparently the undocumented areas on either side of these FCBs are used by the extended versions of FCBs. Starting with MS-DOS, Version 2.x, the only real use of FCBs is to access special files, such as volume label.

Field 20 (Command tail length) contains the length of the command tail.

Field 21 (Command tail) --The command tail is a modified version of the command line used to start the program. It begins with the first space after he program name and contains every hing up to and including the carnage return. Any I/O redirection operations for example, > output.dat) are removed. It is terminated by a null (0 byte). Although many programming languages (most notably C) automatically parse the command line for you, it is sometimes necessary to parse the command tail yourself for special purposes.

TSR Functions

Everyone these days seems to be writing terminate-and-stay resident (TSR) utilities. These programs range from simple time-display utilities up to system tyrants, such as SideKick Plus. I've written several myself, and have found that well-behaved TSRs must use undocumented MS-DOS features.

A well-behaved TSR is one which gracefully and cooperatively co-exists with MS-DOS, other TSRs, and regular applications. Unfortunately, Microsoft has never published the complete details of how to go about writing such a program.

Many TSRs are designed to be "popup" utilities, which open a processing window whenever a special "hot key is pressed. Such programs provide a limited form of multiprocessing, wherein the user can call up a TSR function (such as a spelling checker or notepad) from within any application. But MS-DOS is not re-entrant; that is, it cannot be interrupted while performing a critical operation such as writing to the disk. Because a TSR is activated by an asynchronous event (such as a timer tick or a keystroke), it is vital for the TSR to know when it can't interrupt MS-DOS.

MS-DOS maintains a special undocumented variable known popularly as the INDOS flag. When MS-DOS is doing something critical, It increments this one-word flag. Thus MS-DOS can be safely interrupted when the flag is zero. The location of this flag can be found through function 0x34, which has these particulars:

Function 0x34: Get INDOS Flag Address

     Interrupt:0x21      Function:0x34      AH = 0x34

Registers upon return:

     ES = segment of INDOS flag      BX = offset of INDOS flag

Using EX:BX as a pointer, a program can check that it is safe to interrupt MS-DOS. Not only is this useful for TSRs, but it is also necessary for multitasking programs.

One complexity with using the INDOS flag is that it is set to a value of 1 while MS-DOS is waiting at the command prompt. Obviously MS-DOS isn't really doing anything --it's just waiting for input. Luckily we have another undocumented facility at our disposal: the idle interrupt.

Whenever MS-DOS gets bored and has nothing else to do, it invokes an INT 0x28. Programs can chain to this interrupt, and know when they receive it that MS-DOS is not doing anything important. When I write a TSR that is activated from the keyboard, I capture both the hardware keyboard interrupt (INT 0x09) and INT 0x28. When I get the INT 0x09, I examine the INDOS flag to see if DOS is busy before I do anything. Whenever I see an INT 0x28, ignore the INDOS flag and just look to see if the hot key has been pressed.

MS-DOS tracks (internally) pieces of information about the currently active program. When a TSR is activated, it is necessary to change MS-DOS's information so that the TSR is not using the resources of the currently active application.

One resource of particular interest is the program segment prefix, discussed earlier. MS-DOS assumes that the program most recently loaded is the currently active program and stores this PSP in an internal variable. For example, when you open a file, MS-DOS uses the handle table in the PSP of the most recently executed program.

This can cause some subtle problems when using TSRs. When a TSR is activated, we need to save the current PSP, and tell MS-DOS to use the PSP of the TSR. Otherwise, when the TSR uses a PSP resource (such as in opening a file), it could corrupt the PSP of the program most recently loaded. This could fill up the most recent application' s file handle table or crash its stack.

There are ways around this problem. MS-DOS function 0x50 sets the current PSP, and functions 0x51 and 0x62 retrieve the PSP of the current program. These functions are accessed as follows:

Function 0x50: Set Current PSP      Interrupt: 0x21      Function:0x50

Registers on entry:      AH = 0x50      BX = Segment of the PSP to be made current

Registers on return:      None

Functions 0x51 and 0x62: Get PSP Segment

     Interrupt:0x21

Function: 0x51 (undocumented, MS-DOS 2.x and later)           0x62 (documented, MS-DOS 3.x and
later)

Registers on entry:      AX=0x51

Registers on return:      BX = Current PSP segment

One of the first acts of a TSR during its initialization should be to retrieve and store internally the segment of its own PSP. Some high-level languages (for example, Turbo Pascal and most C compilers) have a global variable which is automatically set to the segment of the PSP. Either way, the segment should be retrieved before the TSR becomes resident.

Why are there two functions used to retrieve the current PSP? Well, function 0x51 existed beginning with MS-DOS 2.0 and had a serious bug when used with an INT 0x28 handler (it did not restore the stack properly). Function 0x62 was introduced with MS-DOS 3.0 and does not have the bug. Why Microsoft didn't just fix the bug with 0x51, rather than adding a new function, is anybody's guess.

Conclusion

I've covered some of the more useful and interesting undocumented (or poorly documented) functions and features of MS-DOS. The PSP is an integral part of a program, and proper use of its resources can enhance and extend the performance of a program.

As for the TSR information I presented earlier, it seems odd that Microsoft would document how to make a program resident, but not how to make it perform in a well-behaved manner! If you're writing a TSR, you can use this information to avoid the numerous pitfalls involved in designing memory-resident programs.

Bibliography

MS-DOS 2.1 Programmer's Reference. Redmond, Wash.: Microsoft, 1983.

Duncan, Ray. Advanced MS-DOS. Redmond, Wash.: Microsoft Press, 1986.

Ladd, Scott Robert. "A Turbo TSR." BYTE 13(7) (July 1988): 301-304.

Young, Michael J. Performance Programming Under MS-DOS. Alameda, Calif.: Sybex, 1987.