March 1994/ROMLDR, an Embedded System Program Locator

Real-Time/Embedded Systems

ROMLDR, an Embedded System Program Locator

Charles B. Allison

Charles Allison has been working with microprocessor hardware and firmware in embedded systems since 1976. He has a Bachelor of Science degree in physics and a Master of Business Administration degree. Charles has a microprocessor consulting business, Allison Technical Services, where he has been developing embedded control and monitoring products for clients since 1984. Charles can be reached through CompuServe 71005,1502, his BBS/FAX line at (713)-777-4746 or his company, ATS, 8343 Carvel, Houston, TX 77036.
MS-DOS software development tools such as C compilers and debuggers have become marvelously sophisticated and useful. They can also provide a cost-effective means for developing embedded systems. The purpose of this article is to introduce one of the aspects of using a high performance DOS-based C compiler for embedded systems, that of relocating code and data segments. I provide a program, ROMLDR, which can modify a program from the MS-DOS EXE format to a located binary file format.
ROMLDR's principle purpose is to adapt DOS-based C compiler output for use in EPROM-based embedded systems. Other uses include BIOS extensions, EPROM-based MS-DOS applications, and relocation of MS-DOS programs in PC memory, such as above the 640k MS-DOS memory limit. ROMLDR can also be used with EXE files generated by languages other than C.
ROMLDR was written using Borland C 3.1. It should be possible to use either Borland Turbo C or Microsoft C to make the ROMLDR program and to use for embedded programs. I have tested ROMLDR only with the Borland C 3.1 compiler.

Embedding DOS-Compiled Programs
To use DOS-compiled programs in EPROM-based embedded systems, you must provide several additional components, as well as resolve some unique programming issues. A number of the required components, such as the startup code, depend on the compiler, target hardware, and application. (See the sidebar "Coding for Embedded Applications" for a brief discussion on embedded code requirements.)
Once you have attended to all these details, you can program, compile, and link the various program files into an MS-DOS EXE file. However, you can't just burn your EXE file into EPROM and go. You must explicitly perform a step that MS-DOS performs implicitly when it loads an EXE file.
MS-DOS modifies programs with the EXE extent when it loads them into memory. This modification, often referred to as a fix up (also known as the locate function), consists of modifying the program's segment values for the actual memory address where it is to run. By performing fix ups, DOS can load an executable almost anywhere in real-mode memory. (DOS can load small programs with the COM extent anywhere and run them without modification.)
DOS performs fix ups by adding the file's load-address segment value to all segment addresses stored in the code and data areas of the program.
Unlike DOS programs, most embedded systems programs reside in and execute from EPROM. Embedded system locators must perform fix ups prior to placing the program in EPROM and must take into account that variables will be located in RAM, through their initialized values for startup are still in EPROM.

Locating for Embedded Systems
Before I present my program in detail, I want to outline the major parts of the location process. I will first describe the structure of a DOS EXE file, and how that structure is reflected in my code. Next, I will describe the location process.

EXE File Format
The EXE file consists of several sections, including a header, program code, data initialization values, and optional program debug information. Refer to structure EXE_HDR at the beginning of Listing 1 for the layout and definitions of the various parameters. The header section consists of several parameters which define the size of the file and the size of the header. Following these parameters is a section of fix-up far pointers.
Each of these pointers targets a location in the program code containing a segment address value that must be modified with the correct load address. These pointers are stored in standard 80x86 segment:offset format relative to the beginning of the code section. The pointers' segment values are derived from teh MAP file segment table by using the top four hexadecimal digits from the beginning address listed for each segment. (MAP files are generated by the compiler. MAP file segments consist of the top four hexadecimal digits of the beginning address.)
There are num_reloc fix-up pointers in the header section. These pointers begin at the offset off_reloc from the beginning of the header. (Note that fix-up pointers may not be sorted by address as they occur in the EXE file.) Following the header is the program's code section. This section consists of one or more separate segments, the number and type of which depend on the program and its memory model. Following the code is the initialized data section. Then comes the uninitialized data section, and finally the stack.

The Location Process
Once it has loaded a program into memory, the MS-DOS loader adds the code section's segment address to the segment value stored in each location requiring a fix up. The loader finds these locations by dereferencing each fix-up pointer in the header. MS-DOS sets the CPU's stack registers to disp_stack_seg:sp and calls the program at address rel_cs_seg:ip. (Note: For the sake of illustration, I use disp_stack_seg, sp, rel_cs_seg, and ip to represent values stored at specific offsets within the EXE header. By referring to fields of the same name in my struct, EXE_HDR, you can see where these values are stored in the header.) MS-DOS also provides some environment and header information to the loaded program through register contents.
Most, but not quite all of the information necessary to generate rommable absolute binary files already exists in the EXE file. The rest of the information must come from the segment data in the compiler's MAP file and from configuration information provided by the user in a loader configuration file.

Program Description
ROMLDR uses the linked EXE file and its MAP file to create a binary file that can be programmed into EPROMs.
Main begins by allocating a far buffer to contain program segments. This buffer should be 0xl0000 bytes in length to ensure that any size code segment can be processed. (While testing from within the Borland IDE, I had to reduce the buffer's size significantly due to memory constraints.) A simple error routine, term_error, generates error messages for a variety of potential problems, and provides for program termination.
After allocating a buffer, ROMLDR reads the configuration file (CFG) specified on the command line. This file contains the names for the EXE, MAP, and BIN output files, EPROM and RAM hexadecimal load addresses, and the class name of the first RAM segment. Table 1 shows the CFG file format. CFG file parameters must be located on separate lines and separated by spaces. On each line, ROMLDR ignores any characters occurring after the list of required parameters.

Reading the MAP File
ROMLDR executes a while loop to read and process lines of text from the MAP file. (The MAP file used with ROMLDR should be the short version, which contains only the segment table.) ROMLDR extracts memory allocation class names, plus their starting and ending addresses, and stores them in an array of structures called maptable. ROMLDR performs a simple length check using configuration variable class_loc to determine if the current line contains segment information. ROMLDR expects the line to be in a fixed column format, with the class name occurring at offset class_loc. ROMLDR converts address values to long integers, and compares class names with ram_class, a configuration variable used to define the beginning of RAM. The beginning segment address is stored in ramdata. ROMLDR currently will process a maximum of 120 segments.

Processing the Header
Once ROMLDR has acquired the MAP file, it reads the EXE header portion via function gethdr. This function first reads in 32 bytes of the header to determine the header's size and then loads the rest of the header. This function stores header information in an array named header which can contain a maximum of 10,000 bytes.
ROMLDR then sorts the fix-up pointers into ascending address order. Function sort_table uses qsort to sort the pointers. Function cmp_ptr, supplied as an argument to qsort, compares values for qsort by converting pointers from segment:offset to long integer form.
After sorting the pointers, ROMLDR performs fix ups on each segment, using a for loop to iterate through all segments. Function read_segm reads each segment from the EXE file and returns the segment size. If the segment length is non-zero, ROMLDR calls function fix_segm to run through any fix ups needed for the segment and then calls write_segm to output the processed segment to the BIN file. (A logical enhancement to ROMLDR would be to output the segments in a standard hex format, such as Intel hex format.)

HOW ROMLDR Handles EPROM and RAM
EPROM segments require different modifications than RAM segments. ROMLDR treats all segments at or below the class named ENDCODE as EPROM and treats those above ENDCODE as RAM. The location process currently terminates on reaching the last segment, the STACK class. (The BIN file, however, needs only to contain code and data segments up to the last initialized data value.)
The ENDCODE class name is special for another reason. ROM based systems typically must transfer initialized data from EPROM to RAM. Therefore, ROMLDR will modify all references to data segments to refer to the RAM locations and not to the initial values located in the EPROM. (The location for the initial values must be used in the startup code so that they can be transferred to RAM.) Segment ENDCODE is used for this purpose. I make the ENDCODE segment's length less than 16 bytes, locate it on a paragraph boundary, and ensure that the beginning data segment is also aligned by paragraph. As a result, the beginning ROM location for initialized data becomes ENDCODE+1. Since ENDCODE's address is less than the beginning of the RAM segment, it will refer to the ROM address just below the initialized data values.
ROMLDR modifies EPROM data by adding the configuration file's EPROM segment address, stored in romsadr, to the code's existing segment value. (A more sophisticated version of the program could offer the option of several user-defined addresses and the names of the classes that would reside in each.)
ROMLDR modifies RAM segments by first subtracting out the value of the first RAM segment and then adding the configuration file's RAM segment location value. This method allows the RAM locations to begin at the configuration-defined starting value. The subtraction was not necessary for code segments since they began with a zero segment value.

Example Code
Listing 2 is DEMO.C, a typical "Hello World" program with some added items to provide examples of values for several segment classes. Listing 3, DEMO.MAP, is the map file generated for DEMO.C using the example startup code in Listing 4 instead of the standard Borland startup code. Note that the _INIT_ segment contains some values which are addresses of library initialization routines that should be called in order of priority. The example startup code does not yet include this section or the interrupt vector initialization section. Examples of these can be found in your compiler's startup code. The sidebar "Startup Code for Embedded Systems" provides a discussion of startup code requirements.

ROMLDR Versus Commercial Products
ROMLDR has several shortcomings when compared to commercial locator packages. When you use ROMLDR you must provide startup code for your application; commercial products usually provide the basic startup code required as well as code solutions for a variety of problems which must be overcome in various embedded configurations. ROMLDR does not provide debugging support, but commercial products usually provide some capabilities for the debugging of application programs. Finally, vendors of commercial locator packages often provide technical support; when you use a non-commercial package such as ROMLDR you must solve all problems on your own.

Conclusions
While ROMLDR is intended primarily as a learning tool and an introduction to embedded systems, it can prove useful for some low-end applications. ROMLDR should be able to handle straightforward applications where there is one EPROM and one RAM memory space. It can easily be modified for more complex configurations, especially those which have specific fixed requirements.
Embedded systems often monitor and control equipment other than normal computer peripherals. Embedded systems programmers must be extra cautious, since bugs in their programs can place property and lives at risk. In this situation, there is no substitute for understanding both the application and the tools. Understanding how ROMLDR works may give you insight into how more complex systems operate.