Examining OPTLINK for Windows

Linker optimizations that increase speed while reducing size

Matt Pietrek

Matt, author of Windows Internals, is a programmer at Nu-Mega specializing in debuggers and file formats. He can be reached on CompuServe at 71774,362.

Third-party development tools intended to replace and enhance the standard development environment tools can greatly enhance the productivity of DOS and Windows programmers. Yet, there can be pitfalls when choosing to replace standard machinery. For example, if you're a Borland C++ developer using the standard Turbo Debugger for Windows (TDW), you can reasonably expect full technical support from Borland when a debugging problem arises. However, if you've replaced TDW with, say, Symantec's Multiscope debugger, who do you call in the event of a problem? At best, the product support will be fragmented. At worst, both companies may point fingers in the other direction leaving you somewhere in the middle.

Also, the executable file format or debug specification that's in vogue today may be obsolete tomorrow. If you commit to using a third-party tool that doesn't keep up with the latest industry standards, you're stuck. Therefore, there has to be a compelling reason for a user to switch to a new set of tools. A speed increase of 10 percent or a file size reduction of 2 percent may not be enough to convince you to give up the security of the programs you already use. Instead, a third-party tool not only has to provide compatibility with your current tools, it also needs to offer significant advantages. I've put OPTLINK for Windows version 4.01, from SLR Systems to the test to see if it meets these criteria.

What is OPTLINK?

OPTLINK for Windows is intended as a drop-in replacement for Microsoft's LINK.EXE and Borland's TLINK.EXE. OPTLINK runs from the DOS command line, and generates DOS executables, as well as DLLs and executables for Windows and OS/2 1.x. It does not generate OS/2 2.x LX format files, nor the PE format files used by Win32 operating systems such as Windows NT. However, SLR has indicated that it intends to support PE format files soon.

OPTLINK performs all the standard optimizations that LINK and TLINK perform, including far call translation, segment packing, and fixup chaining. Far call translation occurs when the linker sees a far call instruction to a procedure that's in the same code segment. For example, given a call of the form call far ptr xxxx:yyyy where xxxx is the same as the current code segment, the linker can replace that one instruction with:

NOP
  PUSH   CS
  CALL   NEAR PTR YYYY

This second sequence is both faster to execute because it avoids a costly segment register load, and avoids the need for a fixup in the .EXE or .DLL file, thus shrinking the file size and speeding up load time.

Segment packing occurs when the linker takes segments of the same class and concatenates them together. For instance, if you were using the medium or large memory models, and had files A.C, B.C, and C.C, the resulting code segments in the .OBJs would be A_TEXT, B_TEXT, and C_TEXT. Without segment packing, the linker would produce three separate code segments in the .EXE. While not really a problem for DOS executables, in Windows this wastes space in the file and forces Windows to use more selectors when it loads the program. In addition, segment packing affords the linker additional opportunities to perform far call translations, saving even more space.

Fixup chaining is a method of compressing the load-time relocation information in NE format files. (NE files are Windows and OS/2 1.x files.) To give an example, consider a program that makes 20 calls to the Windows BeginPaint() API. Without fixup chaining there would be 20 fixups referring to BeginPaint() in the .EXE. Each fixup is eight bytes in length, so the total space used for relocations is 160 bytes. A linker that does fixup chaining (such as OPTLINK) can get away with only putting one fixup record in the file. How's this? The NE format has a clever method of letting fixups be applied in a linked-list fashion. The head of the list is pointed to by the single relocation record. At the spot in the segment where the address of BeginPaint() will be plugged in is a 16-bit offset to another place where BeginPaint()'s address also needs to be applied. When the operating system loader brings the file into memory, it just visits each node of the chain and leaves behind a copy of the necessary information (the target address). Not only does fixup chaining save space by eliminating redundant fixup records, it can also speed up load times significantly. For more information on fixup chaining (as well as segment packing), see my article "Liposuction Your Corpulent Executables and Remove Excess Fat" (Microsoft Systems Journal, July 1993).

In addition to the main program (OPTLINKS.EXE), the package comes with a few other programs. OPTIMP is a superset of the IMPLIBs shipped with the Borland and Microsoft development environments. STRIPDEB removes the debug information from the end of an executable, similar to Borland's TDSTRIP and Microsoft's CVPACK /STRIP. FIXLIB accepts a Borland-produced .LIB file format and modifies the dictionary so that LINK, TLINK, and OPTLINK can all use it. According to the SLR folks, the dictionary in Borland .LIBs is incorrect at times, and both LINK and OPTLINK are unable to use it. I personally love FIXLIB because I can now use Borland's IMPORT.LIB with LINK and OPTLINK. IMPORT.LIB has all the exported Windows functions, not just those documented in Microsoft's LIBW.LIB.

OPTLINK vs. TLINK

Before Borland became a major presence in the C/C++ market, OPTLINK was targeted at users of Microsoft C who wanted smaller .EXEs and faster linking. However, SLR now appears to be targeting users of Borland's TLINK. The reason can be summarized in two words: debug capacity. As all too many users of Borland's TLINK 5.x know, when building a program with debugging information, TLINK can run out of memory amazingly early. This is especially true with C++ programs. The use of class hierarchies leads to much more debugging information than the equivalent C code would produce. Borland users who have stuck with TLINK are getting increasingly frustrated with turning on debugging information in just select modules to prevent TLINK from running out of memory. OPTLINK has a much greater capacity when processing Borland's debugging information, so it has a major inroad with Borland's customer base. In fact, Borland representatives have themselves recommended OPTLINK when pressured about TLINK's capacity problems.

One of TLINK's attempts to deal with the sheer volume of debugging information was to introduce symbol table compression (using the /Vt switch). A compressed symbol table is in the same format as a non-compressed symbol table. The compression that occurs is more a matter of eliminating duplicate type information. For instance, if you defined a struct in an .H file and included that file in three separate .CPP files, the type information describing the structure will show up three times in an uncompressed symbol table. By using /Vt with TLINK, there would only be one copy of the struct's type information.

OPTLINK performs debug information compression implicitly as part of the link process. In fact, OPTLINK does a better job of eliminating redundant information than TLINK /Vt does. I determined this by linking a couple of programs with both TLINK /Vt and OPTLINK. To see the resulting debug information, I ran TDUMP -v -ex on the two executable files. I then compared each debug information subsection table in the two .EXEs. The detailed results are breathtakingly dull, so I'll spare you a recitation of them here. The short summary is that OPTLINK was more aggressive in eliminating types, member definitions, class definitions, and so on. Table 1 shows the debug information sizes for the two files. With one minor exception, I noted that OPTLINK fully supports the Borland debug specification, down to inclusion of the browser symbols and code coverage tables. The minor exception is that OPTLINK doesn't output browser information for local symbols.

Another compelling reason for Windows programmers to consider OPTLINK is that it produces significantly smaller executable files and DLLs for Windows. The primary size reduction comes from OPTLINK's ability to chain fixups as noted earlier. In linking the OWL WCHESS.EXE sample program, OPTLINK produces 503 fixups as compared to 3412 by TLINK. At eight bytes per fixup, that's a savings of over 22K, and more than 10 percent of the .EXE's size. Needless to say, the OPTLINK version will load faster as well.

Since OPTLINK defaults to producing Windows files that will only run in protected mode, it makes all entries in the NE entry table FIXED, even if the function is in a MOVEABLE segment. By using FIXED entries instead of MOVEABLE, OPTLINK can eliminate three bytes of overhead per entry. TLINK also defaults to PROTMODE operation, but generates MOVEABLE entries if the function is in a MOVEABLE segment. Another space savings offered by OPTLINK includes a smaller DOS stub if you let it provide a default stub.

Despite all the benefits OPTLINK offers, there are a few rough edges if you're a TLINK user. OPTLINK was originally developed as a Microsoft LINK replacement; Borland support was added later. As such, it doesn't appear that OPTLINK has been "burned in" as much for TLINK replacement as it has for LINK replacement. For example, in a linker response file, it's legal for a program to specify only the base file name for the target to be built (for instance, "FOO", rather than "FOO.EXE"). When I passed OPTLINK such a response file and told it to build a DLL, it created the file with a .EXE extension, rather than .DLL. The bit indicating that the file was a DLL was set inside the NE file, but the file's extension was wrong. TLINK handles this situation correctly.

Another quirk is OPTLINK's response file handling. I'm in the habit of invoking Borland's command-line compiler (BCC.EXE) with just a .C or .CPP file, and letting it supply the defaults when invoking TLINK. To make BCC work with OPTLINK, I made a copy of OPTLINKS.EXE called TLINK.EXE, and supplied an appropriate /TLINK mode OPTLINKS.CFG file. For a test, I ran BCC A.C, where A.C was a minimal DOS program. When using Borland's TLINK, the linker accepted the output from BCC without a peep. When using the renamed OPTLINKS, it prompted me for both library files and a .DEF file (ala LINK). Pressing the Enter key at each prompt yielded an .EXE file, but this prompting is annoying when it happens continually in a development situation. Since the program was a DOS program, OPTLINK shouldn't have asked for a .DEF file (TLINK doesn't).

Another problem I encountered with TLINK compatibility had to do with default .DEF files for Windows .EXEs. If you don't specify a .DEF file when using TLINK, it uses a set of defaults, including a 5K program stack. While OPTLINK will also use defaults, it has a nasty habit of not specifying any stack at all for the generated .EXE. To circumvent this problem, I tried putting a /STACK:5120 directive in the OPTLINKS.CFG file. While this worked for Windows programs, it also gave DOS programs a 5K stack. Borland- produced DOS programs start out with an initial small stack, and at run time switch the SS:SP to a larger stack. Creating a DOS .EXE with an initial 5K stack was certainly not the behavior I desired from OPTLINK. The point of all this is that although SLR has put on a snazzy coat of TLINK paint, some areas appear to be lightly tested. In addition, OPTLINK seems to want to revert to LINK compatibility mode whenever it gets a chance.

OPTLINK vs. Microsoft's LINK

In the past, OPTLINK's primary target audience was Microsoft C and MASM developers who needed faster link times and increased capacity. With LINK 5.50 from the Visual C++ package, Microsoft has significantly narrowed both gaps. However, OPTLINK still holds some advantages for Microsoft users.

To a certain extent, debug information capacity is less a problem with Microsoft tools than the corresponding tools offered by Borland. The reason is that the linker doesn't have to do all the work of massaging the debug information into its final form. When producing CodeView-style information, OPTLINK emits a preliminary version of the debug information that's relatively easy for the linker to process. Afterwards, OPTLINK invokes CVPACK.EXE which takes care of merging all the debug information into one unit and eliminating duplicate information. Interestingly, OPTLINK doesn't complain if it can't execute CVPACK. If you have older tools that only recognize the CodeView 3.0 debug specification, OPTLINK can produce this format as well as producing the default CodeView 4.0 debug information.

In the speed category, OPTLINK was just slightly faster than LINK on my test executable, but not enough to get excited about; see Table 2. In all fairness, the test .EXE wasn't large enough to test the virtual memory systems of either OPTLINK or LINK. On large industrial- grade applications, SLR claims some users see performance gains of up to 50 percent over LINK.

Regarding the parts of the .EXE used by the operating system, OPTLINK produces NE files that aren't dramatically different than what LINK produces. LINK chains fixups, so you won't see the dramatic space savings like you would when comparing OPTLINK to TLINK. In fact, OPTLINK appears to produce the identical fixups to LINK, although in a different order. Two other NE tables where there's a difference between the two linkers are the resident and non-resident names tables (where the names of your exported functions live). LINK puts entries in these tables in a seemingly random order, while OPTLINK sorts the name in the reverse order of the entry table (for example, 15, 14, 13, and so on).

Other differences between OPTLINK and LINK-produced Windows executables include the entry table. Like TLINK, LINK defaults to PROTMODE, yet still generates MOVEABLE entries where appropriate. OPTLINK always appears to generate the smaller FIXED entries, thereby saving three bytes per entry. In addition, some segments in NE files are a few bytes larger in the OPTLINK-created executable than in the LINK-produced .EXE. While this may just be an effect of rounding-up segment sizes, it could potentially be the source of different behaviors when comparing the two linkers. For example, you might have a fence-post error and try to read one byte past the end of a data structure at the end of a segment. The LINK-produced program could GP fault, while the OPTLINK- produced program might not.

Table 1: Comparisons for OPTLINK and TLINK. Program was the OWL CHESS example compiled with BC++ 3.1. OPTLINK produces far fewer fixups as compared to TLINK, saving more than 10 percent of the .EXE's size. OPTLINK switches: /NOREL /TLINK /Twe /c /x /n /v /Vt /A=16 /P=65535. TLINK switches: /Twe /c /x /n /v /Vt /A=16 /P=65535. All tests run on a Gateway 4DX2-66V in non-turbo mode; 16 Mbytes installed; Windows was not running; Memory Manager was 386MAX 6.02; disk cache was Hyperdisk 4.21 with 7168 Kbytes in the cache; the times are the average of several runs, with the first run discarded; link times do not include resource binding; file sizes do not include resources.

                               OPTLINK      TLINK
                               4.01         5.1
File Size (w/Debug)          297570     421547
File Size (no Debug)          151632     189046
Link time (w/Debug)          6.5 sec     9.1 sec
Number of Fixup     s     503     3412

Table 2: Comparing OPTLINK with LINK. Program was a mixed MSVC C and MASM 5.1 program. Both times include the time to run CVPACK. LINK 5.50 ordinarily does debug compression internally, but the presence of MASM 5.1 information may have forced it to use CVPACK. Link times do not include resource binding and file sizes do not include resources. OPTLINK switches: /NOMAP /NOREL /NOLOGO /SI /CO /NOD /align:16 /CVVERSION:4. LINK switches: /NOLOGO /BAT /CO /NOD /align:16.

                               OPTLINK      TLINK
                               4.01         5.1
File Size (w/Debug)          389296     390484
File Size (no Debug)          161776     162697
Link time (w/Debug)          5.7 sec     6.3 sec
Number of Fixups          446     446

For More Information

OPTLINK for Windows
SLR Systems
1622 North Main Street
Butler, PA 16001
412-282-0864
$350.00