MIXED-LANGUAGE PROGRAMMING WITH ASM

Getting the job done often requires blending models and languages

Karl Wright and Rick Schell

Karl is the principal developer of Turbo Assembler and he can be reached at P.O. Box 39, Bedford, MA 01730. Rick is director of language development for Borland International and can be reached at 1800 Green Hills Road, Scotts Valley, CA 95065.


As applications get larger, fewer and fewer are written in a single language. Large software projects tend to come together in a piecemeal fashion -- some parts are borrowed from previous projects, other parts may be purchased from various vendor sources, and, let's face it, every programmer has a favorite language. Assembly languages have made great strides recently in the area of mixed language programming. Now more than ever before, it makes sense to write applications with more than one language and to include assembly language in the mix.

Furthermore, every programming language ever created has inherent strengths and weaknesses. One area in which different languages have distinct strengths is in how procedures are called. This is an extremely important issue, because in many applications more time and effort is spent getting in and out of procedures than doing anything else! Conversely, a good choice of procedure calling conventions can actually make the difference between an application that can be written quickly and one which cannot be written at all.

Usually, higher-level languages such as C and Pascal use an argument passing technique known as the "stack frame method," where arguments are pushed onto a stack and addressed as an offset from some "frame" pointer. It is a good general technique in that it allows for an unlimited number of arguments with built-in recursion.

C and Pascal each make use of a slightly different flavor of the stack frame method. The C-style stack frame permits a variable number of arguments to be passed to a procedure. This requires that the caller remove the arguments from the stack after the procedure call, because it is the caller who knows best how many arguments were passed. In Pascal, on the other hand, the number of arguments is fixed, so the procedure itself is responsible for removing its arguments from the stack. Typically, this is done efficiently with the single machine instruction RET xx.

Until recently, assembly language was generally limited to what is known as the "register passing method" of passing arguments. With register passing, arguments are passed to procedures in machine registers or at fixed memory locations. (Stack frames could be constructed in assembly language, but with considerable effort on the part of the programmer.) Register passing is not a general argument passing method. There are a limited number of registers in any machine, and explicit PUSH and POP instructions must be used to retain the availability of arguments during recursion. Nevertheless, register passing is a much more efficient method of passing arguments than the stack frame when the number of arguments to a procedure is small and the particular argument registers are chosen carefully in light of the instructions, which are to be done inside the procedure.

A Text "Spectrum Analyzer" Example

The example used to illustrate this point is a program that reads one or more text files, breaks them into words, and counts the individual words. It then sorts the resulting array by word count, and displays the word and the associated count together in a neat, tabular form.

This example emphasizes speed of execution, with the additional criteria that modularity is preserved and nasty tricks like self-modifying code are not used. This will permit the program to be relatively easy to change or to upgrade, and still be considerably faster than anything written wholly in a single language.

The major points that need to be covered are the interfaces between modules and what each module is responsible for, as well as the overall organization of the application.

The command line that this program will accept has the following format: SPECTRUM <file_spec> <file_spec> ... where each <file_spec> can include wild cards. If a file name is given more than once, its spectrum will be taken more than once. The output of the application will be a table that is written to Standard Out and is sorted in order of reference count, the most referenced words being listed first.

The basic steps are: 1. Initialize all data structures. 2. Parse the command line. For each file spec, read the file(s) and break it (them) into words. Keep a reference count for each unique word. 3. Build a list of unique words and sort it by reference count. 4. Scan the sorted list and print out the reference count and associated word for each list element.

For the sake of performance, the work of reading a file, breaking it into words, and hashing them into a symbol table is best handled in assembly language, as is the other time bottleneck that occurs when the sort is done. Less time-critical areas, such as command-line parsing and table formatting, are written in C to provide greater flexibility in the user interface. Finally, the generality of assembly language, another inherent strength, makes it best for dealing with the heap and error handling modules.

The major modules we need and their respective languages are: ERROR.ASM, the assembly language error handler (see Listing One, page 116); HEAP.ASM, the assembly language memory allocator (Listing Two, page 116); WORD.ASM, the assembly language lexer/word, table/file input (Listing Three, page 116); SORT.ASM, the assembly language general-sort procedure (Listing Four, page 119); and SPECTRUM.C, the command-line parsing, text formatting, and output written in C (Listing Five, page 120). The make file is shown in Listing Six, page 121.

Throughout the program, we've made every effort to use an appropriate calling convention for the situation. On procedures with stack frames, Pascal-style calling conventions are most frequently used because of their inherently faster execution and smaller code requirements. Only on procedures that require a variable number of arguments do we use a C-style stack frame.

The extensive modularity we use in this application is not absolutely necessary given its small size. We have tried, however, to put forth as general a treatment as possible, demonstrating techniques that are appropriate even for very large applications. The use of strong data abstraction is one of these techniques. In strong data abstraction, the details of an actual data structure are known only to a small set of procedures that manage that data structure. The data structure and the procedures that manage it are taken together to form a module. Any other code in the program that deals with the data structure must do so through the appropriate procedures -- any other access is considered to be a breach of modularity. In this application, the HEAP and WORD modules are good examples of strong data abstractions.

The program uses SMALL model with a NEAR stack. All of the code is in segment _TEXT (except for any code in the C libraries), so CS is always set to_TEXT. Data, uninitialized data, and stacks are all in DGROUP, so SS must always be set to DGROUP. DS is also set to DGROUP in the C sections of the program, but is used as a general segment register in the assembly language code.

The interfaces to the procedures in the various modules pretty well spell out the function of each module:

More Details.

Error Handling Module Because errors need only to be caught and displayed without the ability to resume execution of the application, the error handling scheme this program uses is a mechanism whereby the stack pointer is saved at some point in the execution of the program, and if an error is encountered, the program is resumed at that point. The required procedures are listed in Table 1.

Table 1: Required procedures for error handling

  void pascal ERROR_INIT (void)
         Initializes error module.

  unsigned pascal ERROR_TRAP (void pascal (*execution_procedure)() )
         Returns 0 if no error occurred in the execution of
         EXECUTION_PROCEDURE or any procedures it calls.  (Otherwise,
         an error code is returned.)  EXECUTION_PROCEDURE is a
         generic procedure which can generate errors in its execution
         (via ERROR_LOG) and might be declared in C as follows:
         void pascal execution_procedure(void)

  void pascal ERROR_LOG (unsigned error_code)
         Causes control to pass to the nearest enclosing ERROR_TRAP.
         Execution resumes with that instance of function ERROR_TRAP
         returning error_code.

Heap Module Because data structures are allocated but never freed, a simple stack heap is the best choice for both performance and simplicity. The application uses a paragraph-based heap where memory is allocated with 16-byte granularity. This turns out to be useful because it permits any data item allocated from the heap to be described with a single 16-bit segment address. See Table 2.

Table 2: Required procedures for stack heap

  void pascal HEAP_INIT (unsigned starting_segment, unsigned segment_count)
         Initializes the heap to start at a certain segment and be
         a certain size.

  void far * pascal HEAP_ALLOC (unsigned paragraph_count)
         Allocates the requested number of paragraphs from the
         heap and returns the far address of the memory in DX:AX.
         NOTE: The offset part of the address is always 0.

Symbol Table Module The symbol table module is responsible for much of the actual work of reading in a file, converting it to words, and recording the word usage information. After it is read in, each symbol is represented by an area of memory allocated from the heap containing the reference count for the symbol and the actual text of the symbol. Because it is allocated from the heap, each symbol can be addressed by using a 16-bit word descriptor. Refer to Table 3.

Table 3: Procedures for symbol table

  void pascal WORD_INIT (unsigned maximum_word_count)
         Initializes symbol table.  The maximum number of
         different words allowed is passed so that a hash table
         can be initialized.

  void pascal WORD_READ (unsigned file_handle)
         Reads all the text there is from the specified file
         handle and analyzes it.

  void pascal WORD_SCAN (void pascal (*word_procedure)())
         Calls the specified procedure once for each individual
         symbol.  The word descriptor for the symbol is passed to
         WORD_PROCEDURE as an argument.  WORD_PROCEDURE might
         be declared in C as follows:
         void pascal word_procedure(unsigned word_descriptor).

  char far * pascal WORD_NAME (unsigned word_descriptor)
         Returns the FAR address of the name of the described symbol.

  unsigned pascal WORD_REFCOUNT (unsigned word_descriptor)
         Returns the total reference count of the described symbol.

  unsigned pascal WORD_COUNT(void)
         Returns the total number of distinct words processed so far.

  int pascal WORD_COMPREF (unsigned word_descriptor1, unsigned
  word_descriptor2)
         Compares the reference counts of two word descriptors.
         Returns flags for
         refcount(word_descriptor2) - refcount(word_descriptor1).  NOTE:
         This procedure, while it obeys Pascal calling conventions, is
         not callable directly from C because it returns its result in
         the flag register.  It also has the requirement that the
         registers CX and DX are preserved.

         This procedure might be described as using a sort of
         "hybrid" calling convention, where a stack frame is
         used but high-level language register conventions are not
         obeyed.

Sorting Module The sort routine is written in assembly language because a recursive algorithm was chosen and recursion tends to be faster if register passing can be used appropriately. In this case, there are a small number of registers that are used directly; more importantly, during the innermost step of the recursion (which is done most often) no registers whatsoever need to be saved on the stack. Recursion with a stack frame can't make a decision this intelligent, because access to the arguments is needed first.

The sort procedure operates on an array of words, calling a generic comparison routine whose address is passed as an argument. This comparison routine uses a hybrid calling convention, where a stack frame is present but registers are not necessarily consistent with C. The level of generality this arrangement achieves is high, but it does require that the comparison routine be written in assembly language. See Table 4.

Table 4: Procedures for sorting

  void pascal SORT_DO (unsigned far *sort_array, unsigned sort_count,
         int pascal (*compare_procedure)())
         Uses the specified compare procedure to order the array.
         COMPARE_PROCEDURE is called with two array values, and
         returns flags appropriate to a comparison of those
         values.  Note that compare_procedure cannot be written in
         C because the value is returned in the machine flags.  In
         addition, the segment registers are not guaranteed to be
         set up in a manner consistent with C when
         compare_procedure is called.  Compare_procedure itself is
         expected to preserve CX and DX.  The definition for
         compare_procedure might be stated:
         int pascal compare_procedure(unsigned value1, unsigned value2)

If raw speed were the only concern, the SORT_DO procedure might best be integrated entirely into the symbol table module, which would permit the comparison to be performed directly and would remove the need to call the comparison routine. But we felt that a more general treatment was superior in terms of modifiability -- it is relatively straightforward to add a switch to control the particular sorting method, for example.

The Command-line Parsing and Text Formatting Module We are now ready to lay out the full-scale sequencing of the program. Given the assembly language interface listed earlier, the following steps should be taken by the C portion of the program:

    1. Allocate memory from DOS, call ERROR_INIT, and set up an error trap using ERROR_TRAP.

    2. Call HEAP_INIT and WORD_INIT appropriately.

    3. Parse the command line. For each file spec, call WORD_READ for all files matching the file spec (the C code is responsible for resolving all wild cards and for opening and closing each file).

    4. Request the total number of unique words using WORD_COUNT, and allocate an array of 16-bit word descriptors using HEAP_ALLOC that is large enough to hold them. Call WORD_SCAN appropriately to fill up the array with word descriptors.

    5. Sort the array using SORT_DO with the comparison routine WORD_COMP REF, which compares the count of references for two word descriptors.

    6. Write the table title.

    7. Scan the array to write out the table entries. Use WORD_REFCOUNT to get the reference count for each word descriptor, and WORD_NAME to get the name string for each word descriptor.

Theory of Operation

The SPECTRUM program uses a hash function and hash table to achieve its level of performance. Inside the WORD module, the procedure WORD_READ reads text into a buffer. This text is copied to a storage area one word at a time. During the copy operation, which uses the LODSB and STOSB instructions, the text is converted to uppercase and the hash value for the word is calculated, all on-the-fly.

The hash table is an array of word descriptors. An element in the hash table is 0 if there is not yet an associated symbol. The hash function is calculated by looking at each character in the word, rotating the previous hash value circularly left by five, and XORing in the character value. The final hash value is masked off to become an index into the hash table.

After the hash index is calculated, the corresponding hash table entry is checked. If it is 0, a new symbol is created, and its reference count is initialized to 1. Otherwise, the text of the word is compared against the text stored in the symbol whose word descriptor is found in the hash table. If it agrees, the correct symbol has been located, and its reference count is incremented. If not, a collision has occurred, and the next hash value is calculated by adding 11*2 to the current hash index (this number must be relatively prime to the size of the hash table). The process then repeats until the correct hash table entry or a 0 is found.

An unusual technique is used to speed the recognition of the various different character types during the lexing process. BX is initialized to point to a translation table, which contains a bit for each pertinent character type. An XLAT instruction followed by a TEST AL,xxx is then all that is needed to identify a character as a numeral, delimiter, lowercase alphabetic, and so on.

Another unusual technique is used to describe objects in the assembly language section of the program. Rather than use a full 32 bits to describe the address of a data object, which is somewhat cumbersome, a paragraph address is used instead. This paragraph address becomes the "descriptor" for the object. Data within the object is addressed by loading an appropriate segment register with the object descriptor and accessing the data with a constant offset using that segment register.

After all files have been read in and parsed, an array of word descriptors is built using the routine WORD_SCAN. This array is then sorted using SORT_DO with the comparison routine WORD_COMPREF. SORT_DO is a recursive sort that requires N*LOG(N) comparisons. It operates by dividing the array into two roughly equal parts, recursively sorting each part, and then merging the two parts in place.

Finally, to output the table, the array is scanned sequentially. For each word descriptor in the array, WORD_NAME is used to obtain the actual text of the word, and WORD_REFCOUNT is used to obtain the reference count. These values are displayed using PRINTF.

Conclusion

It is not only practical but advisable to mix languages and models in order to achieve the best results. Modern assembly language is a vital part of this mix, and will continue to be important in the future, because space and performance are always important for competitive software, no matter how powerful the hardware becomes. Assembly language's flexibility can assist in everything from optimization to the creation of programs using more than one interfacing convention.

Assembler Specific Features

The assembly language section of the application was written in Borland's Turbo Assembler 2.0 and uses several features unique to that assembler. If you are using another assembler, you may need to modify portions of the example so that your assembler will accept it. The following are the features I used and how you can work around them in your assembler.

Extended CALL automatically builds a calling stack frame by generating a series of PUSHes in the order appropriate to the specified language. For example, CALL foo pascal, ax, bx, wordptr would PUSH the three arguments AX, BX, and WORDPTR onto the stack in the order appropriate for Pascal stack frames, and is equivalent to

  PUSH ax
  PUSH bx
  PUSH wordptr
  CALL foo

Multiple PUSHes/POPs permit more than one item at a time to be PUSHed or POPed with a single instruction. For example,

  PUSH AX BX
  POP BX AX

is equivalent to

  PUSH AX
  PUSH BX
  POP BX
  POP AX

Local Symbols are enabled with the LOCALS directive. All local symbols begin with the two characters @@. They are scoped to be local to the enclosing procedure. For example

  fool proc
    jmp @@exit
  @@exit: ret
  endp

  foo2 proc
    jmp @@exit
  @@exit: ret ;This @@EXIT can
  co-exist amicably with the former one.
  endp

If you are using an assembler that does not support this feature, one way to work around it is to change the .MODEL statement at the start of each module to .MODEL SMALL, PASCAL. This will cause all symbols within a procedure to become local.

ARG and USES Statements the assembler used for the example has a way of setting up procedure stack frames that is somewhat easier to read than the standard method. For example:

  foo proc pascal
  arg a1,a2
  uses ds,si

      ...

is equivalent to the statement:

  foo proc pascal uses ds si,a1,a2
      ...

Some assemblers require a language to be specified in the .MODEL statement before the language keyword PASCAL is recognized. If this is true for your assembler, you will need to change the .MODEL statement at the start of each module to .MODEL SMALL,PASCAL.

The CODEPTR type is used occasionally in the example. It means either WORD or DWORD depending on whether the selected model has NEAR or FAR code, respectively. Because the example is SMALL model, you may replace CODEPTR with WORD wherever it is found.

-- R.S.

MIXED-LANGUAGE PROGRAMMING WITH ASM by Karl Wright and Rick Schell [LISTING ONE]




;* Module description * This module takes care of error trapping. The scheme
;used records the trapping routine stack pointer so that an error can cause
;the stack to return to a consistent state. This module was written using
;Borland's Turbo Assembler 2.0.

;** Environment **
.model small   ;Set up for SMALL model.
locals      ;Enable local symbols.

;** Macros **
;<<Generate correct return based on model>>
procret macro
if @codesize
   retf
else
   retn
endif
endm

;** Public operations **
public pascal ERROR_INIT   ;Initialize error handler.
public pascal ERROR_TRAP   ;Set up error trap.
public pascal ERROR_LOG      ;Log error.

;** Uninitialized data **
.data?
errstk   dw ?   ;SP at last error log (-1 if none).

;** Code **
.code
;Set up DS to nothing since that is the typical arrangement.
assume ds:nothing

;[Initialize error manager]
error_init proc pascal      ;Declare proc with PASCAL calling conventions.
   mov errstk,-1
   ret
endp

;[Set up error trap]
;This procedure preserves the previous ERRSTK, sets up a new ERRSTK, and
;calls the passed procedure.  On exit, the previous ERRSTK is restored.
error_trap proc pascal      ;Pascal calling conventions.
arg @@proc:codeptr      ;Only argument is procedure to call.
uses ds,si,es,di      ;Force a save of all registers C cares for.
   push errstk
   ;Call internal routine to record return address on stack.
   call @@rtn
   pop errstk
   ret
@@rtn   label proc
   mov errstk,sp      ;Save SP so we can restore it later.
   call @@proc pascal   ;Call procedure.
   xor ax,ax      ;Return code = 0 for normal return.
   procret
endp

;[Log error]
;Control is passed to the last ERROR_TRAP, if any.
;Error code is passed and returned in AX.
error_log proc pascal
arg @@error_code:word
   cmp errstk,-1      ;Lock up if no error address.
@@1:   jz @@1
   mov ax,@@error_code
   mov sp,errstk
   procret
endp
end





[LISTING TWO]


;* Module description * This module manages a simple stack-based heap.
;Deallocation is not supported. NOTE: This module must be assembled with /MX
;to publish symbols in the correct case. This module is written using
;Borland's Turbo Assembler 2.0.

;** Environment **
.model small   ;Set up for SMALL model.
locals      ;Enable local symbols.

;** Equates **
err_memory = 1   ;Out of memory error number.

;** Public operations **
public pascal HEAP_INIT      ;Initialize heap.
public pascal HEAP_ALLOC   ;Allocate memory from heap.

;** External operations **

;<<Error handler>>
extrn pascal ERROR_LOG:proc   ;Long jump library procedure for errors.

;** Uninitialized data **
.data?
memptr   dw ?   ;Pointer to first free segment.
memsiz   dw ?   ;Remaining paragraphs in heap.

;** Code **
.code
;Set up DS to nothing since that is the typical arrangement.
assume ds:nothing

;[Initialize the heap]
heap_init proc pascal      ;Declare proc with PASCAL calling conventions.
arg @@start_seg:word,@@para_size:word
            ;Arguments are starting segment and para count.
   mov ax,@@start_seg
   mov memptr,ax
   mov ax,@@para_size
   mov memsiz,ax
   ret
heap_init endp

;[Allocate memory from the heap]
heap_alloc proc pascal      ;Declare proc with PASCAL calling conventions.
arg @@para_count:word      ;Only argument is count of paragraphs.
   ;See if there is enough remaining.
   mov ax,@@para_count
   cmp memsiz,ax
   jc @@err
   sub memsiz,ax
   add ax,memptr
   xchg ax,memptr
   mov dx,ax
   xor ax,ax
   ret
@@err:   ;Out-of-memory error.
   mov ax,err_memory
   call error_log pascal,ax
   ;Never returns.
heap_alloc endp

end







[LISTING THREE]


;* Module description * This module reads source files and converts them into
;words, then files the words away in a symbol table with the help of a hash
;function. This module was written using Borland's Turbo Assembler 2.0.

;** Environment **
.model small   ;Set up for SMALL model.
locals      ;Enable local symbols.

;** Equates **
;<<Error numbers>>
err_hash = 2   ;Out of hash space error number.
err_read = 3   ;Read error.

;<<Hash function>>
hash_rotate = 5   ;Amount to rotate for hash function.
hash_skip   = 11;Number of entries to skip on hash collision.

;<<Read buffer>>
rbf_size = 800h   ;Size of read buffer in paragraphs.

;** Public operations **
public pascal WORD_INIT      ;Initialize hash table.
public pascal WORD_READ      ;Read file, convert to words, and hash them.
public pascal WORD_COUNT   ;Get total word count.
public pascal WORD_NAME      ;Get name of word.
public pascal WORD_REFCOUNT   ;Get reference count of word.
public pascal WORD_SCAN      ;Scan all words.
public pascal WORD_COMPREF   ;Compare word reference counts.

;** External operations **
;<<Heap>>
extrn pascal HEAP_ALLOC:proc   ;Heap allocation.

;<<Error handling>>
extrn pascal ERROR_LOG:proc   ;Trap an error.

;** Data structure **
;<<Symbol table entry>>
symtbl struc
symref   dw ?   ;Reference count.
symsiz   dw ?   ;Length of word.
ends
symnam   = size symtbl   ;Offset of start of name text.

;** Initialized data **
.data
;<<Translation character type table>>
typdlm   = 1   ;Delimiter bit.
typnum   = 2   ;Numerical digit.
typcas   = 20h   ;Lower case bit: Set if lower case letter.
xlttbl   label byte
   db '0' dup (typdlm)
   db 10 dup (typnum)
   db ('A'-1)-'9' dup (typdlm)
   db 'Z'-('A'-1) dup (0)
   db ('a'-1)-'Z' dup (typdlm)
   db 'z'-('a'-1) dup (typcas)
   db 255-'z' dup (typdlm)

;** Uninitialized data **
.data?

;<<Hash table values>>
hshptr   dw ?   ;Segment address of hash table.
hshsiz   dw ?   ;Total number of hash entries.  Must be a power of 2!
hshcnt   dw ?   ;Total free entries remaining in hash table.
hshmsk   dw ?   ;Mask for converting hash value to address.

;<<Read buffer values>>
rbfptr   dw ?   ;Segment address of read buffer.

;<<Word buffer>>
wrdbuf   db 256 dup (?)

;** Code **
.code
;Set up DS to nothing since that is the typical arrangement.
assume ds:nothing

;[Initialize hash table]
word_init proc pascal
arg @@max_word_count:word   ;Argument: Maximum number of words.
uses es,di
   ;First, allocate read buffer.
   mov ax,rbf_size
   call heap_alloc pascal,ax
   mov rbfptr,dx
   ;Now convert maximum word count to power of 2.
   mov ax,@@max_word_count
   mov cl,16+1
@@l1:   dec cl
   shl ax,1
   jnc @@l1
   mov ax,1
   shl ax,cl
   ;Initialize some hash parameters.
   mov hshsiz,ax
   mov hshcnt,ax
   dec ax
   shl ax,1
   mov hshmsk,ax
   ;Now, allocate hash table from heap.
   mov ax,hshsiz      ;Size of hash table in words.
   add ax,7
   mov cl,3
   shr ax,cl      ;Convert to paragraphs.
   call heap_alloc pascal,ax
   mov hshptr,dx
   ;Clear out hash table: 0 means 'no value'.
   mov es,dx
   xor di,di
   cld
   mov cx,hshsiz
   xor ax,ax
   rep stosw
   ret
word_init endp

;[Read file and assimilate all words]
word_read proc pascal
arg @@handle:word      ;Argument is file handle.
uses ds,si,es,di
   ;Load XLAT buffer address.  The XLAT table is used for case conversion
   ;and for character type identification.
   mov bx,offset xlttbl
@@read:   ;Read next buffer while delimiter processing.
   call @@brd
   jcxz @@done
@@skip:   ;Skip all delimeters, etc.
   lodsb
   xlat xlttbl
   test al,typdlm
   loopnz @@skip
   jnz @@read
   ;Adjust pointer & count.
   dec si
   inc cx
   ;If it is a number, skip to end.
   test al,typnum
   jnz @@num
   ;It is a word. We'll transfer a word at a time to the word buffer,
        ;hashing it as we go. DX will be the current hash value. CX is the
        ;amount remaining in the buffer.
   xor dx,dx
   ;Initialize output address.
   push ss
   pop es
   mov di,offset wrdbuf
@@clp:   ;Transfer.  This is THE most time-critical loop in the program.
   lodsb         ;Read character.
   mov ah,al
   xlat xlttbl      ;Get its type.
   test al,typdlm      ;Abort if delimiter.
   jnz @@wend
   and al,typcas      ;Use case bit to convert to upper case.
   neg al
   add al,ah
   stosb         ;Save it in word buffer.
   ;Calculate hash value.
   mov ah,cl
   mov cl,hash_rotate
   rol dx,cl
   mov cl,ah
   xor dl,al
   loop @@clp      ;Keep going until end of buffer.
   ;End of buffer while word processing.  Read more.
   call @@brd
   jcxz @@wnd2
   jmp @@clp
@@nrd:   ;Read next buffer while number processing.
   call @@brd
   jcxz @@done
@@num:   ;Numbers are not considered 'words' and should be skipped.
   ;Skip up to first delimiter.
   lodsb
   xlat xlttbl
   test al,typdlm
   loopz @@num
   jz @@nrd
   ;Adjust pointer and count.
   dec si
   inc cx
   jmp @@skip
@@done:   ret
@@wend:   ;End of word.  Adjust buffer pointer.
   dec si
@@wnd2:   ;End of word.  Hash value is in DX, upper-case word is in WRDBUF,
   ;DI points to end of word + 1.
   push ds si cx bx   ;Save the registers we will use for this step.
   xor al,al      ;Null-terminate the word.
   stosb
   mov cx,di      ;Calculate the word's length.
   sub cx,offset wrdbuf
   mov bx,dx      ;Put the hash value in a useable register.
   shl bx,1      ;Lower bit will be discarded, so shift.
   push ss         ;Initialize DS.
   pop ds
   assume ds:dgroup
   ;Now it is time to locate the word in the hash table if it is there,
   ;or create an entry if it is not.
@@hlp:   mov es,hshptr
   and bx,hshmsk
   mov ax,es:[bx]
   and ax,ax
   jz @@make
   ;Verify that the hash entry is the correct one.
   mov es,ax
   mov ax,cx
   cmp es:[symsiz],ax   ;Compare length of word.
   jnz @@coll
   mov si,offset wrdbuf   ;Compare actual text if that agrees.
   mov di,symnam
   repz cmpsb
   mov cx,ax
   jz @@fd
@@coll:   ;Collision!  Advance to the next candidate hash entry.
   add bx,hash_skip*2
   jmp @@hlp
@@dne2:   ret
@@make:   ;We have encountered this word for the first time.
   ;We must create a new symbol entry of the appropriate size.
   ;First decrement remaining free hash count.
   dec hshcnt
   jz @@herr
   push cx
   push bx
   mov ax,cx      ;Calculate length of symbol descriptor.
   add ax,symnam+15
   mov cl,4
   shr ax,cl
   call heap_alloc pascal,ax
   pop bx         ;Record symbol descriptor in hash table.
   mov es:[bx],dx
   pop cx         ;Record length.
   mov es,dx
   mov es:[symsiz],cx
   mov di,symnam      ;Move text of word into symbol table.
   mov si,offset wrdbuf
   shr cx,1
   rep movsw
   rcl cx,1
   rep movsb
   mov es:[symref],0   ;Clear reference count.
@@fd:   ;Matching entry found!  Increment reference count.
   inc es:[symref]
@@nwd:   ;Go on to the next word in the buffer, if any.
   pop bx cx si ds
   assume ds:nothing
   jcxz @@dne2
   jmp @@skip
@@herr:   ;Out of hash space error.
   mov ax,err_hash
   call error_log pascal,ax
   ;No return from ERROR_LOG.
;(Read buffer)
;Reads the next hunk of buffer.  Returns actual amount read in CX,
;DS:SI as start of data to read.
@@brd:   push dx bx
   mov cx,rbf_size*16
   mov bx,@@handle
   mov ah,3fh
   mov ds,rbfptr
   xor dx,dx
   int 21h
   jc @@err
   mov cx,ax
   xor si,si
   pop bx dx
   cld
   retn      ;Use RETN so stack frame return won't be generated.
@@err:   ;Read error.
   mov ax,err_read
   call error_log pascal,ax
   ;No return is needed because ERROR_LOG never returns.
word_read endp

;[Get total word count]
word_count proc pascal
   mov ax,hshsiz      ;Load total word capacity.
   sub ax,hshcnt      ;Subtract actual remaining free words.
   ret
word_count endp

;[Get address of name of word]
word_name proc pascal
arg @@word_desc:word      ;Argument is word descriptor.
   mov dx,@@word_desc
   mov ax,symnam
   ret
word_name endp

;[Get refcount for word]
word_refcount proc pascal
arg @@word_desc:word      ;Argument is word descriptor.
uses ds
   mov ds,@@word_desc
   mov ax,ds:[symref]
   ret
word_refcount endp

;[Scan all words]
word_scan proc pascal
arg @@scan_proc:codeptr      ;Argument is procedure to call for each word.
uses ds,si
   mov ds,hshptr
   xor si,si
   mov cx,hshsiz
   cld
@@l1:   lodsw
   and ax,ax
   jnz @@take
@@next:   loop @@l1
   ret
@@take:   push cx ds
   push ss
   pop ds
   call @@scan_proc pascal,ax
   pop ds cx
   cld
   jmp @@next
word_scan endp

;[Compare reference counts for two word descriptors]
word_compref proc pascal
arg @@word_desc1:word,@@word_desc2:word
uses ds
   mov ds,@@word_desc2
   mov ax,ds:[symref]
   mov ds,@@word_desc1
   sub ax,ds:[symref]
   ret
endp
end





[LISTING FOUR]


;* Module description * This module contains the sort routine for SPECTRUM.
;This module was written using Borland's Turbo Assembler 2.0.

;** Environment **
.model small   ;Set up for SMALL model.
locals      ;Enable local symbols.

;** Public operations **
public pascal SORT_DO      ;Perform sort.

;** Code **
.code
;Set up DS to nothing since that is the typical arrangement.
assume ds:nothing

;[Sort procedure]
sort_do proc pascal
arg @@array:dword,@@count:word,@@compare_proc:codeptr
uses ds,si,di

   ;First load up registers for internal recursion.  DS:SI will be
   ;the current sort array address, CX the count of elements to sort.
   lds si,@@array
   mov cx,@@count
   call @@sort
   ret

;Internally recursive sort routine. This routine accepts DS:SI as the sort
;array address, and CX as the count of elements to sort.
@@sort:   cmp cx,2
   jnc @@go
   retn
@@go:   ;Save all registers we will change.
   ;Internally, DI and DX will be start and count of second merge area.
   push si cx di dx
   ;Divide into two parts and sort each one.
   mov dx,cx
   shr cx,1
   sub dx,cx
   call @@sort
   mov di,si
   add di,cx
   add di,cx
   xchg si,di
   xchg cx,dx
   call @@sort
   xchg cx,dx
   xchg si,di
   ;Now, merge the two areas in place.
   ;Each area must be at least size 1.
@@mrgl:   ;Compare - DS:DI - DS:SI.
   call @@compare_proc pascal,ds:[di],ds:[si]
;;The following commented-out sequence is the code that would be required
;;if strict Pascal calling conventions were adhered to for calling
;;COMPARE_PROC.  You can see how much extra work this is!!
;;   push cx dx
;;   push ds
;;   mov ax,ds:[di]
;;   mov bx,ds:[si]
;;   push ss
;;   pop ds
;;   call @@compare_proc pascal,ax,bx
;;   pop ds
;;   pop dx cx
;;   and ax,ax
   jns @@ok
   ;Slide up first merge area using starting value from DI.
   mov ax,ds:[di]
   push si cx
@@sllp:   xchg ax,ds:[si]
   add si,2
   loop @@sllp
   xchg ax,ds:[si]
   pop cx si
   add si,2
   add di,2
   dec dx
   jnz @@mrgl
   jmp short @@exi
@@ok:   ;Correct so far.  Advance SI.
   add si,2
   loop @@mrgl
@@exi:   ;Restore registers.
   pop dx di cx si
   retn
sort_do endp

end





[LISTING FIVE]


/***** File: SPECTRUM.C *****/
/* This C module is written using Borland's Turbo C 2.0 and can be
   compiled using the default switches.  It should be linked with the file
   WILDARGS.OBJ from the Turbo C examples directory to enable the wildcard
   file name expansion facility.  Without WILDARGS, SPECTRUM will still work
   but will not be capable of expanding file names with wildcards.

   The following is an example make file, where TA is the assembler name, TCC
   is the C compiler name, TLINK is the linker name, \TC\LIB contains the C
   libraries, and \TC\EXA contains the Turbo C examples:

spectrum.exe: spectrum.obj heap.obj word.obj error.obj sort.obj
   tlink \tc\lib\c0s+\tc\exa\wildargs+spectrum+heap+word+error+sort,spectrum,, \tc\lib\cs.lib;
heap.obj: heap.asm
   ta heap /mx;
word.obj: word.asm
   ta word /mx;
error.obj: error.asm
   ta error /mx;
sort.obj: sort.asm
   ta sort /mx;
spectrum.obj: spectrum.c
   tcc -c spectrum
*/

/*** Header Files ***/
#include <dos.h>
#include <stdio.h>
#include <fcntl.h>

/*** Function Protypes ***/
/* Used Locally */
int allocmem( unsigned, unsigned * );
int freemem ( unsigned );
int _open( const char *, int oflags );
int _close( int );
/* Error trapper */
extern void pascal error_init (void);
extern unsigned pascal error_trap (void pascal (*execution_procedure)() );
extern void pascal error_log (unsigned error_code);
/* Heap */
extern void pascal heap_init (unsigned starting_segment, unsigned segment_count);
extern void far * pascal heap_alloc (unsigned paragraph_count);
/* Symbol table */
extern void pascal word_init (unsigned maximum_word_count);
extern void pascal word_read (unsigned file_handle);
extern void pascal word_scan (void pascal (*word_procedure)() );
extern char far * pascal word_name (unsigned word_descriptor);
extern unsigned pascal word_refcount (unsigned word_descriptor);
extern unsigned pascal word_count (void);
extern int pascal word_compref (unsigned word_desc1, unsigned word_desc2);
/* Sorting procedure */
extern void pascal sort_do (unsigned far *sort_array, unsigned sort_count,int pascal (*compare_procedure)() );

/*** Global Variables ***/
/* Error table */
char * error_table [] = {
"Insufficient Memory\n",
"Out of Hash Space\n",
"File Read Error\n",
"Usage: SPECTRUM filespec [filespec] ... [filespec]\n(filespec may have ?,*)\n"
};

/* Arguments */
int global_argc;
char **global_argv;

/* Memory */
unsigned segment_count;
unsigned starting_segment;

/* Sort array */
unsigned sort_index;
unsigned far *sort_array;

/**** Procedures ****/
/* Fill sort array with descriptors */
void pascal array_fill(unsigned word_desc)
{
    sort_array[sort_index++] = word_desc;
}

/* Main execution procedure */
void pascal main2 (void)
{
    int i;
    unsigned j;
    int words = 0;
    int file_handle;
    if( global_argc < 2 ) {
   error_log(4);
   }
    heap_init (starting_segment, segment_count);
    word_init (32767);
    for( i=1 ; i<global_argc ; i++ ) {
   file_handle = _open (global_argv[i], O_RDONLY);
   if (file_handle != -1 ) {
       word_read( file_handle);
       _close( file_handle );
       } else {
       error_log(3);
       }
   }

    /* Obtain array address */
    sort_array = (unsigned far *)heap_alloc((word_count()+7)/8);
    /* Fill array */
    sort_index = 0;
    word_scan(array_fill);
    /* Sort array */
    printf ("Sorting...\n");
    sort_do (sort_array, sort_index, word_compref);

    /* Display output */
    printf ("\nCount\tWord\n");
    printf ("-----\t----\n");
    for (i=0 ; i<sort_index-1 ; i++) {
   j = word_refcount(sort_array[i]);
   words = words + j;
   printf ("%d",j);
   printf ("\t");
   printf ("%Fs",word_name(sort_array[i]));
   printf ("\n");
   }
    printf ("\nTotal unique words:\t%d\n",sort_index);
    printf ("Total words:\t\t%d\n",words);
}

/* Main procedure */
int main( int argc, char *argv[] )
{
    int i;
    /* Copy arguments */
    global_argc = argc;
    global_argv = argv;
    error_init();
    segment_count = allocmem(65535,&starting_segment);
    allocmem( segment_count, &starting_segment );
    i = error_trap ( main2 );
    if (i != 0) {
   /* Print error message */
   printf (error_table[i-1]);
   }
    freemem (starting_segment);
    return (i);
}





[LISTING SIX]


spectrum.exe: spectrum.obj heap.obj word.obj error.obj sort.obj
  tlink /v \tc\lib\c0s+\tc\exa\wildargs+spectrum+heap+word+error+sort, spectrum,,\tc\lib\cs.lib;
heap.obj: heap.asm
   ta heap /mx /zi
word.obj: word.asm
   ta word /mx /zi
error.obj: error.asm
   ta error /mx /zi
sort.obj: sort.asm
   ta sort /mx /zi
spectrum.obj: spectrum.c
   tcc -c -v spectrum