Karl is the principal developer of Turbo Assembler and he can be reached at P.O. Box 39, Bedford, MA 01730. Rick is director of language development for Borland International and can be reached at 1800 Green Hills Road, Scotts Valley, CA 95065.
As applications get larger, fewer and fewer are written in a single language. Large software projects tend to come together in a piecemeal fashion -- some parts are borrowed from previous projects, other parts may be purchased from various vendor sources, and, let's face it, every programmer has a favorite language. Assembly languages have made great strides recently in the area of mixed language programming. Now more than ever before, it makes sense to write applications with more than one language and to include assembly language in the mix.
Furthermore, every programming language ever created has inherent strengths and weaknesses. One area in which different languages have distinct strengths is in how procedures are called. This is an extremely important issue, because in many applications more time and effort is spent getting in and out of procedures than doing anything else! Conversely, a good choice of procedure calling conventions can actually make the difference between an application that can be written quickly and one which cannot be written at all.
Usually, higher-level languages such as C and Pascal use an argument passing technique known as the "stack frame method," where arguments are pushed onto a stack and addressed as an offset from some "frame" pointer. It is a good general technique in that it allows for an unlimited number of arguments with built-in recursion.
C and Pascal each make use of a slightly different flavor of the stack frame method. The C-style stack frame permits a variable number of arguments to be passed to a procedure. This requires that the caller remove the arguments from the stack after the procedure call, because it is the caller who knows best how many arguments were passed. In Pascal, on the other hand, the number of arguments is fixed, so the procedure itself is responsible for removing its arguments from the stack. Typically, this is done efficiently with the single machine instruction RET xx.
Until recently, assembly language was generally limited to what is known as the "register passing method" of passing arguments. With register passing, arguments are passed to procedures in machine registers or at fixed memory locations. (Stack frames could be constructed in assembly language, but with considerable effort on the part of the programmer.) Register passing is not a general argument passing method. There are a limited number of registers in any machine, and explicit PUSH and POP instructions must be used to retain the availability of arguments during recursion. Nevertheless, register passing is a much more efficient method of passing arguments than the stack frame when the number of arguments to a procedure is small and the particular argument registers are chosen carefully in light of the instructions, which are to be done inside the procedure.
The example used to illustrate this point is a program that reads one or more text files, breaks them into words, and counts the individual words. It then sorts the resulting array by word count, and displays the word and the associated count together in a neat, tabular form.
This example emphasizes speed of execution, with the additional criteria that modularity is preserved and nasty tricks like self-modifying code are not used. This will permit the program to be relatively easy to change or to upgrade, and still be considerably faster than anything written wholly in a single language.
The major points that need to be covered are the interfaces between modules and what each module is responsible for, as well as the overall organization of the application.
The command line that this program will accept has the following format: SPECTRUM <file_spec> <file_spec> ... where each <file_spec> can include wild cards. If a file name is given more than once, its spectrum will be taken more than once. The output of the application will be a table that is written to Standard Out and is sorted in order of reference count, the most referenced words being listed first.
The basic steps are: 1. Initialize all data structures. 2. Parse the command line. For each file spec, read the file(s) and break it (them) into words. Keep a reference count for each unique word. 3. Build a list of unique words and sort it by reference count. 4. Scan the sorted list and print out the reference count and associated word for each list element.
For the sake of performance, the work of reading a file, breaking it into words, and hashing them into a symbol table is best handled in assembly language, as is the other time bottleneck that occurs when the sort is done. Less time-critical areas, such as command-line parsing and table formatting, are written in C to provide greater flexibility in the user interface. Finally, the generality of assembly language, another inherent strength, makes it best for dealing with the heap and error handling modules.
The major modules we need and their respective languages are: ERROR.ASM, the assembly language error handler (see Listing One, page 116); HEAP.ASM, the assembly language memory allocator (Listing Two, page 116); WORD.ASM, the assembly language lexer/word, table/file input (Listing Three, page 116); SORT.ASM, the assembly language general-sort procedure (Listing Four, page 119); and SPECTRUM.C, the command-line parsing, text formatting, and output written in C (Listing Five, page 120). The make file is shown in Listing Six, page 121.
Throughout the program, we've made every effort to use an appropriate calling convention for the situation. On procedures with stack frames, Pascal-style calling conventions are most frequently used because of their inherently faster execution and smaller code requirements. Only on procedures that require a variable number of arguments do we use a C-style stack frame.
The extensive modularity we use in this application is not absolutely necessary given its small size. We have tried, however, to put forth as general a treatment as possible, demonstrating techniques that are appropriate even for very large applications. The use of strong data abstraction is one of these techniques. In strong data abstraction, the details of an actual data structure are known only to a small set of procedures that manage that data structure. The data structure and the procedures that manage it are taken together to form a module. Any other code in the program that deals with the data structure must do so through the appropriate procedures -- any other access is considered to be a breach of modularity. In this application, the HEAP and WORD modules are good examples of strong data abstractions.
The program uses SMALL model with a NEAR stack. All of the code is in segment _TEXT (except for any code in the C libraries), so CS is always set to_TEXT. Data, uninitialized data, and stacks are all in DGROUP, so SS must always be set to DGROUP. DS is also set to DGROUP in the C sections of the program, but is used as a general segment register in the assembly language code.
The interfaces to the procedures in the various modules pretty well spell out the function of each module:
Error Handling Module Because errors need only to be caught and displayed without the ability to resume execution of the application, the error handling scheme this program uses is a mechanism whereby the stack pointer is saved at some point in the execution of the program, and if an error is encountered, the program is resumed at that point. The required procedures are listed in Table 1.
void pascal ERROR_INIT (void)
Initializes error module.
unsigned pascal ERROR_TRAP (void pascal (*execution_procedure)() )
Returns 0 if no error occurred in the execution of
EXECUTION_PROCEDURE or any procedures it calls. (Otherwise,
an error code is returned.) EXECUTION_PROCEDURE is a
generic procedure which can generate errors in its execution
(via ERROR_LOG) and might be declared in C as follows:
void pascal execution_procedure(void)
void pascal ERROR_LOG (unsigned error_code)
Causes control to pass to the nearest enclosing ERROR_TRAP.
Execution resumes with that instance of function ERROR_TRAP
returning error_code.
Heap Module Because data structures are allocated but never freed, a simple stack heap is the best choice for both performance and simplicity. The application uses a paragraph-based heap where memory is allocated with 16-byte granularity. This turns out to be useful because it permits any data item allocated from the heap to be described with a single 16-bit segment address. See Table 2.
void pascal HEAP_INIT (unsigned starting_segment, unsigned segment_count)
Initializes the heap to start at a certain segment and be
a certain size.
void far * pascal HEAP_ALLOC (unsigned paragraph_count)
Allocates the requested number of paragraphs from the
heap and returns the far address of the memory in DX:AX.
NOTE: The offset part of the address is always 0.
Symbol Table Module The symbol table module is responsible for much of the actual work of reading in a file, converting it to words, and recording the word usage information. After it is read in, each symbol is represented by an area of memory allocated from the heap containing the reference count for the symbol and the actual text of the symbol. Because it is allocated from the heap, each symbol can be addressed by using a 16-bit word descriptor. Refer to Table 3.
void pascal WORD_INIT (unsigned maximum_word_count)
Initializes symbol table. The maximum number of
different words allowed is passed so that a hash table
can be initialized.
void pascal WORD_READ (unsigned file_handle)
Reads all the text there is from the specified file
handle and analyzes it.
void pascal WORD_SCAN (void pascal (*word_procedure)())
Calls the specified procedure once for each individual
symbol. The word descriptor for the symbol is passed to
WORD_PROCEDURE as an argument. WORD_PROCEDURE might
be declared in C as follows:
void pascal word_procedure(unsigned word_descriptor).
char far * pascal WORD_NAME (unsigned word_descriptor)
Returns the FAR address of the name of the described symbol.
unsigned pascal WORD_REFCOUNT (unsigned word_descriptor)
Returns the total reference count of the described symbol.
unsigned pascal WORD_COUNT(void)
Returns the total number of distinct words processed so far.
int pascal WORD_COMPREF (unsigned word_descriptor1, unsigned
word_descriptor2)
Compares the reference counts of two word descriptors.
Returns flags for
refcount(word_descriptor2) - refcount(word_descriptor1). NOTE:
This procedure, while it obeys Pascal calling conventions, is
not callable directly from C because it returns its result in
the flag register. It also has the requirement that the
registers CX and DX are preserved.
This procedure might be described as using a sort of
"hybrid" calling convention, where a stack frame is
used but high-level language register conventions are not
obeyed.
Sorting Module The sort routine is written in assembly language because a recursive algorithm was chosen and recursion tends to be faster if register passing can be used appropriately. In this case, there are a small number of registers that are used directly; more importantly, during the innermost step of the recursion (which is done most often) no registers whatsoever need to be saved on the stack. Recursion with a stack frame can't make a decision this intelligent, because access to the arguments is needed first.
The sort procedure operates on an array of words, calling a generic comparison routine whose address is passed as an argument. This comparison routine uses a hybrid calling convention, where a stack frame is present but registers are not necessarily consistent with C. The level of generality this arrangement achieves is high, but it does require that the comparison routine be written in assembly language. See Table 4.
void pascal SORT_DO (unsigned far *sort_array, unsigned sort_count,
int pascal (*compare_procedure)())
Uses the specified compare procedure to order the array.
COMPARE_PROCEDURE is called with two array values, and
returns flags appropriate to a comparison of those
values. Note that compare_procedure cannot be written in
C because the value is returned in the machine flags. In
addition, the segment registers are not guaranteed to be
set up in a manner consistent with C when
compare_procedure is called. Compare_procedure itself is
expected to preserve CX and DX. The definition for
compare_procedure might be stated:
int pascal compare_procedure(unsigned value1, unsigned value2)
If raw speed were the only concern, the SORT_DO procedure might best be integrated entirely into the symbol table module, which would permit the comparison to be performed directly and would remove the need to call the comparison routine. But we felt that a more general treatment was superior in terms of modifiability -- it is relatively straightforward to add a switch to control the particular sorting method, for example.
The Command-line Parsing and Text Formatting Module We are now ready to lay out the full-scale sequencing of the program. Given the assembly language interface listed earlier, the following steps should be taken by the C portion of the program:
The SPECTRUM program uses a hash function and hash table to achieve its level of performance. Inside the WORD module, the procedure WORD_READ reads text into a buffer. This text is copied to a storage area one word at a time. During the copy operation, which uses the LODSB and STOSB instructions, the text is converted to uppercase and the hash value for the word is calculated, all on-the-fly.
The hash table is an array of word descriptors. An element in the hash table is 0 if there is not yet an associated symbol. The hash function is calculated by looking at each character in the word, rotating the previous hash value circularly left by five, and XORing in the character value. The final hash value is masked off to become an index into the hash table.
After the hash index is calculated, the corresponding hash table entry is checked. If it is 0, a new symbol is created, and its reference count is initialized to 1. Otherwise, the text of the word is compared against the text stored in the symbol whose word descriptor is found in the hash table. If it agrees, the correct symbol has been located, and its reference count is incremented. If not, a collision has occurred, and the next hash value is calculated by adding 11*2 to the current hash index (this number must be relatively prime to the size of the hash table). The process then repeats until the correct hash table entry or a 0 is found.
An unusual technique is used to speed the recognition of the various different character types during the lexing process. BX is initialized to point to a translation table, which contains a bit for each pertinent character type. An XLAT instruction followed by a TEST AL,xxx is then all that is needed to identify a character as a numeral, delimiter, lowercase alphabetic, and so on.
Another unusual technique is used to describe objects in the assembly language section of the program. Rather than use a full 32 bits to describe the address of a data object, which is somewhat cumbersome, a paragraph address is used instead. This paragraph address becomes the "descriptor" for the object. Data within the object is addressed by loading an appropriate segment register with the object descriptor and accessing the data with a constant offset using that segment register.
After all files have been read in and parsed, an array of word descriptors is built using the routine WORD_SCAN. This array is then sorted using SORT_DO with the comparison routine WORD_COMPREF. SORT_DO is a recursive sort that requires N*LOG(N) comparisons. It operates by dividing the array into two roughly equal parts, recursively sorting each part, and then merging the two parts in place.
Finally, to output the table, the array is scanned sequentially. For each word descriptor in the array, WORD_NAME is used to obtain the actual text of the word, and WORD_REFCOUNT is used to obtain the reference count. These values are displayed using PRINTF.
It is not only practical but advisable to mix languages and models in order to achieve the best results. Modern assembly language is a vital part of this mix, and will continue to be important in the future, because space and performance are always important for competitive software, no matter how powerful the hardware becomes. Assembly language's flexibility can assist in everything from optimization to the creation of programs using more than one interfacing convention.
The assembly language section of the application was written in Borland's Turbo Assembler 2.0 and uses several features unique to that assembler. If you are using another assembler, you may need to modify portions of the example so that your assembler will accept it. The following are the features I used and how you can work around them in your assembler.
Extended CALL automatically builds a calling stack frame by generating a series of PUSHes in the order appropriate to the specified language. For example, CALL foo pascal, ax, bx, wordptr would PUSH the three arguments AX, BX, and WORDPTR onto the stack in the order appropriate for Pascal stack frames, and is equivalent to
PUSH ax PUSH bx PUSH wordptr CALL foo
Multiple PUSHes/POPs permit more than one item at a time to be PUSHed or POPed with a single instruction. For example,
PUSH AX BX POP BX AX
is equivalent to
PUSH AX PUSH BX POP BX POP AX
Local Symbols are enabled with the LOCALS directive. All local symbols begin with the two characters @@. They are scoped to be local to the enclosing procedure. For example
fool proc
jmp @@exit
@@exit: ret
endp
foo2 proc
jmp @@exit
@@exit: ret ;This @@EXIT can
co-exist amicably with the former one.
endp
If you are using an assembler that does not support this feature, one way to work around it is to change the .MODEL statement at the start of each module to .MODEL SMALL, PASCAL. This will cause all symbols within a procedure to become local.
ARG and USES Statements the assembler used for the example has a way of setting up procedure stack frames that is somewhat easier to read than the standard method. For example:
foo proc pascal
arg a1,a2
uses ds,si
...
is equivalent to the statement:
foo proc pascal uses ds si,a1,a2
...
Some assemblers require a language to be specified in the .MODEL statement before the language keyword PASCAL is recognized. If this is true for your assembler, you will need to change the .MODEL statement at the start of each module to .MODEL SMALL,PASCAL.
The CODEPTR type is used occasionally in the example. It means either WORD or DWORD depending on whether the selected model has NEAR or FAR code, respectively. Because the example is SMALL model, you may replace CODEPTR with WORD wherever it is found.
-- R.S.
MIXED-LANGUAGE PROGRAMMING WITH ASM
by Karl Wright and Rick Schell
[LISTING ONE]
;* Module description * This module takes care of error trapping. The scheme
;used records the trapping routine stack pointer so that an error can cause
;the stack to return to a consistent state. This module was written using
;Borland's Turbo Assembler 2.0.
;** Environment **
.model small ;Set up for SMALL model.
locals ;Enable local symbols.
;** Macros **
;<<Generate correct return based on model>>
procret macro
if @codesize
retf
else
retn
endif
endm
;** Public operations **
public pascal ERROR_INIT ;Initialize error handler.
public pascal ERROR_TRAP ;Set up error trap.
public pascal ERROR_LOG ;Log error.
;** Uninitialized data **
.data?
errstk dw ? ;SP at last error log (-1 if none).
;** Code **
.code
;Set up DS to nothing since that is the typical arrangement.
assume ds:nothing
;[Initialize error manager]
error_init proc pascal ;Declare proc with PASCAL calling conventions.
mov errstk,-1
ret
endp
;[Set up error trap]
;This procedure preserves the previous ERRSTK, sets up a new ERRSTK, and
;calls the passed procedure. On exit, the previous ERRSTK is restored.
error_trap proc pascal ;Pascal calling conventions.
arg @@proc:codeptr ;Only argument is procedure to call.
uses ds,si,es,di ;Force a save of all registers C cares for.
push errstk
;Call internal routine to record return address on stack.
call @@rtn
pop errstk
ret
@@rtn label proc
mov errstk,sp ;Save SP so we can restore it later.
call @@proc pascal ;Call procedure.
xor ax,ax ;Return code = 0 for normal return.
procret
endp
;[Log error]
;Control is passed to the last ERROR_TRAP, if any.
;Error code is passed and returned in AX.
error_log proc pascal
arg @@error_code:word
cmp errstk,-1 ;Lock up if no error address.
@@1: jz @@1
mov ax,@@error_code
mov sp,errstk
procret
endp
end
[LISTING TWO]
;* Module description * This module manages a simple stack-based heap.
;Deallocation is not supported. NOTE: This module must be assembled with /MX
;to publish symbols in the correct case. This module is written using
;Borland's Turbo Assembler 2.0.
;** Environment **
.model small ;Set up for SMALL model.
locals ;Enable local symbols.
;** Equates **
err_memory = 1 ;Out of memory error number.
;** Public operations **
public pascal HEAP_INIT ;Initialize heap.
public pascal HEAP_ALLOC ;Allocate memory from heap.
;** External operations **
;<<Error handler>>
extrn pascal ERROR_LOG:proc ;Long jump library procedure for errors.
;** Uninitialized data **
.data?
memptr dw ? ;Pointer to first free segment.
memsiz dw ? ;Remaining paragraphs in heap.
;** Code **
.code
;Set up DS to nothing since that is the typical arrangement.
assume ds:nothing
;[Initialize the heap]
heap_init proc pascal ;Declare proc with PASCAL calling conventions.
arg @@start_seg:word,@@para_size:word
;Arguments are starting segment and para count.
mov ax,@@start_seg
mov memptr,ax
mov ax,@@para_size
mov memsiz,ax
ret
heap_init endp
;[Allocate memory from the heap]
heap_alloc proc pascal ;Declare proc with PASCAL calling conventions.
arg @@para_count:word ;Only argument is count of paragraphs.
;See if there is enough remaining.
mov ax,@@para_count
cmp memsiz,ax
jc @@err
sub memsiz,ax
add ax,memptr
xchg ax,memptr
mov dx,ax
xor ax,ax
ret
@@err: ;Out-of-memory error.
mov ax,err_memory
call error_log pascal,ax
;Never returns.
heap_alloc endp
end
[LISTING THREE]
;* Module description * This module reads source files and converts them into
;words, then files the words away in a symbol table with the help of a hash
;function. This module was written using Borland's Turbo Assembler 2.0.
;** Environment **
.model small ;Set up for SMALL model.
locals ;Enable local symbols.
;** Equates **
;<<Error numbers>>
err_hash = 2 ;Out of hash space error number.
err_read = 3 ;Read error.
;<<Hash function>>
hash_rotate = 5 ;Amount to rotate for hash function.
hash_skip = 11;Number of entries to skip on hash collision.
;<<Read buffer>>
rbf_size = 800h ;Size of read buffer in paragraphs.
;** Public operations **
public pascal WORD_INIT ;Initialize hash table.
public pascal WORD_READ ;Read file, convert to words, and hash them.
public pascal WORD_COUNT ;Get total word count.
public pascal WORD_NAME ;Get name of word.
public pascal WORD_REFCOUNT ;Get reference count of word.
public pascal WORD_SCAN ;Scan all words.
public pascal WORD_COMPREF ;Compare word reference counts.
;** External operations **
;<<Heap>>
extrn pascal HEAP_ALLOC:proc ;Heap allocation.
;<<Error handling>>
extrn pascal ERROR_LOG:proc ;Trap an error.
;** Data structure **
;<<Symbol table entry>>
symtbl struc
symref dw ? ;Reference count.
symsiz dw ? ;Length of word.
ends
symnam = size symtbl ;Offset of start of name text.
;** Initialized data **
.data
;<<Translation character type table>>
typdlm = 1 ;Delimiter bit.
typnum = 2 ;Numerical digit.
typcas = 20h ;Lower case bit: Set if lower case letter.
xlttbl label byte
db '0' dup (typdlm)
db 10 dup (typnum)
db ('A'-1)-'9' dup (typdlm)
db 'Z'-('A'-1) dup (0)
db ('a'-1)-'Z' dup (typdlm)
db 'z'-('a'-1) dup (typcas)
db 255-'z' dup (typdlm)
;** Uninitialized data **
.data?
;<<Hash table values>>
hshptr dw ? ;Segment address of hash table.
hshsiz dw ? ;Total number of hash entries. Must be a power of 2!
hshcnt dw ? ;Total free entries remaining in hash table.
hshmsk dw ? ;Mask for converting hash value to address.
;<<Read buffer values>>
rbfptr dw ? ;Segment address of read buffer.
;<<Word buffer>>
wrdbuf db 256 dup (?)
;** Code **
.code
;Set up DS to nothing since that is the typical arrangement.
assume ds:nothing
;[Initialize hash table]
word_init proc pascal
arg @@max_word_count:word ;Argument: Maximum number of words.
uses es,di
;First, allocate read buffer.
mov ax,rbf_size
call heap_alloc pascal,ax
mov rbfptr,dx
;Now convert maximum word count to power of 2.
mov ax,@@max_word_count
mov cl,16+1
@@l1: dec cl
shl ax,1
jnc @@l1
mov ax,1
shl ax,cl
;Initialize some hash parameters.
mov hshsiz,ax
mov hshcnt,ax
dec ax
shl ax,1
mov hshmsk,ax
;Now, allocate hash table from heap.
mov ax,hshsiz ;Size of hash table in words.
add ax,7
mov cl,3
shr ax,cl ;Convert to paragraphs.
call heap_alloc pascal,ax
mov hshptr,dx
;Clear out hash table: 0 means 'no value'.
mov es,dx
xor di,di
cld
mov cx,hshsiz
xor ax,ax
rep stosw
ret
word_init endp
;[Read file and assimilate all words]
word_read proc pascal
arg @@handle:word ;Argument is file handle.
uses ds,si,es,di
;Load XLAT buffer address. The XLAT table is used for case conversion
;and for character type identification.
mov bx,offset xlttbl
@@read: ;Read next buffer while delimiter processing.
call @@brd
jcxz @@done
@@skip: ;Skip all delimeters, etc.
lodsb
xlat xlttbl
test al,typdlm
loopnz @@skip
jnz @@read
;Adjust pointer & count.
dec si
inc cx
;If it is a number, skip to end.
test al,typnum
jnz @@num
;It is a word. We'll transfer a word at a time to the word buffer,
;hashing it as we go. DX will be the current hash value. CX is the
;amount remaining in the buffer.
xor dx,dx
;Initialize output address.
push ss
pop es
mov di,offset wrdbuf
@@clp: ;Transfer. This is THE most time-critical loop in the program.
lodsb ;Read character.
mov ah,al
xlat xlttbl ;Get its type.
test al,typdlm ;Abort if delimiter.
jnz @@wend
and al,typcas ;Use case bit to convert to upper case.
neg al
add al,ah
stosb ;Save it in word buffer.
;Calculate hash value.
mov ah,cl
mov cl,hash_rotate
rol dx,cl
mov cl,ah
xor dl,al
loop @@clp ;Keep going until end of buffer.
;End of buffer while word processing. Read more.
call @@brd
jcxz @@wnd2
jmp @@clp
@@nrd: ;Read next buffer while number processing.
call @@brd
jcxz @@done
@@num: ;Numbers are not considered 'words' and should be skipped.
;Skip up to first delimiter.
lodsb
xlat xlttbl
test al,typdlm
loopz @@num
jz @@nrd
;Adjust pointer and count.
dec si
inc cx
jmp @@skip
@@done: ret
@@wend: ;End of word. Adjust buffer pointer.
dec si
@@wnd2: ;End of word. Hash value is in DX, upper-case word is in WRDBUF,
;DI points to end of word + 1.
push ds si cx bx ;Save the registers we will use for this step.
xor al,al ;Null-terminate the word.
stosb
mov cx,di ;Calculate the word's length.
sub cx,offset wrdbuf
mov bx,dx ;Put the hash value in a useable register.
shl bx,1 ;Lower bit will be discarded, so shift.
push ss ;Initialize DS.
pop ds
assume ds:dgroup
;Now it is time to locate the word in the hash table if it is there,
;or create an entry if it is not.
@@hlp: mov es,hshptr
and bx,hshmsk
mov ax,es:[bx]
and ax,ax
jz @@make
;Verify that the hash entry is the correct one.
mov es,ax
mov ax,cx
cmp es:[symsiz],ax ;Compare length of word.
jnz @@coll
mov si,offset wrdbuf ;Compare actual text if that agrees.
mov di,symnam
repz cmpsb
mov cx,ax
jz @@fd
@@coll: ;Collision! Advance to the next candidate hash entry.
add bx,hash_skip*2
jmp @@hlp
@@dne2: ret
@@make: ;We have encountered this word for the first time.
;We must create a new symbol entry of the appropriate size.
;First decrement remaining free hash count.
dec hshcnt
jz @@herr
push cx
push bx
mov ax,cx ;Calculate length of symbol descriptor.
add ax,symnam+15
mov cl,4
shr ax,cl
call heap_alloc pascal,ax
pop bx ;Record symbol descriptor in hash table.
mov es:[bx],dx
pop cx ;Record length.
mov es,dx
mov es:[symsiz],cx
mov di,symnam ;Move text of word into symbol table.
mov si,offset wrdbuf
shr cx,1
rep movsw
rcl cx,1
rep movsb
mov es:[symref],0 ;Clear reference count.
@@fd: ;Matching entry found! Increment reference count.
inc es:[symref]
@@nwd: ;Go on to the next word in the buffer, if any.
pop bx cx si ds
assume ds:nothing
jcxz @@dne2
jmp @@skip
@@herr: ;Out of hash space error.
mov ax,err_hash
call error_log pascal,ax
;No return from ERROR_LOG.
;(Read buffer)
;Reads the next hunk of buffer. Returns actual amount read in CX,
;DS:SI as start of data to read.
@@brd: push dx bx
mov cx,rbf_size*16
mov bx,@@handle
mov ah,3fh
mov ds,rbfptr
xor dx,dx
int 21h
jc @@err
mov cx,ax
xor si,si
pop bx dx
cld
retn ;Use RETN so stack frame return won't be generated.
@@err: ;Read error.
mov ax,err_read
call error_log pascal,ax
;No return is needed because ERROR_LOG never returns.
word_read endp
;[Get total word count]
word_count proc pascal
mov ax,hshsiz ;Load total word capacity.
sub ax,hshcnt ;Subtract actual remaining free words.
ret
word_count endp
;[Get address of name of word]
word_name proc pascal
arg @@word_desc:word ;Argument is word descriptor.
mov dx,@@word_desc
mov ax,symnam
ret
word_name endp
;[Get refcount for word]
word_refcount proc pascal
arg @@word_desc:word ;Argument is word descriptor.
uses ds
mov ds,@@word_desc
mov ax,ds:[symref]
ret
word_refcount endp
;[Scan all words]
word_scan proc pascal
arg @@scan_proc:codeptr ;Argument is procedure to call for each word.
uses ds,si
mov ds,hshptr
xor si,si
mov cx,hshsiz
cld
@@l1: lodsw
and ax,ax
jnz @@take
@@next: loop @@l1
ret
@@take: push cx ds
push ss
pop ds
call @@scan_proc pascal,ax
pop ds cx
cld
jmp @@next
word_scan endp
;[Compare reference counts for two word descriptors]
word_compref proc pascal
arg @@word_desc1:word,@@word_desc2:word
uses ds
mov ds,@@word_desc2
mov ax,ds:[symref]
mov ds,@@word_desc1
sub ax,ds:[symref]
ret
endp
end
[LISTING FOUR]
;* Module description * This module contains the sort routine for SPECTRUM.
;This module was written using Borland's Turbo Assembler 2.0.
;** Environment **
.model small ;Set up for SMALL model.
locals ;Enable local symbols.
;** Public operations **
public pascal SORT_DO ;Perform sort.
;** Code **
.code
;Set up DS to nothing since that is the typical arrangement.
assume ds:nothing
;[Sort procedure]
sort_do proc pascal
arg @@array:dword,@@count:word,@@compare_proc:codeptr
uses ds,si,di
;First load up registers for internal recursion. DS:SI will be
;the current sort array address, CX the count of elements to sort.
lds si,@@array
mov cx,@@count
call @@sort
ret
;Internally recursive sort routine. This routine accepts DS:SI as the sort
;array address, and CX as the count of elements to sort.
@@sort: cmp cx,2
jnc @@go
retn
@@go: ;Save all registers we will change.
;Internally, DI and DX will be start and count of second merge area.
push si cx di dx
;Divide into two parts and sort each one.
mov dx,cx
shr cx,1
sub dx,cx
call @@sort
mov di,si
add di,cx
add di,cx
xchg si,di
xchg cx,dx
call @@sort
xchg cx,dx
xchg si,di
;Now, merge the two areas in place.
;Each area must be at least size 1.
@@mrgl: ;Compare - DS:DI - DS:SI.
call @@compare_proc pascal,ds:[di],ds:[si]
;;The following commented-out sequence is the code that would be required
;;if strict Pascal calling conventions were adhered to for calling
;;COMPARE_PROC. You can see how much extra work this is!!
;; push cx dx
;; push ds
;; mov ax,ds:[di]
;; mov bx,ds:[si]
;; push ss
;; pop ds
;; call @@compare_proc pascal,ax,bx
;; pop ds
;; pop dx cx
;; and ax,ax
jns @@ok
;Slide up first merge area using starting value from DI.
mov ax,ds:[di]
push si cx
@@sllp: xchg ax,ds:[si]
add si,2
loop @@sllp
xchg ax,ds:[si]
pop cx si
add si,2
add di,2
dec dx
jnz @@mrgl
jmp short @@exi
@@ok: ;Correct so far. Advance SI.
add si,2
loop @@mrgl
@@exi: ;Restore registers.
pop dx di cx si
retn
sort_do endp
end
[LISTING FIVE]
/***** File: SPECTRUM.C *****/
/* This C module is written using Borland's Turbo C 2.0 and can be
compiled using the default switches. It should be linked with the file
WILDARGS.OBJ from the Turbo C examples directory to enable the wildcard
file name expansion facility. Without WILDARGS, SPECTRUM will still work
but will not be capable of expanding file names with wildcards.
The following is an example make file, where TA is the assembler name, TCC
is the C compiler name, TLINK is the linker name, \TC\LIB contains the C
libraries, and \TC\EXA contains the Turbo C examples:
spectrum.exe: spectrum.obj heap.obj word.obj error.obj sort.obj
tlink \tc\lib\c0s+\tc\exa\wildargs+spectrum+heap+word+error+sort,spectrum,, \tc\lib\cs.lib;
heap.obj: heap.asm
ta heap /mx;
word.obj: word.asm
ta word /mx;
error.obj: error.asm
ta error /mx;
sort.obj: sort.asm
ta sort /mx;
spectrum.obj: spectrum.c
tcc -c spectrum
*/
/*** Header Files ***/
#include <dos.h>
#include <stdio.h>
#include <fcntl.h>
/*** Function Protypes ***/
/* Used Locally */
int allocmem( unsigned, unsigned * );
int freemem ( unsigned );
int _open( const char *, int oflags );
int _close( int );
/* Error trapper */
extern void pascal error_init (void);
extern unsigned pascal error_trap (void pascal (*execution_procedure)() );
extern void pascal error_log (unsigned error_code);
/* Heap */
extern void pascal heap_init (unsigned starting_segment, unsigned segment_count);
extern void far * pascal heap_alloc (unsigned paragraph_count);
/* Symbol table */
extern void pascal word_init (unsigned maximum_word_count);
extern void pascal word_read (unsigned file_handle);
extern void pascal word_scan (void pascal (*word_procedure)() );
extern char far * pascal word_name (unsigned word_descriptor);
extern unsigned pascal word_refcount (unsigned word_descriptor);
extern unsigned pascal word_count (void);
extern int pascal word_compref (unsigned word_desc1, unsigned word_desc2);
/* Sorting procedure */
extern void pascal sort_do (unsigned far *sort_array, unsigned sort_count,int pascal (*compare_procedure)() );
/*** Global Variables ***/
/* Error table */
char * error_table [] = {
"Insufficient Memory\n",
"Out of Hash Space\n",
"File Read Error\n",
"Usage: SPECTRUM filespec [filespec] ... [filespec]\n(filespec may have ?,*)\n"
};
/* Arguments */
int global_argc;
char **global_argv;
/* Memory */
unsigned segment_count;
unsigned starting_segment;
/* Sort array */
unsigned sort_index;
unsigned far *sort_array;
/**** Procedures ****/
/* Fill sort array with descriptors */
void pascal array_fill(unsigned word_desc)
{
sort_array[sort_index++] = word_desc;
}
/* Main execution procedure */
void pascal main2 (void)
{
int i;
unsigned j;
int words = 0;
int file_handle;
if( global_argc < 2 ) {
error_log(4);
}
heap_init (starting_segment, segment_count);
word_init (32767);
for( i=1 ; i<global_argc ; i++ ) {
file_handle = _open (global_argv[i], O_RDONLY);
if (file_handle != -1 ) {
word_read( file_handle);
_close( file_handle );
} else {
error_log(3);
}
}
/* Obtain array address */
sort_array = (unsigned far *)heap_alloc((word_count()+7)/8);
/* Fill array */
sort_index = 0;
word_scan(array_fill);
/* Sort array */
printf ("Sorting...\n");
sort_do (sort_array, sort_index, word_compref);
/* Display output */
printf ("\nCount\tWord\n");
printf ("-----\t----\n");
for (i=0 ; i<sort_index-1 ; i++) {
j = word_refcount(sort_array[i]);
words = words + j;
printf ("%d",j);
printf ("\t");
printf ("%Fs",word_name(sort_array[i]));
printf ("\n");
}
printf ("\nTotal unique words:\t%d\n",sort_index);
printf ("Total words:\t\t%d\n",words);
}
/* Main procedure */
int main( int argc, char *argv[] )
{
int i;
/* Copy arguments */
global_argc = argc;
global_argv = argv;
error_init();
segment_count = allocmem(65535,&starting_segment);
allocmem( segment_count, &starting_segment );
i = error_trap ( main2 );
if (i != 0) {
/* Print error message */
printf (error_table[i-1]);
}
freemem (starting_segment);
return (i);
}
[LISTING SIX]
spectrum.exe: spectrum.obj heap.obj word.obj error.obj sort.obj
tlink /v \tc\lib\c0s+\tc\exa\wildargs+spectrum+heap+word+error+sort, spectrum,,\tc\lib\cs.lib;
heap.obj: heap.asm
ta heap /mx /zi
word.obj: word.asm
ta word /mx /zi
error.obj: error.asm
ta error /mx /zi
sort.obj: sort.asm
ta sort /mx /zi
spectrum.obj: spectrum.c
tcc -c -v spectrum