This article contains the following executables: UCRASM.ARC
Randy is the designer of numerous hardware and software projects, including assemblers. In addition to consulting, he is currently an instructor in computer science at California Polytechnic University, Pomona and UC Riverside. He can be contacted at 16217 Sunset Trail, Riverside, CA 92506.
Given the obvious benefits of space and speed, it's sad that assembly language has such a bad reputation. Some people cite portability, others complain about how hard it is to learn, but the real reason people don't write more code in assembly language can be summed up in one word: effort. Assembly language will never gain ground against high-level languages unless we can make it easier to program.
There are many reasons why people feel that programming in assembly language is much more difficult than programming in a high-level language such as C. A few points, however, top the list:
I've had quite a bit of success teaching students assembly language at the University of California Riverside using this library. It definitely eases the transition from a high-level language to assembly. Of course, the best assembly language programs are not "C programs written with MOV instructions," but my experience shows that getting students to write real programs in assembly language as soon as possible improves the quality of their education. For professional assembly language programmers, the UCR Standard Library package provides the opportunity to write code in assembly which would be too kludgey in a high-level language (yes, assembly is more appropriate for many applications), but require too much unwritten support code in a typical assembly language application.
While space restrictions prevent me from describing each entry in detail, a few examples will give you an idea of the types of routines available in the library. See Table 1, which lists the names and brief descriptions of all the routines currently in the library. I must emphasize that current means as of the writing of this article. There may be even more routines available by the time you read this. The library is in a state of constant flux.
Input Routines
Getc Reads a single character from the (library) standard
input.
GetcStdIn Reads a single character from the DOS standard input.
GetcBIOS Reads a single character from the keyboard by calling
BIOS.
SetInAdrs }
GetInAdrs } Maintain pointers to the current input routine. By
changing
PushInAdrs } them around, you can redirect the input obtained
through
PopInAdrs } the Getc routine.
Gets (m) Reads a string from the keyboard.
Scanf Formatted input routine, similar to C.
Output Routines
Putc Writes a single character to the (library) standard
output.
PutCR Writes a CR/LF pair using Putc.
PutcStdOut Calls DOS to print a character to the DOS standard
output.
PutcBIOS Calls BIOS to print a character to the display.
GetOutAdrs, }
SetOutAdrs, } Like the corresponding routines above, these
routines
PushOutAdrs, } maintain the stdlib standard output pointers.
PopOutAdrs- }
Puts Prints string pointed at by ES:DI.
Puth Prints value in AL as two hex digits.
Putw Prints value in AX as four hex digits.
Puti Prints value in AX as signed decimal number.
Putu Prints value in AX as unsigned decimal number.
Putl Prints 32-bit value in DX:AX as signed decimal
integer.
Putul Prints 32-bit value in DX:AX as unsigned decimal
integer.
PutlSize Prints integer value in AX using CX print positions.
PutUSize Prints unsigned value in AX using CX print positions.
PutLSize Prints long integer value in DX:AX using CX print
positions,
PutULSize Prints unsigned long value in DX:AX using CX print
positions,
Print Prints zero-terminated literal string constant
following the call.
Printf Similar to the corresponding C routine.
File I/O Routines
Fcreate Creates (and opens for writing) a new file.
ES:DI points at the name.
FOpen Opens an existing file for reading or writing.
ES:DI points at the name.
FClose Closes a file.
FReadOn, } Redirect the standard input routines so that they
take
FReadOff } their input from a file.
FWriteOn, } Redirect the standard output routines so that they
send
FWriteOff } their output to a file.
FSeek Moves the file pointer around in the file.
DOSHandle Converts STDLIB file handle to DOS file handle.
FDelete Deletes a file.
FRename Renames a file.
Conversion Routines
Atol (2) Converts ASCII string to long integer returned in
DX:AX.
Atoul (2) Converts ASCII string to unsigned 32-bit value in
DX:AX.
Atou (2) Converts ASCII string to 16-bit unsigned integer in
AX.
Atoh (2) Converts ASCII string of hex digits to 16-bit value in
AX.
Atolh (2) Converts ASCII string of hex digits to 32-bit value
in DX:AX.
Atoi (2) Converts ASCII string to 16-bit signed integer in AX.
Itoa (2,m) Converts signed integer value in AX to string.
Utoa (2,m) Converts unsigned value in AX to string, similar to
Itoa/2.
Htoa (2,m) Converts binary value in AL to exactly two hexadecimal
characters.
Wtoa (2,m) Converts binary value in AX to exactly four
hexadecimal characters.
Ltoa (2,m) Converts 32-bit signed value in DX:AX to string, like
Itoa.
Ultoa (2,m) Converts 32-bit unsigned value in DX:AX to string.
Sprintf (m) Performs in-core formatting operation, like its C
counterpart.
Sscanf Like the C sscanf routine.
ToLower If character in AL is upper case, converts it to lower
case.
ToUpper Converts lower-case chars in AL to upper case.
Utility Routines
ISize On input, AX contains signed decimal integer. On
output,
AX contains number of print positions this integer
would require (including sign).
USize Returns output size of unsigned value in AX.
LSize Returns output size of 32-bit signed value in DX:AX.
ULSize Returns output size of 32-bit unsigned value in DX:AX.
IsAINum Returns Z-flag set if character is alphanumeric.
IsXDigit Returns Z-flag set if character is a hex digit.
IsDigit Returns Z-flag set if character is a decimal digit.
IsAlpha Returns Z-flag set if character is alphabetic.
IsLower Returns Z-flag set if character is a lowercase
alphabetic.
IsUpper Returns Z-flag set if character is an uppercase
alphabetic.
String Routines
Strcpy (I) Copies string from ES:DI to DX:SI (including 0 byte).
StrDup (I) Copies string from ES:DI to newly allocate storage on
heap. Returns the pointer to the new string in ES:DI.
StrLen Computes the length of a string.
Strcat (m, I, mI) Concatenates two strings.
StrChr Searches for a character within a string.
StrStr (I) Searches for a substring within a string.
Strcmp (I) Compares two strings.
Strupr (m) Converts all lowercase characters in string to upper
case.
Strlwr (m) Converts all uppercase characters in string to lower
case.
Strset (m) Overwrites all characters in string with character
passed in AL.
Strspan (I) Skips over characters in string which belong to a
specified set. Returns number of skipped characters
in CX.
Strcspan (I) Like the routines above, but skips characters not in
the set.
Strins (m, I,mI) Inserts one string into another.
Strdel (m) Deletes characters from a string.
StrRev (m) Reverses characters in string (submitted by Mike
Blaszczak).
Memory Management Routines
MemInit Initializes the memory manager.
Malloc Allocates blocks of memory.
Realloc Resizes an allocated block.
Free Deallocates a block allocated via malloc.
Dupptr Informs the memory manager that two different pointers
refer to the same block. Free will not deallocate
the block until you call it once for each duplicated
pointer.
IsInHeap Returns true if a pointer points anywhere within the
heap area.
IsPtr Returns true if ES:DI points at a currently allocated
object.
Character Set Routines
Set Macro which lets you define up to eight character sets.
CreateSets Allocates storage and intializes up to eight sets on
the heap.
EmptySet Sets a character set to the empty set.
RangeSet Builds character set from lower-bound and upper-bound
characters.
AddStr (I) Adds the characters in a string to a character set.
RmvStr (I) Removes characters specified in string from a character
set.
AddChar Adds a single character (in AL) to a character set.
RmvChar Removes a single character from a character set.
Member Sets Z-flag if character in AL is a member of specified
character set.
CopySet Copies the bits from one set to another.
SetUnion Computes the union of two character sets.
SetIntersect Computes the intersection of two a character sets.
SetDifference Computes the set difference of two character sets.
NextItem Returns the ASCII code of the first item found in the
set.
RmvItem Removes next available item from set and returns this
value.
Floating-Point Routines
lsfpa Loads floating-point accumulator (fpacc) with 32-bit
single precision value.
ssfpa Stores fpacc into a single precision variable.
ldfpa Loads fpacc from a 64-bit double precision variable.
sdfpa Stores fpacc from a double precision variable.
lefpa (I) Loads fpacc from an 80-bit extended precision variable.
sefpa Stores fpacc into an extended precision variable.
lsfpo Loads floating-point operand (fpop) from single
precision variable.
ldfpo Loads fpop from a double precision variable.
lefpo (I) Loads fpop from an extended precision variable.
itof Converts 16-bit integer to floating-point value in
fpacc.
utof Converts unsigned 16-bit value to floating point.
ultof Converts unsigned 32-bit value to floating point.
ltof Converts signed 32-bit value to floating point.
fpadd Adds fpop to fpacc.
fpsub Subtracts fpop from fpacc.
fpcmp Compares fpacc to fpop.
fpmul Multiplies fpacc by fpop.
fpdiv Divides fpacc by fpop.
ftoa Converts fpacc to a decimal string.
etoa Converts fpacc to ASCII string in exponent form.
Notes: Routines with the "2" suffix imply that the routine does not preserve the DI register. The routine returns DI pointing at the first character beyond the string. If the routine does not have the "2" suffix, then it preserves the ES:DI registers.
The "m" suffix implies that the routines automatically allocate storage for their string results on the heap by calling malloc. These routines return a pointer to the new string in ES:DI. Routines without the "m" suffix expect you to pass the address of a sufficiently large buffer in the ES:DI register pair.
Routines with the "I" suffix accept literal string constants in the code stream (that is, the string constant appears after the call to the specified routine).
If the routine you want isn't in the library, write it up, send it in, and we'll make sure it appears in the next release!
The library itself is currently divided into nine categories: input, output, file I/0, conversions, utilities, string operations, memory management, character-set operations, and floating-point operations. Most of these routines are high level; that is, they are nontrivial and typically require many lines of code and considerable knowledge to implement.
The printf routine is a good example of a high-level routine. It's something C programmers take for granted, but you rarely see it in typical assembly language programs. A typical call to the printf routine is shown in Example 1. As you can see, this code is close to that used by C and much more convenient than the usual methods assembly language programmers employ. Except for the DB and DD directives, this probably doesn't look a whole lot like assembly language, though. The printf statement invokes a macro found in the distribution stdlib.a file. Among other things, this macro issues a call to the sl_printf subroutine in the library. I use macros to invoke the library routines, not because I'm trying to be cute, but as a way to provide "smart linking" facilities in MASM 5.1. See Listing One (page 80) for a sample macro definition from the library.
printf db "The value of I is %3d, string s is %s\n",O dd I, S
Not all routines in the UCR StdLib completely mimic their C counterparts. For example, itoa accepts an integer value in AX and converts it to a zero-terminated string of characters, storing the string starting at ES:DI. Like C, ES:DI must point at an array large enough to hold the result, or bad things may happen. However, it's often inconvenient to declare an array and load its address into ES:DI prior to calling itoa. Therefore, a second routine, itoam, is available to simplify the calling sequence. You pass itoam an integer value in AX and it returns a pointer to the converted string in ES:DI. You do not have to preallocate storage for the call; itoam does this automatically by calling malloc. At other times, it would be nice if itoa returned a pointer to the end of the converted string rather than to the beginning of it. (This is useful, for example, when building long strings via successive calls to routines such as itoa.) A third routine, itoa2, provides this facility. Having different routines such as itoa, itoa2, and itoam reduces the size of a program because you don't need as many "glue" instructions between library calls to massage the parameters.
The string routines should look familiar to C programmers. Almost all C string-handling routines (and then some!) are present in the UCR StdLib. Like itoa, there are various forms of some of the string functions, mainly for convenience. For example, strcat takes four different forms: strcat, strcatl, strcatm, and strcatml. These variants provide different ways of passing parameters to and from the strcat routine. strcat concatenates two strings: It copies the string pointed at by DX:SI to the end of the string pointed at by ES:DI. The destination string (ES:DI) must have sufficient memory allocated to hold the concatenation. strcatm also concatenates the string pointed at by ES:DI and DX:SI, but it does not treat the string pointed at by ES:DI as the target location. Instead, it (essentially) does what's shown in Example 2. strcatl links a literal string constant (the "I" suffix stands for "literal") to the end of the string referenced by ES:DI. Likewise, strcatml copies the source string (ES:DI) onto the heap, appends the literal string constant following the call, and returns the address of this new string in ES:DI. For example, if filename points at a DOS filename and you want to append ".EXE" to the end of it, you could use the code in Example 3.
temp = malloc( strlen(es:di) + strlen(dx:si) +1); strcpy(temp, es:di); strcpy(temp + strlen(es:di), dx:si); es:di = temp;
les di, filename strcatml db ".EXE", 0 mov word ptr FullFileName, di mov word ptr FullFileName+2, es
Listing Two (page 80) provides a flavor of UCR Stdlib routine use. This is a short program which computes various statistics about the information in a text file and prints the results. The distribution package also includes test.asm, a file that demonstrates use of each routine in the library. Hopefully, Listing One will show you that it's worthwhile to obtain a copy of the library, and the test file will exemplify the use of each routine in the library.
To keep the library growing, I assign my assembly language students at UC Riverside a final project of adding one new, useful routine to the library. My Commercial Software Development class gets charged with testing, documenting, and otherwise cleaning up the work of the assembly language students. As such, new routines appear in the library on a periodic basis. The text box entitled "Obtaining and Licensing the UCR Standard Library" provides details on obtaining the latest version of the library.
The UCR Standard Library has proven to be of considerable aid in getting students up to speed with 80x86 assembly language. Beginners are finding assembly language easier to deal with than ever before, particularly when combined with MASM 6.0's high-level constructs. Advanced assembly language programmers use the library to help them quickly develop new programs in record time. Given that the library is free for all to use, it is definitely worth adding to your personal development suite if you use or are interested in learning how to use 80x86 assembly language.
There are two "official" distribution points for the UCR StdLib distribution package: BIX and Internet. If you have access to the BIX telecommunications service, you may obtain a copy of "UCRSTDLB.ZIP" from the IBM.PC/LISTINGS area. If you have ftp access to the Internet, you may obtain a copy of UCRSTDLB.ZIP via anonymous ftp to ucrmath.ucr.edu. After logging on, switch to the "PC" subdirectory (this is UNIX, so case counts!) and the rest should be fairly obvious.
In addition to the two official distribution points, the UCR StdLib routines are also available from many other telecommunications services, BBSs, and shareware distributors, including M&T Online and the DDJ Forum on CompuServe. However, there is a version control problem and I cannot guarantee that any service other than the ones mentioned above will have the latest version of the software. Given the way shareware and freeware software propagate around the world, I do not see a reasonable solution to this problem.
All replies and comments should be sent to the following address:
Randall Hyde Department of Computer Science UC Riverside, Riverside, CA 92521 Internet e-mail: rhyde@ucrmath.ucr.edu BIXmail: rhyde
_THE UCR STANDARD ASSEMBLY LANGUAGE LIBRARY_ by Randall Hyde[LISTING ONE]
Printf macro
; The following extrn declaration only occurs if this is the first time
; printf appears in the source file.
;
ifndef sl_printf
stdlib segment para public 'slcode'
extrn sl_printf:far
stdlib ends
endif
;
; Perform the call to the actual sl_printf routine:
call far ptr sl_printf
endm
;
[LISTING TWO]
;**************************************************************************
; TS: A "Text Statistics" package which demonstrates the use of the UCR
; Standard Library Package.
; Note: The purpose of this program is not to demonstrate blazing fast
; assembly code (it's not particularly fast) but rather to demonstrate how
; easy it is to write assembly code using the standard library and MASM 6.0.
; Randall Hyde
;**************************************************************************
; The following include must be outside any segment and before the
; ZZZZZZSEG segment. It includes all the macro definitions for the
; UCR Standard Library.
include stdlib.a ;Links into the UCR Standard
includelib stdlib.lib ; Library package.
;
dseg segment para public 'data'
;
WordCount dw 0 ;Holds file word count value
LineCnt dw 0 ;Holds # of lines in file
ControlCnt dw 0 ;Counts # of control characters
Punct dw 0 ;Counts # of punctuation characters
AlphaCnt dw 0 ;Counts # of alphabetic characters
NumericCnt dw 0 ;Counts numeric digits in file
Other dw 0 ;Counts other chars in file
MemorySize dw 0 ;# of paragraphs of free memory
Chars dw 0 ;Total number of chars in file
TotalChars dq 0.0 ;FP version of the above
FileHandle dw 0 ;STDLIB file handle
Const100 dq 100.0
;
; Create some sets to use in this program:
set CharSet,Alphabetic,Punctuation,Control
;
; Character Counter array. CharCnt [ch] contains the number of "ch"
; characters appearing in the file.
CharCnt dw 256 dup (0)
;
; Boolean flag to denote in/not in a word:
InWord db 0
;
dseg ends
;
cseg segment para public 'code'
assume cs:cseg, ds:dseg
;
; LESI is a macro which loads a 32-bit immediate value into ES:DI.
lesi macro adrs
mov di, seg adrs
mov es, di
lea di, adrs
endm
;
; ldxi loads a 32-bit immediate value into dx:si:
ldxi macro adrs
mov dx, seg adrs
lea si, adrs
endm
;
; Some useful constants-
cr equ 13
lf equ 10
EOFError equ 8
;
public PSP ;DOS Program Segment Prefix
PSP dw ? ;Needed by StdLib MemInit routine
;
; Okay, here's the main program which does the job:
TS proc
mov cs:PSP, es ;Save pgm seg prefix
mov ax, seg dseg ;Set up the segment registers
mov ds, ax
mov es, ax
;
; Initialize the memory manager, giving all free memory to the heap.
mov dx, 0
meminit
mov MemorySize, cx ;Save # of available paragraphs.
;
; Set up the character sets:
; First, build the Alphabetic set:
mov al, "A"
mov ah, "Z"
lesi Alphabetic
RangeSet
AddStrL
db "abcdefghijklmnopqrstuvwxyz",0
;
; Create the set with the punctuation characters:
lesi Punctuation
AddStrL
db "!@#$%^&*()_-+={[}]|\':;<,>.?/~`", '"', 0
;
; Create the control character set:
lesi Control
mov al,0
mov ah, 1fh
RangeSet
mov al, 7fh
AddChar
;
; Print the amount of available memory and prompt the user to enter a file name.
printf
db "Text Statistics Program",cr,lf
db "There are %d paragraphs of memory available",cr,lf
db lf
db "Enter a filename:",0
dd MemorySize
;
getsm ;Get the filename.
;
; Open the file.
mov al, 0 ;Open for reading.
fopen ;Open the file.
jnc GoodOpen
;
; If the carry flag comes back set, we've got an error, print an appropriate
; message and quit:
print
db "DOS error #",0
puti ;Error code is in AX.
putcr
jmp Return2DOS
;
; If the carry flag comes back clear, we've successfully opened the file. AX
; contains the STDLIB filehandle, ES:DI still points at the filename
; allocated on the heap:
GoodOpen: mov FileHandle, AX ;Save STDLIB file handle.
print
db "Computing text statistics for ",0
puts ;Print filename
free ;Dispose of space on heap
putcr
putcr
;
; The following loops check for transitions between words and delimiters. Each
; time we go from "not a word" -> "word" this code bumps up word count by one.
mov ax, FileHandle
fReadOn
;
TSLoop: getc
jnc NoError
jmp ReadError
;
; See if the character is alphabetic
NoError: lesi Alphabetic ;Set contains A-Z, a-z
Member
jz NotAlphabetic
inc AlphaCnt
jmp StatDone
;
; See if the character is a digit:
NotAlphabetic: cmp al, "0"
jb NotNumeric
cmp al, "9"
ja NotNumeric
inc NumericCnt
jmp StatDone
;
; See if the character is a punctuation character
NotNumeric: lesi Punctuation
Member
jz NotPunctuation
inc Punct
jmp StatDone
;
; See if this is a control character:
NotPunctuation: lesi Control
Member
jz NotControl
inc ControlCnt
jmp StatDone
;
NotControl: inc Other
StatDone: mov bl, al ;Use char as index into CharCnt
mov bh, 0
shl bx, 1 ;Convert word index to byte index
inc CharCnt [bx]
;
; Count lines and characters here:
cmp al, lf
jne NotNewLine
inc LineCnt
;
NotNewLine: inc Chars
; Count words down here
cmp InWord, 0 ;See if we're in a word.
je NotInAWord
cmp al, " "
ja WCDone
mov InWord, 0 ;Just left a word
jmp WCDone
;
NotInAWord: cmp al, " "
jbe WCDone
mov InWord, 1 ;Just entered a word
inc WordCount
;
WCDone:
;
; Okay, or the current character into the character set so we can keep
; track of the characters which appear in this file.
lesi CharSet
AddChar
jmp TSLoop
;
; Come down here on EOF or other read error.
ReadError: cmp ax, EOFError
je Quit
print
db "DOS Error ",0
puti
putcr
jmp Return2DOS
;
; Return to DOS.
Quit: freadoff
mov ax, FileHandle
fclose
printf
db cr,lf,lf
db "Number of words in this file is %d",cr,lf
db "Number of lines in this file is %d",cr,lf
db "Number of control characters is %d",cr,lf
db "Number of punctuation characters is %d",cr,lf
db "Number of alphabetic characters is %d",cr,lf
db "Number of numeric characters is %d",cr,lf
db "Number of other characters is %d",cr,lf
db "Total number of characters is %d",cr,lf
db lf, 0
dd WordCount,LineCnt,ControlCnt,Punct
dd AlphaCnt,NumericCnt,Other,Chars
;
; Print the characters that actually appeared in the file.
lesi CharSet
EC64: mov cx, 64 ;Chars/line on output.
EachChar: RmvItem
cmp al, 0
je CSDone
cmp al, " "
jbe EachChar
putc
loop EachChar
putcr
jmp EC64
;
CSDone: print
db cr,lf,lf
db "Press any key to continue:",0
getc
putcr
putcr
;
; Now print the statistics for each character:
mov ax, Chars ;Get character count,
utof ; convert it to a floating
lesi TotalChars ; point value, and save this
sdfpa ; value in "TotalChars".
;
; Print out each character, the number of occurrences, and the ratio of
; this character's count to the total number of characters.
mov bx, " "*2 ;Start output with spaces.
ComputeRatios: cmp CharCnt[bx], 0
je SkipThisChar
mov ax, bx
shr ax, 1 ;Convert index to character
putc ; and print it.
print
db " = ",0
mov ax, CharCnt [bx]
mov cx, 4
putisize
print
db " Percentage of total is ",0
;
utof
;
; Divide by the total number of characters in the file:
lesi TotalChars
ldfpo
fpdiv
;
; Multiply by 100 to get a percentage
lesi Const100
ldfpo
fpmul
;
; Print the ratio:
mov al, 7
mov ah, 3
ftoam
puts
free
print
db "%",cr,lf,0
;
SkipThisChar: inc bx
inc bx
cmp bx, 200h
jb ComputeRatios
putcr
;
Return2DOS: mov ah, 4ch
int 21h
;
TS endp
;
cseg ends
;
; Allocate a reasonable amount of space for the stack (2k).
sseg segment para stack 'stack'
stk db 256 dup ("stack ")
sseg ends
;
; zzzzzzseg must be the last segment that gets loaded into memory!
; The UCR Standard Library package uses this segment to determine where
; the end of the program lies.
zzzzzzseg segment para public 'zzzzzz'
LastBytes db 16 dup (?)
zzzzzzseg ends
end TS
Copyright © 1992, Dr. Dobb's Journal