Dale is a programmer at AGC Corp., where he specializes in optimization and portability of Basic programs. He can be reached at 1001 AGC Drive, Cleveland, TN 37312.
Basic has a reputation for making it possible to get small programs up and running quickly. But on the other hand, the language has gained a reputation for limited functionality, particularly when it comes to pointers and large-scale memory management.
Most of what needs to be done regarding pointer types can be accomplished with ordinary 16- and 32-bit integers; when formal identification is required, variable names such as ptr.databuffer or ptr.indexbuffer will suffice.
As for memory management, the sorting routine described in this article provides additional functionality by using large, single-string buffers and integer variables as pointers to the buffers. The design objective of this sort routine was to read data from files or file keys, send it one record at a time to the sort, and begin retrieval when the time between the first record sent and the first record retrieved is reduced to the absolute minimum. The current version of the routine will accommodate more than 32,000 records; you can modify the routine to handle several million records by creating index buffers with a segment length of three instead of two.
This version is compatible with Microsoft Basic 4.0 (and up) where the binary file mode provides the advantage of not closing and reopening files when writing to them with different lengths. Porting to Basic implementations compatible with Basic 4.0 and up shouldn't be a problem; moving to older dialects will likely involve changes. (For instance, you may have to change deflng x to defsng x if long integers are not available, then change the affected clng, mod, and \ statements accordingly; use gotos without changing the visible structure of the code when block if/then/else isn't available; and make the main buffers [sbuf$, sndx$] the field variables in corresponding random file opens if binary file mode is not available.) Porting to other languages (especially C) may involve more work, but I've written the code in a format designed to minimize porting problems. For example, I use nonstring operations (such as memcpy) and buffers and other entities that allow maximum flexibility in moving memory blocks.
Most Basic programs that sort string data use string arrays to hold the data to be sorted. While individual array elements can be swapped quickly as the result of a comparison, other problems (multilevel sorting, array element assignment, writing arrays to disk, and so on) tend to slow the process to the point where assembler routines are invariably required for performance. The sort routine in Listing Two (page 95) takes a somewhat radical approach to the problem by combining several techniques into one.
The first step in building this routine was to create a large string buffer to hold all the sort data, where every n character is a data element including blank-space padding, then insert each data element in sequence after first shifting any greater values upward in the buffer with a mid$ command: mid$ (buffer$, x + n) = mid$(buffer$, x). This technique has two problems that led to the addition of an indexing buffer in the current routine. First, the mid$ command requires swap space in string memory equal to the size of the shift on the right side of the equal sign, which could double the memory requirement or slow the process down by having to shift in segments. Secondly, the time required to shift the buffer for each insertion was prohibitive.
My solution was to write each data element into the main buffer (sbuf$) in sequential order while performing the aforementioned insert on the associated index buffer (sndx$). The Basic stringstack technique used here can best be described by analogy: Imagine a library with three trays of index cards (ordered by author, title, and subject) where each card contains the data and an exact physical location for the book. Now remove all data from the cards except the physical location pointer. This makes for a very compact index (in terms of the size of index data stored). The major disadvantage is that to find a specific title, a library patron will have to perform a binary search on an index tray and walk over to the shelves for each index card examined, just to make a comparison. Then the patron will have to determine whether to move up or down the index stack before making another trip to the bookshelves. Although impractical for library patrons, this approach works very well on computers because computers get the data record directly from the index pointer.
The third major technique used here is the group merge, where memory is not sufficient to sort all data in one pass and each group is dumped to disk files until the sort process is completed. The sort routine then loads one record from each group and outputs the lowest (highest if descending order) of the batch.
Probably the biggest time-killer in most Basic programs is the creation of string variables, both real and virtual. Creation of real strings cannot be avoided, but when a string is recreated to the same length over and over instead of just being cleared, it wastes time and forces more garbage collection. The virtual strings, as I call them, do not waste as much time as the assigned strings because they disappear after the current execution statement and do not work their way down into the string heap. Mid$ commands, str$, chr$, and so forth, all create virtual strings; when the number of incidents is very large, we have, in certain cases, an opportunity to save some time.
The first technique is to create a substitute for the Basic function chr$. The char$ array in Listing One (page 94) requires about 800 bytes of memory, and once created (see Listing Two), a call to char$ should be about four times faster than a call to chr$.
The second technique is not available to current versions of Basic, but because it is available in a relatively inexpensive replacement library (see "Declarations for the Sample Program"), I think it worthwhile to describe here. The midchar function gets the ASCII value of any character in a string without having Basic create the single mid$ character first, and does it about five times faster if linking with Crescent Software's PDQ replacement library.
The third technique used in this sort is based on the second, and converts 16-bit integer values from string to numeric - cvi(mid$(xxx$, midpos, 2))--without having Basic create the mid$ characters; get the midchar of the first byte and add the midchar of the second byte after multiplying it by 256. This third technique is approximately three times faster than the cvi(mid$(...function. For maximum portability, I keep a set of these replaceable routines handy to append to Basic programs when compiling in situations where PDQ and other proprietary add-ons are not available or applicable.
The sample program in Listing One (which call the sort in Listing Two) has four main functions: send data to sort, create external index (optional), resort from workfile (optional), and retrieve data from sort. From the top of the listing, you will note the midchar declaration. The mid$ call it replaces could be ignored in the code without requiring a separate function, but when the function (see the end of the listing) is eliminated and the program is compiled with Crescent Software's PDQ library, the speed increase is substantial, so I keep in this format for maximum portability.
The next bit of code is the DIM line; the subscripts equal to 10 may be raised or lowered if desired, and the subscript of 100 may have to be increased if the number of sort groups could be greater than 100. As an example, if the length of each sort string (sdat$) were 100 bytes, and the maximum sort buffer (sbuf$) size were 30,000, only 300 records could be held in memory in a single sort group. If the total number of records to be sorted were more than 30,000, more than 100 groups would be required, and the aforementioned subscript would have to be increased, although this would be an extreme case. The subscript of 255 for char$ cannot be lessened because it is a replacement for the Basic chr$ function, and requires all 256 characters of the ASCII collating set.
Next, I used common shared to pass a group of variables to the main sorting routine (Listing Two), but you will want to employ a named common block in real applications so that the variables are not passed when chaining. Most of the variable declarations below the common lines are required. The same list, along with a few others, can be found at the top of Listing Two. The three lines marked with an asterisk at the right side are the only lines that required the programmer's attention, the sort-string length, and the integer masking.
Two of the data sections (sortdata1 and sortdata3) are examples only and are usually provided to the executing program by configuration files or by a data dictionary. The data in sortdata2 is actual sort data and will generally come from database files and the like. The remaining information for configuring a sort can be found at the top of Listing Two.
--D.T.
The sortsq variable in Listing Two indicates the order of output (ascending/ descending), and resets the output pointer on each retrieval. nvflag is used to minimize the character inversions for ascending/descending sequences, and when nvflag matches the iseq() flag for a sort data segment, the characters in that segment are inverted (subtracted from 255) to facilitate the straightforward string comparisons that are the heart of the routine (see the fillproc subroutine). For reasons that pertain to Basic's internal structure, this inversion method, along with the swapping of bytes in 16-bit integer string, provides better performance than a multistage comparison where each segment is compared in turn, and the comparison terminates early if a difference is found. This advantages applies even when the char$ and midchar techniques are not included.
One place in the code where you will notice some cryptic and mostly undocumented variables is about 85 lines after the beginning line sub nsort...); ixx1, ixx2, and so on. These variables were built from a complex set of sort-group initialization parameters, and they represent the only efficient means I could find to put the intended idea into code. If this had to be rewritten, I'd recommend finding another algorithm and avoiding any modifications here.
The calls to memfree (25, 70, and 72 lines from the beginning) specify a byte exclusion value as the first parameter. For reasons I could not deduce, older versions of Basic seemed to want at least 4000 bytes overhead in the string heap beyond any amount calculated for temporary (virtual) strings and so on in order to prevent a string space corrupt error. Other Basic versions may not have this limitation, but on the other hand, they have the memory to spare, so I prefer to stay with the full exclusion for maximum portability. --D.T.
_FAST SORTING USING LARGE STRING BUFFERS_ by Dale Thorn[LISTING ONE]
'===========================================================================
'NSORT.BAS Sort/retrieve/index data; ascending/descending; mixed data types
' By: Dale Thorn
' Rev. 03/26/91
'===========================================================================
main:
defint a-w
deflng x
defsng y
defdbl z
declare function midchar(i$, i) 'use Basic function (listed) if PDQ not avail.
dim ibeg(10), ilen(10), iptx(100, 1), iseq(10), char$(255)
common shared compln, ddunit, grpptr, grptot, maxrcd, memndx, ndunit
common shared ndxgrp, ndxlen, nosegs, nvflag, offset, opcode, opinit
common shared outptr, outtot, rcdptr, rcdtot, recptr, sdunit, sortln
common shared sortsq, subtot, ibeg(), ilen(), iptx(), iseq(), char$()
compln = 0 'comparison length in sort data (sdat$); may be less than sortln
ddunit = 0 'file channel/unit number for index-building (opcode = -3)
grpptr = 0 'sort group record pointer/sort buffer pointer
grptot = 0 'internal sort group size
maxrcd = 0 'internal maximum sort group size
memndx = 0 'internal index-load flag
ndunit = 0 'file channel/unit number for sort index files
ndxgrp = 0 'internal index file group record counter
ndxlen = 0 'internal index file record size
nosegs = 0 'no. of sort segments in sdat$; total length of segments = compln
nvflag = 0 'internal optimization for least ascending/descending inversions
offset = 0 'internal group-to-record offset counter
opcode = 0 'sort operation (0 to -3)
opinit = 0 'internal sort operation data initialization flag
outptr = 0 'internal data output record pointer
outtot = 0 'internal data output record counter
rcdptr = 0 'internal sort data record counter (all records)
rcdtot = 0 'internal sort data record total (final count)
recptr = 0 'internal sort data record counter (group records)
sdunit = 0 'file channel/unit number for sort data file (.sdx)
sortln = 0 'length of sort data buffer (sdat$); may be greater than compln
sortsq = 0 'internal sort sequence (ascending/descending) flag
subtot = 0 'internal partial group data record total (final count)
drcd$ = "" 'temp. sort data record buffer
nrcd$ = "" 'sort index file buffer
sbuf$ = "" 'main sort group memory buffer
sdat$ = "" 'main sort data record buffer
smsk$ = "" 'sort data mask (must be uppercased) [BBXXXBBXXXXXBB.....]
sndx$ = "" 'sort index-pointer memory buffer
'// NOTE: Any lines below with an asterisk (*) on the extreme /////
' right will require a modification or replacement. /////
'/////// Modification applies to DATA statements as well. /////
sortln = 40 'total sort buffer length*
pfmt$ = space$(5) 'output format buffer for integer strings
sdat$ = space$(sortln) 'sort data record buffer
restore sortdata1 'first tablespec to sort from
read sdunit, ndunit, ddunit 'file channel/unit numbers used by NSORT.SUB
read ibeg(0), ilen(0), iseq(0) 'test values from table sortdata1
nosegs = 0 'initialize total no. of sort segments
while ibeg(0) 'begin loop to load segment pointers and flags
nosegs = nosegs + 1 'increment total sort segments
ibeg(nosegs) = ibeg(0) 'segment begin pointer for sdat$ buffer
ilen(nosegs) = ilen(0) 'segment length
iseq(nosegs) = iseq(0) 'segment sort sequence (ascending/descending)
compln = compln + ilen(0) 'total sort compare length
read ibeg(0), ilen(0), iseq(0) 'read next set of test values
wend
smsk$ = string$(compln, "X") 'allocate masking buffer (default type=character)
mid$(smsk$, 21) = "BB" '"binary" position specified*
mid$(smsk$, 33) = "BB" '"binary" position specified*
restore sortdata2 'sample sort data table
opcode = 0 'set flag to add records to sort (initial operation)
nrcds = 0 'number of records added to the sort
do 'begin loop to read data and add to sort
segptr = 1 'set segment position pointer for sdat$
lset sdat$ = "" 'clear the sort data buffer prior to loading
for segno = 1 to nosegs 'begin loop to load each data segment
read segdata$ 'read data segment from table sortdata2
if len(segdata$) = 0 then exit do 'exit read-data loop at end-of-data
if midchar(smsk$, segptr) = 66 then '16-bit integer <BB> segment
mid$(sdat$, segptr) = mki$(val(segdata$)) 'convert data to integer
else 'character <XX....> segment
mid$(sdat$, segptr) = segdata$ 'put character segment to sort buffer
end if
segptr = segptr + ilen(segno) 'increment segment position pointer
next
call nsort(drcd$, nrcd$, sbuf$, sdat$, smsk$, sndx$) 'add record to sort
nrcds = nrcds + 1 'total records added to the sort
loop
opcode = -3 'set flag to build an external index to the sortdata file
call nsort(drcd$, nrcd$, sbuf$, sdat$, smsk$, sndx$) 'build the index file
open "sortdata.ddx" for binary as #ddunit 'open the external index file
ddxrcd$ = space$(2) 'allocate the index buffer
for rcdno = 1 to nrcds 'begin loop to retrieve and display indexed data
call fileio(ddunit, 2, clng(rcdno), ddxrcd$, 0) 'retrieve an index record
call fileio(sdunit, sortln, clng(cvi(ddxrcd$)), sdat$, 0) 'retrieve data
for segno = 1 to nosegs 'begin loop to display sort segments
if midchar(smsk$, ibeg(segno)) = 66 then '16-bit integer <BB> segment
rset pfmt$ = right$(str$(cvi(mid$(sdat$, ibeg(segno), 2))), 5)
print pfmt$; " "; 'print integer data
else 'character segment
print mid$(sdat$, ibeg(segno), ilen(segno)); " "; 'print char. data
end if
next
print 'terminate print line
next
call killfile("sortdata.ddx", ddunit) 'index file closed and removed
restore sortdata3 'next tablespec to sort from
read ibeg(0), ilen(0), iseq(0) 'test values from table sortdata3
compln = 0 'comparison length in sort data (sdat$)
nosegs = 0 'initialize total no. of sort segments
while ibeg(0) 'begin loop to load segment pointers and flags
nosegs = nosegs + 1 'increment total sort segments
ibeg(nosegs) = ibeg(0) 'segment begin pointer for sdat$ buffer
ilen(nosegs) = ilen(0) 'segment length
iseq(nosegs) = iseq(0) 'segment sort sequence (ascending/descending)
compln = compln + ilen(0) 'total sort compare length
read ibeg(0), ilen(0), iseq(0) 'read next set of test values
wend
opcode = -1 'set flag to resort data from existing sort file
call nsort(drcd$, nrcd$, sbuf$, sdat$, smsk$, sndx$) 'resort the data
opcode = -2 'set flag to retrieve records from sort (final operation)
call nsort(drcd$, nrcd$, sbuf$, sdat$, smsk$, sndx$) 'retrieve 1st data record
while len(sdat$) 'begin loop to display sort data
for segno = 1 to nosegs 'begin loop to display sort segments
if midchar(smsk$, ibeg(segno)) = 66 then '16-bit integer <BB> segment
rset pfmt$ = right$(str$(cvi(mid$(sdat$, ibeg(segno), 2))), 5)
print pfmt$; " "; 'print integer data
else 'character segment
print mid$(sdat$, ibeg(segno), ilen(segno)); " "; 'print char. data
end if
next
print 'terminate print line
call nsort(drcd$, nrcd$, sbuf$, sdat$, smsk$, sndx$) 'retrieve next record
wend
close 'close all files
system 'return to DOS
'---------------------------------------------------------------------------
sortdata1: 'initial sort specifications
'---------------------------------------------------------------------------
'_____datafile____indexfile____buildfile :'File channel/unit numbers;
data 1, 2, 3 :'may be found using FREEFILE
'_____segbegin____seglength____segsequence :'Segment begin pointers, lengths
data 1, 20, 1 :'and sort sequences for sort
data 21, 2, -1 :'data buffer (sdat$).
data 23, 10, 1 :' sequence = 1; ascending
data 33, 2, -1 :' sequence = -1; descending
data 35, 6, 1 :'
data 0, 0, 0 :'end-of-data markers
'---------------------------------------------------------------------------
sortdata2: 'example sort data
'---------------------------------------------------------------------------
'_______Alpha data, len=20______Num.(2)______Alpha (10)____Num.(2)____Alpha (6)
data "Petrol Chemicals Ltd", "3576", "London SW3", "588", "A23456"
data "Associated Factories", "112", "Richmond", "1313", "XNA"
data "Dale's Containers", "12343", "Devonshire", "55", "DALE"
data "", "", "", "", ""
'---------------------------------------------------------------------------
sortdata3: 'specifications for alternate sorting order
'---------------------------------------------------------------------------
'_____segbegin____seglength____segsequence :'Segment begin pointers, lengths
data 33, 2, 1 :'and sort sequences for sort
data 1, 10, 1 :'data buffer (sdat$).
data 0, 0, 0 :'end-of-data markers
function midchar (i$, i) static 'find ASCII value of a single character in i$
midchar = asc(mid$(i$, i, 1)) 'set midchar value
end function 'return to calling program
rem $include: 'nsort.sub'
[LISTING TWO]
'===========================================================================
'NSORT.SUB Sort/retrieve/index data; ascending/descending; mixed data types
' By: Dale Thorn
' Rev. 03/24/91
'---------------------------------------------------------------------------
' compln - comparison length in sort data (sdat$); may be less than sortln
' ddunit - file channel/unit number for index-building (opcode = -3)
' grpptr - sort group record pointer/sort buffer pointer
' grptot - internal sort group size
' maxrcd - internal maximum sort group size
' memndx - internal index-load flag
' ndunit - file channel/unit number for sort index files
' ndxgrp - internal index file group record counter
' ndxlen - internal index file record size
' nosegs - no. of sort segments in sdat$; total length of segments = compln
' nvflag - internal optimization for least ascending/descending data inversions
' offset - internal group-to-record offset counter
' opcode - sort operation (0 to -3)
' opinit - internal sort operation data initialization flag
' outptr - internal data output record pointer
' outtot - internal data output record counter
' rcdptr - internal sort data record counter (all records)
' rcdtot - internal sort data record total (final count)
' recptr - internal sort data record counter (group records)
' sdunit - file channel/unit number for sort data file (.sdx)
' sortln - length of sort data buffer (sdat$); may be greater than compln
' sortsq - internal sort sequence (ascending/descending) flag
' subtot - internal partial group data record total (final count)
'
' ibeg() - segment begin pointers for sort data buffer (sdat$)
' ilen() - segment length pointers for sort data buffer (sdat$)
' iptx() - pointers used if merge-sort req'd. (set internally)
' iseq() - segment sequence pointers for sort data buffer (sdat$)
' 1 = ascending; -1 = descending
' char$() - high-performance substitute for Basic chr$() function
'
' drcd$ - temp. sort data record buffer (set to "" on first call)
' nrcd$ - sort index file buffer (set to "" on first call)
' sbuf$ - main sort group memory buffer (set to "" on first call)
' sdat$ - main sort data record buffer (set to actual value on first call)
' smsk$ - sort data mask (must be uppercased)
' BB = integer string; XXX.... all other bytes
' sndx$ - sort index-pointer memory buffer (set to "" on first call)
'
'
' set opcode = 0 on first call to add records to sort.
' set opcode = -1 to resort data from existing sort work file (sortdata.sdx).
' set opcode = -2 on first call to retrieve records from sort.
' set opcode = -3 to build index file (sortdata.ddx).
'
' *** Notes: opcode = 0 is always the first process (add records).
' opcode = -1 may be set to resort data, but only following
' the creation of an index with opcode set to -3.
' opcode = -2 may be set to retrieve records once all records
' have been added with opcode set to 0, or after
' a resort with opcode set to -1. Once opcode is
' set to -2 and all records are retrieved, the
' sort routine is terminated and all sort memory
' is returned to the calling program. If further
' sorting is required, begin anew with opcode = 0.
' opcode = -3 may be set to build an index file following an
' initial sort with opcode set to 0, or a resort
' with opcode set to -1. If more than 2 sorting
' sequences are required, where 2 or more index
' files are needed, rename each .ddx file to save it.
' The final sort sequence may be obtained using
' opcode = -2, and thus eliminate the need for a
' corresponding index file. Each 2 bytes in the index
' file are a pointer to a record in the .sdx file.
'
' For the first sort (opcode = 0), place all sort segments of sdat$
' into the left part of sdat$ in sequential order (1, 2, 3, etc.).
' When re-sorting using opcode = -1, segments may be in any order.
' All data stored in sortdata.sdx will be in the original sequence.
'
' ***** Important: Minimum sort length is 2 bytes.
' ***** If free memory is minimal, more sort groups may
' ***** be needed, and dim iptx(nnn) may be too small.
' ***** Each opcode process must be completed for all
' ***** records before switching to another process.
' ***** Use named common block if chaining programs.
'---------------------------------------------------------------------------
sub nsort (drcd$, nrcd$, sbuf$, sdat$, smsk$, sndx$) static
if opcode > -2 then 'insert a record <add to the sort>
if opinit mod 2 = 0 then 'first-sort-record initialization
opinit = opinit - 1 'adjust initialization flag
sortsq = iseq(1) 'primary output sequence
nvflag = 0 'data inversion flag
for segno = 1 to nosegs 'build data inversion spec
nvflag = nvflag + ilen(segno) * iseq(segno) 'bytes above/below 0
next
if nvflag < 0 then 'data inversion optimization
nvflag = 1 'set inversion flag plus
else
nvflag = -1 'set inversion flag minus
end if '[see fillproc & writeproc subroutines]
if nvflag = sortsq then sortsq = -sortsq 'primary output sequence
call killfile("sortdata.ndx", ndunit) 'kill work index file
open "sortdata.ndx" for binary as #ndunit 'open work index file
if opcode = 0 then 'initial (add records) operation
call killfile("sortdata.sdx", sdunit) 'kill work data file
open "sortdata.sdx" for binary as #sdunit 'open work data file
drcd$ = space$(sortln) 'temporary sort data buffer
for ichr = 0 to 255 'create substitute character set
char$(ichr) = chr$(ichr) 'substitute for Basic chr$() function
next
end if
call memfree(clng(4096), clng(195840), xfree) 'reserve 4 kb memory
maxrcd = xfree \ (sortln + 4) 'maximum records per memory group
if maxrcd > 32640 \ sortln then maxrcd = 32640 \ sortln 'buffer size
sbuf$ = space$(maxrcd * sortln) 'main sort data buffer
sndx$ = space$(maxrcd * 2 + 2) 'reorderable/shiftable index buffer
rcdptr = 1 'used to count total records
recptr = 1 'used to count records within a sort group
grpptr = 1 'sort buffer pointer
end if
if opcode = -1 then 'resort from existing workfile (.sdx)
ndxgrp = 0 'total number of sort groups
offset = 0 'internal group-to-record offset counter
while rcdptr <= rcdtot 'loop until all records are read
call fileio(sdunit, sortln, clng(rcdptr), sdat$, 0) 'get sort data
gosub putproc 'add records in new sort sequence
wend
else 'original (insert) sequence
gosub putproc 'add records to sort
end if
else 'retrieve a record or build an index
offset = 0 'group-to-record offset counter
if opinit mod 2 then 'first retrieval record initialization
opinit = opinit - 1 'adjust initialization flag
if opinit = -2 then 'first operation after original sort
rcdtot = rcdptr - 1 'total records from original sort
subtot = recptr - 1 'partial-group subtotal from original sort
end if
outptr = 1 'beginning pointer for data output
outtot = rcdtot 'total records to output
if ndxgrp then 'sorting was done in groups
gosub writeproc 'save data left over from previous operation
else 'all sorting was done in memory
maxrcd = rcdtot 'reset maximum records for file write
ndxlen = maxrcd * 2 'length of index data to write
gosub writeproc 'save sort data
ndxgrp = 0 'reset index group count to zero
end if
sbuf$ = "" 'erase buffer to reclaim memory
sndx$ = "" 'erase buffer to reclaim memory
if ndxgrp then 'merge-sort required
grplen = ndxlen 'group size * 2
sbuf$ = space$(ndxgrp * sortln) 'buffer holds 1 record per group
sndx$ = space$(ndxgrp * 2 + 2) 'buffer holds 1 record per group
end if
if opcode = -3 then 'build index from sorted data
call memfree(clng(6144), clng(32640), xfree) 'reserve 2kb for .ddx
else 'normal retrieval [return each record to calling program]
call memfree(clng(4096), clng(32640), xfree) 'reserve normal 4 kb
end if
xsize = clng(outtot) * 2 'total records * 2
memndx = (xsize <= 32640 and xsize <= xfree) 'index-in-memory flag
if memndx then 'retrieval index fits entirely in memory
ndxlen = xsize 'buffer length is index file length
else 'retrieval index does not fit in memory
ndxlen = 2 'buffer length is 16-bit integer length
end if
nrcd$ = space$(ndxlen) 'allocate index file buffer
if memndx then call fileio(ndunit, ndxlen, clng(1), nrcd$, 0)'fill it
if ndxgrp then 'merge-sort initialization
ixx1 = (sortsq > 0) 'used locally to shorten line
ixx2 = (sortsq < 0) 'used locally to shorten line
ixx3 = (memndx and ixx1) 'used locally to shorten line
ixx4 = (memndx and ixx2) 'used locally to shorten line
iyy1 = 1 - memndx 'used locally to shorten line
iyy2 = grplen \ (1 - not memndx) 'used locally to shorten line
for recptr = 1 to ndxgrp 'loop thru each index group
grpptr = recptr 'sort group record pointer
iyy3 = (grptot - subtot) * (ixx2 and (recptr = ndxgrp))
iyy4 = (grptot - subtot) * (ixx1 and (recptr = ndxgrp))
ircd = (recptr + ixx1) * iyy2 + iyy3 * iyy1 + ixx4 - ixx1
ircx = (recptr + ixx2) * iyy2 + iyy4 * iyy1 + ixx3 - ixx2
if memndx then 'get index pointer from memory buffer
ichr = midchar(nrcd$, ircd + 1) * 256 'high byte of index
rcdptr = midchar(nrcd$, ircd) + ichr 'same as cvi(mid$(...
else 'get index pointer from file
call fileio(ndunit, ndxlen, clng(ircd), nrcd$, 0)
rcdptr = cvi(nrcd$) 'set pointer to retrieve data
end if
call fileio(sdunit, sortln, clng(rcdptr), sdat$, 0) 'get data
gosub fillproc 'add 1 record from each sort group to buffer
iptx(recptr, 0) = ircd 'begin ptr.to load ndx.rcd. from group
iptx(recptr, 1) = ircx 'end ptr.to load ndx.rcd. from group
next
recptr = ndxgrp 'reset groups-pointer to begin output
if sortsq < 0 then outptr = recptr 'begin output in reverse order
else 'non-merge; all output from memory
if sortsq < 0 then outptr = outtot 'begin output in reverse order
end if
end if
if opcode = -3 then 'build index from sorted data
call killfile("sortdata.ddx", ddunit) 'kill user index file
open "sortdata.ddx" for binary as #ddunit 'open user index file
ddxrcd$ = space$(2048) 'collection buffer for index-build
filptr = 0 'record pointer for writing .ddx buffer to file
ddxptr = 1 'buffer pointer for adding index values to ddxrcd$
gosub getproc 'get first index record
while not closed 'retrieve index pointers and save to .ddx file
mid$(ddxrcd$, ddxptr) = mki$(rcdptr) 'copy index to .ddx buffer
ddxptr = ddxptr + 2 'increment buffer pointer
if ddxptr > 2048 then 'write a group of data to file
filptr = filptr + 1 'increment file pointer
call fileio(ddunit, 2048, clng(filptr), ddxrcd$, -1) 'put data
ddxptr = 1 'reset buffer pointer to beginning of buffer
end if
gosub getproc 'get next index records
wend
if ddxptr > 1 then 'save leftover index pointers
call fileio(ddunit, 2048, clng(filptr + 1), ddxrcd$, -1) 'put data
end if
close #ddunit 'close the .ddx file
ddxrcd$ = "" 'reclaim memory from .ddx buffer
else 'retrieve a single sort record and return to calling program
gosub getproc 'get a record pointer
if not closed then 'retrieval OK as long as more records available
call fileio(sdunit, sortln, clng(rcdptr), sdat$, 0) 'retrieve data
end if
end if
if closed then 'retrieval/index completed
if opcode = -2 then 'final (single-record retrieval) sequence
call killfile("sortdata.ndx", ndunit) 'kill sort index workfile
call killfile("sortdata.sdx", sdunit) 'kill sort data file
sdat$ = "" 'kill sort data buffer
end if
nrcd$ = "" 'kill index file buffer
sbuf$ = "" 'kill main sort group buffer
sndx$ = "" 'kill sort index buffer
end if
end if
exit sub 'return to calling program
'----------------------------------------------------------------------
fillproc: 'put sort data into sbuf$, sndx$
'----------------------------------------------------------------------
if opcode = 0 then lset drcd$ = sdat$ 'load all segments at once
iptr = 1 'initialize work buffer pointer
for segno = 1 to nosegs 'load segments into work buffer and/or do invert
if midchar(smsk$, ibeg(segno)) = 66 then 'invert 16-bit integer strings
ichr = midchar(sdat$, ibeg(segno)) 'save first byte, then swap
mid$(drcd$, iptr) = char$(midchar(sdat$, ibeg(segno) + 1)) '2nd byte
mid$(drcd$, iptr + 1) = char$(ichr) 'put 1st byte in 2nd position
else 'non-integer (character) sort segment
if opcode then 'segments not in original (contiguous) sequence
mid$(drcd$, iptr) = mid$(sdat$, ibeg(segno), ilen(segno))
end if 'insert each sort segment into temp. buffer [above]
end if
if iseq(segno) = nvflag then 'invert data for ascend/descend sequence
for ichr = iptr to iptr + ilen(segno) - 1 'do each byte in segment
mid$(drcd$, ichr) = char$(255 - midchar(drcd$, ichr))
next 'data will be re-inverted before writing to file
end if
iptr = iptr + ilen(segno) 'increment work buffer segment pointer
next 'begin binary search for sort compare [below]
topptr = recptr 'set top end of binary search
lowptr = 0 'set low end of binary search
while topptr - lowptr > 1 'search work data buffer using work index buffer
midptr = lowptr + (topptr - lowptr) \ 2 'set mid point for compare
ichx = midptr * 2 'mid-position incorporating 16-bit index width
ichr = midchar(sndx$, ichx) * 256 'same as cvi(mid$(.....))
iptr = (midchar(sndx$, ichx - 1) + ichr - offset - 1) * sortln 'mid-
if left$(drcd$, compln) <= mid$(sbuf$, iptr + 1, compln) then '-buff.pos
topptr = midptr 'move search lower
else 'sort record value > compare value in sort memory buffer
lowptr = midptr 'move search higher
end if
wend
iptr = topptr * 2 - 1 'current index-"stack" insert position
mid$(sbuf$, (grpptr - 1) * sortln + 1) = drcd$ 'write sort data to buffer
mid$(sndx$, iptr + 2) = mid$(sndx$, iptr, (recptr - topptr) * 2) 'shift ndx
mid$(sndx$, iptr) = mki$(grpptr + offset) 'write current pointer to index
return 'return to calling routine
'-----------------------------------------------------------------------
getproc: 'retrieve a record from the sort
'-----------------------------------------------------------------------
if ndxgrp then 'merge-retrieval from sort groups
if recptr then 'sort records are still available
ichr = outptr * 2 'mid-position based on 16-bit index width
grpptr = midchar(sndx$, ichr - 1) + midchar(sndx$, ichr) * 256
if memndx then 'get group pointer from work index [above]
ichr = midchar(nrcd$, iptx(grpptr, 0) + 1) * 256 'get record ptr
rcdptr = midchar(nrcd$, iptx(grpptr, 0)) + ichr 'from memory-index
else 'get record pointer from index file
call fileio(ndunit, ndxlen, clng(iptx(grpptr, 0)), nrcd$, 0)
rcdptr = cvi(nrcd$) 'nrcd$ is a 16-bit integer record
end if
if sortsq > 0 then mid$(sndx$, 1) = mid$(sndx$, 3) 'shift work index
if iptx(grpptr, 0) = iptx(grpptr, 1) then 'end of group reached
recptr = recptr - 1 'decrement group stack pointer
if sortsq < 0 then outptr = recptr 'set output pointer if appl.
else 'end of group not yet reached
iptx(grpptr, 0) = iptx(grpptr, 0) + (1 - memndx) * sortsq'move ptr
if memndx then 'get a data record using a pointer from memory
ichr = midchar(nrcd$, iptx(grpptr, 0)) 'get the record pointer
ichx = midchar(nrcd$, iptx(grpptr, 0) + 1) * 256 '..from memory
call fileio(sdunit, sortln, clng(ichr + ichx), sdat$, 0)
else 'get a data record using a pointer from the index file
call fileio(ndunit, ndxlen, clng(iptx(grpptr, 0)), nrcd$, 0)
call fileio(sdunit, sortln, clng(cvi(nrcd$)), sdat$, 0)
end if
gosub fillproc 'add the data record to the merge-sort
end if
closed = 0 'retrieval process not closed
else 'no more records available
closed = not 0 'retrieval process closed
end if
else 'non-merge sort retrieval; all data is in memory
if outtot then 'sort records are still available
ichr = outptr * 2 'mid-position based on 16-bit index width
rcdptr = midchar(nrcd$, ichr - 1) + midchar(nrcd$, ichr) * 256
outptr = outptr + sortsq 'increment or decrement index pointer
outtot = outtot - 1 'decrement remaining records
closed = 0 'retrieval process not closed
else 'no more records available
closed = not 0 'retrieval process closed
end if
end if
return 'return to calling routine
'----------------------------------------------------------------------
putproc: 'add a record to the sort
'----------------------------------------------------------------------
if recptr > maxrcd then 'too many records to fit in memory
if ndxgrp = 0 then 'first group; initialize index group variables
grptot = recptr - 1 'number of records per group
ndxlen = grptot * 2 'size of index file buffer
end if
gosub writeproc 'save data group and index group
offset = rcdptr - 1 'group-to-record offset counter
recptr = 1 'reset group record counter
grpptr = 1 'sort buffer pointer
end if
gosub fillproc 'add current record to sort
rcdptr = rcdptr + 1 'increment total records counter
recptr = recptr + 1 'increment group record counter
grpptr = recptr 'sort buffer pointer
return 'return to calling routine
'-----------------------------------------------------------------------
writeproc: 'write index and sort data to files
'-----------------------------------------------------------------------
ndxgrp = ndxgrp + 1 'increment the index group number
call fileio(ndunit, ndxlen, clng(ndxgrp), left$(sndx$, ndxlen), -1)
if opinit > -3 then 'initial sequences; save sort data to .sdx file
for iptr = 0 to (maxrcd - 1) * sortln step sortln 'loop thru mem.buffer
for segno = 1 to nosegs 're-invert data as appropriate
iptz = iptr + ibeg(segno) 'sort group memory buffer pointer
if midchar(smsk$, ibeg(segno)) = 66 then 'invert integer string
ichr = midchar(sbuf$, iptz) 'save first byte, then swap
mid$(sbuf$, iptz) = char$(midchar(sbuf$, iptz + 1)) '2nd byte
mid$(sbuf$, iptz + 1) = char$(ichr) 'put 1st byte in 2nd pos.
end if
if iseq(segno) = nvflag then 'invert data for ascend/descend seq
for ichr = iptz to iptz + ilen(segno) - 1 'invert each byte
mid$(sbuf$, ichr) = char$(255 - midchar(sbuf$, ichr))
next
end if
next
next
sdxlen = maxrcd * sortln 'size of group memory buffer
xflptr = lof(sdunit) \ sdxlen + 1 'current data "record"
call fileio(sdunit, sdxlen, xflptr, sbuf$, -1) 'put data group to file
end if
return
end sub 'return to calling program
sub fileio (fcno, flen, xrec, fbuf$, fopr) static 'read/write file data
'int fcno 'file unit/channel no.
'int flen '"record" length used for positioning only
'int fopr '0 = read; non-0 = write
'long xrec 'logical "record" number
'char fbuf$ 'read/write data buffer
xpos = (xrec - 1) * flen + 1 'absolute byte position in file
if fopr then 'operation = write
put #fcno, xpos, fbuf$ 'write data to file
else 'operation = read
get #fcno, xpos, fbuf$ 'read data from file
end if
end sub 'return to calling program
sub killfile (ffil$, fcno) static 'kill a DOS file
'int fcno 'file unit/channel no.
'char ffil$ 'file name
close #fcno 'close file if open
open ffil$ for binary as #fcno 'open file in binary mode
close #fcno 'close the file
kill ffil$ 'kill the file
end sub 'return to calling program
sub memfree (xexc, xmax, xfree) static 'get max. free memory less exclusion
'long xexc 'amount of memory to reserve/exclude
'long xmax 'upper limit for xfree (or zero)
xfree = fre("") - xexc 'total free memory less exclusion
if xmax > 0 and xfree > xmax then xfree = xmax 'set maximum if applicable
end sub 'return to calling program
Copyright © 1991, Dr. Dobb's Journal