Features


An Applied File I/O Tutorial: The Mini-Database System

Leor Zolman


Leor Zolman wrote "BDS C", the first C compiler designed exclusively for personal computers. Since then he has designed and taught programming workshops and has also been involved in personal growth workshops as both participant and staff member. He still doesn't hold any degrees. His latest incarnation is as a CUJ staff member.

Last month I began this series on a special-purpose small database system by presenting the global data structures and main program. While my main goal is to present an applied example of the design interdependencies between data structures and disk I/O techniques, there is still one major component of the system to tackle before sinking our teeth into the actual file I/O code: the data-entry and editing module.

The entire listing this month is MDBEDIT.C (Listing 1), the record editing module. It consists primarily of the function edit_db(), most of which is a large switch statement to handle all the available editing options. edit_db() is called twice from the main program: to edit a database (via the EDIT main menu command) and to begin editing a newly created database (after the CREATE command has been issued).

edit_db() maintains an internal state variable, cur_rec, identifying the currently selected record. Upon entry to edit_db(), cur_rec is set to zero (the number of the first record of the file). For coding convenience, the variable rp (a pointer to structures of type record) holds a pointer to the cur_rec'th element of the RECS array; this allows any expression of the form

RECS [cur_rec] ->member
to be written more simply as

rp->member
provided, of course, rp is updated to reflect any changes made to the value of cur_rec.

The global variable n_recs (declared externally in header file mdb.h) is also vital to the record editing process: whenever a record is added or purged from the database, the value of n_recs is adjusted accordingly. n_recs always reflects the total number of records, both active and inactive, currently stored in the database.

Record Management

To simplify the task of inserting and deleting records, I've selected a simple scheme for record management: each record is either "active" or "inactive", as reflected by the value of the active flag associated with the record. When a new record is created, its active flag is set to TRUE (1). To delete a record, the DELETE command merely changes the record's active flag to FALSE (0). The (now inactive) record's other data remains intact, however, until the FIX command is used at some later time to release the storage of all inactive records en masse.

The commands for perusing data records sequentially — NEXT and PREVIOUS — both ignore inactive records. The LIST command, however, lists all records in the database, both active and inactive. To access an inactive record, the SELECT command may be used to address the record directly via the record number shown by the LIST command. Once an inactive record has been SELECTed, the UNDELETE command may be used to make the record active once again, if desired.

Inactive records only "stay around" until the next time the FIX command is invoked. Each time FIX is run, the storage for all deleted records is freed and the record pointers are compacted and sorted.

The User Interface And Main Editing Loop

A menu structure, similar to the one shown last month for the main menu, controls the option menu for edit_db(). The record editing options I chose to implement under this menu are rudimentary to any sort of data-retreival system and, for brevity's sake, are operationally limited to a line-oriented interface.

Upon entry to edit_db(), cur_rec is initialized to the first record of the database and an infinite loop is begun. For each iteration of the loop, rp is set to point to the new current record's data, and the values of all fields in the current record are displayed on the console output. If there are currently no records in the database (determined by checking the value of n_recs for 0), then the current record display sequence is skipped.

Next, the menu is displayed and the user selects from the list of editing options. The options are:

LIST_RECS

This option displays a "short form" listing of all records in the database. For each record, the record number (equal to the record pointer's subscript number in the RECS array) and name fields are always listed. If the value of the variable active for the current record is false (indicating that the record is inactive), then the message "(deleted)" is also displayed. Records are displayed in their physical order in the RECS array; this will correspond to the order of entry until the first time the FIX option is invoked (see below.)

NEXT, PREVIOUS

These two options select the next or previous active record (in physical sequence) to be the "current" record by adjusting the value of cur_rec as required. Before adjusting cur_rec, however, we check that we're not sending its value out of bounds, by iterating the variable i forward or backward through the records. If an active record is encountered, then cur_rec is set accordingly; if, however, the end-of-file (or beginning-of-file) is reached, then a message to that effect is displayed and cur_rec remains unchanged.

NEW

Creates a new record. If we have reached maximum record capacity, then an error message is displayed and the option is aborted. The alloc_rec() function returns a pointer to storage for the new record; if the return value is NULL, then memory could not be allocated and the option is aborted.

Once storage is acquired, cur_rec is set to the old value of n_recs (the record count) and n_recs is incremented to reflect the new number of records (line 125). After this operation, cur_rec is the subscript of the slot in RECS that will hold the new record pointer. Next, rp is copied into the new slot (line 126) and the element active is made TRUE (line 127). In lines 129 though 134, all data elements of the new record are initialized to "empty" values and control falls through to the next case, MODIFY.

MODIFY

This section is a quick-and-simple editing sequence for the data elements of the current record. For each element in sequence, the following is performed:

In a "real" application additional input validations would occur at this point. For example, to limit the gender field to only the values M or F, you could change the if sequence in lines 154 and 155 to:

while (strlen(gets(buf)) > 0 /* if not empty line */
{
    char c = toupper(*buf);
    if (!(c == 'M' || c == 'F')) /* if not legal */
    {
        printf("Please enter 'M' or 'F': ");
        continue; /* ask again */
    }
    rp->gender = c; /* assign the value */
    break; /* exit the loop */
}

DELETE

Deleting a record is simply a matter of setting the active flag to 0. If the flag is already 0, we display a message to that effect. Otherwise, we make the user confirm the deletion by typing a 'y'.

UNDELETE

This option restores a "deleted" record by setting its active flag back to 1. If the record was active in the first place, a message is issued.

SELECT:

Used in conjunction with the LIST option, this option can directly select any record in the database. SELECT asks for a record number and checks to make sure the value is within bounds. If the record number is valid, it is assigned to cur_rec. Since we wish to allow both active and inactive records to be selected, there are no further validity checks necessary.

FIX

The FIX options serves a dual purpose: first, to purge all inactive (deleted) records from the database; second, to sort the remaining records. The fix_db() function does the work, and herein lies the only tricky code in this installment.

To purge deleted records, a single pass is made over the database records using two indices. The first index, the loop variable i, iterates over the entire array of record pointers (that is, all n_recs records). The second index, new_n_recs, is incremented only for active records; when an inactive record is encountered, new_n__recs is not incremented, and the pointer to the inactive record's data is passed to the free() function to deallocate the storage.

At the start of each iteration through the loop, record i is copied into position new_n_recs (line 222), and then new_n_recs is adjusted as described. The result is that the memory used by all inactive records is de-allocated, and the positions in the RECS array formerly occupied by inactive record pointers are filled in with active pointers.

After the purge loop completes, a single call to the qsort() library function sorts the remaining records. qsort() is a wonderfully versatile function, but the interface can be a bit confusing. In this case, we wish to sort the pointers in the RECS array by the last field; we first set up the call to qsort() and then set up a comparison function that qsort() uses to determine the relationship between two pointers.

qsort() takes four parameters: a pointer to an array of data items to be sorted, the number of items in the array, the size of each item, and a pointer to a user-supplied function (to be called by qsort() directly) that accepts two pointers to the data array elements as parameters, compares the two data items specified by the pointers, and returns (to qsort() one of several pre-defined values indicating the relative ordering of those two data items.

Since we are sorting the pointers in the RECS array, we pass RECS as the first parameter. The number of items in the array is new_n_recs (as a result of the purge loop just completed.) As we are sorting pointers, we pass the expression

sizeof(struct record *)
as the data item size (by using the sizeof operator instead of hard-wiring in a constant value such as two for this parameter, we insure portability of this code among different memory models or processor families.) For the final parameter, we supply the name of a comparison function named compar().

The comparison function, as described above, must be designed to accept parameters that are pointers to the data type being sorted. Since the data we are sorting have type "pointer to structure", the parameters to the comparison function must be of type "pointer to pointer to structure." For a return value, the comparison function must return either zero (if the data items are equivalent, using whatever arbitrary ordering rules we wish to apply), a negative value (if the first item is "less than" the second by those same rules), or a positive value (if the first item is "greater than" the second.) The function header (line 242) reflects these specifications.

Notice that a prototype for the compar() function is given (line 214) so that the identifier compar is recognized as a function having the proper characteristics when fix_db() calls qsort() (line 230). Alternatively, the entire definition of compar() could have been placed where the prototype is now located, eliminating any need for the separate prototype. I prefer to list functions in top-down order, however, so the prototype becommes necessary.

The body of the compar() function compares the element last (a string representing the last name of the person described in the record) from both records using the strcmp() standard library function. strcmp() is ideally suited for use with qsort(), because its return value is defined exactly as specified for the qsort() comparison function. The tricky part of compar() is remembering to de-reference the "structure pointer pointer" parameters in order to use them as simple "structure pointers" (line 245.) Note the need for parentheses around the indirection sub-expression of each parameter to strcmp(); without the parentheses, the selection operator (->) would take precedence over the indirection operator (*) and draw a compiler error.

As For The File I/O...

Between this and last month's listings, the system is now complete except for the read_db() and write_db() functions. Before the next installment, you may wish to key in the system as presented so far (inserting a pair of dummy functions in lieu of the real read_db() and write_db()) and play around a bit with entering, deleting and editing records in order to get a feel for the system's operation.

I'll devote future installments to showing some different approaches to implement read_db() and write_db(). As a challenging exercise, see if you can come up with a scheme for implementing these functions on your own. Would you save the data on disk in the form of ASCII text or binary data? Which library functions would you use in each case? What would be the advantages of ASCII over binary, or vice versa?

For a smaller-scale exercise, see if you can develop a replacement for the do_menu() function that has a slicker user interface. For example, you might include the ability to specify commands by a highlighted letter within the text of the command, as well as by a sequential number. Part of the menu specification would then have to include information about which letter to highlight — perhaps by including a "magic" character in the menu item text that flags the subsequent character as special. Or, you may choose to adapt a windowing menu routine from some general C library function package.