Leor Zolman wrote "BDS C", the first C compiler designed exclusively for personal computers. Since then he has designed and taught programming workshops and has also been involved in personal growth workshops as both participant and staff member. He still doesn't hold any degrees. His latest incarnation is as a CUJ staff member.
Series Introduction
If you're a recent convert to C from any other high-level language and you've tried to write programs that do any serious file input/output, then chances are you've experienced more than a little bit of frustration. The C standard library, in keeping with the general philosophy of the C language, provides tools powerful enough for doing anything you want, provided you know how to correctly combine those tools. In the case of file I/O, about the only operations supported in a "trivial" manner are:
For reading and writing any other flavor of data structure to or from the disk, a certain level of "C sophistication" is required. Often, the task quickly moves beyond "How do I read and write this data?" toward the more general problem, "What is the most appropriate way to represent this data in order to facilitate efficient means of reading and writing it to disk?"
- reading and writing bytes
- reading and writing single lines of text
In this series of tutorial articles, I will develop from scratch a complete special-purpose "mini-database" system in order to illustrate the process of designing file based C applications. The resulting system will be functional but intentionally inadequte for any particular task.
This first installment will consist of an operational description of the database, broken into the following areas:
The second installment will present the database record editing and management mechanism.
- data structures
- functional description
- user interface (the menu system)
Later installments will present several different approaches to storing the data on disk and will discuss the relative merits of each approach. The first version will store all data to disk as user-readable text and will use statically-allocated arrays in memory.
The second version will store all data to disk in binary format for rapid transfer. I'll also develop two memory allocation systems for the binary version: static array allocation (same as for the textual disk format) and dynamic array allocation (to optimize the use of system memory).
Mini-Database Data Record Structure
This will not be a "general-purpose" database system, but rather a program built to handle only one specific record format: a personnel record as in Table 1.The definition of the structure tag for this record, named record, is shown in the header file (Listing 1, lines 30-37).
The system will be able to handle only one active database at a time. We'll use dynamic memory allocation to obtain storage for the data records, so that data memory is allocated only when necessary. For the first version of the system, the list of data record pointers will be kept in a statically-allocated (i.e., fixed-length) array. The definition of this array is shown on line 49 of the header file. The name of the array is recs, and its type is
array (of MAX_RECS elements) of pointers to structures of type recordThe programmer must explicitly size a fixed-length array. In my code the size is MAX_RECS. Thus, the total amount of fixed memory needed for storing the records of the database is MAX_RECS times the size of a single record pointer. (In later versions, I'll even show how to dynamically allocate the storage for the recs array itself. To facilitate this modification the symbol RECS is introduced (Listing 1, line 50) as an alias for recs.)Lines 42-46 (of Listing 1) illustrate a necessary complication when writing multiple-source-file programs: global data must be defined in one module and one module only. If the data is to be known in any other modules of the program, it must be declared in those other modules. Definitions actually reserve storage for the specified data, while declarations only serve to inform the compiler about the nature of data defined elsewhere. This simplistic rule of thumb will usually differentiate between definitions and declarations appearing in header files:
If the extern modifier is used, you're probably looking at a declaration; otherwise, you're probably looking at a definition.
To conform with the ANSI Standard, each data item should be defined only once among all the source modules of a program. At first it might seem one need only insert an extern keyword in front of all but one declaration, making it the definition. Unfortunately, this is not easily done. Typical multiple-module programs use lots of shared data; do we really want to maintain separate lists of declarations in separate modules, some having the extern keyword and some not? Of course not; we'd rather have all the data included within a single .h file. But if the declarations/definitions must be written differently in separate files, can we really use a single header file? Yes. Lines 42-46 illustrate a symbolic constant to control whether the extern keyword is generated for the critical declarations. If MAIN_MODULE exists, then we are compiling the main module of the program and the symbolic constant EXTERN is defined to nothing (so the items in lines 49, 52 and 53 are defined). Otherwise EXTERN is defined to extern and the lines are treated as declarations of external data. To force definitions to be created as the main module is compiled, we #define MAIN_MODULE (see Listing 2, line 24) before the inclusion of the header file. The other modules of the program do not contain such a definition.
(Note: The difference between definitions and declarations has been rendered fuzzy by variations among C compilers over the years. Microsoft, perhaps to eliminate the need for exactly the sort of mechanism just presented, decided to make its linker allow multiple definitions of the same piece of data among source files of a program (although multiple initializations were still flagged as errors.) While this does simplify development in some cases, it renders C programs relying on this "feature" non-portable. Turbo C 2.0, under which this database program was developed, makes you "do it right", even if doing so requires a little bit more thought.)
Other Global Data
The system maintains a minimal amount of global data to describe the currently open database's state. The variable n_recs tells how many records are currently held in memory. The variable max_recs contains the maximum number of records that can be represented. For the fixed-length array version, max_recs is simply assigned the value of the symbolic constant MAX_RECS (Listing 2, line 72).
A Simple Menu System
A simple line-oriented menu system serves as the user interface. The menu function do_menu is shown in MDBUTIL.C (Listing 3, lines 21-39.) The menu consists of a list of pointers to structures of type menu_item (Listing 1, lines 55-58), where each menu_item consists of an integer action code and a string description of the action. do_menu simply numbers and lists out each description (up to but not including the first entry with action_code of 0), asks the user to pick one of the choices, and returns the action_code value associated with the selected item. Note that the action_code values need not correspond to the choice numbers displayed by the function.
The Main Menu Options
The database operations are divided into two menus. The first menu (Listing 2, lines 37-48) contains the options for controlling database selection, disk I/O and program termination. The second menu, within the MDBEDIT.C module (shown in a future article) controls all the options associated with editing the data records of the currently active database.The main menu controls the top-level system functions. A variable, db_active, tells whether a database is currently open, and thus whether certain operations are appropriate. For example, we don't want to allow the user to open a new database if another is currently open. The main menu options are as follows:
CREATE:
Initialize a new database. Ask the user for a name for the database (this will also be the name of the file used to store the database on disk) and check to make sure another file does not already exist by that name. If the name is OK, then initialize max_recs, n_recs and db_active.
EDIT:
Call the edit_db() function to edit the records of the database.
OPEN:
Load a previously stored database from disk (via the read_db function), then go immediately into editing that database by calling edit db. read_db() allocates the appropriate amount of memory for the database records, assigns the pointers to elements in the RECS array, and returns the number of records loaded. We announce the number of records before calling edit_db().
BAKUP.
This menu entry is included to encourage backup facilities in your applications. The backup function, backup_rib(), is just a dummy.
CLOSE:
Terminate operations on the current database, write it to disk and free up all associated storage.
SAVE:
Write the database to disk, preventing loss of work "so far" in case of a system crash. Do not close the database.
ABANDON:
Close the database without saving it to disk: free up all storage.
QUIT:
Exit the program.
Utility Functions
Listing 3 shows the source module MDBUTIL.C, containing utility functions used throughout the program. In addition to the do_menu() function (already described), this module includes error(), alloc_rec() and free_up().The error() function is a general-purpose fatal exit. It prints a message and exits.
The alloc_rec() function is not used by any of the code in this month's listing, but is basic to the operation of the program. alloc_rec() is called to obtain memory from the system to store a single record of database data. The malloc() function is called to actually obtain the block of storage. alloc_rec returns either NULL, signaling that the system has no more storage to spare, or a valid memory pointer obtained from malloc().
The free_up() function returns all storage (obtained through calls to alloc_rec) back to the system. In this system storage is always freed up for the entire database at one time (when the current database file is closed or abandoned.) Freeing that storage is simply a matter of walking through all the records of the database and calling the free() function for each pointer.
Next month: Editing the database records.