October 1991/A Generic Search Routine

Features

A Generic Search Routine

Matt Weisfeld and Michael J. Gilson

Matt Weisfeld and Michael J. Gilson are both currently employed by the Allen-Bradley Company as senior system test engineers and are responsible for the development of test software on VAX/VMS, DOS and UNIX platforms. Matt has ten years of programming experience and has taught computer programming and software design at two universities. Michael Gilson has five years of experience as a C developer, and worked for four years as a test engineer developing assembly code for production test equipment at Honeywell Industrial Programmable Controls Division.
Because of the way the internal master structure of the database is built, text books on database development using C typically include code for more than one search routine. Each transaction read into the database conforms to this structure, which may contain any number of fields. Since most database programs allow a user to query all of these fields, a search mechanism must be in place to handle all possibilities. This problem is usually solved by writing separate code to search each field in the master structure array, but this results in redundant code and unwanted maintenance headaches.
In this article, we'll show how to use void pointers and pointer arithmetic to implement a single search routine that will handle all variables, regardless of scalar data type. By taking advantage of these tools, a programmer can save time, reduce code, and simplify program maintenance. We'll illustrate this generic search solution with a sample database application.

Intended Application
Suppose that a database application requires a master record with four fields: id (an integer), number (an integer), price (a floating point) and code (a string of 10 characters). The master structure declaration is as follows:

struct record { int id; int number; float price; char code[10]; };
Also assume that the database program can perform a query on all of these fields.
For example, one legal query could be

find a price = 33.3
You could satisfy this query by creating a separate search routine to loop through all master records looking for a price of 33.3. In the same vein, you could write a separate search routine to handle the other fields in the structure — id, number, and code. This approach would be reasonable if all master structures were this small. However, imagine the amount of code necessary for a master structure of 50 fields? Worse still is the fact that a code change is required every time a new field is added or an existing one is modified. How can this query, or any other query be satisfied with one search routine?
The query must be parsed and the information it contains extracted. How the parsing is done is not a major concern for this discussion. It is part of the user interface, not the database system. However, the information that the line contains is crucial. The search is interested in two pieces of information: the keyword price and the value 33.3. First, you must determine if this information is valid. If price is not a field in the master structure, there is nothing with which to compare it. Second, it must be predetermined that, at least in this case, price is a floating-point number. Under no circumstances can price represent more than one scalar data type in the same application. Third, keyword price in the master structure must be located such that the search can operate on the proper field.
In short, you must define whether a keyword is valid, define its scalar data type, and determine where it is located in the master structure.

Data Definition Structure
All of the information needed to solve this problem can reside in a data definition structure. For example, a structure called field_definition is defined as

struct field_definition { char keyword[20]; int type; struct record *position_ptr; };
In essence, a field_definition defines a legal keyword, the data type that is assigned to it, and the position in the array where to commence the search for that keyword.
The field keyword defines a legal field name. By using the master structure as a guide, the valid keywords acceptable to this application are id, number, price, and code. The field type defines what scalar data type the keyword is. The example master structure contains keywords that are integers, floating-point numbers, and character strings. (These are not the only scalar data types available to the programmer.) Finally, the pointer position_ptr locates the keyword in the first master record. This field specifies where the search will begin. For example, if the current task is to search all records to find a price of 33.3, this pointer would contain the address of the field price in the first transaction of the master record array. Listing 1 presents the file common.h that contains the structure definitions and program constants.

Generic Search Algorithm
Now that the two main structures are defined, it is possible to construct samples for both. Because this article centers around the search algorithm, the structures are built internally. In any practical application, all the information loaded into the master record array would be read in from a file. A basic linear search will search the arrays in this program, but any search algorithm would suffice.
The sample field_definition structure is listed below, the array is called definitions:

struct field_definition definitions[] = { "id", INT, &tran_record [0].id, "number", INT, &tran_record [0].number, "price", FLOAT, &tran_record[0].price, "code", STRING, &tran_record[0].code, };
The structure that follows shows what a transaction looks like. Each entry of the sample master structure array contains four fields: id, number, price, and code. Each field has two attributes associated with it, a scalar data type and an address. To relate this sample field_definition to a sample master structure array consider the following:

struct record tran_record[]= { 0, 0, 0.0, "zero", 1, 11, 11.1, "one" 2, 22, 22.2, "two", 3, 33, 33.3, "three", 4, 44, 44.4, "four", };
The first transaction of the tran_record array contains the id (int) 0, the number (int) 0, the price (float) 0.0, and the code (string) "zero". Notice that this transaction conforms to the rules for valid keywords and scalar data type in the field_definition structure. The other rule in field_definition concerns the pointer. This rule generates the most interest and perhaps the most confusion.
Figure 1 illustrates how the pointer field is used. Notice that this field points to the corresponding keyword in the first transaction.
In effect, the pointer field in definitions is the address of the starting point of the search. The fact that it is defined as a structure pointer represents the key to this algorithm. When a structure pointer is incremented, it is bumped up by the size of the structure itself. Thus, if a pointer is assigned to the following address:
struct record *tran_ptr;
tran_ptr = &tran_record[0].price;
the next value of price can be accessed by executing the command:
tran_ptr++;
To illustrate further, assume that the search is for a keyword price with the value of 22.2. The address of the first value of price is already known by inspecting the initial address. See Figure 2. A comparison reveals that tran_record[0] has a price of 0.0, not 22.2.
Now the following command is executed:
tran_ptr++;
The pointer has been incremented by the size of the structure record, keeping it on the same word boundary. See Figure 3. Another comparison reveals that tran_record[1] does not have a price of 22.2.
By incrementing the pointer once more the correct value is found. See Figure 4.
The listings that accompany this article demonstrate how this algorithm is implemented with code.

Determining The Type
There are two routines at the heart of the search algorithm. The actual search function is lookup. Because lookup has no knowledge of the scalar data type that is passed to it, the function type_check provides the information.
The prototype for type_check is

FUNC type_check( struct field_definition *definitions, struct record *tran_ptr, int transaction_size, char *user_keyword, char *user_value);
The new items here are transaction_size, user_keyword, and user_value. The integer transaction_size in this case is 5 (five records in the array) and is calculated as

transaction_size = sizeof (tran_record)/sizeof (tran_record[0]);
In essence, this line divides the size of the entire array by the size of one tran_record, thus providing the number of members of the array.
The two character pointers make up the user-supplied key. For example, if the search criterion was to find all prices that are equal to 33.0, then 33.0 would be the user_value and price would be the user_keyword. A user query such as

find all price = 33.0
could be parsed to supply this information. Again, because the search algorithm is the primary focus here, this information will be generated internally, relieving us of the user interface.
type_check performs two functions. It determines whether the user_keyword passed to it actually exists in the structure definitions. If it does not exist, an error occurs. For example, the following query would be illegal:

find all amounts = 33.3
because amounts is not a valid keyword. When a match is made, the second function is performed. With the keyword known, the type is easily determined and can be converted to its proper format. For example,

integer = atoi (user_value); floater = atof (user_value); string = user_value;
Notice that the user_value was already a string.
At this point, all the information necessary to call lookup has been determined.
Listing 2 shows the complete type_check routine.

Performing The Lookup
The lookup routine is straightforward. It only requires a compact switch statement.
The prototype for lookup is

FUNC lookup (int type, struct record *position_ptr, struct record *tran_start, int transaction_size, void *user_value);
The two structure pointers reference the same array, but at different addresses. As explained previously, position_ptr addresses the proper keyword in the first member of tran_record. The second pointer addresses the beginning of the tran_record array. The second pointer has nothing to do with the search itself, but after a match is made, another routine is called to process the information. The lookup routine passes this pointer to the other routine. The processing routine is discussed in the next section.
The void pointer user_value represents the search key and provides the mechanism that makes this program independent of scalar data type.
When a variable is declared, it is assigned an address that points to a memory location containing a value. An integer pointer points to an integer value, etc. However, it is the size of the memory location that varies among different scalar data types. The address takes up much the same space regardless of what it points to. In most languages, the type of pointer must be declared at compile time. For example, an int pointer cannot be used as a float pointer at any time. However, C allows the programmer to declare a pointer to void. This directs the compiler to provide the space for a generic data pointer and allows the programmer to decide at a later time what to use it for. Of course the memory associated with the value must be allocated by the programmer.
Listing 3 provides a sample case statement that illustrates how the search algorithm works in the case of an integer.
The statement in the for loop:
integer = *((int *) position_ptr);
casts position_ptr to an int pointer. At first glance this may seem improper, because position_ptr is actually defined as a structure pointer. However, remember that the structure begins with an int member. This statement directs that the memory location addressed by position_ptr be treated as an int. In this case position_ptr was loaded with the address of number. By casting position_ptr to an int pointer, the integer value it holds can be accessed. You should understand why the * is present to the left of the first left parenthesis. If the following instruction had been executed:
integer = (int *) position_ptr;
integer would have been assigned the address of position_ptr, not its value. The * is needed to extract the actual value residing at that address.
The if statement:
if (integer == *( (int *) user_value) )
performs the same function as the statement in the for loop. In this case, user_value is a void pointer and must be cast to the appropriate type. Again the * is needed because you are extracting the actual value.
Finally, after a loop iteration is complete, the for loop performs
position_ptr++;
This increments position_ptr by the size of the master record structure, which in effect, points it to the corresponding keyword in the next member of the array.
If a match is made within the loop, a separate routine is called to process the information. In this program the search is for all matches, not just the first occurrence. A print routine is called in this example to print the matched transactions.
Listing 4 shows the complete lookup routine.

Processing The Information
Calling a separate routine to process the information provides more flexibility for the lookup routine. In this way, lookup need not be changed, if, for example, a change is made in the structure of a report. The report function is called lookup_print. This routine is passed the address of each matched transaction and prints it to a file (or the screen). Listing 5 shows the lookup_print program. This independence issue leads to the question of what the lookup routine should return to type_check. With simplicity in mind, the lookup routine in this example passes back a Boolean variable signifying whether or not a match was made. Because it is desirable to have lookup and type_check perform only their basic functions, any further activity is done in the routine that initiates the search.
The complete test program is shown in Listing 6, and the output it produces is in Listing 7.

Conclusion
This generic search program can be adapted to suit individual applications. For example, the sample program never actually calls lookup, only type_check. This was done to illustrate the sequence of events that take place. However, it may be more logical to have the application call a routine named lookup that performs both the type checking and searching functions. Also, individual needs may dictate that the value returned to the application differ.