Choosing a Table Search Algorithm


Choosing the proper search algorithm can greatly impact the efficiency of a program. This is especially true as a table grows very large. For example, consider an array with n elements (size n, assume elements start at 1). Each element of the array is a structure with the first field designated as the key. The search algorithm will compare a search value to each key in the table until it finds a match or the table is exhausted. I describe two types of search techniques here: the linear search and the binary search. (Hash tables are also an alternative to certain searching applications; however, they are mainly used for tables that are highly volatile or built at run time. The table in this article is fairly static and is built at compile time.)

Linear Search

Linear searches are the simpler of the two and normally considered the least efficient. The search progresses in a linear fashion (i.e. table position 1, position 2, ... position n). Processing time increases linearly with n. As a measure of efficiency, I present figures in terms of number of accesses required:

Best Case     : 1    —  a search for the first element in the table
Worst Case    : n    —  a search for the last element in the table
Average Case  : n/2  —  on average half the numbers are searched
A simple technique to eliminate a comparison step can increase the efficiency of a linear search by 20 to 50 percent. Most unimproved linear searches will require a simple comparison to limit the loop (ex: if i <= n) at each iteration. An improved algorithm elimates these comparisons by first placing a copy of the search value at location n+1. The algorithm can now walk through the array without checking if it has exceeded maximum table size, since it is certain stop its search at location n+1, if not before. If the search algorithm progressed as far as location n+1, then the search value was not originally in the table.

It is also possible to order the array so that the most frequently accessed values are at the top of the table. However, this technique requires the program to keep statistics as the table is continuously searched, and to perform periodic re-ordering of the table.

Binary Search

Binary searches are in many cases more efficient than linear searches. However, they have one major drawback: the table must be pre-sorted. Binary searches use the bisection method to search the table, much like a human looks through an alphabetized index. For example, if the table is of size 10, the first element searched is at the mid-point (1+10)/2 = 5. If element 5 is not a match, a simple comparison tells us whether the key is above or below element 5 (since the table is sorted). At this point half the table is eliminated from further consideration. If the value is greater than the key in element 5, then the search is confined to locations 6 to 10. The new mid-point is (6+10)/2 = 8. This bisection process continues until either a match is made or the last element is searched. For a listing of the actual algorithm, consult the references provided. In this example, searching the entire table takes at most four iterations (compared to ten for the linear search). The access figures provided here assume that the bisection splits the remaining table into two parts that differ in size by at most one. The figures for binary searches are as follows:

Best Case     : 1           —  a search finds the first mid-point searched in
                               the table
Worst Case    : log2 n      —  a search finds the last element searched in the
                               table
Average Case  : (log2 n)-1  —  if each element is searched with equal
                               frequency

Comparing Linear and Binary Search

For large tables, the binary search requires much less time than the linear search in the worst case. Consider a table of 50,000 items. The worst-case scenario for a binary search requires no more than 16 accesses whereas the worst scenario for the linear search takes 50,000 accesses! In situations where the majority of searches are likely to fail, the efficiency of the binary search is very compelling. However, the logic of the binary search is much more complex (and thus much more time-consuming) than the logic of the linear search. Also, as already mentioned, the requirement for a sorted table is a drawback of the binary search. In this case, efficiency is impacted by the sorting method employed. Thus, for certain applications with small tables, the linear search may well be the faster of the two.

The volatility of the table also influences the choice of search technique. If the table is constantly being updated, then the sorts required by the binary search will occur with greater frequency, decreasing its efficiency relative to the linear search. However, if the table is static, then the sort is performed only once, thus increasing the attractiveness of the binary search.

In the context of the article, the size of the table is relatively small, so it might be best to use a linear search. Even in larger Windows applications, the number of case statements in a single switch structure will most certainly be less than 1000. However, the internal function table is very static and thus eliminates the primary objection to using a binary search.