Kenneth Van Camp is a project engineer and computer programmer at the Xicis Corporation. He is also the author of SURFMODL, a public domain 3-D rendering system,and SWIVEL, a public-domain program for displaying world maps. He holds a BS in Mechanical Engineering from the State University of New York at Stony Brook.
Introduction
When building databases for territory mapping systems, designers usually store a list of points. These (X,Y) coordinate pairs define the outline of the territory and allow for easy graphing of the territory boundary. The program simply draws lines connecting the points. Unfortunately, this straightforward representation makes it very difficult to answer some simple questions about the database.For instance, suppose the coordinates of Company A are known. The database contains the territory boundaries for numerous sales representatives, and the computer must assign a representative to Company A. Whose territory does it lie in?
Or, suppose a district contains several territories and the user needs to ask some questions: Do any two territories overlap? Are there gaps present between boundaries, which have not been assigned to any representative? Does Territory B lie completely within District C?
Such questions are not easily answered because the designers selected a boundary representation of the territories. However, the questions just posed assume that an area representation is known. Line segments have no thickness, yet the database is expected to understand the relation between points and lines and between lines and areas. All of these questions can be answered when using a boundary-represented database, but they are time-consuming queries.
This article demonstrates an alternate method for representing territory databases using a hierarchical data structure known as a quadcode. It has some properties that make it a natural for territory mapping applications. Among these are:
- A single quadcode represents both area and location, so either type of information may be quickly retrieved.
- A territory can be represented at different resolutions in different areas, so details are preserved where necessary but storage is not wasted where they are not. Consequently, quadcode-represented territories can require significantly less storage than other methods.
Quadcode Fundamentals
The quadcode is a base-4 code representing the quadrants of an image. Figure 1 shows a box divided into quadrants numbered 0, 1, 2, and 3 for the upper left, upper right, lower left, and lower right quadrants, respectively. The idea behind quadcodes is to subdivide each of these quadrants repeatedly. The numbers of the new quadrants are appended to the numbers of the quadrants from the earlier subdivisions.For instance, if each of the quadrants in Figure 1 is subdivided into four more quadrants, Figure 2 is created. The address, or quadcode, of each of the resulting 16 boxes is a two-digit base-4 code. I will call each of these digits a quit (quaternary digit), so each small box in Figure 2 has two quits.
The first quit of each small box shows the quadrant number of the original, outer box. The second quit shows the quadrant after subdividing each of these quadrants. This process of subdivision may be repeated as many times as required to obtain the desired resolution of the picture. For instance, the box labelled A in Figure 3 would have a quadcode of 213. The first quit (2) is the most significant one, because it makes the largest subdivision. Each following quit helps to further subdivide the image. Therefore, the more quits a quadcode has, the smaller the box and the greater its resolution. By using an implementation that allows a variable number of quits, different resolutions can be attained at different areas of maps.
Figure 4 shows an example of this. The larger boxes have fewer quits, and therefore represent larger areas. The smaller boxes increase the resolution of the picture where it is needed.
Before discussing my implementation of quadcodes, I must cover a little terminology. When talking about relationships between quadcodes, it is common to use the family analogy that is commonly used in describing tree structures.
In Figure 2, quadcode 0 is said to be a parent of quadcodes 00, 01, 02, and 03. This is because it fully contains these smaller quadrants, and they are only one level higher in resolution. Conversely, quadcodes 00, 01, 02, and 03 are the children of quadcode 0. The more general term is descendant. Any quadcode that is completely contained within another quadcode is its descendant, and the quadcode that contains it is its ancestor. In Figure 3, quadcode A (213) is a descendant of quadcode 2, and also of quadcode 21.
Finally, two quadcodes are siblings if they have the same number of quits, and are identical in all but the last quit (i.e., they have the same parent).
Primitive Operations with Quadcodes
Listing 1 shows QC.H, the header file for my implementation of a QuadCode class. I have chosen to store them in a compact form requiring two bits per quit. The most significant quit is stored in the highest bits of the first byte of storage, and unused quits in a byte are always set to zero. This makes it easy to compare one quadcode to another, because entire bytes can be compared simultaneously. (Ordering of quadcodes is important when building territory databases, for example.)This method of storage, using bit 7 as the most-significant bit, is consistent with that used in normal integer storage. However, the difference is that a standard integer always has its least-significant bit in position 0 and works from right to left. I have chosen to place the most-significant bit in position 7 and work from left to right. This scheme is necessary, since leading zeros in a quadcode are significant. A separate variable in the data structure holds a count of the number of quits in use.
Some basic functions are required that allow conversion between quadcodes and the familiar Cartesian coordinate system. For this purpose, the first constructor converts a (row, column) coordinate to a quadcode. For programmers who are used to working in (x, y) coordinate systems, note that the (I,J) representation is reversed. The first coordinate is the row number, or y coordinate, while the second coordinate is the column number, or x coordinate. The origin is at the upper left.
The row and column numbers give the coordinates of the upper left corner of the quadcode, and the number of quits must be specified when constructing the quadcode to resolve ambiguity. The body of the conversion function is shown in the constructor QuadCode: :QuadCode, in QC.CPP (Listing 2) . The reverse conversion (from quadcode to Cartesian coordinates) is in the function QuadCode::ToIJ. For details on how these functions are implemented, see the sidebar on "Cartesian To Quadcode Conversion."
Storage Considerations
An important feature of the binary storage scheme is that one quadcode requires exactly the same storage as one (I,J) coordinate pair. Therefore, two 32-bit coordinates can be stored in one 64-bit quadcode (which holds a maximum of 32 quits). In QC.H, I have defined a few constants that determine the storage limitations of the QuadCode class. If the macro STAT_QC is defined, then all storage is statically allocated and the number of bytes used is NBYTE_QC. If not, all storage is dynamically allocated and the maximum number of quits is unlimited.Another option in my implementation is the definition of the macro MSB_FIRST. Computers like the IBM PC store the least-significant byte first in integer data types. Other computers reverse this ordering. Defining MSB_FIRST will make my class library work on these systems as well.
In addition to the functions for converting between Cartesian and quadcode coordinates, there are also functions in the QuadCode class library to convert quadcodes to and from a character representation, for ASCII database storage or debugging output. One additional construction-type function is MakeParent, which changes any quadcode into its parent (i.e., removes its last quit).
Several comparison functions are provided, including all of the normal operators (greater than, less than, equal to, etc.). Two special comparison functions, Sibling and Contains, are important for generation of quadcode lists. They determine if one quadcode is a sibling of another, or if one fully contains another.
Region Representation
Since territories are rarely square, a single quadcode is insufficient to represent an arbitrary one. Instead, quadcodes must be grouped into lists known as regions. Listing 1 contains the definitions of my Region class, although space does not permit publishing a listing of the full implementation. The complete listing, with a demo program, are available in the monthly Code Disk published by The C Users Journal.In my implementation of the Region class, quadcodes are stored in a singly-linked list for economy of storage. The list is maintained in sorted order, using the QuadCode::Compare function as the basis for ordering.
To apply quadcodes to territory mapping applications, I needed some basic functions to convert a boundary-represented region into a quadcode representation. I found an interesting paper that discussed conversion from boundary representations to quadtrees (a data structure that can be used to represent territories in the same way as quadcodes). However, the algorithm required extensive searching while building the database. This searching was quite fast with a quadtree representation, but was very slow using a linked list of quadcodes.
Instead, I decided to use a technique that is very familiar to graphics programmers. I implemented an algorithm for filling polygons on a graphics screen, using this to generate a list of quadcodes that fill the region. There are two steps to generating the list of quadcodes in this manner:
1. Create a list of all the quadcodes necessary to fill the area, using only quadcodes at the highest resolution required to store the area.
2. Compress the region by combining quadcodes wherever possible.
An example is shown in Figure 5, where the darkened lines show the border of the region of interest. After applying step 1, we get a list of quadcodes that looks something like this:
01, 10, 03, 21, 30, 31, 23, 32, 33I have shown the list in the order it would normally be generated by a polygon filling routine, which traverses the polygon from left to right and top to bottom. In the interest of maintaining a sorted list of quadcodes, however, we would normally store this in the order:
01, 03, 10, 21, 23, 30, 31, 32, 33Notice that this order is very close to the original order. This is because of the original choice of quadrant numbering from 0 to 3, which was also done left to right and top to bottom. The problem with adding quadcodes in the order shown in the first list is in the search time to insert new nodes. Since we are using a singly-linked list maintained in sorted order, we must search almost the entire list to find the insertion point.Fortunately, there is a simple solution to the problem. I modified the polygon-filling routine to insert quadcodes in the opposite order. That is, I inserted quadcodes from right to left and bottom to top. This dramatically cuts the search time for inserting new quadcodes, because each new quadcode is normally inserted near the beginning of the list. I found that regions were built 60% faster in the reverse order than in the original order.
In a real territory-mapping system, this list of quadcodes would be maintained in a database. In this case, search times are controlled by the binary tree (or other indexing scheme) that controls the ordering of the database. However, many databases are sensitive to the order in which records are added. My experience has shown that database building can be significantly accelerated by adding records in an order that is well suited to the internal structure of the database. The reverse ordering used in my implementation usually produces good results.
If the programmer intends to maintain a sizable region list in memory, it would probably make more sense to use a binary tree. For purposes of this article and demonstration, however, a linked list is the easiest to show. As you will see if you run my demonstration program, this data structure is quite adequate for regions of reasonable size and complexity.
Region Compression
As noted above, there are two steps to region generation. So far, I have only discussed the first step, which generates an ordered list of quadcodes. The second step is to compress the region by removing unnecessary data. For instance, in Figure 5 the region fills the entire lower right quadrant. The four quadcodes 30, 31, 32, and 33 can actually be removed from the list and replaced with the single quadcode 3, with no loss of accuracy. Therefore, the second step of region generation is to scan the region list for any four consecutive siblings, then replace them with their parent. On one pass through the compression function, a quadcode can only be compressed through one generation. The region constructor performs multiple calls to the compression function until there is no more compression to perform, to make all of the quadcodes "bubble up" to their most compact level. The final list of quadcodes for Figure 5 is:
01, 03, 10, 21, 23, 3Note that my choice of ordering constraints (by most-significant quit first) makes it easy to find the four siblings, because they are always grouped together. I also used a uniqueness constraint when adding quadcodes to the list in Step 1, which prevented addition of any quadcode that was already in the list.If a region is being generated at a high resolution, the list of quadcodes generated in Step 1 can be very large. Normally, the vast majority of these will be compressed out in Step 2, but in the mean time a great deal of temporary storage is consumed and the linked list grows very long. To prevent excessive waste of space, I make a compression pass during Step 1 after every 100 quadcodes are added, or at the end of every row, whichever comes last. It is not necessary to perform a full compression loop here. A single pass causes quadcodes to bubble up only a single level. The next time compression is performed, they may bubble up an additional level, and so on. The result is that by the end of Step 1, all of the quadcodes added in the beginning (i.e., near the end of the list) have been fully compressed. The only remaining compression to perform is on the quadcodes near the beginning of the list.
Quadcode compression is an important step. For regions of reasonable complexity, I have found that storage is reduced by more than 90% by compressing the region. A possible way to speed up region generation would be to use a water mark to keep track of the last place in the list where compression was performed. You don't compress beyond that point unless a new quadcode has been added after it. However, my tests with Turbo Profiler indicate that compression currently takes up less than 5% of region-generation time, so this is not necessary.
How compact is a quadcode-represented region? For simple examples like the demonstration provided with this article, quadcodes require much more memory than the equivalent boundary list. Storage savings with quadcodes will only be realized if the equivalent boundary list is large, or if the region is generated at low resolution. Another approach to reducing storage requirements of regions is to reduce the number of bytes used in each quadcode (by reducing NBYTE_QC). This can only be done by carefully examining application requirements for the total area represented and the resolution required.
There is one other important member function in the Region class: InRegion. This is the main interrogation function. It accepts a quadcode as its argument, and returns TRUE if the quadcode is within the region or FALSE if it is not. It works by calling the QuadCode::Contains function on each quadcode in the region list, to see if the selected quadcode is contained within any of the quadcodes in the region. It stops searching when it finds either a quadcode that contains the selected quadcode, or one with a larger value (because the list is sorted).
The InRegion function would be much faster if it used directed search, like a binary tree. The database could be queried to find the insertion point for the quadcode. If the quadcode already exists, it is in the region. If it does not exist, the preceding record is checked to see if it contains the quadcode. If so, it is within the region; otherwise it is not.
A Graphic Demonstration Program
All of the code for my QuadCode and Region classes is generic C++. It has been tested with the Borland and Zortech compilers (both were version 2.0). The demonstration program QCDEMO (available on the monthly Code Disk) calls functions in the Borland graphics library and therefore requires that compiler. It runs under MS-DOS with any of the popular graphics adapters, and uses the standard .BGI (Borland Graphic Interface) files. They must either be in the current directory or in a directory designated by the environment variable BGIDIR.QCDEMO is a demonstration of the power and speed of quadcodes. It builds a region from a list of perimeter points defining a familiar territory, then displays it graphically on the screen. To demonstrate the generalized ability of the region filling algorithm, one small lake is included in the territory (i.e., a hole). In this case, the hole was made by including it in the boundary list. A straight line was run from one of the boundary points to an edge of the hole, the hole was traced, then the straight line was re-traced back to the boundary. Holes can also be made after region generation through the use of Boolean operators.
After QCDEMO displays the territory, a small crosshair can be moved by the user with the arrow keys. Every time the crosshair is moved, the region list is interrogated to determine if the crosshair is inside or outside of the region. Then a small box in the upper left corner of the screen is updated. If the crosshair is within the region, the box is filled; if not then the box is drawn empty.
The fact that this box can be updated every time the crosshair is moved, with no noticeable delay, is testimony to the speed of quadcodes. If a binary tree were used instead of a linked list, searching would be even faster.
There are a couple of features worth mentioning about QCDEMO. First, pressing the S key while in the graphic display will toggle the speed of cursor movement between fast and slow. (The initial setting is fast.) When moving near the boundary, you will want to put the cursor in slow speed to allow accurate positioning (to a single pixel). Pressing the ESC key exits the program.
There is a single command-line option to QCDEMO. By default, QCDEMO displays all quadcodes shrunken by one pixel. This causes the boundaries between quadcodes to be displayed, so the user can see how the region is stored internally. If you specify -n on the command line, then no shrinkage occurs and the region will appear as a solid area.
Conclusions
QCDEMO shows that quadcodes have practical applications, and are a good data structure to add to the repertoire of any graphics programmer. By using algorithms that can handle regions of any geometry, this implementation allows users to quickly query complex territories. It could be made even faster through the use of some standard indexing schemes in a database manager, and the capabilities could be extended through the use of Boolean operators on regions. The addition of Boolean operators is an area I am currently working on, and may be the subject of a future article.
Bibliography
Abrash, Michael, "Graphics Programming: Of Songs, Taxes and the Simplicity of Complex Polygons," Dr. Dobbs Journal #177 (June 1991), 139-143.Gargantini, Irene, "An Effective Way to Represent Quadtrees," Communications of the ACM, Vol. 25 No. 12 (December 1982), 905-910.
Li, Shu-Xiang and Loew, Murray H., "The Quadcode and Its Arithmetic," Communications of the ACM, Vol. 30 No. 7 (July 1987), 621-626.
Monmonier, Mark S., Computer-Assisted Cartography: Principles and Prospects, Prentice-Hall, 1982, 118-120.
Samet, Hanan, "Region Representation: Quadtrees from Boundary Codes," Communications of the ACM, Vol. 23 No. 3 (March 1980), 163-170.
Sidebar: "Cartesian to Quadcode Conversion"