BOUNDING BOX DATA COMPRESSION

An optimized font data compression method for fast screen I/O environments

Glenn Searfoss

Glenn works at Data Transforms Inc., and can be reached at 616 Washington Street, Denver, CO 80203.

The optimal data compression method for a given situation is determined by the type of data to be used and its intended application. To date, a variety of methods have been used to effect a balance between compression and display speed of bit-mapped font data.

One school of thought maintains that only noncompressed font data be used. While this data can be displayed rapidly, the amount of data storage required for fonts can be prohibitive, particularly when used with higher display resolutions and colors.

Another school stresses maximum compression of font data at all times. In this situation, each time a character is accessed, the cell data must be reconstituted. This saves data storage space but sacrifices access speed during screen display.

A third approach attempts to incorporate both aspects. Here font data remains compressed until accessed, at which point all data is uncompressed. A font may then be used at optimal speed. As additional fonts are uncompressed, the data storage limitation of the first method is encountered. This limitation can be ameliorated by each caching system in which only the most recently used characters are available in uncompressed form (at the expense of additional code complexity).

Because none of these methods are completely satisfactory for font screen display, it is important to develop a better balance between efficient access and efficient storage.

The terms "fast-access" or "on-the-fly" have been used to describe bit-mapped font data optimized for fast screen I/O. It is ultimately desirable in this environment to achieve maximum data compression while maintaining or increasing data access speed. To this end, it is essential to understand certain restrictions inherent with the graphics display of bit-mapped font data.

First, there must be minimal calculation of font data. Vector format and highly compressed bit-mapped fonts both require recalculation of character data at display time. This greatly restricts their usefulness in a fast-access environment. Speed optimization occurs when there is little or no data compression and reconstruction.

Second, character size must relate to the display resolution. As screen display resolution increases, larger characters are required to maintain the same relative size and quality as characters used on lower-resolution displays.

Third, font data storage requirements must not impinge upon code space. As bit-mapped fonts increase in size, their data storage needs quickly reach mammoth proportions. Realistically, some form of data compression is required for a program and several large font sets to coexist in memory.

Fourth, the amount of code necessary must be functional, effective, and preferably compact. Coding requirements are reduced with minimal data compression and decompression.

The data compression method best suited for this application will achieve a dynamic balance between these four criteria. The Bounding Box method of data compression is such a method.

To illustrate its effectiveness in this situation, a comparison with a commonly used method of data compression, run length bit encoding (RLE), is useful. This comparison is useful even though the two approaches are not mutually exclusive: One can store run length encoded characters in the Bounding Box format.

A valid comparison depends upon establishing a common reference point. Because noncompressed data is the basic requirement for both compression schemes, a brief outline of this font data format may be useful.

Noncompressed fonts are used as is. The font data is comprised of header information and complete character cell data. The header may detail as many character formatting aspects as desired.

A typical header for standard noncompressed font data is shown in Listing One, page 108. In this case, I'm using the C language data structure of Data Transforms' Fontrix (Font1) Format.

Each font has a header that defines the font and points to the character bitmaps. Immediately following the font header is the character bitmap data. Character bitmaps are stored as scanrects in scanline order from top to bottom. Each scanline is stored byte wise left to right, left justified, and rounded to byte length. Each byte is stored 8 bits per byte where MSB is the leftmost pixel. Using this as a standard font header to describe Figure 1, we can proceed to specifics concerning methods of compressing this data.

Bounding Box Compression

The Bounding Box Compression Method (see Figure 2) involves outlining a cell's "bit-on" character data with the smallest box possible. Coordinates that position the "box" relative to the original character cell are saved in the font header. This compression method is automatic within the Data Transforms Font Editor when a font over 32 x 32 pixels in size is saved. A sample header for a font compressed using the Bounding Box method is shown in Listing Two, page 108 (again using Data Transforms' Fontrix [Font2] Format).

The data listed in the struct font2 portion of the (Font2) font structure above is defined as follows:

The font header is the same as described in the struct font1head of the noncompressed font data format; font cell segments are an array of segment pointers to characters kept as offsets from the beginning of the font file. For example, if an array value for a character = 10, then the starting address of a character's "bounding box" data = (the start of file address + size of [struct font2]) + (10 x 16), where 1 segment = 16 bytes.
The horizontal size is the actual width in bits of the bounding box.
The horizontal offset is the distance in bits from the left edge of the cell to the upper lefthand corner of the bounding box.
Horizontal bytes refers to the actual size in bytes of the interior of the bounding box.
The vertical size is the actual height in bits of the bounding box.
The vertical offset is the distance in bits from the top edge of the cell to the upper edge of the bounding box.

The character data is never compressed; rather, the empty space outside the "bounding box" is discarded. The analogy of a shrinkwrap bag can be used to illustrate.

Imagine a character cell placed within a shrink-to-fit bag. The noncompressed cell is the unshrunk bag. Now shrink the bag until all edges contact the outer limits of bit-on data, forming a rectangular bounding box. For some characters -- such as a lowercase "i" -- this correlates to a major amount of shrinkage, and the shrinkage correlates to saved data space, hence, compression. An uppercase "W" may fill most of a cell. In this instance minimal shrinkage will occur, with little or no saved data space. However, the overall net savings in data space for an entire font set will be great because few characters fill an entire cell (see Figure 2, Figure 3, and Figure 4).

Run Length Bit Encoding

A RLE font would possess a header similar to the font header described in the "struct font1head" of the noncompressed font data format. The main differences between RLE and the Bounding Box lies in the method of character data storage and how code points to it.

In a simple case of run length bit encoding, bit mapped data is compressed by reading each scanline of a character cell, grouping adjacent bit-on or bit-off data and saving the information as pairs of ASCII digits. For example, by using RLE the marked scanline in Figure 5 could be compressed as follows:

0x06 (6 identical bits in a row), 0x00 (those bits are bit-off data)
0x02 (2 identical bits in a row), 0x01 (those bits are bit-on data)
0x09 (9 identical bits in a row), 0x00 (those bits are bit-off data)

Having established the basic structure of the two methods, it now remains to compare their differences in access speed, compression efficiency, and coding requirements.

Speed of Compressed Data Access

When a Bounding Box compressed character is used, there is no run-time penalty during screen display. A character cell is not reconstituted. Rather, the character data is used as is and positioned relative to the original cell size information.

Fonts compressed with run length bit encoding pay a run-time penalty during screen display. If a font remains compressed while being accessed, each time a character is displayed to the screen, the entire character cell must be reconstructed. In a case like this, you'll be watching the clock.

Compression Efficiency

The Bounding Box routine can run at up to 90 percent of the overall compression efficiency of more computationally expensive compression algorithms. The actual character bits-on data is never compressed.

For this reason, the Bounding Box approach is more properly called a technique or a character storage format, rather than an algorithm. Information regarding the bounding box (size, [x,y] position within the original cell, and so on) is kept for each character in a lookup table in the font header (see Figure 2 and Figure 4).

Using this compression method on the character cell in Figure 1 (lowercase i), the net gain in savings is 94 percent of the original character cell size. For Figure 3 (uppercase W), the net savings is 31 percent of the original character cell size. The percentage in savings correlates to the discarded zero (bit off) data outside the bounding box (see Figure 4).

The run length bit encoding (simple case) compression method can run at up to 98 percent of overall compression efficiency, but will be computationally expensive. Using this compression method on the character cell in Figure 1 (lowercase i), the net gain in savings is 78 percent of the original character cell size. For Figure 3 (uppercase W), the net savings is 57 percent of the original character cell size (see Figure 5 and Figure 6). Table 1 provides a complete RLE table for Figure 5. Read each scan line left to right, top to bottom, numbered 0 to 15. Table 2 is the complete RLE table for Figure 6. Again, read each scan line left to right, top to bottom, 0 to 15.

Table 1: RLE table for Figure 5. Reading each scan line left to right, top to bottom, and numbered 0 - 15

   0] 0x17 0x00
   1] 0x17 0x00
   2] 0x17 0x00
   3] 0x17 0x00
   4] 0x06 0x00 0x02 0x01 0x09 0x00
   5] 0x17 0x00
   6] 0x06 0x00 0x02 0x01 0x09 0x00
   7] 0x06 0x00 0x02 0x01 0x09 0x00
   8] 0x06 0x00 0x02 0x01 0x09 0x00
   9] 0x06 0x00 0x02 0x01 0x09 0x00
  10] 0x06 0x00 0x02 0x01 0x09 0x00
  11] 0x17 0x00
  12] 0x17 0x00
  13] 0x17 0x00
  14] 0x17 0x00
  15] 0x17 0x00

Table 2: RLE table for Figure 6. Reading each scan line left to right, top to bottom, and numbered 0 - 15

   0] 0x17 0x00
   1] 0x02 0x01 0x13 0x00 0x02 0x01
   2] 0x02 0x01 0x13 0x00 0x02 0x01
   3] 0x01 0x00 0x02 0x01 0x13 0x00 0x02 0x01 0x01 0x00
   4] 0x01 0x00 0x02 0x01 0x13 0x00 0x02 0x01 0x01 0x00
   5] 0x02 0x00 0x02 0x01 0x13 0x00 0x02 0x01 0x02 0x00
   6] 0x02 0x00 0x02 0x01 0x04 0x00 0x01 0x01 0x04 0x00 0x02 0x01 0x02 0x00
   7] 0x03 0x00 0x02 0x01 0x02 0x00 0x03 0x01 0x02 0x00 0x02 0x01 0x03 0x00
   8] 0x03 0x00 0x02 0x01 0x01 0x00 0x02 0x01 0x01 0x00 0x02 0x01 0x01 0x00
      0x02 0x01 0x03 0x00
   9] 0x04 0x00 0x03 0x01 0x03 0x00 0x03 0x01 0x03 0x00
  10] 0x04 0x00 0x03 0x01 0x03 0x00 0c03 0x01 0x04 0x00
  11] 0x05 0x00 0x01 0x01 0x05 0x00 0x01 0x01 0x05 0x00
  12] 0x17 0x00
  13] 0x17 0x00
  14] 0x17 0x00
  15] 0x17 0x00

Coding Requirements

The Bounding Box method is invoked at a time when display speed is not an issue -- during font creation. All the data needed to use a character is saved in a lookup table in the font header at this time. Because the actual data within the bounding box is not compressed, no coding is required to reconstruct a character cell. Code need only index into the data and index the screen pixel position.

Run length bit encoding requires code to do more than access font data. It must also handle the peculiarities involved with compressing and uncompressing data. Font characters that remain compressed while being used must be reconstructed each time they are accessed. Font data that is reconstituted once and thereafter accessed as noncompressed data, can instigate a data storage conflict. As additional fonts are uncompressed, their combined data storage requirements begin to compete with code and operational program space.

The Nitty Gritty

The Bounding Box method is most effective when the data to be compressed is bit-mapped data; all bit-on data is concentrated and confined in one area, as is often the case with alphanumeric character fonts; the ratio of bit-off to bit-on data is high; fast usage of compressed data is of most importance (for example, screen I/O, graphics printing, and so on); and it is preferred that font data remain compressed when being accessed.

Run length bit encoding is most effective when all bit-on data is widely dispersed or present at a majority of the cell edges; the ratio of bit-off to bit-on data is low; and the usage of data storage must be maximized or is of higher priority.

RLE and the Bounding Box are two good methods of data compression. Each has limits in its ability to handle graphics information. The decision to use a RLE or Bounding Box compression scheme should be based on the likely distribution of bit-on data and the preferred ratio of data compression to access speed.

As mentioned earlier, in situations where memory is extremely tight, the Bounding Box method can profitably be used in conjunction with RLE or other algorithms. Of the various compression methods currently used in fast-access screen I/O environments, the Bounding Box method is recommended for maintaining maximum data compression while allowing the greatest access speed of bit-mapped font data.

BOUNDING BOX DATA COMPRESSION by Glenn Searfoss [LISTING ONE]



/* FONT1 STRUCTURE DEFINITIONS */

struct font1head {   /* standard character font header */
   unsigned char fnttype;   /* font structure type: */
   /* non-compressed type = 0x16 or compressed type = 0x14 */
   char fntname[13];   /* font name: always followed with a '.set' extension */
   unsigned char fntcheck; /* check digit: verifies a Data Transforms font: */
        /* non-compressed font = 0xba, compressed font = 0xdc */
   unsigned char fntbase;
                  /* baseline count (in pixels) from top to bottom, top = 0 */
   unsigned char fnttotal; /* total characters in font:*/
        /* limited to the lower 94 ASCII characters: 0x21 - 0x7E */
   unsigned char fntstart;   /* starting character */
   unsigned char fntstatus; /* proportional or non-proportional: 0=non-prop.*/
   unsigned char fnthsize;     /* horizontal cell size in pixels */
   unsigned char fntvsize;     /* vertical cell size in pixels */
   unsigned char fntbytes;     /* number of horizontal bytes in current cell */
   unsigned char fntspaceh;    /* space bar horizontal size in pixels */
   unsigned char fntchargap;   /* pixels between characters default */
   unsigned char fntlfgap;     /* pixels between linefeeds default */
   int fntlength;          /* total length of file */
   unsigned char fntpitch;     /* italics pitch (0 = none) */
      /* bits 0 - 6 ... number of scanlines to skip */
      /* bit  7 ... 0 = decrement xpos, 1 = increment xpos */
   unsigned char fntinvert;   /* 0 = dont invert, 1 = invert */
   unsigned char fnthbold;   /* number of overlapping bits horizontal */
   unsigned char fntvbold;   /* number of overlapping bits vertical */
   unsigned char fnthmag;   /* integral horizontal bit magnification */
   unsigned char fntvmag;   /* integral vertical bit magnification */
   unsigned char fnthfract;   /* fractional horizontal bit magnification */
   unsigned char fntvfract;   /* fractional vertical bit magnification */
   unsigned char fntdirection;
                       /* Print direction 0=left to right,1..3=counterclock */
   unsigned char fntrot90;   /* rotation 0=up, 1..3=counterclock 1...3 */
   unsigned char fnthflip;   /* horizontal flip 0 = no, 1 = yes */
   unsigned char fntvflip;   /* vertical flip 0 = no, 1 = yes */
   unsigned char fntcolor;   /* color of font */
              /* bits 0 - 3 ... foreground color */
              /* bit 0 ... strike black ribbon */
         /* bit 1 ... strike blue ribbon */
              /* bit 2 ... strike red ribbon */
              /* bit 3 ... strike yellow ribbon */
              /* bits 4 - 7 ... background color */
              /* bit 4 ... strike black ribbon */
              /* bit 5 ... strike blue ribbon */
              /* bit 6 ... strike red ribbon */
              /* bit 7 ... strike yellow ribbon */
   unsigned char fntsubtype;   /* subcategory type of this font */
              /* 0 = normal font subtype */
              /* 1 = equation roman font subtype */
              /* 2 = equation symbol font subtype */
   unsigned char fntunused[18];   /* unused bytes */
};

struct font1 {   /* standard character font (type = 0x16) */
   struct font1head fhd;   /* font header */
   char *fntcellptr[FONT1TOTAL + 1];
                     /* offsets from beginning of file to character bitmaps */
   char fntcellwidth[FONT1TOTAL + 1];   /* cell widths if proportional */
};

[LISTING TWO]




/* FONT2 STRUCTURE DEFINITIONS */

struct font2 {
   struct font1head fhd2;           /* font header */
   unsigned int fnt2cellseg[FONT2TOTAL + 1];
                                    /* array: segment pointers to characters */
   int fnt2cellhsize[FONT2TOTAL + 1];   /* array: cell horiz sizes in bits */
   int fnt2cellhoffset[FONT2TOTAL + 1];   /* array: cell horiz offsets in bits */
   int fnt2cellhbytes[FONT2TOTAL + 1];   /* array: cell horiz size in bytes */
   int fnt2cellvsize[FONT2TOTAL + 1];   /* array: cell vert sizes in bits */
   int fnt2cellvoffset[FONT2TOTAL + 1];   /* array: cell vert offsets in bits */
};