THE I860 AS A GRAPHICS CONTROLLER

3-D graphics transformations and rendering

We begin by calculating the blue (B), green (G), and red (R) intensities for the first two pixels (i and i+1) in a triangle scan line; see Figure 1. Each color intensity is represented as an 8-bit integer portion and a 24-bit binary fraction for purposes of calculation. Let's assume that we've calculated the total color delta over the current triangle scan line for each color component R, G, and B (for example, B_color_delta = B[i+n]-B[i]) and have divided that color delta by the number of pixels to be interpolated across (pixel_delta = n). The result of this division, also represented as an 8-bit integer and a 24-bit fraction, is the incremental color delta.

Now we're ready to recursively add the incremental color delta for each color component to the initial values, so that each successive pixel's RGB values along the triangle scan line are calculated. Here's where faddp.d helps out by automating and speeding up the process.

First let's calculate B values for the next two pixels (i+2 and i+3) in the triangle scan line. To do that, just put the initial B values for the first two pixels of the triangle scan line (i and i+1) into faddp.d's 64-bit op1, side by side in the op1 register pair, as shown in Figure 2. You'll need to use the predefined format of eight bits integer portion and 24 bits fractional portion to use the instruction properly. Then let op2 = two instances of 2* (B_color_delta)/pixel_delta, again side by side in the op2 register pair. The reason the interpolant value is 2* (B_color_delta)/pixel_delta, rather than simply B_color_delta/pixel_delta, is that you are interpolating from pixel i to pixel i+2 in one half of the register pair, and from pixel i+1 to pixel i+3 in the other half.

In one clock faddp.d adds the color fields, generating the B values for the next two pixels. (In fact, like most i860 CPU instructions, all the graphics instructions execute in just one clock.) The result is placed in the fdest register pair so that it can be used as the op1 next time around, in order to generate the B values for pixels i+4 and i+5.

In addition, when PS is set for 32-bit pixels, faddp.d shifts the MERGE register right by eight bits and then updates certain MERGE fields with the integer portions of the faddp.d result. That's so that after three applications of faddp.d--once for R values, once for Gs, and once for Bs--the RGB values for two pixels will be consolidated ("merged") in the MERGE register in precisely the arrangement (packed-pixel format) that graphics hardware typically requires.

After three iterations of faddp.d, one 8-bit field is left unused in the MERGE register. That field can have any other attribute (such as texture) ORed into it with the form (floating-point or with merge) instruction. Form also transfers the MERGE register contents into a floating-point register pair in preparation for storing to the frame buffer, and it clears the MERGE register for the next set of interpolations.

With the RGB values for pixels i, i+1, i+2, and i+3 calculated, the next op1 of the faddp.d instruction will be the B values of pixels i+2 and i+3; the B interpolants in op2 remain the same as they were in the first set of B interpolations. Likewise, after the B values for pixels i+4 and i+5 are obtained, their G and R values are interpolated. In this way, the RGB values for all pixels within a triangle scan line can be quickly and efficiently calculated.

Sixteen-bit pixels are handled similarly to 32-bit pixels, except that for purposes of calculation, colors are represented by an integer portion (for example, Int[Bi]) of six bits and a fractional portion (Frac[Bi]) of ten bits. As illustrated in Figure 3, one faddp.d sums two sets of four pixels' color fields (blue, for instance), updates four 6-bit fields of the MERGE register, and shifts MERGE right by six bits. After two more such instructions, one for green and one for red, the MERGE register contains RGB values for four pixels and is ready to be stored out. One difference for 16-bit pixels, however, is that because there is not room in a 16-bit pixel for six bits each of R, G, and B intensities, two fields (normally for R and G) are allocated six bits each, while the third field (for B) is truncated to just four bits during shifting of the MERGE register. The bits are allocated this way because the human eye is significantly less sensitive to differing shades of blue than of red or green.

Because 8-bit pixels are a nonstandard format, color interpolation for them is often platform dependent. However, because the i860 CPU pixel interpolation instructions only define operand field sizes, and not their uses, the 8-bit faddp.d instruction can be easily adapted to a wide variety of implementations.

Z-value Interpolation

In 3-D graphics applications, objects' surfaces, and the pixels that represent these surfaces, have depth (Z-values) associated with them. Just like color values, however, Z-values are only given explicitly for triangle vertices on objects' surfaces. Z-values for pixels on or inside the triangles must be interpolated from the vertex values.

Z-values can be either 16 or 32 bits long. To accelerate interpolations, the graphics instruction faddz (floating-point add with Z merge) interpolates two 16-bit Z-values at a time. Just as in color interpolation, a Z-value interpolant is recursively added to initial Z-values from pixels at one end of a triangle scan line to generate the Z-values of pixels along the scan line.

As shown in Figure 4, the interpolation results are stored in a floating-point register pair. Additionally, the MERGE register is shifted right 16 bits and then updated with the integer portions of the interpolation sums. That way, after two successive faddz instructions, the MERGE register contains 16-bit Z-values for four pixels in a row.

Because 32-bit Z-buffer calculations require more bits of precision than can be accommodated with faddz, they are more efficiently interpolated using the 64-bit integer add instruction, fiadd.dd.

Z-value Comparisons and Pixel Display

When displaying a 3-D object, not all of its surfaces are to be displayed simultaneously, or the back of the object (with respect to a viewer) might overwrite the front. Likewise, in a scene consisting of multiple objects, some objects' surfaces may obscure other objects. This is why we calculate Z-values during rendering: once Z-values have been calculated for all the different objects' surfaces, those Z-values can be used to decide which surfaces to display. Selecting which pixels to display is known as "hidden surface removal."

If a newly computed pixel's Z-value is smaller (closer to the viewer) than the Z-value of the pixel already displayed at that pixel's (x,y) coordinates, then the newly computed pixel is displayed instead of the previous one, and the Z-buffer is updated with the new pixel's Z-value. If the newly computed pixel's Z-value is larger than the Z-value of the pixel already displayed at that pixel's (x,y) coordinates, then the newly computed pixel is not displayed at all, and the Z-buffer retains its value for the given pixel location.

The i860 CPU has two kinds of special graphics instructions, fzchks/fzchkl (floating-point Z-buffer check short/long) and pst.d (double-word pixel store), which expedite the Z-value comparison and subsequent store operations.

Fzchks compares four pairs of 16-bit Z-values in a swoop. Normally one of the sets of four Z-values is from newly computed pixels; the other set is from the Z-buffer. Fzchks first shifts the contents of the 8-bit PM (pixel mask) field in the PSR control register right by four bits. Then it sets one of the high-order bits of PM for each of the four comparisons that indicates that the newly computed pixel has a smaller Z-value than the corresponding one stored in the Z-buffer.

PM is shifted right so that the results of two successive fzchks instructions accumulate in the 8-bit PM field. The PM field is used by the pst.d instruction, which examines the contents of PM and stores to the frame buffer only those pixels within its 64-bit register pair operand that correspond to set bits in PM. Thus only those pixels which need to be updated in the frame buffer are actually written out.

Fzchkl (l for long) is identical to fzchks (short) except that it compares two pairs of 32-bit Z-values at a time, shifts PM right by only two bits, and only updates the two high-order bits of PM corresponding to the results of the two 32-bit comparisons.

PS and PM Unrelated

Here's the only potentially confusing piece of the puzzle. Although PS and PM are both used by pst.d, they are unrelated. That is, the number of bits allotted to pixel size and to Z-value size are unrelated. You can have an 8-bit pixel with a 32-bit Z-buffer, a 32-bit pixel with a 16-bit Z-buffer, or any other combination you please.

Pst.d stores 64 bits at a time, which represents eight pixels if your pixel size is 8 bits, but only four pixels if your pixel size is 16 bits, or two pixels if your pixel size is 32 bits. Although PM presumably has eight bits (8 pixels' worth) of information in it from multiple fzchks/1 instructions, pst.d only examines the appropriate number of low-order bits of PM. (The "appropriate" number depends on the pixel size as described in the next section.) Pst.d also shifts PM right by 8/pixel_size_in_bytes bits, where pixel_size_in_bytes is determined by PS. That sets up PM for the next pst.d. Multiple pst.d instructions are executed until eight pixels in a row have been stored to the frame buffer (or not stored, depending on the contents of PM).

Examples

Assume your pixel size is 8 bits (as determined by the PS field of PSR) and your Z-values are 16 bits. In order to generate eight pixels' worth of Pixel Mask information, you must perform two fzchks instructions, which compare four Z-value pairs at a time. Then you must execute one pst.d, which stores (or doesn't store, depending on PM) eight 8-bit pixels, exploiting all eight bits of PM. All eight bits of PM have been "used up," so you must then proceed to the next round of fzchks instructions before executing another pst.d. This correlates with the fact that one pst.d shifts PM right by 8/1 = 8 bits--that is, effectively shifts all eight bits out.

Alternatively, say your pixel size is 16 bits, and your Z-values are 32 bits. Set up PM with four consecutive fzchkl instructions, each of which compares two Z-value pairs at a time. Then, because one pst.d only stores (potentially) four pixels, exploiting only the low-order four bits of PM, you'll need to execute two pst.d instructions in a row before proceeding to the next fzchkl instructions. Again, this makes sense because pst.d with 16-bit pixels shifts PM by 8/2 = 4 bits.

Summary

Because they provide hardware support for rendering as well as fast transformations, the i860 CPUs are optimal solutions for demanding graphics applications. Scientific visualization, CAD/CAM, animation, and other graphics-oriented applications can all benefit from the i860 CPUs' graphics features, enjoying performance improvements of up to ten times compared to conventional integer operations.